Optimizely’s Site Reliability Engineers work on improving the availability, scalability, performance and reliability of our production data platform. Our distributed event processing and compute platform powers the results and analytics for all of our Experimentation and Personalization products. This platform processes billions of events a day and is relied on by many Fortune 100 global businesses.
We value observability, monitoring, actionable alerting based on SLOs, blameless postmortems and efficient incident response. We work in both the application and systems worlds, instrumenting key parts of core architecture while supporting developers as they do the same.
What you’ll do
You will be part of a small team of SREs in Optimizely’s Austin, TX office. As a member of the Data Infrastructure team your work will directly impact the reliability and performance of all of Optimizely’s products.
You will deep dive into gnarly operational issues within software deployments, operating systems, network I/O, and Linux processes.
You will also work on projects to move away from operational toil and towards improving fault tolerance, automation and SLO driven priorities.
Your responsibilities include but are not limited to:
Work closely with distributed systems engineers developing new scalable features and services within our data platform
Build and scale new infrastructure to meet demand
Document system design and procedures
Participate in production on-call rotation
Contribute to improvements of infra and application monitoring and alerting