Posted on: 02/01/2026



About the Role :
We are looking for an experienced Staff Platform Engineer with a strong Site Reliability Engineering (SRE) mindset to join our Platform Engineering team. This role is critical in building resilient, scalable, and secure platforms that empower development teams to deliver high-quality software efficiently.
As a Staff Engineer, you will lead initiatives to improve reliability, observability, and operational excellence across our platforms.
You will design and implement solutions that automate infrastructure, optimize performance, and ensure high availability. This position requires a balance of deep technical expertise, leadership skills, and a passion for reliability engineering.
Key Responsibilities :
- Architect and implement highly available, scalable, and secure cloud platforms (AWS, Azure, GCP).
- Drive SRE practices : implement SLIs, SLOs, and error budgets to improve reliability and performance.
- Enhance observability : build advanced monitoring, logging, and alerting systems for proactive issue detection.
- Automate everything : infrastructure provisioning, deployments, and operational tasks using IaC and scripting.
- Lead incident management and postmortems, ensuring root cause analysis and continuous improvement.
- Collaborate with development and operations teams to embed reliability into the software lifecycle.
- Mentor engineers, fostering a culture of operational excellence and innovation.
- Contribute to technical roadmap, aligning platform capabilities with organizational goals.
Key Requirements :
- 12+ years of experience in Platform Engineering, SRE, or DevOps roles.
- Strong application development background (5+ years in .NET and Java).
- Proven experience as a technical lead, driving design and architecture decisions.
- Expertise in AWS and Azure infrastructure and services.
- Advanced scripting skills (Python preferred).
- Deep knowledge of IaC tools (Terraform, CloudFormation).
- CI/CD pipeline design and implementation (GitHub Actions, ADO, Jenkins, CodePipeline, or similar).
- Containerization and orchestration (Docker, Kubernetes).
- Version control systems (Git, Bitbucket, TFS).
- Configuration management tools (Ansible, Chef, Puppet).
- Hands-on experience with code reviews, design reviews, and technical governance.
Nice to Have :
- AWS or Azure certifications.
- Experience with serverless architectures and automation.
- Familiarity with GitOps workflows and progressive delivery strategies.
- System administration (Windows/Linux) and networking fundamentals (DNS, Load Balancers, Reverse Proxies).
- Knowledge of Active Directory and ADFS.
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
Site Reliability Engineering
Job Code
1596136