About the Role :

We're looking for a highly skilled and experienced Senior DevOps/SRE Engineer to join our team.

In this role, you'll be responsible for building and maintaining the infrastructure that powers our large-scale, high-availability systems.

You'll work on everything from CI/CD pipelines to geo-redundant deployments, ensuring our platform is scalable, reliable, and performant.

This is a critical position for someone who thrives on solving complex, production-grade challenges and has a passion for automation and operational excellence.

Key Responsibilities :

- Design, implement, and maintain CI/CD pipelines for global, multi-region deployments.

- Administer and manage our Kubernetes clusters, including multi-region deployments and scaling strategies to handle high queries per second (QPS).

- Develop and manage Infrastructure as Code (IaC) using tools like Terraform or CloudFormation.

- Manage and optimize our cloud infrastructure on platforms like AWS, GCP, or Azure, with a focus on geo-redundant architecture.

- Proactively monitor, troubleshoot, and resolve issues in large-scale distributed systems.

- Collaborate with development teams to improve application performance, scalability, and reliability.

- Mentor junior team members and provide technical leadership on complex projects.

- Ensure system security, compliance, and best practices are followed.

Required Qualifications :

- 6 to 10 years of professional experience in a DevOps, SRE, or similar role, with a focus on managing large-scale, high-availability systems.

- Proven, hands-on expertise in Kubernetes administration, including scaling for high QPS and managing multi-region deployments.

- Deep experience with IaC tools, specifically Terraform or CloudFormation.

- Strong background in building and maintaining CI/CD pipelines for complex, multi-region environments.

- Proficiency with cloud platforms such as AWS, GCP, or Azure and a solid understanding of geo-redundant architecture.

- Strong knowledge of Linux and expertise in scripting languages like Bash and Python.

- Extensive experience with troubleshooting and debugging production issues in large-scale distributed systems.

- Demonstrated experience leading teams or projects and a strong ability to solve challenging technical problems