HamburgerMenu
hirist

Site Reliability Engineer/Lead - CI/CD Pipeline

SolutionTech HR
Mumbai
6 - 10 Years

Posted on: 23/09/2025

Job Description

Key Responsibilities :


- Lead and mentor a team of SREs/DevOps Engineers, fostering a culture of ownership, reliability, and continuous improvement.

- Own the availability, scalability, and performance of production systems and services.

- Design and manage distributed systems and microservices architectures at scale.

- Develop and implement incident response strategies, root cause analysis, and create actionable postmortems.

- Drive improvements in infrastructure automation, CI/CD pipelines, and deployment strategies.

- Collaborate with cross-functional teams including engineering, product, and QA to embed SRE best practices.

- Implement observability tools (e.g., Prometheus, Grafana, ELK, Datadog) to monitor system performance and proactively detect issues.

- Manage and optimize cloud infrastructure on AWS, including services such as EC2, ELB,

AutoScaling, S3, CloudFront, and CloudWatch.

- Utilize Infrastructure-as-Code tools such as Terraform, CloudFormation, or Pulumi for provisioning and maintaining infrastructure.

- Apply strong Linux, networking, load balancing, and security principles to ensure platform

resilience.

- Leverage Docker and Kubernetes for container orchestration and scalable deployments.

- Build internal tools and automation using Python, Go, or Bash scripting.

- Support event-driven architectures leveraging Kafka or RabbitMQ for high-throughput, real-time systems.

- Proactively contribute to reliability-focused architecture and design discussions.


Required Skills & Experience :


- 6 - 10 years of overall experience in backend engineering, infrastructure, DevOps, or SRE roles.

- Minimum 3 years of experience leading SRE, DevOps, or Infrastructure teams.


- Proven track record managing distributed systems and microservices at scale.

- Deep understanding of Linux systems, networking fundamentals, load balancing, and infrastructure security.

- Strong hands-on experience with AWS services : EC2, ELB, AutoScaling, CloudFront, S3, and CloudWatch.

- Expert-level knowledge of Docker and Kubernetes in production environments.

- Proficient with Infrastructure-as-Code tools : Terraform, CloudFormation, or Pulumi.

- Hands-on experience with monitoring and observability tools : Prometheus, Grafana, ELK

Stack, or Datadog.

- Strong scripting or programming skills in Python, Go, Bash, or similar languages.

- Familiarity with Kafka or RabbitMQ for event-driven and messaging architectures.

- Excellent incident management skills, including triage, RCA, and communication.

- Ability to thrive in fast-paced environments and adapt to changing priorities.


Preferred Qualifications :


- Bachelors degree in Computer Science, Engineering, or a related field.

- Experience in startup or high-growth environments.

- Contributions to open-source DevOps or SRE tools are a plus.

- Certifications in AWS, Kubernetes, or other cloud-native technologies are advantageous.


info-icon

Did you find something suspicious?