Artificial Intelligence

Machine Learning

NLP

Security Architect - AI

Cloud Architect - ML/AI

Emerging Technologies

DevOps / SRE

CyberSecurity

Quality Assurance

Platform Engineering / SAP/Oracle

Site Reliability Engineer/Lead - CI/CD Pipeline

SolutionTech HR

Mumbai

6 - 10 Years

Site Reliability AWS Cloud Services DevOps CI/CD Pipeline ELK Stack Datadog Docker Kubernetes

Posted on: 23/09/2025

Job Description

Key Responsibilities :

- Lead and mentor a team of SREs/DevOps Engineers, fostering a culture of ownership, reliability, and continuous improvement.

- Own the availability, scalability, and performance of production systems and services.

- Design and manage distributed systems and microservices architectures at scale.

- Develop and implement incident response strategies, root cause analysis, and create actionable postmortems.

- Drive improvements in infrastructure automation, CI/CD pipelines, and deployment strategies.

- Collaborate with cross-functional teams including engineering, product, and QA to embed SRE best practices.

- Implement observability tools (e.g., Prometheus, Grafana, ELK, Datadog) to monitor system performance and proactively detect issues.

- Manage and optimize cloud infrastructure on AWS, including services such as EC2, ELB,

AutoScaling, S3, CloudFront, and CloudWatch.

- Utilize Infrastructure-as-Code tools such as Terraform, CloudFormation, or Pulumi for provisioning and maintaining infrastructure.

- Apply strong Linux, networking, load balancing, and security principles to ensure platform

resilience.

- Leverage Docker and Kubernetes for container orchestration and scalable deployments.

- Build internal tools and automation using Python, Go, or Bash scripting.

- Support event-driven architectures leveraging Kafka or RabbitMQ for high-throughput, real-time systems.

- Proactively contribute to reliability-focused architecture and design discussions.

Required Skills & Experience :

- 6 - 10 years of overall experience in backend engineering, infrastructure, DevOps, or SRE roles.

- Minimum 3 years of experience leading SRE, DevOps, or Infrastructure teams.

- Proven track record managing distributed systems and microservices at scale.

- Deep understanding of Linux systems, networking fundamentals, load balancing, and infrastructure security.

- Strong hands-on experience with AWS services : EC2, ELB, AutoScaling, CloudFront, S3, and CloudWatch.

- Expert-level knowledge of Docker and Kubernetes in production environments.

- Proficient with Infrastructure-as-Code tools : Terraform, CloudFormation, or Pulumi.

- Hands-on experience with monitoring and observability tools : Prometheus, Grafana, ELK

Stack, or Datadog.

- Strong scripting or programming skills in Python, Go, Bash, or similar languages.

- Familiarity with Kafka or RabbitMQ for event-driven and messaging architectures.

- Excellent incident management skills, including triage, RCA, and communication.

- Ability to thrive in fast-paced environments and adapt to changing priorities.

Preferred Qualifications :

- Bachelors degree in Computer Science, Engineering, or a related field.

- Experience in startup or high-growth environments.

- Contributions to open-source DevOps or SRE tools are a plus.

- Certifications in AWS, Kubernetes, or other cloud-native technologies are advantageous.

Did you find something suspicious?

Posted By

Anup

HR at SolutionTech HR

Last Active: 5 Nov 2025

Job Views:
42

Applications: 23

Recruiter Actions: 0

Posted in

DevOps / SRE

Functional Area

Site Reliability Engineering

Job Code

1550420

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers