We are looking for an experienced Senior DevOps / Site Reliability Engineer with 5-6 years of hands-on experience to ensure the robustness, scalability, and security of our infrastructure and services. You will lead efforts in automating deployment pipelines, optimizing system reliability, and driving best practices across development and operations teams.

Key Responsibilities :

- Architect, build, and maintain scalable and resilient infrastructure solutions on cloud platforms (AWS, GCP, Azure).

- Lead the design and implementation of CI/CD pipelines, ensuring fast, reliable, and secure deployments.

- Automate infrastructure provisioning and management using Infrastructure as Code (IaC) tools such as Terraform, Ansible, or CloudFormation.

- Monitor system health, establish SLIs/SLOs, and respond swiftly to incidents with root cause analysis and remediation.

- Collaborate closely with software engineering teams to improve service reliability, scalability, and performance.

- Implement and enhance monitoring, logging, and alerting solutions using Prometheus, Grafana, ELK stack, or similar tools.

- Drive security best practices and compliance across infrastructure and deployment workflows.

- Mentor junior engineers and contribute to the continuous improvement of DevOps processes and tooling.

- Participate in and help manage on-call rotations, incident response, and post-mortem analysis.

Qualifications :

- Bachelors degree in Computer Science, Engineering, or related field, or equivalent professional experience.

- 5-6 years of relevant experience in DevOps, Site Reliability Engineering, or Cloud Infrastructure roles.

- Strong expertise with cloud providers (AWS, GCP, Azure) and container orchestration platforms like Kubernetes and Docker.

- Proven experience with CI/CD tools (Jenkins, GitLab CI, CircleCI) and automation scripting (Python, Bash, Go).

- Hands-on knowledge of Infrastructure as Code (Terraform, Ansible, CloudFormation).

- Proficient with monitoring and alerting tools (Prometheus, Grafana, ELK stack, Datadog).

- Solid understanding of networking, security principles, and system architecture.

- Excellent analytical and problem-solving skills with a proactive approach.

- Strong communication skills and ability to collaborate effectively across teams.

Did you find something suspicious?

Similar jobs that you might be interested in

Posted by

Harish

TA Specialist at Sigmasoft Infotech Private Limited

Last Active: 23 Sep 2025

Job Views:
126

Applications: 77

Recruiter Actions: 0

Posted in

DevOps / SRE

Functional Area

Site Reliability Engineering

Job Code

1548144

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers