We are looking for a highly skilled and motivated Senior DevOps/Site Reliability Engineer to join our growing infrastructure and platform engineering team. The ideal candidate will be responsible for designing, implementing, and managing scalable, secure, and highly available cloud-native infrastructure across AWS or GCP platforms. This role will also focus on automation, observability, container orchestration, and incident management to ensure production reliability and performance.

We are seeking a highly experienced and driven Senior DevOps / SRE Engineer to join our engineering team. This role will focus on enabling infrastructure automation, ensuring high system reliability, and driving performance optimization across cloud-native environments. You will play a key role in managing production systems and implementing DevOps best practices using modern cloud and container orchestration technologies.

Key Responsibilities :

- Design, implement, and manage scalable, secure, and resilient cloud infrastructure using AWS or GCP.

- Deploy and maintain containerized applications using Docker and Kubernetes.

- Automate infrastructure provisioning and configuration management using tools like Terraform, CloudFormation, or Ansible.

- Develop and manage CI/CD pipelines for continuous integration, delivery, and deployment using tools such as Jenkins, GitLab CI, or similar.

- Monitor system health, application performance, and availability using tools like Grafana, Prometheus, Dynatrace, etc.

- Troubleshoot production issues, perform root cause analysis, and implement long-term solutions to improve system reliability.

- Write custom scripts and automation tools using Python, Java, or other relevant languages.

- Collaborate with cross-functional teams including developers, architects, QA, and security to integrate DevOps practices into the SDLC.

- Implement SRE practices, define and monitor SLIs/SLOs, and support incident response and post-mortem processes.

- Ensure cloud environments meet security, compliance, and governance standards.

Required Skills & Qualifications :

- 5+ years of hands-on experience in DevOps or Site Reliability Engineering roles.

- Strong expertise in public cloud platforms AWS and/or GCP.

- Proven experience with Docker, Kubernetes, and managing containerized microservices at scale.

- Proficient in at least one programming language: Java or Python.

- Solid understanding of Linux systems, networking, and system security.

- Experience with monitoring, logging, and alerting tools such as Grafana, Prometheus, Dynatrace, etc.

- Hands-on experience with CI/CD pipelines and tools like Jenkins or GitLab CI.

- Familiarity with version control systems (Git) and GitOps workflows.

Preferred Skills :

- Experience with incident management and SRE best practices.

- Familiarity with log aggregation and analysis tools such as ELK, Fluentd, or Splunk.

- Exposure to database operations (PostgreSQL, MySQL, or NoSQL systems).

- Experience working in Agile/Scrum teams.

Did you find something suspicious?

Posted By

Snehil

Technical Recruiter at Ampstek

Last Active: 31 Oct 2025

Job Views:
85

Applications: 78

Recruiter Actions: 0

Posted in

DevOps / SRE

Functional Area

Site Reliability Engineering

Job Code

1522423

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers