Artificial Intelligence

Machine Learning

NLP

Security Architect - AI

Cloud Architect - ML/AI

Emerging Technologies

DevOps / SRE

CyberSecurity

Quality Assurance

Platform Engineering / SAP/Oracle

Site Reliability Engineer - Docker/Kubernetes

Tekgence India Private Limited

Pune

7 - 8 Years

41+ Reviews

Kubernetes Docker Datadog Prometheus DynaTrace Ansible Terraform AWS Azure Google Cloud Platform Site Reliability

Posted on: 27/11/2025

Job Description

Position Overview :

We are seeking a highly skilled Site Reliability Engineer (SRE) with 7 years of experience to join our dynamic team. The ideal candidate will have extensive expertise in production support, Python/Shell scripting, Kubernetes, Docker, and SRE monitoring tools such as Datadog, Prometheus, and Dynatrace. This role focuses on ensuring the reliability, scalability, and performance of our systems while supporting critical production environments.

Key Responsibilities :

- Provide production support for mission-critical applications, ensuring high availability and rapid issue resolution.

- Develop and maintain automation scripts using Python and Shell scripting to streamline operations and improve system efficiency.

- Manage and deploy containerized applications using Kubernetes and Docker, ensuring seamless orchestration and scalability.

- Implement and manage SRE monitoring tools (Datadog, Prometheus, Dynatrace) to proactively monitor system health, performance, and incidents.

- Collaborate with development and operations teams to design and implement reliable, scalable infrastructure.

- Perform root cause analysis (RCA) for production incidents and implement preventive measures.

- Optimize system performance, reduce latency, and improve fault tolerance.

- Contribute to on-call rotation for 24/7 production support.

Required Skills :

- Experience : Minimum 7 years of relevant experience in Site Reliability Engineering, DevOps, or production support roles.

- Proven expertise in production support, including incident management, troubleshooting,

and resolution in high-availability environments.

- Strong programming skills in Python and Shell scripting for automation and tooling.

- Hands-on experience with Kubernetes for container orchestration and Docker for

containerization.

- Proficiency in SRE monitoring tools such as Datadog, Prometheus, and Dynatrace for

observability and performance monitoring.

- Solid understanding of cloud infrastructure (AWS, Azure, or GCP) and CI/CD pipelines.

- Excellent problem-solving skills and ability to work under pressure in fast-paced

environments.

- Strong communication skills and ability to collaborate with cross-functional teams.

Preferred Qualifications :

- Experience with Infrastructure as Code (IaC) tools like Terraform or Ansible.

- Familiarity with additional monitoring tools or log management platforms (e.g., ELK Stack,

Splunk).

- Certifications in Kubernetes (CKA/CKAD), cloud platforms, or SRE practices.

Did you find something suspicious?

Posted By

Mani Kanta

RPO at Tekgence India Private Limited

Last Active: 28 Nov 2025

Job Views:
163

Applications: 42

Recruiter Actions: 6

Posted in

DevOps / SRE

Functional Area

Site Reliability Engineering

Job Code

1581631

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers