This includes CI/CD systems, Kubernetes clusters, infrastructure automation, and telemetry platforms.
You will work closely with development, QA, and operations teams to build resilient systems and ensure continuous improvement of reliability standards.

Key Responsibilities :

- Own and manage DevOps components and tooling across 100+ production environments.

- Administer, scale, and optimize Kubernetes clusters used for application and infrastructure workloads.

- Implement and maintain observability stacks including Prometheus, OpenTelemetry (OTel), Elasticsearch, and ClickHouse for metrics, tracing, and log analytics.

- Ensure high availability of CI/CD pipelines and automate infrastructure provisioning using Terraform and Ansible.

- Build alerting, monitoring, and dashboarding systems to proactively detect and resolve issues.

- Lead root cause analysis for incidents and drive long-term stability improvements.

- Collaborate with engineering teams to design systems that are reliable, secure, and observable by default.

- Participate in on-call rotations and lead incident response efforts when necessary.

- Advice the cloud platform team to improve the reliability of the systems in production and scale them based on need.

- Participate in the development process by supporting new features, services, releases and hold an ownership mindset for the cloud platform technologies .

- Expertise in one of the programming language: Java, Python or Go.

- Proficient in writing bash scripts.

- Good understanding of SQL and NoSQL systems.

- Good understanding of systems programming (network stack, file system, OS services) .

- Should have good handson on Ansible .

- Should be able to automate Day to day activities .

Required Skills & Experience :

- 5+ years of experience in SRE, DevOps, or Infrastructure Engineering roles.

- Expertise in Kubernetes: deployment, scaling, troubleshooting, and operations in production.

- Strong Linux systems background and scripting skills (Python, Bash, or Go).

- Hands-on experience with CI/CD tools such as Jenkins, GitLab CI, or similar.

- Infrastructure-as-Code skills with tools like Terraform, Ansible, or equivalent.

- Solid knowledge of observability tools, including:

- Prometheus for monitoring and alerting

- OpenTelemetry (OTel) for tracing and telemetry

- Elasticsearch and ClickHouse for log storage and analytics

- Appdynamics

- Experience with containerization (Docker) and orchestration at scale.

- Familiarity with cloud platforms (AWS, GCP, or Azure) and hybrid-cloud architecture.

- Ability to debug and tune system performance under production load

Did you find something suspicious?

Posted By

Recruiter

HR at QUALYS SECURITY TECHSERVICES PRIVATE LIMITED

Last Active: 30 Oct 2025

Job Views:
91

Applications: 90

Recruiter Actions: 0

Posted in

DevOps / SRE

Functional Area

DevOps / Cloud

Job Code

1541958

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers