Posted on: 31/01/2026
Job Title : Cloud DevOps Engineer
Loc : Hyderabad & Bangalore (WFO)
Experience : 7+y Mandatory
Job Description :
Role : Cloud DevOps Engineer (AWS + EKS)
Exp : 7+ years
Candidate should have deep technical expertise in Kubernetes (EKS), ArgoCD, AWS services, CI/CD automation, observability platforms, and container build pipelines.
This role involves managing DevOps processes for microservice development teams, ensuring reliability, scalability, and operational excellence across multiple environments.
Qualifications :
- 7+ years of experience as a Cloud / DevOps Engineer with strong hands-on expertise in AWS and Kubernetes (EKS).
- Proven experience implementing CI/CD and GitOps pipelines using ArgoCD for microservices-based applications.
- Strong knowledge of AWS services including VPC, IAM, ALB/NLB, ECR, S3, and container orchestration best practices.
- Hands-on experience with container build and deployment pipelines, including Dockerless builds using Kaniko.
- Expertise in observability and monitoring tools such as Dynatrace, Prometheus, and Grafana.
- Solid understanding of microservices architecture, scalability, reliability, and operational excellence in production environments
Responsibilities :
Kubernetes & Cloud Infrastructure :
- Architect, deploy, and manage scalable workloads on Amazon EKS.
- Implement and maintain Kubernetes components including Ingress Controllers, Services, Deployments, Stateful Sets, ConfigMaps, and Secrets.
- Design multi-environment deployment topologies and cluster governance.
CI/CD & GitOps :
- Build and manage automated deployment pipelines using ArgoCD (GitOps).
- Drive GitOps best practices across microservice teams.
- Implement container image build and optimization using Kaniko (Dockerless builds).
AWS Cloud Services :
- Manage AWS services related to compute, networking, and storage for DevOps workloads.
- Use VPC, IAM, ALB/NLB, ECR, S3, DynamoDB, Lambda (if required).
Observability & Monitoring :
- Integrate and manage observability tools :
1. Dynatrace Application Performance Monitoring
2. Prometheus & Grafana Cluster and application metrics
- Create actionable dashboards and alerting rules.
Operational Activity :
- Manage workload allocation.
- Implement standard operating procedures for deployments, incident response, and microservice management.
- Work with developers to assist in microservices onboarding, migration, and environment setup.
- Improve system scalability, availability, and performance.
Important interview Question :
Kubernetes / EKS :
- How do you design a production-grade EKS cluster for multi-environment microservices?
- What strategies do you use for zero-downtime deployments in Kubernetes?
- Explain how Ingress Controllers work.
- How would you scale Kubernetes workloads automatically based on custom metrics?
- Describe best practices for Kubernetes multi-tenancy and namespace governance.
GitOps / ArgoCD :
- What is the difference between ArgoCD pull-based deployment vs traditional CI/CD push-based deployment?
- How do you handle ArgoCD sync waves, health checks, and dependency ordering?
- How do you implement security in ArgoCD (RBAC, SSO, repo access, secret handling)?
- How do you troubleshoot ArgoCD applications stuck in OutOfSync or Progressing states?
AWS & Networking :
- How do you design a secure VPC architecture for EKS (public/private subnets, NAT, routing)?
- Explain the networking flow between AWS Load Balancers and Kubernetes pods.
- What are best practices for IAM permissions in a DevOps/Kubernetes environment?
CI/CD & Containers :
- Why would you use Kaniko instead of Docker for image builds?
- How do you secure a container supply chain pipeline end-to-end?
- How do you handle multi-environment deployments in a GitOps workflow?
Observability :
- What is the difference between infrastructure metrics and application metrics?
- How do you design an alerting strategy that avoids noise and focuses on SLO/SLA?
- How does Dynatraces OneAgent work inside a Kubernetes cluster?
- How do you standardize microservices onboarding in a large organization?
- Describe a major production incident you handled and the DevOps processes you improved afterward.
- Along with Above Python Script is mandatory.
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1608149