About the Role :

We are seeking an experienced Senior DevOps Engineer / Lead DevOps Engineer with deep expertise in Kubernetes infrastructure design and implementation. In this role, you will be responsible for architecting, building, and managing enterprise-grade Kubernetes clusters from scratch, driving cloud-native infrastructure modernization initiatives, and mentoring a small team of DevOps engineers.

This is an exciting opportunity to work with cutting-edge technologies, optimize application deployment pipelines, and ensure that our cloud infrastructure is robust, secure, and highly available.

Key Responsibilities :

- Design and implement enterprise-grade Kubernetes clusters across multi-cloud environments (AWS, Azure, GCP).

- Ensure high availability, scalability, and resilience of Kubernetes clusters.

- Manage upgrades, patching, and lifecycle management of clusters and nodes.

- Implement and maintain Terraform, Helm charts, and GitOps workflows for infrastructure provisioning and management.

- Automate routine operational tasks, deployments, and scaling operations.

- Build, maintain, and optimize CI/CD pipelines using Jenkins, GitLab CI/CD, or equivalent tools.

- Collaborate with development teams to streamline deployment strategies and reduce lead time for changes.

- Implement logging, monitoring, and observability solutions (Prometheus, Grafana, ELK Stack, or equivalent) for Kubernetes workloads.

- Proactively identify bottlenecks and performance issues.

- Implement security policies including RBAC, network policies, secrets management, and container security scanning.

- Ensure compliance with organizational and regulatory security standards.

- Design and implement disaster recovery strategies for containerized applications and Kubernetes clusters.

- Maintain backup and restore procedures for critical systems.

- Lead a team of 3 to 4 DevOps engineers, providing guidance, mentorship, and technical oversight.

- Conduct technical reviews and enforce code quality and best practices.

- Facilitate knowledge sharing and maintain comprehensive documentation.

Performance Optimization & Cost Management :

- Optimize cluster performance, resource utilization, and cost-efficiency across multi-cloud deployments.

Required Skills & Qualifications :

Education :

- Bachelors or Masters degree in Computer Science, Engineering, or a related technical field (or equivalent experience).

Experience :

- 8+ years in DevOps or Site Reliability Engineering roles, with strong hands-on experience in Kubernetes.

Technical Skills :

Kubernetes :

- Cluster architecture, deployment, scaling, monitoring, and troubleshooting.

Infrastructure as Code :

- Terraform, Helm, GitOps.

Containerization :

- Docker and container best practices.

Cloud Services :

- AWS, Azure, and/or GCP.

Scripting & Automation :

- Python, Bash, or equivalent scripting languages.

CI/CD Tools :

- Jenkins, GitLab CI/CD, or equivalent.

Version Control :

- Git and branching strategies.

Observability :

- Monitoring, logging, and alerting using Prometheus, Grafana, ELK, or similar tools.

Leadership & Soft Skills :

- Proven ability to lead and mentor DevOps teams.

- Strong analytical, problem-solving, and troubleshooting skills.

- Excellent communication and collaboration with cross-functional teams.

- Ability to work in fast-paced, agile environments and drive initiatives independently.