Posted on: 26/11/2025
Description :
Role Overview :
We are looking for a seasoned Cloud Infrastructure Engineer who will design, build, and maintain scalable, reliable, and secure cloud-native infrastructure. This role will combine automation, Infrastructure as Code (IaC), GitOps, and container orchestration to drive our cloud platform to the next level.
You will work closely with development teams, SREs, and platform engineers to provision, deploy, and manage infrastructure, ensuring stability and performance. Your expertise in Python scripting, Terraform, Argo CD, Helm, Kubernetes, and a public cloud (AWS, Azure, or GCP) will be central to delivering automated, resilient systems.
Key Responsibilities :
Infrastructure Automation & IaC :
- Design and maintain infrastructure using Terraform.
- Build modular, versioned Terraform modules for reusability and scalability.
- Manage Terraform state securely and implement best practices.
GitOps & Continuous Delivery :
- Implement continuous delivery strategies using Argo CD.
- Manage Kubernetes manifests and Helm charts for application deployments.
- Collaborate with engineering teams to create declarative deployment workflows.
Kubernetes Operations :
- Provision, operate, and scale Kubernetes clusters in production.
- Manage namespaces, RBAC, networking, Ingress, and policies.
- Use Helm to package, deploy, and manage applications on Kubernetes.
Scripting & Tooling :
- Develop automation and tooling in Python for infrastructure management, recovery, self-service, and operational tasks.
- Automate repetitive workflows and foster infrastructure self-service.
Cloud Infrastructure Management :
- Architect, deploy, and maintain infrastructure on AWS, Azure, or GCP.
- Handle networking, identity & access management, load balancing, storage, and security.
- Focus on high availability, fault tolerance, cost optimization, and disaster recovery.
Reliability & Observability :
- Define and measure reliability (SLIs/SLOs), monitor infrastructure health, and create dashboards.
- Build and maintain monitoring, logging, and alerting systems.
- Participate in incident response, root cause analysis, and post-mortems.
Security & Compliance :
- Apply cloud-native security best practices : RBAC, secrets management, least-privilege access.
- Work with the security team to ensure compliance and enforce infrastructure policies.
Collaboration & Documentation :
- Write design documents, runbooks, and playbooks.
- Collaborate with developers, product teams, and other platform engineers as a subject matter expert.
- Mentor and guide junior engineers on infrastructure practices.
Required Skills & Experience :
- 6- 10 years in cloud infrastructure, DevOps, SRE, or platform engineering roles.
- Strong Python scripting skills for automation and infrastructure tasks.
- Proven experience with Terraform (module design, state management).
- Hands-on experience with Argo CD or similar GitOps tools.
- Deep knowledge of Kubernetes and Helm chart management.
- Experience with CI/CD pipelines and source control (Git).
- Experience working on AWS, Azure, or GCP cloud platforms.
- Solid understanding of Linux systems, networking, and security.
- Understanding of monitoring and observability tools (metrics, logs, alerts).
- Familiarity with reliability engineering concepts (SLIs, SLOs, error budgets).
Preferred / Nice-to-Have :
- Experience with service mesh (e.g., Istio, Linkerd).
- Knowledge of progressive delivery (canary, blue/green).
- Experience with multi-cloud or hybrid-cloud environments.
- Certifications : CKA, Terraform Certified, AWS/GCP/Azure Architect.
- Experience with policy-as-code (e.g., OPA).
- Familiarity with container security, vulnerability scanning, and secrets management
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1580499
Interview Questions for you
View All