Posted on: 21/11/2025
Description :
About the Role
We are seeking a DevOps Architect who will design, implement, and optimize the cloud-native infrastructure powering our SaaS platform.
This role is highly technical and requires deep expertise in distributed systems, microservices, CI/CD pipelines, system reliability, infra automation, and high-availability architectures.
You will work closely with backend engineers, SREs, and data/ML teams to ensure a stable, scalable, and secure production environment.
Key Responsibilities :
- Architect end-to-end DevOps pipelines, cloud environments, and automation frameworks.
- Build, maintain, and scale microservices infrastructure on AWS with Kubernetes, service mesh, and GitOps.
- Create CI/CD pipelines from scratch with auto-scaling, blue/green, and canary strategy support.
- Implement observability frameworks including metrics, logs, traces, and automated alerting.
- Solve complex infra/code/network performance bottlenecks and drive RCA for critical incidents.
- Orchestrate real-time data pipelines, ML pipelines, vector stores, and API infrastructure.
- Create infrastructure-as-code modules (Terraform/CloudFormation/CDK).
- Ensure strong DevSecOps posture identity, access control, encryption, vulnerability scanning.
Core Technical Skills (Essential) :
- Strong hands-on experience with Node.js, Python, or Go for backend + tooling.
- Expert in Docker, Kubernetes (EKS preferred), GitOps (ArgoCD/Flux).
- Deep AWS expertise (EC2, S3, RDS, DynamoDB, EKS, Lambda, CloudWatch, IAM).
- CI/CD design with GitHub Actions, GitLab CI, Jenkins, or Argo Workflows.
- Strong debugging across infra (K8s), networks (VPC, NACL, SG), and services (APIs).
- Observability stack: Prometheus, Grafana, Loki/ELK, OpenTelemetry.
Desirable Technical Skills :
- Experience with Kafka/Kinesis, event-driven systems.
- AI/ML infra exposure model deployments, embeddings, vector databases like Redis/Pinecone.
- Knowledge of RLHF, RL4LM, or similar ML lifecycle workflows.
- Experience with performance hardening and resilience testing (chaos engineering)
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1578696
Interview Questions for you
View All