- Design and manage cloud-native ML platforms supporting training, inference, and model lifecycle automation.

- Build ML/ETL pipelines using Apache Airflow / AWS MWAA and distributed data workflows using Apache Spark (EMR/Glue).

- Containerize and deploy ML workloads using Docker, EKS, ECS/Fargate, and Lambda.

- Develop CI/CT/CD pipelines integrating model validation, automated training, testing, and deployment.

- Implement ML observability : model drift, data drift, performance monitoring, and alerting using CloudWatch, Grafana, Prometheus.

- Ensure data governance, versioning, metadata tracking, reproducibility, and secure data pipelines.

- Collaborate with data scientists to productionize notebooks, experiments, and model deployments.

Ideal Candidate :

- 8+ years in MLOps/DevOps with strong ML pipeline experience.

Strong hands-on experience with AWS :

1. Compute/Orchestration : EKS, ECS, EC2, Lambda

2. Data : EMR, Glue, S3, Redshift, RDS, Athena, Kinesis

3. Workflow : MWAA/Airflow, Step Functions

4. Monitoring : CloudWatch, OpenSearch, Grafana

- Strong Python skills and familiarity with ML frameworks (TensorFlow/PyTorch/Scikit-learn).

- Expertise with Docker, Kubernetes, Git, CI/CD tools (GitHub Actions/Jenkins).

- Strong Linux, scripting, and troubleshooting skills.

- Experience enabling reproducible ML environments using Jupyter Hub and containerized development workflows.

Education :

- Masters degree in Computer Science, Machine Learning, Data Engineering, or related field.