Posted on: 10/09/2025
Key Responsibilities :
- Architect, deploy, and manage scalable ML infrastructure on cloud platforms (AWS, GCP, or Azure).
- Design and maintain end-to-end ML pipelines for training, testing, and deploying models.
- Work with Kubernetes, Docker, and CI/CD to automate ML workflows and deployments.
- Collaborate with data scientists to optimize model training and inference performance.
- Implement monitoring, logging, and alerting systems for ML applications in production.
- Ensure data security, compliance, and cost optimization in cloud environments.
- Integrate distributed computing frameworks (Spark, Ray, Dask, etc.) for large-scale data processing.
- Research and adopt best practices for MLOps and cutting-edge ML infrastructure technologies.
Requirements :
- Bachelors/Masters degree in Computer Science, Engineering, or related field.
- 7+ years of experience in cloud engineering, DevOps, or ML infrastructure roles.
- Strong expertise in cloud platforms (AWS, GCP, Azure) including compute, storage, and networking.
- Hands-on experience with Kubernetes, Docker, and Terraform for infrastructure automation.
- Solid knowledge of ML frameworks (TensorFlow, PyTorch) and MLOps tools (Kubeflow, MLflow, SageMaker, Vertex AI, etc.).
- Strong programming skills in Python, Go, or Java.
- Experience with distributed training, GPU acceleration, and model deployment at scale.
- Knowledge of CI/CD pipelines, monitoring tools (Prometheus, Grafana), and logging systems.
- Strong problem-solving, communication, and leadership skills.
Did you find something suspicious?