Senior Cloud & ML Infrastructure Engineer

Location : Bangalore / Bengaluru, Hyderabad, Pune, Mumbai, Mohali, Panchkula, Delhi

Experience : 6-10+ Years

Night Shift : 9 pm to 6 am

About the Role :

Were looking for a Senior Cloud & ML Infrastructure Engineer to lead the design, scaling, and optimization of cloud-native machine learning infrastructure. This role is ideal for someone passionate about solving complex platform engineering challenges across AWS, with a focus on model orchestration, deployment automation, and production-grade reliability. Youll architect ML systems at scale, provide guidance on infrastructure best practices, and work cross-functionally to bridge DevOps, ML, and backend teams.

Key Responsibilities :

- Architect and manage end-to-end ML infrastructure using SageMaker, AWS Step Functions, Lambda, and ECR

- Design and implement multi-region, highly-available AWS solutions for real-time inference and batch processing

- Create and manage IaC blueprints for reproducible infrastructure using AWS CDK

- Establish CI/CD practices for ML model packaging, validation, and drift monitoring

- Oversee infrastructure security, including IAM policies, encryption at rest/in-transit, and compliance standards

- Monitor and optimize compute/storage cost, ensuring efficient resource usage at scale

- Collaborate on data lake and analytics integration

- Serve as a technical mentor and guide AWS adoption patterns across engineering teams

Required Skills :

- 6+ years designing and deploying cloud infrastructure on AWS at scale

- Proven experience building and maintaining ML pipelines with services like SageMaker, ECS/EKS, or custom Docker pipelines

- Strong knowledge of networking, IAM, VPCs, and security best practices in AWS

- Deep experience with automation frameworks, IaC tools, and CI/CD strategies

- Advanced scripting proficiency in Python, Go, or Bash

- Familiarity with observability stacks (CloudWatch, Prometheus, Grafana)

Nice to Have :

- Background in robotics infrastructure, including AWS IoT Core, Greengrass, or OTA deployments

- Experience designing systems for physical robot fleet telemetry, diagnostics, and control

- Familiarity with multi-stage production environments and robotic software rollout processes

- Competence in frontend hosting for dashboard or API visualization

- Involvement with real-time streaming, MQTT, or edge inference workflows

- Hands-on experience with ROS 2 (Robot Operating System) or similar robotics frameworks, including launch file management, sensor data pipelines, and deployment to embedded Linux devices