Posted on: 09/09/2025
Senior Cloud & ML Infrastructure Engineer
Location : Hyderabad Pune
Experience : 6+ Years
NOTE : Immediate Joiner
About the Role :
Were looking for a Senior Cloud & ML Infrastructure Engineer to lead the design, scaling, and optimization of cloud-native machine learning infrastructure. This role is ideal for someone passionate about solving complex platform engineering challenges across AWS, with a focus on model orchestration, deployment automation, and production-grade reliability. Youll architect ML systems at scale, provide guidance on infrastructure best practices, and work cross-functionally to bridge DevOps, ML, and backend teams.
Key Responsibilities :
- Architect and manage end-to-end ML infrastructure using SageMaker, AWS Step Functions, Lambda, and ECR
- Design and implement multi-region, highly-available AWS solutions for real-time inference and batch processing
- Create and manage IaC blueprints for reproducible infrastructure using AWS CDK
- Establish CI/CD practices for ML model packaging, validation, and drift monitoring
- Oversee infrastructure security, including IAM policies, encryption at rest/in-transit, and compliance standards
- Monitor and optimize compute/storage cost, ensuring efficient resource usage at scale
- Collaborate on data lake and analytics integration
- Serve as a technical mentor and guide AWS adoption patterns across engineering teams
Required Skills :
- 6+ years designing and deploying cloud infrastructure on AWS at scale
- Proven experience building and maintaining ML pipelines with services like SageMaker, ECS/EKS, or custom Docker pipelines
- Strong knowledge of networking, IAM, VPCs, and security best practices in AWS
- Deep experience with automation frameworks, IaC tools, and CI/CD strategies
- Advanced scripting proficiency in Python, Go, or Bash
- Familiarity with observability stacks (CloudWatch, Prometheus, Grafana)
Nice to Have :
- Background in robotics infrastructure, including AWS IoT Core, Greengrass, or OTA
deployments
- Experience designing systems for physical robot fleet telemetry, diagnostics, and control
- Familiarity with multi-stage production environments and robotic software rollout
processes
- Competence in frontend hosting for dashboard or API visualization
- Involvement with real-time streaming, MQTT, or edge inference workflows
- Hands-on experience with ROS 2 (Robot Operating System) or similar robotics frameworks, including launch file management, sensor data pipelines, and deployment to embedded Linux devices
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
ML / DL Engineering
Job Code
1543494
Interview Questions for you
View All