Posted on: 21/01/2026
Description :
We are looking for a seasoned Staff MLOps Engineer to lead the design, implementation, and scaling of enterprise-grade machine learning platforms on AWS .
This role will focus on building reliable, secure, and cost-efficient MLOps systems that enable data scientists and engineers to deploy, monitor, and manage ML models in production.
As a Staff Engineer, you will provide technical leadership, define best practices, and drive cross-team alignment on ML platform architecture.
Duties & Responsibilities :
Key Responsibilities :
MLOps Platform & Architecture :
- Architect and own scalable MLOps platforms on AWS supporting model training, deployment, monitoring, and governance.
- Design and maintain end-to-end ML CI/CD pipelines, including data validation, model training, testing, approval, and deployment.
- Establish standards for model lifecycle management, experiment tracking, versioning, reproducibility, and rollback.
Model Deployment & Monitoring :
- Enable real-time, batch, and asynchronous model inference using AWS-native and container-based solutions.
- Implement monitoring for model performance, data drift, concept drift, and operational metrics.
- Ensure high availability, fault tolerance, and observability for production ML systems.
AWS Cloud & Infrastructure :
- Lead design and implementation using AWS services, including but not limited to :
1. Amazon SageMaker (training, hosting, pipelines, feature store).
2. EKS, ECS, EC2, Lambda for model serving and orchestration.
3. S3, Glue, Athena, Redshift for data storage and analytics.
4. CloudWatch, X-Ray for logging and monitoring.
- Implement Infrastructure as Code ( IaC ) using Terraform or AWS CloudFormation.
- Optimize ML workloads for cost, performance, and scalability, including GPU/spot instance strategies.
DevOps, Security & Compliance :
- Build and maintain CI/CD pipelines using tools such as GitHub Actions, GitLab CI, Jenkins, or AWS CodePipeline .
- Enforce security best practices (IAM, VPC, encryption, secrets management).
- Support compliance, auditability, and governance requirements for ML systems.
Technical Leadership & Collaboration :
- Serve as a Staff-level technical leader, influencing MLOps architecture across multiple teams.
- Mentor engineers and data scientists on production ML best practices.
- Partner with Data Science, Data Engineering, Platform, and Product teams to align ML solutions with business goals.
- Contribute to the long-term ML platform roadmap and strategy.
Skills Required :
- 11 - 13 years of overall experience, with 5+ years in MLOps , ML Platform, or ML Infrastructure roles.
- Strong experience deploying and operating machine learning models in production on AWS.
- Proficiency in Python and experience with ML frameworks such as TensorFlow, PyTorch , Scikit-learn.
- Deep hands-on experience with Docker and Kubernetes (EKS).
- Strong understanding of Amazon SageMaker and its ecosystem.
- Experience with CI/CD systems and Git-based workflows.
- Solid background in distributed systems, system design, and cloud architecture.
Preferred / Nice-to-Have Skills :
- Experience with SageMaker Feature Store, Pipelines, Model Registry, or MLflow.
- Exposure to LLMOps/GenAI on AWS (Bedrock, custom LLM deployment, vector databases like OpenSearch, Pinecone).
- Experience with streaming and real-time pipelines (Kafka, Kinesis, Spark).
- Experience in regulated or high-scale environments (finance, healthcare, retail, etc.
- AWS certifications (Solutions Architect, Machine Learning Specialty) are a plus.
Soft Skills :
- Strong ownership and decision-making ability at a Staff level.
- Excellent communication skills across engineering, data science, and leadership teams.
- Ability to balance short-term delivery with long-term platform vision.
- Passion for building reliable, scalable, and maintainable ML systems.
Qualifications Required :
- Bachelors degree from four-year college or university, or equivalent combination of education and experience.
- 11 - 13 years of overall experience, with 5+ years in MLOps , ML Platform, or ML Infrastructure roles.
About Symplr :
- As a leader in healthcare operations solutions, we empower healthcare organizations to navigate the complexities of integrating critical business operations.
- Our customers are at the heart of everything we do, and they rely on our mission-critical systems to drive better operations and better outcomes.
- We are a remote-first company with employees working across the United States, India, and the Netherlands.
- Guided by values, we focus on teamwork, championing our customers, being rooted in action and outcomes, overcoming challenges, and leading through equality and integrity.
Did you find something suspicious?