HamburgerMenu
hirist

Job Description

As a Sr. Machine Learning Ops Platform Engineer, you will be responsible for building automation and leading-edge architecture around Data and AI/ML engineering on the ResMed AI platform. Specifically, you will code and help architect a production-grade, scalable platform to be used by dozens of data scientists. You will help define and ensure best coding and CI/CD practices within a team of excellent and engaged engineers. You will be given creative freedom and will work in a supportive team environment. This is a hands-on role that involves coding and regular interaction with business stakeholders.


Let's talk about responsibilities :


- Ensure AI platforms are reliable, scalable, and resilient by establishing foundational blueprints for upgrade and release strategies, implementing comprehensive logging, monitoring, and metrics, and automating critical system management tasks.


- Work with Generative AI development, including embeddings and fine-tuning of generative models.


- Build and maintain systems using DevOps, LLMOps, and AIOps practices (Kubernetes, Docker), AWS, Python, and Terraform.


- Push the boundaries of whats possible with AI by thinking beyond current technology and stack constraints, and collaboratively delivering innovative yet practical solutions.


- Participate in and set up Proofs of Concept (POCs) to demonstrate proposed solutions.


- Enable team members through training, culture-building, and mentorship.


- Identify, design, and implement internal process improvements : automate manual processes, re-architect infrastructure for greater scalability, etc.


- Build infrastructure for the AWS platform, including Lambdas, ECS, EC2, SNS/SQS, Bedrock, ML pipeline engineering, data monitoring, alerting, and networking.


- Implement observability stacks such as Prometheus, Loki, VictoriaLogs, Grafana, and Datadog.


- Design, build, and support Data & ML model pipelines using the latest CI/CD and deployment technologies.


- Collaborate with stakeholders including Executive, Product, Data, and Design teams to resolve technical issues and support infrastructure needs.


- Participate in code reviews and process improvements.


Let's talk about qualifications and experience :


- 7+ years of experience in a complex, technical environment. Proven experience developing production-grade code in Python, SQL, and Pandas.


- Deep expertise in Kubernetes as a foundational platform for deploying, scaling, and managing AI/ML workloads in production.


- Experience with 3 or more of the following AWS tools : Lambda, EC2, EMR, S3, Glue, Athena, RDS, Networking, IAM, Batch Processing, SageMaker, Airflow.


- Proficiency in Terraform and common DevOps/DevSecOps tools and techniques such as Docker, GitHub, GitHub Actions, SonarQube, Checkmarx, and JFrog.


- Experience with Kubeflow and MLflow is a strong advantage.


- Skilled in creating and managing CI/CD pipelines and APIs tailored for AI/ML workloads.


- Experience with both relational (SQL) and NoSQL databases; Snowflake experience is a plus.


- Solid understanding of the OAuth 2.0 protocol for secure authorization.


- Hands-on experience implementing A/B testing for models and AI applications.


- Familiarity with AI governance, observability, and compliance practices across AI/ML workflows.


- Experience working with LLMs, AI agents, and multi-agent coordination platforms (MCP); strong hands-on exposure to AI platform engineering.


- Exposure to LLM orchestration frameworks such as Flowise, Langflow, and LangGraph is a plus.


- Familiarity with AgentOps tools and methodologies is a strong advantage.


- Demonstrated ability to work with AI/ML teams and cross-functional groups in fast-paced, dynamic environments.


All listed duties, requirements and responsibilities are considered as essential functions to this position; however, business conditions may require reasonable accommodation for added tasks and responsibilities.


Lets talk about what you can expect :


- A supportive environment that focuses on people development and best implementation.


- Opportunity to design, influence, and be innovative.


- Work with inclusive global teams and the open sharing of new ideas. We want your ideas!


- Be supported both inside and outside of the work environment.


- The opportunity to build something meaningful and see a direct positive impact on peoples lives! Dream big, iterate and experiment to drive innovation!!


info-icon

Did you find something suspicious?