- Job Title : Site Reliability Engineer (AI Ops / ML Ops).
- Work Mode : Onsite.
- Our Department : Trimble's Construction Management Solutions (CMS) division is dedicated to transforming the construction industry.
- We provide technology solutions that streamline and optimize workflows for preconstruction, project management, and field operations.
- By connecting the physical and digital worlds, we help our customers improve productivity, efficiency, and project outcomes.
- Are you passionate about deploying, monitoring, and scaling machine learning systems in production environments and eager to contribute to robust AI infrastructure within a collaborative team?.
What You Will Do :
- This role offers an exciting opportunity to work in AI/ML Development and Operations (DevOps) engineering, working within a dynamic team that values reliability and continuous improvement.
- The successful candidate will contribute to the deployment and maintenance of AI/ML systems in production, gaining hands-on experience with MLOps best practices and infrastructure automation.
- This position provides a structured environment for developing core competencies in ML system operations, DevOps practices, and production ML monitoring, with direct guidance from experienced professionals.
- Assist in the deployment and maintenance of machine learning models in production environments under direct supervision, learning containerization technologies like Docker and Kubernetes.
- Support CI/CD pipeline development for ML workflows, including model versioning, automated testing, and deployment processes using tools like Azure DevOps.
- Monitor ML model performance, data drift, and system health in production environments, implementing basic alerting and logging solutions.
- Contribute to infrastructure automation and configuration management for ML systems, learning Infrastructure as Code (IaC) practices with tools like Terraform or CloudFormation.
- Collaborate with ML engineers and data scientists to operationalize models, ensuring scalability, reliability, and adherence to established MLOps procedures and best practices.
Required :
What Skills & Experience You Should Bring :
- 1 to 2 years of professional experience in in DevOps, MLOps, or systems engineering environment.
- Bachelor's degree in Computer Science, Engineering, Information Technology, or a closely related technical field.
- Trimble's Professional ladder typically requires four or more years of formal education.
- Experience with Microsoft Azure and its services including ML/AI (Azure ML, Azure DevOps, etc.) Must Have.
- Foundational knowledge of DevOps principles and practices, with understanding of CI/CD concepts and basic system administration.
- Proficiency with Python or other scripting languages (Shell / Bash / PowerShell / Perl) for automation scripting and system integration.
- Understanding of containerization technologies (Docker) and basic orchestration concepts (Kubernetes fundamentals).
- Familiarity with version control systems (Git) and collaborative development workflows.
- Basic understanding of machine learning concepts and the ML model lifecycle from development to production.
Preferred :
- Familiarity with MLOps tools and frameworks (MLflow, Kubeflow, DVC, or similar).
- Basic experience with monitoring and observability tools (Prometheus, Grafana, ELK stack).
- Understanding of Infrastructure as Code (IaC) tools like Terraform or Ansible.
- Experience with Windows/Linux system administration and command-line tools.
- Knowledge of database systems and data pipeline technologies.
- Exposure to model serving frameworks (TensorFlow Serving, TorchServe, ONNX Runtime).
- Basic understanding of security best practices for ML systems and data governance.
About Our Division : Construction Management Solutions (CMS) : - Trimble's Construction Management Solutions (CMS) division is dedicated to transforming the construction industry.
- We provide technology solutions that streamline and optimize workflows for preconstruction, project management, and field operations.
- By connecting the physical and digital worlds, we help our customers improve productivity, efficiency, and project outcomes.