HamburgerMenu
hirist

MLOps Engineer

Alchemy
Multiple Locations
5 - 10 Years

Posted on: 13/12/2025

Job Description

Description :


We are seeking an experienced MLOps Engineer with strong Databricks expertise to build, scale, and operationalize our machine learning ecosystem.


This role demands deep hands-on experience with MLFlow, CI/CD automation, Databricks administration, and workflow orchestration, ensuring production-grade reliability, governance, and performance of ML solutions.


The ideal candidate combines software engineering rigor with a strong understanding of machine learning lifecycle management, distributed compute, and cloud-native MLOps architectures.


Key Responsibilities :


MLOps & MLFlow Engineering :


- Collaborate closely with data scientists to productionize ML models, ensuring reproducibility and reliability.


- Build, automate, and maintain MLFlow pipelines for experiment tracking, model versioning, model registry, and deployment.


- Implement MLFlow tracking servers, model artifact repositories, and serving endpoints within Databricks.


- Manage model promotion processes across Dev, QA, and Production environments with strong governance and validation controls.


- Establish best practices for feature engineering consistency, model lineage, and reproducibility.


CI/CD & Automation :


- Design and maintain CI/CD pipelines for Databricks notebooks, workflows, MLFlow models, and data pipelines.


- Integrate build/deploy pipelines with tools such as Azure DevOps, GitHub Actions, or Jenkins.


- Enforce automated testing, linting, quality checks, and incremental deployment strategies.


- Ensure seamless code integration, versioning, and deployment across multiple environments.


Databricks Platform Administration :


- Manage Databricks clusters, pools, jobs, permissions, and workspace configurations.


- Optimize compute usage, cluster policies, and job execution cost-efficiency.


- Implement security, role-based access, token management, and workspace isolation practices.


- Collaborate with cloud teams for VNet integration, networking, and infrastructure hardening.


Notebook & Pipeline Development :


- Develop modular and reusable Databricks notebooks for data ingestion, processing, quality checks, and model training.


- Implement scalable pipeline patterns using PySpark, SQL, Delta Lake, and MLFlow.


- Enforce best practices for coding standards, exception handling, and logging.


Databricks Workflows & Orchestration :


- Design and manage Databricks Workflows for end-to-end orchestration of notebooks, DLT (Delta Live Tables), and ML pipelines.


- Implement workflow dependencies, retry logic, alerting, and SLA monitoring.


- Automate ML model deployment workflows, including batch scoring, streaming inference, and scheduled retraining.


Unity Catalog & Data Governance :


- Implement Unity Catalog governance for data, models, and notebooks.


- Configure access controls, catalogs, schemas, and lineage tracking across the platform.


- Ensure compliance with data security, audit, and organizational governance standards.


- Implement scalable permission models aligned with enterprise policies.


Collaboration, Documentation & Best Practices :


- Work closely with data engineers, data scientists, business teams, and cloud engineers.


- Create detailed documentation for workflows, operational runbooks, CI/CD pipelines, and platform configurations.


- Establish standards for version control, repository structures, environment management, and ML lifecycle processes.


Required Skills & Qualifications :


- 5+ years of experience in MLOps, ML platforms, or data engineering roles.


- Strong expertise in Databricks, MLFlow, Delta Lake, Databricks Workflows, and Unity Catalog.


- Hands-on experience deploying ML models in production using MLFlow or equivalent.


- Strong Python, PySpark, SQL, and distributed data processing experience.


- Deep understanding of CI/CD tools (Azure DevOps, GitHub Actions, Jenkins).


- Working knowledge of cloud platforms (Azure/AWS/GCP) and infrastructure fundamentals.


- Experience with monitoring, logging, and observability tools.


- Ability to troubleshoot complex ML pipelines and distributed computing issues


info-icon

Did you find something suspicious?