This role demands deep hands-on experience with MLFlow, CI/CD automation, Databricks administration, and workflow orchestration, ensuring production-grade reliability, governance, and performance of ML solutions.

The ideal candidate combines software engineering rigor with a strong understanding of machine learning lifecycle management, distributed compute, and cloud-native MLOps architectures.

Key Responsibilities :

MLOps & MLFlow Engineering :

- Collaborate closely with data scientists to productionize ML models, ensuring reproducibility and reliability.

- Build, automate, and maintain MLFlow pipelines for experiment tracking, model versioning, model registry, and deployment.

- Implement MLFlow tracking servers, model artifact repositories, and serving endpoints within Databricks.

- Manage model promotion processes across Dev, QA, and Production environments with strong governance and validation controls.

- Establish best practices for feature engineering consistency, model lineage, and reproducibility.

CI/CD & Automation :

- Design and maintain CI/CD pipelines for Databricks notebooks, workflows, MLFlow models, and data pipelines.

- Integrate build/deploy pipelines with tools such as Azure DevOps, GitHub Actions, or Jenkins.

- Enforce automated testing, linting, quality checks, and incremental deployment strategies.

- Ensure seamless code integration, versioning, and deployment across multiple environments.

Databricks Platform Administration :

- Manage Databricks clusters, pools, jobs, permissions, and workspace configurations.

- Optimize compute usage, cluster policies, and job execution cost-efficiency.

- Implement security, role-based access, token management, and workspace isolation practices.

- Collaborate with cloud teams for VNet integration, networking, and infrastructure hardening.

Notebook & Pipeline Development :

- Develop modular and reusable Databricks notebooks for data ingestion, processing, quality checks, and model training.

- Implement scalable pipeline patterns using PySpark, SQL, Delta Lake, and MLFlow.

- Enforce best practices for coding standards, exception handling, and logging.

Databricks Workflows & Orchestration :

- Design and manage Databricks Workflows for end-to-end orchestration of notebooks, DLT (Delta Live Tables), and ML pipelines.

- Implement workflow dependencies, retry logic, alerting, and SLA monitoring.

- Automate ML model deployment workflows, including batch scoring, streaming inference, and scheduled retraining.

Unity Catalog & Data Governance :

- Implement Unity Catalog governance for data, models, and notebooks.

- Configure access controls, catalogs, schemas, and lineage tracking across the platform.

- Ensure compliance with data security, audit, and organizational governance standards.

- Implement scalable permission models aligned with enterprise policies.

Collaboration, Documentation & Best Practices :

- Work closely with data engineers, data scientists, business teams, and cloud engineers.

- Create detailed documentation for workflows, operational runbooks, CI/CD pipelines, and platform configurations.

- Establish standards for version control, repository structures, environment management, and ML lifecycle processes.

Required Skills & Qualifications :

- 5+ years of experience in MLOps, ML platforms, or data engineering roles.

- Strong expertise in Databricks, MLFlow, Delta Lake, Databricks Workflows, and Unity Catalog.

- Hands-on experience deploying ML models in production using MLFlow or equivalent.

- Strong Python, PySpark, SQL, and distributed data processing experience.

- Deep understanding of CI/CD tools (Azure DevOps, GitHub Actions, Jenkins).

- Working knowledge of cloud platforms (Azure/AWS/GCP) and infrastructure fundamentals.

- Experience with monitoring, logging, and observability tools.

- Ability to troubleshoot complex ML pipelines and distributed computing issues

Did you find something suspicious?

Posted by

Disha

Consultant at Alchemy

Last Active: 14 Dec 2025

Job Views:
10

Applications: 8

Recruiter Actions: 0

Posted in

DevOps / SRE

Functional Area

DevOps / Cloud

Job Code

1589683

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers