Competencies :

Key Competencies Strong analytical and problem-solving skills Ability to work in a fast-paced, agile environment Excellent communication and stakeholder management High ownership with focus on scalability and reliability

Job Description :

We are looking for a highly skilled ML Ops Engineer to design, build, and manage scalable machine learning pipelines and infrastructure.

The ideal candidate will have strong expertise in PySpark/Python, Databricks, and end-to-end MLOps lifecycle management, with hands-on experience in deploying and maintaining ML models in production environments.

This role will focus on operationalizing machine learning solutions by ensuring reliability, scalability, governance, and seamless integration with enterprise data platforms.

Primary Skills (Must Have) :

- Strong expertise in PySpark and Python Hands-on experience with Databricks platform End-to-end MLOps lifecycle management Cloud infrastructure management (AWS / Azure / GCP) CI/CD pipeline implementation for ML models

Secondary Skills (Good to Have) :

- Exposure to GenAI frameworks (LLMs, LangChain, etc.) Experience in model deployment & real-time/batch serving Knowledge of security, governance, and compliance frameworks

Key Responsibilities :

- Design, develop, and maintain end-to-end ML pipelines including data ingestion, feature engineering, model training, validation, deployment, and monitoring.

- Build scalable data processing workflows using PySpark on Databricks.

- Implement and automate CI/CD pipelines for ML workflows ensuring faster and reliable deployments.

- Manage model versioning, experiment tracking, and reproducibility using tools like MLflow.

- Deploy ML models into production and enable real-time and batch inference pipelines.

- Continuously monitor model performance, data drift, and system health in production environments.

- Collaborate with data scientists, data engineers, and DevOps teams to productionize machine learning models.

- Implement security, governance, and access controls across ML pipelines and data workflows.

- Optimize cloud infrastructure for performance, scalability, and cost efficiency.

- Support troubleshooting, root cause analysis, and continuous improvement of ML systems.

- Enable automated retraining pipelines and lifecycle management of models.

Required Skills & Qualifications :

- Strong programming skills in Python and PySpark

- Good understanding of SQL and large-scale data processing Hands-on experience with Databricks (Must Have)

- Experience with ML platforms/tools such as MLflow, SageMaker, or equivalent Strong knowledge of CI/CD tools (Jenkins, GitHub Actions, Azure DevOps, etc.)

- Experience working with cloud platforms (AWS / Azure / GCP) Solid understanding of ML lifecycle management, deployment, and monitoring

- Familiarity with Docker and Kubernetes for containerization and orchestration Understanding of data pipelines and distributed computing systems

Preferred Qualifications :

- Experience with large-scale distributed data processing (Apache Spark ecosystem) Exposure to feature stores and model governance frameworks

- Experience with GenAI/LLM-based applications and deployment patterns Knowledge of Infrastructure as Code (Terraform, ARM, CloudFormation)

- Familiarity with monitoring and observability tools Key Competencies Strong analytical and problem-solving skills

- Ability to work in a fast-paced, agile environment Excellent communication and stakeholder management High ownership with focus on scalability and reliability

Nice to Have :

- Experience in real-time streaming pipelines (Kafka, Spark Streaming)

- Understanding of data security, privacy, and regulatory compliance Prior experience in enterprise-scale ML platform implementation

Key Responsibilities :

Primary Skills (Must Have) :

Strong expertise in PySpark and Python Hands-on experience with Databricks platform End-to-end MLOps lifecycle management Cloud infrastructure management (AWS / Azure / GCP) CI/CD pipeline implementation for ML models

Secondary Skills (Good to Have) :

Exposure to GenAI frameworks (LLMs, LangChain, etc.) Experience in model deployment & real-time/batch serving Knowledge of security, governance, and compliance frameworks

Key Responsibilities :

- Design, develop, and maintain end-to-end ML pipelines including data ingestion, feature engineering, model training, validation, deployment, and monitoring. Build scalable data processing workflows using PySpark on Databricks.

- Implement and automate CI/CD pipelines for ML workflows ensuring faster and reliable deployments. Manage model versioning, experiment tracking, and reproducibility using tools like MLflow.

- Deploy ML models into production and enable real-time and batch inference pipelines.

- Continuously monitor model performance, data drift, and system health in production environments.

- Collaborate with data scientists, data engineers, and DevOps teams to productionize machine learning models.

- Implement security, governance, and access controls across ML pipelines and data workflows.

- Optimize cloud infrastructure for performance, scalability, and cost efficiency.

- Support troubleshooting, root cause analysis, and continuous improvement of ML systems. Enable automated retraining pipelines and lifecycle management of models.

Must Have Skills :

Primary Skills (Must Have) :

Good to Have Skills :

- Secondary Skills (Good to Have) Exposure to GenAI frameworks (LLMs, LangChain, etc.)

- Experience in model deployment & real-time/batch serving Knowledge of security, governance, and compliance frameworks