Posted on: 10/04/2026
Description :
We are seeking a senior Ops Engineer to support GenAI, LLM, and ML workloads, with a strong focus on deployment automation, observability, scalability, and platform reliability across Azure and Kubernetes environments.
Key Responsibilities :
- Build and maintain CI/CD/CT pipelines for ML models, LLMs, and GenAI workloads
- Deploy and operationalize :
a. ML models and custom LLMs
b. AI agents and GenAI services using Databricks, MLflow, AKS / ARO
- Integrate and scale GenAI ecosystems including :
a. Azure OpenAI / OpenAI
b. HuggingFace models
c RAG pipelines and vector databases
- Support development and deployment of custom models and out of the box AI agents
- Manage Databricks :
a. Workspaces
b. Clusters
c. Model registry
d. Job orchestration
- Own AKS / ARO lifecycle, including networking, scaling, Helm based deployments, and GitOps workflows
- Implement robust observability for AI/ML/LLM systems (latency, drift, reliability, performance)
- Ensure cloud security, governance, access controls, and cost efficiency
Required Skills :
- Strong hands on experience with Azure, AKS/Kubernetes, Databricks, MLflow
- Experience with LLMOps, RAG pipelines, and vector stores (FAISS, Pinecone, Chroma, etc.)
- Proficiency in Python and automation scripting
- Strong understanding of AI/ML system operations and platform reliability
Did you find something suspicious?