We are looking for a highly skilled MLOps & LLM Ops Engineer with strong expertise in deploying, automating, and monitoring AI/ML models - including Large Language Models (LLMs) - in production environments. The ideal candidate will have hands-on experience in CI/CD automation, container orchestration, data pipelines, LangChain, and cloud deployment across Azure/AWS. You will collaborate with data scientists, ML engineers, and customer architects to ensure seamless end-to-end delivery of scalable, high-performing AI systems.
Key Responsibilities :
1. Model Deployment & Automation :
- Automate the full lifecycle of AI/ML model deployment, including packaging, orchestration, scaling, and rollout strategies.
- Implement automated workflows for data, model versioning, and experiment tracking using tools like MLflow or similar systems.
- Deploy Large Language Models (LLMs) to production using frameworks such as LangChain, Flask, FastAPI, or custom
microservices.
- Containerize and orchestrate model services using Docker & Kubernetes, enabling highly available and fault-tolerant inference pipelines.
2. CI/CD & Infrastructure Automation :
- Build and maintain robust CI/CD pipelines using Git, Jenkins, GitHub Actions, or GitLab CI for continuous integration, testing, and deployment of ML solutions.
- Implement infrastructure-as-code (IaC) for automated provisioning of cloud resources (Terraform or equivalent).
- Automate deployment workflows for API endpoints, microservices, feature stores, and data processing pipelines.
3. Data Pipelines & Real-Time Processing :
- Design, deploy, and manage data ingestion and processing pipelines using Airflow, Kafka, and RabbitMQ.
- Ensure reliable, scalable, and secure data pipelines that support both training and inference workflows.
- Optimize data freshness, batch scheduling, and streaming performance for high-throughput model operations.
4. LLM & Foundation Model Operations :
- Integrate and operationalize foundation model APIs such as OpenAI, Anthropic, Gemini, Cohere, etc.
- Deploy custom or fine-tuned LLMs (GPT, Llama, Mistral, etc.) using LangChain or custom inference frameworks.
- Implement prompt management, evaluation, caching, vector store integrations, and retrieval-augmented generation (RAG)
pipelines.
- Ensure high performance, low latency, and reliability of LLM-based production systems.
5. Cloud Deployment & Infrastructure Management :
- Deploy ML workloads in Azure or AWS using services like Kubernetes (AKS/EKS), Lambda, EC2, S3/ADLS, API Gateway, Azure
Functions, etc.
- Monitor and optimize infrastructure cost, performance, and scalability for ML and LLM systems.
- Collaborate with customer architects to define, plan, and execute end-to-end deployments and solution architectures.
6. Monitoring, Observability & Performance Optimization :
- Implement and maintain observability stacks for model performance monitoring, including :
- Use tools like Prometheus, Grafana, ELK, Datadog, or cloud-native monitoring solutions.
- Troubleshoot production issues and perform root cause analysis across models, pipelines, and infrastructure.
Required Skills & Qualifications :
- Strong hands-on experience in MLOps, production ML workflows, and automation.
- Expertise in CI/CD tools (Git, Jenkins, GitHub Actions, GitLab CI).
- Strong experience with Docker and Kubernetes for model containerization and deployment.
- Practical knowledge of MLflow, LangChain, and experiment tracking/versioning systems.
- Experience with Airflow, Kafka, RabbitMQ for large-scale data workflow orchestration.
- Experience working with foundation model APIs (OpenAI, Anthropic, etc.).
- Hands-on deployment experience on Azure and/or AWS cloud platforms.
- Familiarity with performance monitoring tools (Prometheus, Grafana, Datadog, CloudWatch, etc.).
- Solid understanding of distributed systems, microservices, and cloud-native architectures.
- Strong communication, analytical, and debugging skills.
- Ability to work in fast-paced environments and manage complex deployments.
Preferred (Nice-to-Have) :
- Knowledge of vector databases (Pinecone, Weaviate, FAISS, Chroma).
- Experience with RAG pipelines, semantic search, embeddings, or LLM orchestration frameworks.
- Exposure to model optimization techniques such as quantization, distillation, or low-latency inference optimization.
- Hands-on experience with Terraform, Helm, or ArgoCD.
- Experience with GPU-based deployments and optimization in cloud platforms.
Did you find something suspicious?