Posted on: 19/08/2025
Key Responsibilities :
- Design and implement scalable deployment pipelines for open-source Gen AI models (LLMs, diffusion models, etc.)
- Fine-tune and optimize models using techniques like LoRA, quantization, distillation, etc.
- Manage inference workloads, latency optimization, and GPU utilization.
- Build CI/CD pipelines for model training, validation, and deployment.
- Integrate observability, logging, and alerting for model and infrastructure monitoring.
- Automate resource provisioning using Terraform, Helm, or similar tools on GCP/AWS/Azure.
- Ensure model versioning, reproducibility, and rollback using tools like MLflow, DVC, or
Weights & Biases.
- Collaborate with data scientists, backend engineers, and DevOps teams to ensure smooth
production rollouts.
Required Skills & Qualifications :
- 5+ years of total experience in software engineering or cloud infrastructure.
- 3+ years in MLOps with direct experience in deploying large Gen AI models.
- Hands-on experience with open-source models (e.g., LLaMA, Mistral, Stable Diffusion, Falcon,
etc.)
- Strong knowledge of Docker, Kubernetes, and cloud compute orchestration.
- Proficiency in Python and familiarity with model-serving frameworks (e.g., FastAPI, Triton
Inference Server, Hugging Face Accelerate, vLLM).
- Experience with cloud platforms (GCP preferred, AWS or Azure acceptable).
- Familiarity with distributed training, checkpointing, and model parallelism.
Good to Have :
- Experience with low-latency inference systems and token streaming architectures.
- Exposure to LLMOps tools (LangChain, BentoML, Ray Serve, etc.)
Why Join Us :
- Collaborative team with deep expertise in AI, cloud, and enterprise software.
- Flexible work environment with a focus on innovation and impact
Did you find something suspicious?