Posted on: 20/04/2026
Job Description :
- Optimize model inference performance and cost efficiency
- Fine-tune foundation models for specific use cases and domains
- Implement diverse prompt engineering strategies
- Build robust backend infrastructure for AI-powered applications
- Implement and maintain MLOps pipelines for AI lifecycle management
- Design and implement comprehensive traditional ML and LLM monitoring and evaluation systems
- Develop automated testing frameworks for model quality and performance tracking
Basic Qualifications :
- 4 - 8 years of relevant experience in LLMs, Backend Engineering, and MLOps.
LLM Expertise :
- Model Fine-tuning : Experience with parameter-efficient fine-tuning methods (LoRA, QLoRA, adapter layers)
- Inference Optimization : Knowledge of quantization, pruning, caching strategies, and serving optimizations
- Prompt Engineering : Prompt design, few-shot learning, chain-of-thought prompting, and retrieval-augmented generation (RAG)
- Model Evaluation : Experience with AI evaluation frameworks and metrics for different use cases
- Monitoring & Testing : Design of automated evaluation pipelines, A/B testing for models, and continuous monitoring systems
Backend Engineering :
- Languages : Proficiency in Python, with experience in FastAPI, Flask, or similar frameworks
- APIs : Design and implementation of RESTful APIs and real-time systems
- Databases : Experience with vector databases and traditional databases
- Cloud Platforms : AWS, GCP, or Azure with focus on ML services
MLOps & Infrastructure :
- Deployment : Experience with model serving frameworks (vLLM, SGLang, TensorRT)
- Monitoring : ML model monitoring, performance tracking, and alerting systems
- Evaluation Systems : Building automated evaluation pipelines with custom metrics and benchmarks
- CI/CD : MLOps pipelines for automated testing, and deployment
- Orchestration : Experience with workflow tools like Airflow.
Preferred Qualifications :
- LLM Frameworks : Hands-on experience with Transformers, LangChain, LlamaIndex, or similar
- Monitoring Platforms : Knowledge of LLM-specific monitoring tools and general ML monitoring
- Distributed Training and Inference : Experience with multi-GPU and distributed training and
inference setups
- Model Compression : Knowledge of techniques like distillation, quantization, and efficient architectures
- Production Scale : Experience deploying models handling high-throughput, low-latency requirements
- Research Background : Familiarity with recent LLM research and ability to implement novel techniques
Tools & Technologies :
We Use :
- Frameworks : PyTorch, Transformers, TensorFlow
- Serving : vLLM, TensorRT-LLM, SGlang, OpenAI API,
- Infrastructure : Kubernetes, Docker, AWS/GCP
Databases : PostgreSQL, Redis, Vector DBs
We are proud to offer a competitive salary alongside a strong insurance package. We pride ourselves on the growth of our employees, offering extensive learning and development resources.
Did you find something suspicious?
Posted by
Anusha Kapuganti
Lead TA Specialist at SHYFTLABS PRIVATE LIMITED
Last Active: NA as recruiter has posted this job through third party tool.
Posted in
AI/ML
Functional Area
ML / DL Engineering
Job Code
1629646