Posted on: 03/02/2026
Description :
Key Responsibilities :
Architecture & Infrastructure :
- Design, implement, and optimize end-to-end ML training workflows including infrastructure setup, orchestration, fine-tuning, deployment, and monitoring.
- Evaluate and integrate multi-cloud and single-cloud training options across AWS and other major platforms.
- Lead cluster configuration, orchestration design, environment customization, and scaling strategies.
- Compare and recommend hardware options (GPUs, TPUs, accelerators) based on performance, cost, and availability.
Technical Expertise Requirements :
- At least 5 years in AI/ML infrastructure and large-scale training environments.
- Expert in AWS cloud services (EC2, S3, EKS, SageMaker, Batch, FSx, etc.) and familiar with Azure, GCP, and hybrid/multi-cloud setups.
- Strong knowledge of AI/ML training frameworks (PyTorch, TensorFlow, Hugging Face, DeepSpeed, Megatron, Ray, etc.).
- Proven experience with cluster orchestration tools (Kubernetes, Slurm, Ray, SageMaker, Kubeflow).
- Deep understanding of hardware architectures for AI workloads (NVIDIA, AMD, Intel Habana, TPU).
LLM Inference Optimization :
- Expert knowledge of inference optimization techniques including speculative decoding, KV cache optimization (MQA/GQA/PagedAttention), and dynamic batching.
- Deep understanding of prefill vs decode phases, memory-bound vs compute-bound operations.
- Experience with quantization methods (INT4/INT8, GPTQ, AWQ) and model parallelism strategies.
Inference Frameworks :
- Hands-on experience with production inference engines : vLLM, TensorRT-LLM, DeepSpeed-Inference, or TGI.
- Proficiency with serving frameworks : Triton Inference Server, KServe, or Ray Serve.
- Familiarity with kernel optimization libraries (FlashAttention, xFormers).
Performance Engineering :
- Proven ability to optimize inference metrics : TTFT (first token latency), ITL (inter-token latency), and throughput.
- Experience profiling and resolving GPU memory bottlenecks and OOM issues.
- Knowledge of hardware-specific optimizations for modern GPU architectures (A100/H100).
Fine tuning :
- Drive end-to-end fine-tuning of LLMs, including model selection, dataset preparation/cleaning, tokenization, and evaluation with baseline metrics.
- Configure and execute fine-tuning experiments (LoRA, QLoRA, etc.) on large-scale compute setups, ensuring optimal hyperparameter tuning, logging, and checkpointing.
- Document fine-tuning outcomes by capturing performance metrics (losses, BERT/ROUGE scores, training time, resource utilization) and benchmark against baseline models.
Did you find something suspicious?
Posted by
Dushyant Waghmare
Head - Talent Acquisition at FISSION COMPUTER LABS PRIVATE LIMITED
Last Active: 3 Feb 2026
Posted in
AI/ML
Functional Area
Data Science
Job Code
1609334