HamburgerMenu
hirist

Fissionlabs - Senior AI/ML Developer

FISSION COMPUTER LABS PRIVATE LIMITED
Multiple Locations
5 - 10 Years
star-icon
4.1white-divider67+ Reviews

Posted on: 03/02/2026

Job Description

Description :


Key Responsibilities :


Architecture & Infrastructure :


- Design, implement, and optimize end-to-end ML training workflows including infrastructure setup, orchestration, fine-tuning, deployment, and monitoring.


- Evaluate and integrate multi-cloud and single-cloud training options across AWS and other major platforms.


- Lead cluster configuration, orchestration design, environment customization, and scaling strategies.


- Compare and recommend hardware options (GPUs, TPUs, accelerators) based on performance, cost, and availability.


Technical Expertise Requirements :


- At least 5 years in AI/ML infrastructure and large-scale training environments.


- Expert in AWS cloud services (EC2, S3, EKS, SageMaker, Batch, FSx, etc.) and familiar with Azure, GCP, and hybrid/multi-cloud setups.


- Strong knowledge of AI/ML training frameworks (PyTorch, TensorFlow, Hugging Face, DeepSpeed, Megatron, Ray, etc.).


- Proven experience with cluster orchestration tools (Kubernetes, Slurm, Ray, SageMaker, Kubeflow).


- Deep understanding of hardware architectures for AI workloads (NVIDIA, AMD, Intel Habana, TPU).


LLM Inference Optimization :


- Expert knowledge of inference optimization techniques including speculative decoding, KV cache optimization (MQA/GQA/PagedAttention), and dynamic batching.


- Deep understanding of prefill vs decode phases, memory-bound vs compute-bound operations.


- Experience with quantization methods (INT4/INT8, GPTQ, AWQ) and model parallelism strategies.


Inference Frameworks :


- Hands-on experience with production inference engines : vLLM, TensorRT-LLM, DeepSpeed-Inference, or TGI.


- Proficiency with serving frameworks : Triton Inference Server, KServe, or Ray Serve.


- Familiarity with kernel optimization libraries (FlashAttention, xFormers).


Performance Engineering :


- Proven ability to optimize inference metrics : TTFT (first token latency), ITL (inter-token latency), and throughput.


- Experience profiling and resolving GPU memory bottlenecks and OOM issues.


- Knowledge of hardware-specific optimizations for modern GPU architectures (A100/H100).


Fine tuning :


- Drive end-to-end fine-tuning of LLMs, including model selection, dataset preparation/cleaning, tokenization, and evaluation with baseline metrics.


- Configure and execute fine-tuning experiments (LoRA, QLoRA, etc.) on large-scale compute setups, ensuring optimal hyperparameter tuning, logging, and checkpointing.


- Document fine-tuning outcomes by capturing performance metrics (losses, BERT/ROUGE scores, training time, resource utilization) and benchmark against baseline models.


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in