HamburgerMenu
hirist

Auriga - Software Development Engineer II - Generative AI

Auriga IT Consulting Pvt Ltd.
Jaipur
3 - 5 Years
star-icon
4.5white-divider100+ Reviews

Posted on: 29/10/2025

Job Description

Description :


About the job :


Job Summary :


Were seeking a hands-on GenAI & Computer Vision Engineer with 35 years of experience delivering production-grade AI solutions.


You must be fluent in the core libraries, tools, and cloud services listed below, and able to own end-to-end model development - from research and fine-tuning through deployment, monitoring, and iteration.


In this role, youll tackle domain-specific challenges like LLM hallucinations, vector search scalability, real-time inference constraints, and concept drift in vision models.


Key Responsibilities :


Generative AI & LLM Engineering :


- Fine-tune and evaluate LLMs (Hugging Face Transformers, Ollama, LLaMA) for specialized tasks


- Deploy high-throughput inference pipelines using vLLM or Triton Inference Server


- Design agent-based workflows with LangChain or LangGraph, integrating vector databases (Pinecone, Weaviate) for retrieval-augmented generation


- Build scalable inference APIs with FastAPI or Flask, managing batching, concurrency, and rate-limiting


Computer Vision Development :


- Develop and optimize CV models (YOLOv8, Mask R-CNN, ResNet, EfficientNet, ByteTrack) for detection, segmentation, classification, and tracking


- Implement real-time pipelines using NVIDIA DeepStream or OpenCV (cv2); optimize with TensorRT or ONNX Runtime for edge and cloud deployments


- Handle data challengesaugmentation, domain adaptation, semi-supervised learningand mitigate model drift in production


MLOps & Deployment :


- Containerize models and services with Docker; orchestrate with Kubernetes (KServe) or AWS SageMaker Pipelines


- Implement CI/CD for model/version management (MLflow, DVC), automated testing, and performance monitoring (Prometheus + Grafana)


- Manage scalability and cost by leveraging cloud autoscaling on AWS (EC2/EKS), GCP (Vertex AI), or Azure ML (AKS)


Cross-Functional Collaboration :


- Define SLAs for latency, accuracy, and throughput alongside product and DevOps teams


- Evangelize best practices in prompt engineering, model governance, data privacy, and interpretability


- Mentor junior engineers on reproducible research, code reviews, and end-to-end AI delivery


Required Qualifications :


You must be proficient in at least one tool from each category below :


LLM Frameworks & Tooling :


- Hugging Face Transformers, Ollama, vLLM, or LLaMA


Agent & Retrieval Tools :


- LangChain or LangGraph; RAG with Pinecone, Weaviate, or Milvus


Inference Serving :


- Triton Inference Server; FastAPI or Flask


Computer Vision Frameworks & Libraries :


- PyTorch or TensorFlow; OpenCV (cv2) or NVIDIA DeepStream


Model Optimization :


- TensorRT; ONNX Runtime; Torch-TensorRT


MLOps & Versioning :


- Docker and Kubernetes (KServe, SageMaker); MLflow or DVC


Monitoring & Observability :


- Prometheus; Grafana


Cloud Platforms :


- AWS (SageMaker, EC2/EKS) or GCP (Vertex AI, AI Platform) or Azure ML (AKS, ML Studio)


Programming Languages :


- Python (required); C++ or Go (preferred)


Additionally :


- Bachelors or Masters in Computer Science, Electrical Engineering, AI/ML, or a related field


- 35 years of professional experience shipping both generative and vision-based AI models in production


- Strong problem-solving mindset; ability to debug issues like LLM drift, vector index staleness, and model degradation


- Excellent verbal and written communication skills


Typical Domain Challenges Youll Solve :


- LLM Hallucination & Safety : Implement grounding, filtering, and classifier layers to reduce false or unsafe outputs


- Vector DB Scaling : Maintain low-latency, high-throughput similarity search as embeddings grow to millions


- Inference Latency : Balance batch sizing and concurrency to meet real-time SLAs on cloud and edge hardware


- Concept & Data Drift : Automate drift detection and retraining triggers in vision and language pipelines


- Multi-Modal Coordination : Seamlessly orchestrate data flow between vision models and LLM agents in complex workflows


info-icon

Did you find something suspicious?