HamburgerMenu
hirist

Data Scientist/Machine Learning Engineer

Posted on: 07/12/2025

Job Description

Description :


Role : Data Scientist / Machine Learning Engineer (58 Years).


Location : Chennai.


Time Zone : IST.


Primary Tech Stack : AI/ML, Algorithms, Deep Learning, Python, LLMs, CI/CD, APIs.



Experience Required :



- Must-Have : 5+Years of hands-on experience in ML/Deep Learning, including 3+ years in a senior or lead role.



Practical experience with OCR pipelines :



- Tesseract, EasyOCR, PaddleOCR, AWS Textract, Google Vision.



- Experience with Document Layout Models :



- LayoutLMv3, Donut, Pix2Struct, LiLT, LayoutXLM.



Expertise in Vision-Language Models :



- CLIP, BLIP-2, GroundingDINO, LLaVA, Florence-2.



Strong background in LLM-based extraction using :



- OpenAI, Gemini, Claude, Llama, Qwen, Mistral.



Proven ability to design end-to-end ML system architectures, including :



- LLM + OCR + Embedding + Prompt pipelines.



- Preprocessing pipelines for multi-format documents.



- Vector DBs, embedding stores, structured extraction systems.



- Async processing queues, job orchestration, microservices.



- GPU/CPU deployment strategy.



Experience scaling ML systems :



- Batch processing large files.



- Handling concurrency, throughput, latency.



- Model selection, distillation & quantization (GGUF, ONNX).



Preferred Experience :



- CI/CD for ML : GitHub Actions, Jenkins.



- Model monitoring : latency, drift detection, cost optimization.



- Cloud experience (AWS/GCP/Azure), including : SageMaker, Vertex AI, Bedrock (nice to have).



- Ability to choose the right ML approach : fine-tuning, RAG, multimodal, prompting.



- Strong problem-solving and end-to-end solution ownership.



- Skilled in building PoCs and converting them into production-ready systems.



- Ability to estimate feasibility, cost, complexity & timelines for ML solutions.



Education :



- Required : Bachelors or Masters Degree in a relevant field.



Key Responsibility Areas (KRAs) :



- Design and implement scalable ML pipelines integrating OCR + Document Layout Models + LLMs + Embeddings.



- Architect high-performance processing pipelines for multi-format documents (PDF, scanned docs, images, compound files).



- Own system design decisions including model selection, hardware strategy, vector DB usage, orchestration flow, and API layer.



- Build, enhance, and optimize OCR pipelines using Tesseract, EasyOCR, PaddleOCR, AWS Textract, Google Vision.



- Develop and fine-tune LayoutLMv3, Donut, Pix2Struct, LiLT, LayoutXLM for structured data extraction and document understanding.



- Implement and integrate Vision-Language Models (CLIP, BLIP-2, GroundingDINO, LLaVA, Florence-2) for multimodal extraction.



- Build robust LLM extraction pipelines using OpenAI, Gemini, Claude, Llama, Qwen, Mistral.



- Own prompt engineering, schema-driven extraction, evaluation frameworks, and hybrid pipelines (RAG + multimodal + OCR).



- Optimize LLM performance through distillation, quantization (ONNX, GGUF), and fine-tuning where needed.



Architect pipelines for high-volume document processing with focus on :



- Throughput



- Latency



- Resiliency



- Concurrency management



- Build async queues, microservices, and load-balanced processing flows.



- Optimize deployments across CPU/GPU environments (containerized or serverless).



- Set up automated model deployment pipelines using GitHub Actions, Jenkins, etc.



Implement monitoring dashboards for :



- Model drift



- Latency



- Accuracy



- Cost and resource optimization



- Establish feedback loops for model re-training and continuous improvement.



- Work closely with product, engineering, QA, and DevOps teams to deliver production-grade ML solutions.



- Translate business/functional requirements into ML problem statements and technical designs.



- Present architecture, feasibility, estimates, and timelines to leadership and stakeholders.



- Quickly build Proof-of-Concepts and evolve them into full-scale, production-ready systems.



- Evaluate multiple ML approaches (RAG, multimodal, prompting, fine-tuning) and recommend the most effective solution.



- Ensure code quality, documentation, reproducibility, and scalable design.


info-icon

Did you find something suspicious?