Posted on: 07/12/2025
Description :
Role : Data Scientist / Machine Learning Engineer (58 Years).
Location : Chennai.
Time Zone : IST.
Primary Tech Stack : AI/ML, Algorithms, Deep Learning, Python, LLMs, CI/CD, APIs.
Experience Required :
- Must-Have : 5+Years of hands-on experience in ML/Deep Learning, including 3+ years in a senior or lead role.
Practical experience with OCR pipelines :
- Tesseract, EasyOCR, PaddleOCR, AWS Textract, Google Vision.
- Experience with Document Layout Models :
- LayoutLMv3, Donut, Pix2Struct, LiLT, LayoutXLM.
Expertise in Vision-Language Models :
- CLIP, BLIP-2, GroundingDINO, LLaVA, Florence-2.
Strong background in LLM-based extraction using :
- OpenAI, Gemini, Claude, Llama, Qwen, Mistral.
Proven ability to design end-to-end ML system architectures, including :
- LLM + OCR + Embedding + Prompt pipelines.
- Preprocessing pipelines for multi-format documents.
- Vector DBs, embedding stores, structured extraction systems.
- Async processing queues, job orchestration, microservices.
- GPU/CPU deployment strategy.
Experience scaling ML systems :
- Batch processing large files.
- Handling concurrency, throughput, latency.
- Model selection, distillation & quantization (GGUF, ONNX).
Preferred Experience :
- CI/CD for ML : GitHub Actions, Jenkins.
- Model monitoring : latency, drift detection, cost optimization.
- Cloud experience (AWS/GCP/Azure), including : SageMaker, Vertex AI, Bedrock (nice to have).
- Ability to choose the right ML approach : fine-tuning, RAG, multimodal, prompting.
- Strong problem-solving and end-to-end solution ownership.
- Skilled in building PoCs and converting them into production-ready systems.
- Ability to estimate feasibility, cost, complexity & timelines for ML solutions.
Education :
- Required : Bachelors or Masters Degree in a relevant field.
Key Responsibility Areas (KRAs) :
- Design and implement scalable ML pipelines integrating OCR + Document Layout Models + LLMs + Embeddings.
- Architect high-performance processing pipelines for multi-format documents (PDF, scanned docs, images, compound files).
- Own system design decisions including model selection, hardware strategy, vector DB usage, orchestration flow, and API layer.
- Build, enhance, and optimize OCR pipelines using Tesseract, EasyOCR, PaddleOCR, AWS Textract, Google Vision.
- Develop and fine-tune LayoutLMv3, Donut, Pix2Struct, LiLT, LayoutXLM for structured data extraction and document understanding.
- Implement and integrate Vision-Language Models (CLIP, BLIP-2, GroundingDINO, LLaVA, Florence-2) for multimodal extraction.
- Build robust LLM extraction pipelines using OpenAI, Gemini, Claude, Llama, Qwen, Mistral.
- Own prompt engineering, schema-driven extraction, evaluation frameworks, and hybrid pipelines (RAG + multimodal + OCR).
- Optimize LLM performance through distillation, quantization (ONNX, GGUF), and fine-tuning where needed.
Architect pipelines for high-volume document processing with focus on :
- Throughput
- Latency
- Resiliency
- Concurrency management
- Build async queues, microservices, and load-balanced processing flows.
- Optimize deployments across CPU/GPU environments (containerized or serverless).
- Set up automated model deployment pipelines using GitHub Actions, Jenkins, etc.
Implement monitoring dashboards for :
- Model drift
- Latency
- Accuracy
- Cost and resource optimization
- Establish feedback loops for model re-training and continuous improvement.
- Work closely with product, engineering, QA, and DevOps teams to deliver production-grade ML solutions.
- Translate business/functional requirements into ML problem statements and technical designs.
- Present architecture, feasibility, estimates, and timelines to leadership and stakeholders.
- Quickly build Proof-of-Concepts and evolve them into full-scale, production-ready systems.
- Evaluate multiple ML approaches (RAG, multimodal, prompting, fine-tuning) and recommend the most effective solution.
- Ensure code quality, documentation, reproducibility, and scalable design.
Did you find something suspicious?
Posted by
Sneha Gupta
Senior Human Resource Manager at TEKIT SOFTWARE SOLUTIONS PRIVATE LIMITED
Last Active: 18 Dec 2025
Posted in
AI/ML
Functional Area
ML / DL Engineering
Job Code
1585944
Interview Questions for you
View All