Posted on: 09/12/2025
Description :
GenAI & Document Intelligence :
- Design and build end-to-end document intelligence pipelines using OCR + LLMs.
- Parse, segment, and structure complex legal documents into schema-driven JSON outputs.
- Implement layout-aware extraction, table handling, and multi-page document processing.
- Apply prompt engineering, function calling, and structured output constraints.
- Build RAG-based systems (Retrieval-Augmented Generation) to ground LLM responses using internal legal knowledge bases.
- Fine-tune Large Language Models (LLaMA, GPT, and other open-source or commercial LLMs) for legal domainspecific tasks.
- Handle OCR noise, incomplete documents, inconsistent layouts, and edge cases.
OCR & Document Processing :
- Integrate OCR engines such as AWS Textract, Azure Form Recognizer, or Tesseract.
- Design post-OCR cleanup, normalization, and rule-based extraction pipelines.
- Implement validation logic to ensure accuracy, consistency, and legal usability.
Backend & Production Engineering :
- Build scalable backend services using Python.
- Optimize latency, throughput, and cost (OCR usage, LLM token consumption).
- Implement logging, retries, failure handling, and confidence checks.
Cloud, Deployment & Reliability :
- Deploy AI systems on AWS / Azure / GCP.
- Collaborate with DevOps teams on containerization, monitoring, and secure access.
- Ensure solutions meet data security, privacy, and standard compliance requirements (e.g., HIPAA, where applicable).
Collaboration & Technical Ownership :
- Work closely with legal professionals to translate domain requirements into technical solutions.
- Provide technical guidance and mentoring to junior engineers.
- Maintain clear technical documentation and architectural decision records.
Did you find something suspicious?