Posted on: 20/11/2025
Description :
- Owns ongoing model performance and enhancement for 1 or more Dodge entities / domains
- Deeply analyzes Dodge datasets in order to suggest best solutions for data management and enrichment using AI / ML
- Design, develop, and test machine learning models to automate data enrichment, classification, and validation processes.
- Develop Python-based automation scripts and microservices to reduce manual effort in project matching, contact discovery, and quality checks.
- Implement NLP models for entity recognition (e.g., identifying architects, GCs, and project roles from unstructured text, pdf documents).
- Implement OCR, NLP, and layout recognition techniques to extract project metadata, deadlines, contacts, and technical requirements.
- Build Python-based scripts and microservices to classify documents by type and extract structured fields (e.g., bid dates, scope of work, etc).
- Build pipelines that integrate scraped project data with external APIs (ZoomInfo, LinkedIn, etc.) to enrich company and contact information.
- Collaborate with data engineers to ensure ML pipelines integrate seamlessly with existing data warehouses.
- Partner with data specialists to design feedback loops that validate and improve model outputs.
Required Qualifications :
- 5+ years of experience in Machine Learning and automation engineering.
- Proficiency in Python with hands-on experience using libraries such as scikit-learn, spaCy, TensorFlow,
or PyTorch.
- Hands-on experience with OCR frameworks (Tesseract, PaddleOCR, AWS Textract, Google Document AI).
- Familiarity with document layout analysis (LayoutLM, Donut, DocTR, etc.).
- Strong knowledge of regex, rules-based parsing, and entity extraction techniques.
- Strong knowledge of data pipelines and ETL frameworks.
- Experience deploying ML models into production, monitoring performance, and maintaining pipelines.
- Solid understanding of relational databases and SQL; experience with large-scale warehouses (e.g.,
Redshift, Snowflake).
- Demonstrated experience automating repetitive tasks with Python, APIs, and workflow orchestration.
- Strong problem-solving skills with the ability to translate business use cases (project/contact enrichment, validation) into ML/automation solutions.
Preferred Qualifications :
- Experience with Named Entity Recognition (NER) and text classification models for parsing unstructured construction/project documents.
- Familiarity with AWS analytics/ML services (SageMaker, Comprehend, Lambda, Step Functions).
- Exposure to CI/CD pipelines and MLOps tools (MLflow, Git, Docker, Kubernetes).
- Prior experience working with sales intelligence data (contacts, companies, lead enrichment).
- Experience in Agile delivery environments using Jira or Confluence.
Mode of Work : Hybrid - This role adheres to the organizations hybrid work policy, requiring presence at the office on three designated days per week
Did you find something suspicious?