Posted on: 07/10/2025
About the Role :
We are seeking a GenAI Data Engineer to design, build, and optimize data pipelines for unstructured and semi-structured content, integrating advanced AI/ML capabilities. This role combines modern ETL expertise with Vector Database & GenAI integration to support intelligent document processing and semantic search applications.
Key Responsibilities :
- Develop and maintain data ingestion pipelines using Azure Data Factory (ADF) and Databricks for structured and unstructured data.
- Create notebooks to process PDF and Word documents, including extracting text, tables, charts, graphs, and images.
- Apply NLP / Embedding Models (e.g., OpenAI, Hugging Face, sentence-transformers) to convert extracted content into embeddings.
- Store embeddings and metadata into Vector Databases (e.g., FAISS, Pinecone, Milvus, Weaviate, ChromaDB).
- Design and implement semantic search and retrieval workflows to enable prompt-based query capabilities.
- Optimize ETL pipelines for scalability, reliability, and performance.
- Collaborate with data scientists and solution architects to integrate GenAI capabilities into enterprise applications.
- Follow best practices for code quality, modularity, and documentation.
Required Skills & Experience :
- Proven experience in Azure Data Factory (ADF) and Databricks for building ETL/ELT workflows.
- Strong programming experience in Python (pandas, PySpark, PyPDF, python-docx, OCR libraries, etc.).
- Hands-on experience with Vector Databases and semantic search implementation.
- Understanding of embedding models, LLM-based retrieval, and prompt engineering.
- Familiarity with handling multi-modal data (text, tables, images, charts).
- Strong knowledge of data modeling, indexing, and query optimization.
- Experience with cloud platforms (Azure preferred).
- Strong problem-solving, debugging, and communication skills.
Nice to Have :
- Experience with knowledge graphs or RAG (Retrieval-Augmented Generation) pipelines.
- Exposure to MLOps practices and LLM fine-tuning.
- Familiarity with enterprise-scale document management systems.
Did you find something suspicious?