Posted on: 12/12/2025
Description :
Role Overview :
The GenAI Data Engineer is a senior, cutting-edge role requiring 8-12 Years of experience to focus on designing, building, and optimizing advanced data pipelines specifically for unstructured and semi-structured content, integrating Generative AI/ML capabilities.
The incumbent will combine modern ETL expertise with Vector Database and GenAI integration to support intelligent document processing and semantic search applications.
This is a Work From Home position.
Job Summary :
We are seeking a senior GenAI Data Engineer (8-12 years experience) with mandatory expertise in Azure Data Factory (ADF) and Databricks for building scalable ETL/ELT workflows. The ideal candidate will specialize in optimizing pipelines for unstructured content and possess experience with Vector Databases for semantic search and RAG (Retrieval-Augmented Generation) pipelines. Key responsibilities include implementing advanced data modeling and indexing techniques, ensuring pipeline performance, and possessing strong exposure to MLOps practices and Large Language Model (LLM) fine-tuning to drive intelligent data applications.
Key Responsibilities and Technical Deliverables :
GenAI Data Pipeline Development and Optimization :
- Design, build, and maintain robust data ingestion and transformation pipelines using Azure Data Factory (ADF) and Databricks environments for both structured and complex unstructured data.
- Optimise ETL/ELT pipelines for scalability, reliability, and performance, focusing on low latency processing for GenAI application needs.
- Implement and integrate Vector Database technologies for efficient storage and retrieval of embeddings, supporting advanced semantic search applications.
- Develop and manage RAG (Retrieval-Augmented Generation) pipelines, ensuring seamless integration between knowledge retrieval systems and LLMs.
Architecture, Modelling, and Performance :
- Apply Strong knowledge of data modelling, indexing, and query optimisation techniques suitable for both traditional relational stores and modern unstructured data repositories.
- Leverage proven Experience with cloud platforms (Azure preferred), utilizing services beyond ADF and Databricks for storage, compute, and serverless processing.
- Ensure data quality, governance, and security are maintained throughout the GenAI data lifecycle.
MLOps and GenAI Integration :
- Possess Exposure to MLOps practices for the reliable deployment, monitoring, and governance of AI/ML models.
- Demonstrate practical Exposure to LLM fine-tuning processes and managing data preparation necessary for effective model customization and training.
- Contribute to the overall architectural strategy for data integration within GenAI applications, including leveraging knowledge graphs for enhanced data contextualization.
Mandatory Skills & Qualifications :
- Experience : 8-12 Years in Data Engineering/AI Engineering roles.
- ETL/Cloud : Proven experience in Azure Data Factory (ADF) and Databricks for building ETL/ELT workflows.
- Data Proficiency : Strong knowledge of data modelling, indexing, and query optimisation.
- AI/GenAI : Experience with knowledge graphs or RAG (Retrieval-Augmented Generation) pipelines and Vector Databases.
- MLOps : Exposure to MLOps practices and LLM fine-tuning.
- Platform : Experience with cloud platforms (Azure preferred).
Preferred Skills :
- Proficiency in Python and PySpark for data transformation scripting.
- Experience with Azure AI services (e.g., Azure OpenAI Service).
- Knowledge of containerization (Docker, Kubernetes) for model serving.
- Experience with unstructured data processing libraries (e.g., spaCy, NLTK).
Did you find something suspicious?
Posted by
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1588770