Posted on: 11/12/2025
Description :
Role Overview :
The GenAI Data Engineer is a senior, cutting-edge role requiring 8-12 Years of experience to focus on designing, building, and optimizing advanced data pipelines specifically for unstructured and semi-structured content, integrating Generative AI/ML capabilities.
The incumbent will combine modern ETL expertise with Vector Database and GenAI integration to support intelligent document processing and semantic search applications.
This is a Work From Home position.
Job Summary :
We are seeking a senior GenAI Data Engineer (8-12 years experience) with mandatory expertise in Azure Data Factory (ADF) and Databricks for building scalable ETL/ELT workflows. The ideal candidate will specialize in optimizing pipelines for unstructured content and possess experience with Vector Databases for semantic search and RAG (Retrieval-Augmented Generation) pipelines. Key responsibilities include implementing advanced data modeling and indexing techniques, ensuring pipeline performance, and possessing strong exposure to MLOps practices and Large Language Model (LLM) fine-tuning to drive intelligent data applications.
Key Responsibilities and Technical Deliverables :
GenAI Data Pipeline Development and Optimization :
- Design, build, and maintain robust data ingestion and transformation pipelines using Azure Data Factory (ADF) and Databricks environments for both structured and complex unstructured data.
- Optimise ETL/ELT pipelines for scalability, reliability, and performance, focusing on low latency processing for GenAI application needs.
- Implement and integrate Vector Database technologies for efficient storage and retrieval of embeddings, supporting advanced semantic search applications.
- Develop and manage RAG (Retrieval-Augmented Generation) pipelines, ensuring seamless integration between knowledge retrieval systems and LLMs.
Architecture, Modelling, and Performance :
- Apply Strong knowledge of data modelling, indexing, and query optimisation techniques suitable for both traditional relational stores and modern unstructured data repositories.
- Leverage proven Experience with cloud platforms (Azure preferred), utilizing services beyond ADF and Databricks for storage, compute, and serverless processing.
- Ensure data quality, governance, and security are maintained throughout the GenAI data lifecycle.
MLOps and GenAI Integration :
- Possess Exposure to MLOps practices for the reliable deployment, monitoring, and governance of AI/ML models.
- Demonstrate practical Exposure to LLM fine-tuning processes and managing data preparation necessary for effective model customization and training.
- Contribute to the overall architectural strategy for data integration within GenAI applications, including leveraging knowledge graphs for enhanced data contextualization.
Mandatory Skills & Qualifications :
- Experience : 8-12 Years in Data Engineering/AI Engineering roles.
- ETL/Cloud : Proven experience in Azure Data Factory (ADF) and Databricks for building ETL/ELT workflows.
- Data Proficiency : Strong knowledge of data modelling, indexing, and query optimisation.
- AI/GenAI : Experience with knowledge graphs or RAG (Retrieval-Augmented Generation) pipelines and Vector Databases.
- MLOps : Exposure to MLOps practices and LLM fine-tuning.
- Platform : Experience with cloud platforms (Azure preferred).
Preferred Skills :
- Proficiency in Python and PySpark for data transformation scripting.
- Experience with Azure AI services (e.g., Azure OpenAI Service).
- Knowledge of containerization (Docker, Kubernetes) for model serving.
- Experience with unstructured data processing libraries (e.g., spaCy, NLTK).
Did you find something suspicious?
Posted by
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1588770
Interview Questions for you
View All