Description :

Role Overview :

The GenAI Data Engineer is a senior, cutting-edge role requiring 8-12 Years of experience to focus on designing, building, and optimizing advanced data pipelines specifically for unstructured and semi-structured content, integrating Generative AI/ML capabilities.

The incumbent will combine modern ETL expertise with Vector Database and GenAI integration to support intelligent document processing and semantic search applications.

This is a Work From Home position.

Job Summary :

We are seeking a senior GenAI Data Engineer (8-12 years experience) with mandatory expertise in Azure Data Factory (ADF) and Databricks for building scalable ETL/ELT workflows. The ideal candidate will specialize in optimizing pipelines for unstructured content and possess experience with Vector Databases for semantic search and RAG (Retrieval-Augmented Generation) pipelines. Key responsibilities include implementing advanced data modeling and indexing techniques, ensuring pipeline performance, and possessing strong exposure to MLOps practices and Large Language Model (LLM) fine-tuning to drive intelligent data applications.

Key Responsibilities and Technical Deliverables :

GenAI Data Pipeline Development and Optimization :

- Design, build, and maintain robust data ingestion and transformation pipelines using Azure Data Factory (ADF) and Databricks environments for both structured and complex unstructured data.

- Optimise ETL/ELT pipelines for scalability, reliability, and performance, focusing on low latency processing for GenAI application needs.

- Implement and integrate Vector Database technologies for efficient storage and retrieval of embeddings, supporting advanced semantic search applications.

- Develop and manage RAG (Retrieval-Augmented Generation) pipelines, ensuring seamless integration between knowledge retrieval systems and LLMs.

Architecture, Modelling, and Performance :

- Apply Strong knowledge of data modelling, indexing, and query optimisation techniques suitable for both traditional relational stores and modern unstructured data repositories.

- Leverage proven Experience with cloud platforms (Azure preferred), utilizing services beyond ADF and Databricks for storage, compute, and serverless processing.

- Ensure data quality, governance, and security are maintained throughout the GenAI data lifecycle.

MLOps and GenAI Integration :

- Possess Exposure to MLOps practices for the reliable deployment, monitoring, and governance of AI/ML models.

- Demonstrate practical Exposure to LLM fine-tuning processes and managing data preparation necessary for effective model customization and training.

- Contribute to the overall architectural strategy for data integration within GenAI applications, including leveraging knowledge graphs for enhanced data contextualization.

Mandatory Skills & Qualifications :

- Experience : 8-12 Years in Data Engineering/AI Engineering roles.

- ETL/Cloud : Proven experience in Azure Data Factory (ADF) and Databricks for building ETL/ELT workflows.

- Data Proficiency : Strong knowledge of data modelling, indexing, and query optimisation.

- AI/GenAI : Experience with knowledge graphs or RAG (Retrieval-Augmented Generation) pipelines and Vector Databases.

- MLOps : Exposure to MLOps practices and LLM fine-tuning.

- Platform : Experience with cloud platforms (Azure preferred).

Preferred Skills :

- Proficiency in Python and PySpark for data transformation scripting.

- Experience with Azure AI services (e.g., Azure OpenAI Service).

- Knowledge of containerization (Docker, Kubernetes) for model serving.

- Experience with unstructured data processing libraries (e.g., spaCy, NLTK).