Posted on: 24/07/2025
About the Role :
We are seeking a skilled and passionate Data Engineer to join our team and drive the development of scalable data pipelines for Generative AI (GenAI) and Large Language Model (LLM)-powered applications. This role demands hands-on expertise in Spark, GCP, and data integration with modern AI APIs.
What You'll Do :
- Design and develop high-throughput, scalable data pipelines for GenAI and LLM-based solutions.
- Build robust ETL/ELT processes using Spark (PySpark/Scala) on Google Cloud Platform (GCP).
- Integrate enterprise and unstructured data with LLM APIs such as OpenAI, Gemini, and Hugging Face.
- Process and enrich large volumes of unstructured data, including text and document
embeddings.
- Manage real-time and batch workflows using Airflow, Dataflow, and BigQuery.
- Implement and maintain best practices for data quality, observability, lineage, and API-first designs.
What Sets You Apart :
- 3+ years of experience building scalable Spark-based pipelines (PySpark or Scala).
- Strong hands-on experience with GCP services: BigQuery, Dataproc, Pub/Sub, Cloud
Functions.
- Familiarity with LLM APIs, vector databases (e.g., Pinecone, FAISS), and GenAI use cases.
- Expertise in text processing, unstructured data handling, and performance optimization.
- Agile mindset and the ability to thrive in a fast-paced startup or dynamic environment.
Nice to Have :
- Experience working with embeddings and semantic search.
- Exposure to MLOps or data observability tools.
- Background in deploying production-grade AI/ML workflows
Did you find something suspicious?
Posted By
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1518980
Interview Questions for you
View All