HamburgerMenu
hirist

AI Data Engineer

Bluglint solutions
5 - 8 Years
Hyderabad

Posted on: 27/02/2026

Job Description

Description :


Key Responsibilities :


- Vector & Graph ETL : Design and maintain pipelines that transform unstructured data (PDFs, emails, logs, chats) into optimized embeddings for Vector Databases (Pinecone, Weaviate, Milvus).


- Semantic Data Modeling : Engineer data structures that optimize for Retrieval-Augmented Generation (RAG), ensuring agents find the "needle in the haystack" in milliseconds.


- Knowledge Graph Construction : Build and scale Knowledge Graphs (Neo4j) to represent complex relationships in our trading and support data that standard vector search misses.


- Automated Data Labeling & Synthetic Data : Implement pipelines using LLMs to auto-label datasets or generate synthetic edge cases for agent training and evaluation.


- Stream Processing for Agents : Build real-time data "listeners" (Kafka/Flink) that feed live context to agents, allowing them to react to market or support events as they happen.


- Data Reliability & "Drift" Detection : Build monitoring for "Embedding Drift", identifying when the statistical distribution of your data changes and the agent's "knowledge" becomes stale.


Qualifications :


- Vector Database Mastery : Expert-level configuration of HNSW indexes, scalar quantization, and metadata filtering strategies within Pinecone, Milvus, or Qdrant.


- Advanced Python & Rust : Proficiency in Python for AI logic and Rust (or C++) for high-performance data processing and custom embedding functions.


- Big Data Ecosystem : Hands-on experience with Apache Spark, Flink, and Kafka in a high-throughput environment (Trading/FinTech preferred).


- LLM Data Tooling : Deep experience with Unstructured.io, LlamaIndex, or LangChain for document parsing and chunking strategy optimization.


- MLOps & DataOps : Mastery of DVC (Data Version Control) and Airflow/Prefect for managing complex, non-linear AI data workflows.


- Embedding Models : Understanding of how to fine-tune embedding models (e.g., BGE, Cohere, or OpenAI) to better represent domain-specific (Trading) terminology.


Additional qualifications :


- Chunking Strategy Architect : You don't just "split text." You implement Semantic Chunking and Parent-Child retrieval strategies to maximize LLM context relevance.


- Cold/Warm/Hot Storage Strategy : Managing cost and latency by tiering data between Vector DBs (Hot), SQL/NoSQL (Warm), and S3/Data Lakes (Cold).


- Privacy & Redaction Pipelines : Building automated PII (Personally Identifiable Information) redaction into the ingestion layer to ensure agents never "see" or "leak" sensitive user data.


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in