Role : Data Engineer (AI-Aware)

Year of Experience : 3 - 5 Years

Location : Hybrid, Pune

Shift Timings : 02:30 PM to 11:30 PM (Mon-Fri)

Department : Information Technology

Your Role :

As a Data Engineer at ChiStats, you'll focus on building robust, scalable data pipelines that power analytics, operational systems, and emerging AI-driven features.

Your core expertise will be in data engineering using Python and Java, while you will also gain exposure to modern AI/ML and GenAI technologies.

You will collaborate with Senior Data Engineers and AI specialists, contributing to both traditional data workloads and next-generation AI-enabled systems. This role is ideal for someone strong in data engineering fundamentals and eager to upskill in AI/ML and GenAI through guided learning.

What You'll Do :

Core Data Engineering Responsibilities (Primary Focus) :

- Build and optimize ETL/ELT pipelines using Python and Java.

- Connect, extract, transform, and load data from structured and unstructured sources.

- Design efficient data processing workflows supporting analytics and operational systems.

- Develop and maintain APIs and microservices that expose data to applications and services.

- Work with relational and document databases such as PostgreSQL and MongoDB.

- Improve performance of pipelines through batching, parallelization, and caching.

- Participate in monitoring, data quality checks, and pipeline reliability improvements.

AI/ML/GenAI Collaboration (Secondary, Awareness-Level) :

- Work with AI/ML teams to understand data needs for model-building and intelligence features.

- Gain exposure to LLM-powered applications, prompt templates, and basic RAG concepts.

- Learn to support simple semantic search or vector-based lookups (with guidance).

- Assist in integrating lightweight AI workflows or components into data services.

- Opportunity to upskill on frameworks like LangChain, LlamaIndex, CrewAI, and vector databases (Pinecone, FAISS, Weaviate, Chroma) through training and mentorship.

Platform & CI/CD (Awareness-Level) :

- Collaborate with DevOps teams to deploy and monitor data services in AWS or containerized environments.

- Gain exposure to CI/CD practices and observability tools.

- Assist in maintaining cloud-based data workloads (AWS native services, CNCF tools).

Team Collaboration :

- Work closely with Senior Data Engineers, Data Analysts, and AI Engineers.

- Troubleshoot issues and contribute to production support within defined SLAs.

- Document data flows, schemas, and pipeline components clearly.

Skills & Qualifications :

Must-Have (Core Strengths) :

- Bachelor's degree in Computer Science, Engineering, Data Science, or related field.

- 2+ years of hands-on experience in data engineering or backend engineering.

- Strong proficiency in Python, plus working experience in Java or a similar language.

- Good understanding of data modeling, SQL, and database concepts (PostgreSQL, MongoDB).

- Experience building APIs or microservices.

Nice-to-Have (Awareness-Level, Willing to Learn) :

- Exposure to LLMs, embeddings, or GenAI concepts such as prompt templates or RAG basics.

- Basic familiarity with vector databases or AI agent frameworks.

- Working knowledge of CI/CD and cloud environments (AWS preferred).

- Exposure to orchestration tools like Airflow or Prefect.

Preferred :

- Experience working with insurance or financial datasets.

- Understanding of regulated data environments.