Description :
Year of Experience : 5 to 7 Years
Location : Hybrid, Pune
Shift Timings : 02 : 30 PM to 11 : 30 PM (Mon-Fri)
Department : Information Technology
At ChiStats, were building the data foundation that powers the next generation of commercial insurance experiences. As a Senior Data Engineer, you will design, build, and maintain scalable data pipelines and services that support analytics, machine learning, and emerging AI applications.
You will work closely with ML, product, and platform teams to ensure that high-quality, reliable data flows into the systems that drive intelligent decision-making. While your core expertise will be in data engineering, you will also work with modern AI/ML and GenAI technologies with an expectation of basic understanding and a willingness to learn and grow in these domains.
What Youll Do :
Core Data Engineering Responsibilities (Primary Focus) :
- Architect and implement large-scale data pipelines using Python and Java.
- Design and maintain ETL/ELT workflows, supporting analytics, operational systems, and ML pipelines.
- Build optimized data models and work with relational (PostgreSQL) and document databases (MongoDB).
- Develop and scale data microservices and APIs using Python or Java.
- Implement data quality checks, monitoring, and data reliability best practices.
- Optimize the performance of large data processing pipelines through parallelization, batching, and caching strategies.
AI/ML/GenAI Collaboration (Secondary, Learning-Oriented) :
- Work with ML engineers to understand data needs for model training, feature engineering, and inference pipelines.
- Contribute to building foundational components of RAG pipelines, embeddings, and vector storagewith guidance from AI systems engineers.
- Understand basics of LLMs, embeddings, and vector databases (Pinecone, Weaviate, FAISS, etc.).
- Assist in integrating AI-based search, summarization, or automation features into data workflows.
- Participate in upskilling programs and cross-team learning to grow your GenAI and AI engineering capabilities.
Platform, DevOps & CI/CD (Awareness-Level) :
- Work with DevOps teams to implement CI/CD for data pipelines (AWS, Kubernetes, Terraform, etc.).
- Understand provisioning, monitoring, and observability for data systems.
- Contribute to infrastructure-as-code practices in collaboration with the platform team.
Leadership & Ownership :
- Provide mentorship to junior data engineers.
- Lead problem-solving sessions and troubleshoot data pipeline issues within defined SLAs.
- Collaborate actively with product, ML, and engineering teams to ensure data readiness.
Skills & Qualifications :
Must-Have (Core Competencies) :
- Bachelors/Masters degree in Computer Science, Data Engineering, or related field.
- 5+ years of experience in data engineering, with strong production-level delivery experience.
- Expert-level proficiency in Python and strong experience with Java.
- Deep experience with data modeling, ETL pipeline design, and workflow orchestration (Airflow, Prefect).
- Hands-on experience with PostgreSQL and MongoDB.
- Strong grasp of distributed systems, microservices, and high-performance data processing.
Nice-to-Have (Awareness-Level) Willingness to Upskill :
- Exposure to LLMs, embeddings, and AI/ML concepts.
- Basic understanding of vector databases (Pinecone, FAISS, Weaviate, Chroma).
- Familiarity with prompt engineering, GenAI concepts, RAG pipeline basics.
- Understanding of AI agent frameworks (LangChain, LlamaIndex, CrewAI, etc.).
- Working knowledge of AWS services, Kubernetes, or data pipeline CI/CD.
Preferred (Not Mandatory) :
- Experience in insurance, fintech, or regulated data environments.
- Experience supporting ML or GenAI-driven products.