Posted on: 19/03/2026
Overview :
We are looking for a skilled Data Engineer II to join our Data Platform team. You will play a key role in building and optimizing our next-generation data infrastructure. Operating at the scale of Flipkart (Petabytes of data), you will design, develop, and maintain high-throughput distributed systems, bridging traditional big data engineering with modern cloud-native and AI-driven workflows.
Key Responsibilities :
Data Pipeline Development & Optimization :
- Build Scalable Pipelines: Design, develop, and maintain robust ETL/ELT pipelines using Scala and Apache Spark/Flink (Core, SQL, Streaming) to process massive datasets with low latency.
- Performance Tuning: optimize Spark jobs and SQL queries for efficiency, resource utilization, and speed.
- Lakehouse Implementation: Implement and manage data tables using modern Lakehouse formats like Apache Iceberg, Hudi, or Delta Lake, ensuring efficient storage and retrieval.
Data Management & Quality :
- Data Modeling: Apply Medallion Architecture principles (Bronze/Silver/Gold) to structure data effectively for downstream analytics and ML use cases.
- Data Quality: Implement data validation checks and automated testing using frameworks (e.g., Deequ, Great Expectations) to ensure data accuracy and reliability.
- Observability: Integrate pipelines with observability tools to monitor data health, freshness, and lineage.
Cloud Native Engineering :
- Cloud Infrastructure: Deploy and manage workloads on GCP DataProc and Kubernetes (K8s), leveraging containerization for scalable processing.
- Infrastructure as Code: Contribute to infrastructure automation and deployment scripts.
Collaboration & Innovation :
- GenAI Integration: Explore and implement GenAI and Agentic workflows to automate data discovery and optimize engineering processes.
- Agile Delivery: Work closely with architects and product teams in an Agile/Scrum environment to deliver features iteratively.
- Code Reviews: Participate in code reviews to maintain code quality, standards, and best practices.
Required Qualifications :
- Experience: 3-5 years of hands-on experience in Data Engineering.
- Primary Tech Stack:
- Strong proficiency in Scala and Apache Spark (Batch & Streaming).
- Solid understanding of SQL and distributed computing concepts.
- Experience with GCP (DataProc, GCS, BigQuery) or equivalent cloud platforms (AWS/Azure).
- Hands-on experience with Kubernetes and Docker.
- Architecture & Storage:
- Experience with Lakehouse table formats (Iceberg, Hudi, or Delta).
- Understanding of data warehousing and modeling concepts (Star schema, Snowflake schema).
Soft Skills :
- Strong problem-solving skills and ability to work independently.
- Good communication skills to collaborate with cross-functional teams.
Education Qualification
- Bachelors or Masters degree in Computer Science, Information Technology, Engineering, or a related quantitative field.
Preferred Qualifications :
- Machine Learning Background: Familiarity with ML concepts, feature engineering, or experience building data pipelines for ML models is highly preferred.
- Experience with workflow orchestration tools (Airflow, Azkaban, etc.).
- Familiarity with real-time analytics databases (Druid, ClickHouse, HBase).
- Experience with CI/CD pipelines for data applications.
Why Join Us ?
- Work on petabyte-scale challenges that define the industry standard.
- Collaborate with top-tier engineers in a high-growth environment.
- Opportunity to work with cutting-edge technologies like Iceberg, K8s, and GenAI.
Did you find something suspicious?
Posted by
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1621921