We are looking for a skilled Data Engineer II to join our Data Platform team. You will play a key role in building and optimizing our next-generation data infrastructure. Operating at the scale of Flipkart (Petabytes of data), you will design, develop, and maintain high-throughput distributed systems, bridging traditional big data engineering with modern cloud-native and AI-driven workflows.

Key Responsibilities :

Data Pipeline Development & Optimization :

- Build Scalable Pipelines: Design, develop, and maintain robust ETL/ELT pipelines using Scala and Apache Spark/Flink (Core, SQL, Streaming) to process massive datasets with low latency.

- Performance Tuning: optimize Spark jobs and SQL queries for efficiency, resource utilization, and speed.

- Lakehouse Implementation: Implement and manage data tables using modern Lakehouse formats like Apache Iceberg, Hudi, or Delta Lake, ensuring efficient storage and retrieval.

Data Management & Quality :

- Data Modeling: Apply Medallion Architecture principles (Bronze/Silver/Gold) to structure data effectively for downstream analytics and ML use cases.

- Data Quality: Implement data validation checks and automated testing using frameworks (e.g., Deequ, Great Expectations) to ensure data accuracy and reliability.

- Observability: Integrate pipelines with observability tools to monitor data health, freshness, and lineage.

Cloud Native Engineering :

- Cloud Infrastructure: Deploy and manage workloads on GCP DataProc and Kubernetes (K8s), leveraging containerization for scalable processing.

- Infrastructure as Code: Contribute to infrastructure automation and deployment scripts.

Collaboration & Innovation :

- GenAI Integration: Explore and implement GenAI and Agentic workflows to automate data discovery and optimize engineering processes.

- Agile Delivery: Work closely with architects and product teams in an Agile/Scrum environment to deliver features iteratively.

- Code Reviews: Participate in code reviews to maintain code quality, standards, and best practices.

Required Qualifications :

- Experience: 3-5 years of hands-on experience in Data Engineering.

- Primary Tech Stack:

- Strong proficiency in Scala and Apache Spark (Batch & Streaming).

- Solid understanding of SQL and distributed computing concepts.

- Experience with GCP (DataProc, GCS, BigQuery) or equivalent cloud platforms (AWS/Azure).

- Hands-on experience with Kubernetes and Docker.

- Architecture & Storage:

- Experience with Lakehouse table formats (Iceberg, Hudi, or Delta).

- Understanding of data warehousing and modeling concepts (Star schema, Snowflake schema).

Soft Skills :

- Strong problem-solving skills and ability to work independently.

- Good communication skills to collaborate with cross-functional teams.

Education Qualification

- Bachelors or Masters degree in Computer Science, Information Technology, Engineering, or a related quantitative field.

Preferred Qualifications :

- Machine Learning Background: Familiarity with ML concepts, feature engineering, or experience building data pipelines for ML models is highly preferred.

- Experience with workflow orchestration tools (Airflow, Azkaban, etc.).

- Familiarity with real-time analytics databases (Druid, ClickHouse, HBase).

- Experience with CI/CD pipelines for data applications.

Why Join Us ?

- Work on petabyte-scale challenges that define the industry standard.

- Collaborate with top-tier engineers in a high-growth environment.

- Opportunity to work with cutting-edge technologies like Iceberg, K8s, and GenAI.