- Design, build, and optimize data pipelines for batch and streaming workloads ensuring scalability and reliability.

- Implement query optimization and caching mechanisms to improve performance and reduce processing time.

- Develop query normalization and parameterization techniques for plan reuse and efficiency.

- Architect data integration workflows across lakes, warehouses, and graph/NoSQL systems.

- Define and manage schema mapping, metadata, and transformation logic across diverse data sources.

- Ensure data governance, observability, and quality through robust testing, monitoring, and validation.

- Collaborate with data scientists, architects, and analysts to deliver unified and high-performance data solutions.

Requirements :

Required Skills & Qualifications :

- 5+ years of hands-on experience in data engineering or large-scale data system design.

- Experience with compiler or parser development like ANTLR (parsing, semantic checks, AST transformations, query planning, code generation).

- Strong expertise in SQL, query tuning, and execution plan optimization.

- Proficiency in Python, Java, or Scala for data pipeline development.

- Practical experience with data processing frameworks (Spark, Flink, Beam) and streaming platforms (Kafka, Kinesis).

- Experience with relational, NoSQL, or graph databases and schema design.

- Familiarity with containerization (Docker), orchestration (Kubernetes), and CI/CD workflows.

- Solid understanding of data modeling, metadata management, and data governance principles.

- Bachelors or Masters degree in Computer Science, Data Engineering, or a related field.

Good to Have :

- Experience with query caching or optimization frameworks.

- Knowledge of cloud data platforms (AWS, GCP, Azure).

- Exposure to real-time analytics and event-driven architecture.

- Relevant certifications in data engineering or cloud technologies.