About the Role :

We are seeking an experienced Data Engineer with a strong background in building scalable data platforms, big data engineering, and end-to-end data lifecycle management. The ideal candidate will have deep expertise in data governance, modelling, and architecture, with hands-on experience in modern data platforms, pipelines, and analytics tools. This role involves designing and maintaining robust data systems that empower business stakeholders to drive data-driven decision-making at scale.

Key Responsibilities :

Data Architecture & Governance :

- Architect and define end-to-end data flows for Big Data/Data Lake use cases.

- Implement best practices in data governance, data quality, master data management, and data security.

- Collaborate with enterprise/domain architects to align data solutions with enterprise roadmaps.

- Participate in Technical Design Authority forums to influence and validate architectural decisions.

Pipeline Development & Data Engineering :

- Design, develop, and optimize scalable ETL/ELT pipelines across diverse data sources (cloud, on-premises, SQL/NoSQL, APIs).

- Automate data ingestion and transformation processes, ensuring performance, scalability, and reliability.

- Implement real-time, batch, and scheduled data ingestion using tools like Apache Sqoop, Flume, Kinesis, Logstash, FluentD.

- Work with Databricks, Spark, Hive, Hadoop, Azure Data Factory, Scala, Python, R to deliver robust data processing workflows.

- Optimize pipeline performance by analyzing physical/logical execution plans.

Data Management & Analytics Enablement :

- Collaborate with analytics teams to improve data models feeding BI tools (e.g., Power BI, Tableau).

- Build and maintain OLAP cubes to address BI limitations and enable complex business analysis.

- Deliver data cleansing, validation, and enrichment solutions to ensure data accuracy.

- Lead initiatives in data mining, statistical analysis, and advanced data modelling (Star/Snowflake schemas, SCD2).

Operations & Performance Optimization :

- Estimate and optimize cluster/core sizes for Databricks clusters and Analysis Services.

- Deploy and maintain CI/CD DevOps pipelines across development, staging, and production environments.

- Monitor, troubleshoot, and enhance system performance, ensuring optimal data ingestion and storage.

- Conduct continuous audits of data systems to identify gaps, performance bottlenecks, or security loopholes.

Leadership & Collaboration :

- Act as a coach/mentor to junior data engineers, providing technical guidance and enforcing best practices.

- Collaborate cross-functionally with business stakeholders, analytics teams, and engineering squads to deliver business outcomes.

- Allocate and track tasks across the team, reporting progress and deliverables to management.

Essential Qualifications & Skills :

Education : Bachelors degree in Computer Science, Engineering, or related field (Masters preferred).

Experience :

- 10+ years in data analytics platforms, ETL/ELT transformations, SQL programming.

- 5+ years hands-on experience in Big Data Engineering, Data Lakes, Distributed Systems.

Technical Expertise :

- Strong proficiency in Hadoop ecosystem (HDFS, Hive, Sqoop, Oozie, Spark Core/Streaming).

- Programming in Scala, Java, Python, Shell scripting.

- Deep experience with Azure Data Platform (Azure SQL DB, Data Factory, Cosmos DB).

- Database expertise: Oracle, MySQL, MongoDB, Presto.

- Data ingestion/extraction using REST API, ODATA, JSON, XML, Web Services.

Core Skills :

- Strong foundation in data modelling, warehousing, and architecture principles.

- Hands-on experience with ETL tools and best practices.

- Solid understanding of data security (encryption, tunneling, access control).

- Proven ability in troubleshooting and performance optimization.