- Lead and mentor a team of data engineers in designing, developing, and maintaining scalable data pipelines.

- Architect, build, and optimize ETL workflows using Python, PySpark, and SQL.

- Collaborate with data scientists, analysts, and business teams to understand data requirements and deliver reliable solutions.

- Implement and manage data integration from multiple structured and unstructured sources.

- Design and maintain data lake/data warehouse solutions on AWS (S3, Glue, Redshift, EMR, Lambda) or Azure (Data Lake, Synapse, Databricks, Data Factory).

- Ensure data quality, security, and compliance with best practices.

- Optimize performance of large-scale data processing systems and pipelines.

- Drive automation, CI/CD practices, and infrastructure-as-code for data platforms.

- Provide technical leadership in solution design, code reviews, and architecture decisions.

Required Skills & Qualifications :

- Strong proficiency in Python, PySpark, and SQL.

- Proven experience in ETL design and development.

- Hands-on expertise in big data frameworks (Spark, Hadoop ecosystem).

- Deep understanding of cloud platforms - AWS (Glue, EMR, Redshift, S3, Lambda) or Azure (Data Factory, Synapse, Databricks, Data Lake).

- Experience with data modeling, data warehousing, and performance optimization.

- Strong knowledge of version control (Git), CI/CD pipelines, and DevOps practices.

- Excellent problem-solving and analytical skills.

- Strong communication and leadership skills with experience leading teams/projects.

Good to Have :

- Experience with streaming platforms (Kafka, Kinesis, Event Hub).

- Knowledge of containerization & orchestration (Docker, Kubernetes).

- Exposure to machine learning pipelines or MLOps.

- Familiarity with data governance and security frameworks.