Description :

Primary Job Title : Data Engineering Lead.

About The Opportunity :

We are seeking a highly skilled Lead Data Engineer with strong expertise in Python, Pandas, PySpark, AWS, and SQL to design, build, and manage scalable data solutions.

The ideal candidate will lead a team of data engineers, develop robust ETL pipelines, and collaborate with analytics, data science, and business teams to ensure high data quality and performance across cloud-based environments.

Role & Responsibilities :

- Lead the design and development of data pipelines for ingestion, transformation, and integration from multiple sources into the enterprise data platform.

- Implement data quality frameworks, validation checks, and monitoring solutions using Python and SQL.

- Optimize PySpark jobs for performance and scalability in a distributed computing environment (AWS EMR/Glue).

- Develop reusable ETL frameworks using PySpark and Pandas for data transformation and analytics.

- Manage and maintain cloud-based infrastructure and data storage (AWS S3, Redshift, Lambda, Glue, Athena).

- Collaborate with data scientists, analysts, and stakeholders to provide clean, structured, and accessible datasets.

- Oversee code reviews, performance tuning, and mentoring junior data engineers.

- Establish best practices for data governance, version control, and CI/CD integration.

- Troubleshoot production data issues and ensure system reliability, availability, and efficiency.

Must-Have Skills :

- Programming : Python (advanced scripting, Pandas, PySpark).

- Cloud : AWS (S3, Glue, Lambda, Redshift, Athena, EMR).

- Database / Querying : Advanced SQL (Joins, Window Functions, Query Optimization).

- Big Data Tools : PySpark, Spark SQL.

- ETL Development : Data ingestion, transformation, and validation pipelines.

- Version Control : Git / GitHub / Bitbucket.

- Workflow Orchestration : Apache Airflow or equivalent (preferred).

Good-to-Have Skills :

- Experience with Docker or Kubernetes for containerization.

- Familiarity with CI/CD pipelines and DevOps practices.

- Exposure to data modeling, schema design, and partitioning strategies.

- Understanding of data lake and data warehouse architecture.

- Knowledge of monitoring tools (CloudWatch, Datadog, or Prometheus).

Qualifications :

- Bachelors or Masters degree in Computer Science, Data Engineering, or a related field.

- 11- 16 years of total experience with at least 3- 4 years leading data engineering projects.

- Proven track record in handling large-scale data pipelines and cloud migrations.