Job Description :

We are seeking a highly skilled Manager Data Engineer with deep expertise in AWS data services, data wrangling using Python & PySpark, and a solid understanding of data governance, lineage, and quality frameworks. The ideal candidate will have a proven track record of delivering end-to-end data pipelines for logistics, supply chain, enterprise finance, or B2B analytics use cases.

Key Responsibilities :

- Design and implement scalable ETL/ELT pipelines using AWS Glue (3.0+), PySpark, and Athena.

- Build and manage data lakes on S3 using the bronze/silver/gold zone structure.

- Ensure pipelines are audit-ready, with validation logs, schema metadata, and classification tagging using Glue Data Catalog.

- Own end-to-end pipeline stages ingestion, transformation, validation, metadata enrichment, and BI readiness.

- Implement data quality frameworks using tools like Great Expectations to catch nulls, outliers, and rule violations early.

- Maintain data lineage and governance using OpenMetadata or Amundsen.

- Collaborate with Data Scientists for ML pipelines, feature engineering, and I/O (JSON/Parquet) optimization.

- Prepare filterable, flattened datasets for BI tools like Sigma, Power BI, or Tableau.

- Interpret complex business metrics (e.g., forecasted revenue, margin, utilization) and translate them into technical logic.

- Build orchestration workflows using AWS Step Functions, EventBridge, and CloudWatch.

- Ensure delivery aligns with evolving business KPIs and compliance standards.

Required Skills :

- Bachelor's or Masters degree in Computer Science, Data Engineering, or a related field.

- 6 to 9 years of hands-on experience in data engineering.

- Minimum 3 years working with AWS-native data platforms.

- Strong expertise in AWS ecosystem: Glue, Athena, S3, Step Functions, CloudWatch, EventBridge.

- Programming proficiency in Python 3.x, PySpark, and SQL (Athena/Presto).

- Experience with Pandas, NumPy, and time series manipulation.

- Proven track record of implementing data quality, governance, and lineage (Great Expectations, OpenMetadata, PII tagging).

- Experience in building audit logs, metadata tagging, and schema management.

- Ability to translate business metrics into reliable technical pipelines.

- Strong communication and collaboration with data, QA, and business teams.

- Familiarity with feature engineering, KPI logic, and BI-ready data structures.

Preferred Skills :

- Experience in domains such as logistics, supply chain, enterprise finance, or B2B analytics.

- Exposure to ML pipelines, data modeling, and KPI interpretation.

- Knowledge of Parquet/JSON, Agile development, and BI dashboards.

Success in This Role Means :

- You ship production-ready pipelines with embedded validation and lineage.

- You minimize QA rework by proactively handling edge cases and logic clarity.

- You become a go-to expert for data accuracy and business logic interpretation.

- You deliver scalable solutions that are easy to understand for BI, QA, and architecture teams.