This role supports Empowers data and AI strategy, with a focus on building Responsible AI capabilities. The Data Engineer will design and implement scalable, ethical, and secure data pipelines and infrastructure that underpin AI/ML systems, ensuring high-quality data flows into model development, testing, monitoring, and governance workflows. The candidate will work across cloud (AWS) and on-premises environments, contributing to the lifecycle of data used for Responsible AI tooling, including bias detection, model transparency, and compliance tracking.

ESSENTIAL FUNCTIONS :

- Design, build, and maintain data pipelines that support model development, testing, and monitoring, with a focus on AI governance and traceability.

- Collaborate with cross-functional teams (including Data Scientists, ML Engineers, and Risk) to understand data needs for AI use cases.

- Integrate data quality, lineage, and metadata tracking into ETL pipelines to support Responsible AI workflows.

- Support ingestion and transformation of structured and unstructured data (including NLP datasets) for AI model training and evaluation.

- Design with compliance in mind: integrate secure handling of PII and support auditability in data flows.

- Participate in technical design discussions focused on enabling transparency, fairness, and explainability in data workflows.

- Troubleshoot and resolve performance and data quality issues in distributed AI pipelines.

- Contribute to reusable libraries or templates to support standardized data practices across AI projects.

QUALIFICATIONS :

- Bachelors Degree in Computer Science, Information Systems, or related field.

- 2 - 6 years of experience in data engineering, preferably in AI/ML environments.

- Strong Python and SQL skills with experience in data pipeline orchestration (e.g., Airflow, Step Functions).

- Experience with Big Data frameworks (e.g., Spark, Hadoop) and streaming data platforms (e.g., Kafka).

- Experience working in AWS environments with services like S3, Glue, Redshift, SageMaker, and Lake Formation.

- Familiarity with machine learning workflows and data requirements (e.g., training/test splits, data versioning, feature stores).

- Experience integrating data validation, data lineage, or metadata tools (e.g., Great Expectations, Apache Atlas, Amundsen).

- Understanding of Responsible AI principles and experience supporting data aspects of fairness, bias, explainability, or model monitoring is a strong plus.

- Experience with JIRA and Agile methodologies.

- Experience in financial services or other highly regulated environments preferred.