Description :

About the Role :

We are looking for a skilled Data Engineer to design, build, and maintain scalable data pipelines and architectures that support advanced analytics, machine learning, and reporting solutions.

The ideal candidate will have a strong background in data modeling, ETL processes, and cloud-based data platforms such as Azure, AWS, or GCP.

You will collaborate with cross-functional teamsdata scientists, analysts, and software engineersto ensure data quality, reliability, and availability for business-critical applications.

Key Responsibilities :

- Design, develop, and maintain ETL/ELT pipelines to extract, transform, and load data from diverse sources.

- Build and optimize data architectures, including data lakes, data warehouses, and streaming pipelines.

- Work with structured and unstructured data using tools like Spark, Databricks, or Azure Data Factory.

- Implement data ingestion and integration using APIs, batch, and real-time streaming frameworks (Kafka, Kinesis, Event Hubs).

- Ensure data quality, validation, and governance across all data systems.

- Collaborate with data scientists and analysts to prepare datasets for analytics and AI workloads.

- Monitor and troubleshoot pipeline performance, ensuring reliability and scalability.

- Develop and maintain data models, metadata repositories, and documentation.

- Work closely with cloud architects to design secure and cost-effective cloud data solutions.

- Apply DevOps principles for data (DataOps)CI/CD pipelines, version control, and automated testing for data workflows.

Required Skills & Expertise :

- Strong programming skills in Python, Scala, or Java.

- Proficiency in SQL for data extraction, transformation, and optimization.

- Hands-on experience with ETL/ELT tools such as Azure Data Factory, AWS Glue, Informatica, Talend, or Apache Airflow.

- Experience with big data frameworks : Apache Spark, Hadoop, or Databricks.

- Solid understanding of data modeling (OLTP/OLAP), data warehousing, and dimensional modeling.

- Expertise in cloud platforms (Azure, AWS, or GCP) and services like Redshift, BigQuery, Synapse, or Snowflake.

- Familiarity with containerization and orchestration (Docker, Kubernetes) is a plus.

- Experience with data versioning and CI/CD tools (Git, Jenkins, Azure DevOps).

- Understanding of data security, compliance, and governance frameworks