- Collaborate closely with the reporting team, data scientists, product owners, and domain experts to design, develop, and maintain scalable data pipelines using an agile development methodology.

- Build efficient and reusable pipelines to ingest, process, and manage large-scale structured and unstructured datasets using cloud infrastructure, while ensuring compliance with data governance, lineage, and monitoring standards.

- Work effectively with diverse data sources, including structured datasets (e.g., RDBMS, Hive, Impala) and semi-structured datasets (e.g., JSON, XML, logs).

- Leverage cloud platforms and tools such as AWS, Dataiku, and Hadoop to architect and implement scalable, fault-tolerant data solutions.

- Automate and optimize ETL workflows to streamline data integration and transformation processes for enterprise-scale reporting and AI solutions.

- Develop and maintain a strong foundation in software engineering practices, including CI/CD pipelines, version control tools (e.g., Git), and advanced scripting with Python or PySpark (preferably).

- Utilize containerization and orchestration systems such as Jenkins, Dataiku, and Git to deploy, manage, and monitor scalable data solutions.

- Monitor, troubleshoot, and support operational data pipelines to ensure they meet user requirements, implementing bug fixes and necessary changes promptly.

- Collaborate with cross-functional teams to address requirements for data preparation and processing for advanced analytics, reporting, and machine learning models.

- Stay updated on emerging technologies, tools, and best practices in data engineering to improve data pipeline performance and scalability.

Your Profile :

You are an ideal candidate for this role if you have :

- A Bachelors or Masters degree in Computer Science, Mathematics, Statistics, or a related field.

- At least 3+ years of hands-on experience in data engineering, with a proven track record of designing and implementing robust data pipelines.

- Proficiency in working with both structured and unstructured data, leveraging databases (e.g., RDBMS, Hive, Impala), APIs and/or NoSQL systems (e.g. MongoDB).

- Strong experience with cloud services, preferably Dataiku, along with tools such as AWS and Hadoop, focused on building scalable and enterprise-grade pipelines.

- Solid expertise in automating and optimizing ETL workflows for both real-time and batch data processing.

- A strong foundation in software/data engineering practices, including CI/CD pipelines, version control tools (e.g., Git,Jenkins), and advanced scripting with Python or PySpark (preferably).

- Familiarity with deployment tools and platforms such as Jenkins to support production-ready pipelines.

- Hands-on experience with big data platforms (e.g., Spark, HDFS, Impala) and a strong understanding of data governance, lineage, and monitoring principles.

- Experience in supporting reporting, visualization, and data analytics tools such as Tableau, and collaborating with data scientists for feature engineering.

- Exceptional collaboration and communication skills, with experience working in cross-functional, agile teams.

- Excellent problem-solving abilities, analytical thinking, and attention to detail.

- A proactive, self-motivated, and results-oriented approach, with the ability to adapt and perform in a dynamic environment.

Good to have :

- Working knowledge of orchestration systems, including Docker, Kubernetes

- Knowledge of concepts around Data as a product, Data Mesh etc.

- Basics of ML and AI