Job Summary :

Main focus should be on Python, Data/ Web crawling, Building data pipelines, creating Apis, and Mongo DB

Key Responsibilities :

- Bring in industry best-practices around creating and maintaining robust data pipelines for complex data projects with / without AI component :

- Programmatically retrieve (unstructured mostly) data from several static and real-time sources (incl. web scraping, API use).

- Structure this data into a structured format.

- Harmonize the data, into a common format and store it in a dedicated database.

- Schedule the different jobs into a dedicated pipeline.

- Rendering results through dynamic interfaces incl. web / mobile / dashboard with ability to log usage and granular user feedbacks.

- Performance tuning and optimal implementation of complex Python scripts, SQL

etl

- Industrialize ML / DL solutions and deploy and manage production services; proactively handle data issues arising on live apps.

- Perform ETL on large and complex datasets for AI applications - work closely with data scientists on performance optimization of large-scale ML/DL model finetuning.

- Build data tools to facilitate fast data cleaning and statistical analysis.

- Build and ensure data architecture is secure and compliant.

- Resolve issues escalated from Business and Functional areas on data quality, accuracy, and availability.

- Work closely with APAC IT Transformation and coordinate with a fully decentralized team across different locations in APAC and global HQ (Paris).

Required Skills and Experience :

You should be :

- Expert in structured and unstructured data in traditional and Big data environments Oracle / SQLserver, MongoDB, Hive / Pig, BigQuery and Spark.

- Have excellent knowledge of Python programming both in traditional and distributed models (PySpark).

- Expert in shell scripting and writing schedulers.

- Hands-on experience with Cloud - deploying complex data solutions in hybrid cloud / on-premise environment both for data extraction / storage and computation.

- Well versed with DevOps best practices like containerization, CICD pipeline (Jenkins and Maven).

- Hands-on experience in deploying production apps using large volumes of data with state-of-the-art technologies like Dockers, Kubernetes and Kafka.

- Experience working on industry standard services like Message Queue, Redis, Elastic Search, Kafka, or Spark Streaming.

- Strong knowledge of data security best practices.

Preferred Qualifications :

- Graduate from a Tier-1 university.