About The Job :

- Develops technical tools and programming to cleanse, organize and transform data and to maintain, protect and update data structures and integrity on an automated basis.

- Applies data extraction, transformation, and loading techniques in order to tie together large data sets from a variety of sources.

- Partners with both internal & external sources to design, build and oversee the deployment and operation of technology architecture, solutions and software.

- Designs, develops and programs methods, processes and systems to capture, manage, store and utilize structured and unstructured data to generate actionable insights and solutions.

- Responsible for the maintenance, improvement, cleaning, and manipulation of data in the business client's operational and analytics databases.

- Proactively analyzes and evaluates the business client's databases in order to identify and recommend improvements and optimization.

Essential Job Functions :

- Uses knowledge of existing and emerging data science engineering principles, theories, and techniques to inform business decisions; and produce accurate business insights.

- Completes projects and assignments of moderate scope and complexity under normal supervision to ensure customer and business needs are met.

- Applies discretion and independent judgement to interpret data trends and summarize data insights.

- Assists in the preliminary data exploration, data preparation for accurate model development.

- Establishes working relationships with others outside area of Data Science Engineering expertise.

- Prepares presentations of project outputs for external customers with assistance.

- Design, develop, and maintain scalable data pipelines and systems for data processing.

- Utilize Data Lakehouse, Spark on Kubernetes and related technologies to manage large-scale data processing.

- Perform data ingestion from various sources like API's, RDBMS, NoSQL DB's, Kafka, Middleware & Files using Spark and process data into Lakehouse platform.

- Develop and maintain py-spark scripts for automation of data processing tasks.

- Implement full and incremental data loading strategies to ensure data consistency and availability.

- Orchestrate and monitor workflows using Apache Airflow.

- Ensure code quality and version control using GIT.

- Troubleshoot and resolve data-related issues in a timely manner.

- Stay up-to-date with the latest industry trends and technologies to continuously improve our data infrastructure.

- Design, develop, and maintain scalable data pipelines and systems for data processing.

- Utilize Data Lakehouse, Spark on Kubernetes and related technologies to manage large-scale data processing.

- Perform data ingestion from various sources like API's, RDBMS, NoSQL DB's, Kafka, Middleware &

- Files using Spark and process data into Lakehouse platform.

- Develop and maintain py-spark scripts for automation of data processing tasks.

- Implement full and incremental data loading strategies to ensure data consistency and availability.

- Orchestrate and monitor workflows using Apache Airflow.

- Ensure code quality and version control using GIT.

- Troubleshoot and resolve data-related issues in a timely manner.

- Stay up-to-date with the latest industry trends and technologies to continuously improve our data infrastructure.

Qualifications :

- Proven experience as a Data Engineer (ETL, data warehousing, data Lakehouse).

- Strong knowledge of Spark on Kubernetes, S3 and Docker Images.

- Proficiency in Data engineering techniques with Py-spark.

- Strong experience in Data warehousing techniques like data mining, data analysis, data profiling.

- Experience with Python scripting for automation.

- Expertise in full and incremental data loading techniques.

- Excellent problem-solving skills and attention to detail.

- Ability to work collaboratively in a team environment and communicate effectively with stakeholders.

Good to have :

- Understanding of streaming data applications using.

- Hands-on experience with Apache Airflow for workflow orchestration.

- Proficiency with GIT for version control

- Understanding of data engineering integration with LLMs or GEN-AI applications and Vector DB.

- Knowledge on Shell scripting Postgres SQL or SQL server or MSBI.