- Define, Design, and Build an optimal data pipeline architecture to collect data from a variety of sources, cleanse, and organize data in SQL & NoSQL destinations (ELT & ETL Processes).

- Define and Build business use case-specific data models that can be consumed by Data Scientists and Data Analysts to conduct discovery and drive business insights and patterns.

- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.

- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS 'big data' technologies.

- Build and deploy analytical models and tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency, and other key business performance metrics.

- Work with stakeholders including the Executive, Product, Data, and Design teams to assist with data-related technical issues and support their data infrastructure needs.

- Define, Design, and Build Executive dashboards and reports catalogs to serve decision-making and insight generation needs.

- Provide inputs to help keep data separated and secure across data centers - on-prem and private and public cloud environments.

- Create data tools for analytics and data science team members that assist them in building and optimizing our product into an innovative industry leader.

- Work with data and analytics experts to strive for greater functionality in our data systems.

- Implement scheduled data load process and maintain and manage the data pipelines.

- Troubleshoot, investigate, and fix failed data pipelines and prepare RCA.

Skills Required :

Experience with a mix of the following Data Engineering Technologies :

- Python, Spark, Snowflake, Databricks, Hadoop (CDH), Hive, Sqoop, oozie.

- SQL - Postgres, MySQL, MS SQL Server.

- Azure - ADF, Synapse Analytics, SQL Server, ADLS G2.

- AWS - Redshift, EMR cluster, S3.

- Experience with a mix of the following Data Analytics and Visualization toolsets.

- Python libraries : Pandas, Scikit-learn, Seaborn, Matplotlib, TF, Stat-Models, PySpark, Spark-SQL, R, SAS, Julia, SPSS,.

- Azure : Synapse Analytics, Azure ML studio, Azure Auto ML.