Description :

- Evaluate business use cases and Identify data sources - internal and external environment to be used in analytics to support decision making related to such business cases

- Define, Design and Build an optimal data pipeline architecture to collect data from variety of sources, cleanse, and organize data in SQL & NoSQL destinations (ELT & ETL Processes).

- Define and Build business use case specific data models which can be consumed by Data Scientists and Data Analysts to conduct discovery and drive business insights and patterns.

- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.

- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS big data technologies.

- Build and deploy analytical models and tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.

- Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.

- Define, Design and Build Executive dash boards and reports catalog to serve decision making and insight generation needs.

- Provide inputs to help keep data separated and secure across data centers on prem and private and public cloud environments.

- Create data tools for analytics and data science team members that assist them in building and optimizing our product into an innovative industry leader.

- Work with data and analytics experts to strive for greater functionality in our data systems.

- Implement scheduled data load process and maintain and manage the data pipelines.

- Troubleshoot, investigate and fix failed data pipelines and prepare RCA.

Experience with Data Engineering technologies :

- Python, Spark, Snowflake, Databricks, Hadoop(CDH), Hive, Sqoop, oozie

- SQL Postgres, MySQL, MS SQL Server

- Azure ADF, Synapse Analytics, SQL Server, ADLS G2

- AWS Redshift, EMR cluster, S3

Experience with Data Analytics and Visualization toolsets :

- SQL, PowerBI, Tableau, Looker, Python, R

- Python libraries, Pandas, Scikit-learn, Seaborn, Matplotlib, TF, Stat-Models, PySpark, Spark-SQL, R, SAS, Julia, SPSS,

- Azure Synapse Analytics, Azure ML studio, Azure Auto ML