This role will be responsible for developing and maintaining data models to support data warehouse and reporting requirements. It requires a strong background in data engineering, excellent leadership capabilities, and the ability to drive projects to successful completion.

Job Responsibilities :

- Engage with Client to participate in requirement gathering, Status update on work, UAT and be the key partner in the overall engagement

- Participates in ETL Design using any python framework of new or changing mappings and workflows with the team and prepares technical specifications

- Crafts ETL Mappings, Mapplets, Workflows, Worklets using Informatica PowerCenter

- Write complex SQL queries with performance tuning and optimization

- Should be able to handle task independently and lead the team

- Responsible for unit testing, integration testing and UAT as and when required

- Good communication Skills

- Coordinate with cross-functional teams to ensure project objectives are met.

- Collaborate with data architects and engineers to design and implement data models.

Job Requirements :

- Advanced knowledge of PySpark, python, pandas, numpy frameworks.

- Minimum 4 years of extensive experience in design, build and deployment of Spark/Pyspark for data integration.

- Deep experience in developing data processing tasks using pySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations

- Create Spark jobs for data transformation and aggregation

- Spark query tuning and performance optimization

- Good understanding of different file formats (ORC, Parquet, AVRO) to optimize queries/processing and compression techniques.

- Deep understanding of distributed systems (e.g. CAP theorem, partitioning, replication, consistency, and consensus)

- Experience in Modular Programming & Robust programming methodologies

- ETL knowledge and have done ETL development using any python framework

- Advanced SQL knowledge

- Ability to perform multiple task in continually changing environment

- Worked with Redshift/Synapse/Snowflake in the past Preferable.

- Good understanding and experience in the SDLC phases like the Requirements Specification, Analysis, Design, Implementation, Testing, Deployment and Maintenance

Qualification :

- BE/ B. Tech/ /M Tech/MBA

Must have Skills :

- Expertise in pharma commercial domain

- Proficiency in PySpark, Hadoop, Hive, and other big data technologies

Skills that give you an edge :

- Excellent interpersonal/communication skills (both oral/written) with the ability to communicate at various levels with clarity & precision