- Experience with big data processing and distributed computing systems like Spark.

- Implement ETL pipelines and data transformation processes.

- Ensure data quality and integrity in all data processing workflows.

- Troubleshoot and resolve issues related to PySpark applications and workflows.

- Understand source, dependencies and data flow from converted PySpark code.

- Strong programming skills in Python and SQL.

- Experience with big data technologies like Hadoop, Hive, and Kafka.

- Understanding of data warehousing concepts and relational databases like SQL.

- Demonstrate and document code lineage.

- Integrate PySpark code with frameworks such as Ingestion Framework, DataLens, etc.,

- Ensure compliance with data security, privacy regulations, and organizational standards.

- Knowledge of CI/CD pipelines and DevOps practices.

- Strong problem-solving and analytical skills.

- Excellent communication and leadership abilities.

Qualifications :

- 4+ years of experience in big data development, Hadoop , Hive & Spark framework.

- Good to have experience in SAS.

- Strong Python, PySpark Development and SQL knowledge.

- Certification in big data or cloud technologies is preferred.

- Excellent communication, collaboration, and problem-solving skills and leading a 5 member PySpark team