Description :

Job Overview :

We are looking for an experienced Python Developer with PySpark expertise to design and develop scalable data processing solutions. The ideal candidate should have strong experience in Python programming, big data processing, and distributed data frameworks, with hands-on experience in PySpark.

Key Responsibilities :

- Develop and maintain scalable data pipelines using Python and PySpark.

- Process and analyze large datasets in distributed environments.

- Design and implement ETL workflows for data ingestion and transformation.

- Work closely with data engineers, data scientists, and analytics teams to support data-driven solutions.

- Optimize existing data processing jobs for performance and scalability.

- Ensure data quality, integrity, and reliability across pipelines.

- Collaborate with cross-functional teams in an Agile development environment.

Required Skills :

- Strong experience in Python programming.

- Hands-on experience with PySpark and Apache Spark.

- Experience working with large-scale data processing and distributed systems.

- Knowledge of ETL pipelines and data processing frameworks.

- Experience with SQL and relational databases.

- Familiarity with data lakes, cloud platforms (AWS/Azure/GCP), or big data tools is a plus.

- Understanding of version control tools like Git.

Qualifications:

- Bachelors or Masters degree in Computer Science, Engineering, or related field.

Preferred Skills:

- Experience with Hadoop ecosystem.

- Knowledge of data warehousing concepts.

- Exposure to Airflow, Kafka, or other data orchestration tools is an advantage.