HamburgerMenu
hirist

Big Data Engineer - Python/PySpark

Posted on: 19/09/2025

Job Description

Key Responsibilities :

- Design, develop, and maintain scalable ETL/ELT pipelines using Python and PySpark.

- Work with large-scale datasets across distributed systems like HDFS, Hive, and Spark.

- Optimize performance of data ingestion and transformation processes using Delta Lake and Apache Iceberg.

- Implement and manage file formats such as Avro, ORC, and Parquet.

- Design external tables and optimize partitioning strategies in Hive/Databricks.

- Perform DML operations across distributed data platforms ensuring consistency and performance.

- Collaborate with data architects, analysts, and stakeholders to ensure high-quality data solutions.


Required Skills :


- Strong programming skills in Python and PySpark.

- Deep knowledge of HDFS, Hive, and Spark internals and architecture.

- Proficiency in working with Delta Lake and Apache Iceberg.

- Hands-on experience with file formats: Avro, ORC, Parquet.

- Expertise in ETL pipeline design, data ingestion, and performance tuning.

- Knowledge of Hive Metastore, external tables, and partition strategies.

- Experience in performing DML operations over distributed data systems.


Nice to Have :


- Hands-on experience with SAS SPDE tables and migration strategies.

- Familiarity with Starburst, Databricks, or similar query engines/platforms.

- Understanding of modern data lakehouse architectures and query optimizations.

- Exposure to cloud data platforms (AWS, Azure, or GCP) is a plus.


Why Join Us :


- Opportunity to work on cutting-edge data platforms and tools.

- Flexible hybrid work model based in Hyderabad.

- High-impact role in a data-driven enterprise environment.

- Competitive contract-to-hire opportunity with long-term growth prospects.


info-icon

Did you find something suspicious?