Key Responsibilities :
- Design, develop, and maintain scalable ETL/ELT pipelines using Python and PySpark.
- Work with large-scale datasets across distributed systems like HDFS, Hive, and Spark.
- Optimize performance of data ingestion and transformation processes using Delta Lake and Apache Iceberg.
- Implement and manage file formats such as Avro, ORC, and Parquet.
- Design external tables and optimize partitioning strategies in Hive/Databricks.
- Perform DML operations across distributed data platforms ensuring consistency and performance.
- Collaborate with data architects, analysts, and stakeholders to ensure high-quality data solutions.
Required Skills :
- Deep knowledge of HDFS, Hive, and Spark internals and architecture.
- Proficiency in working with Delta Lake and Apache Iceberg.
- Hands-on experience with file formats: Avro, ORC, Parquet.
- Expertise in ETL pipeline design, data ingestion, and performance tuning.
- Knowledge of Hive Metastore, external tables, and partition strategies.
- Experience in performing DML operations over distributed data systems.
Nice to Have :
- Familiarity with Starburst, Databricks, or similar query engines/platforms.
- Understanding of modern data lakehouse architectures and query optimizations.
- Exposure to cloud data platforms (AWS, Azure, or GCP) is a plus.
Why Join Us :
- Flexible hybrid work model based in Hyderabad.
- High-impact role in a data-driven enterprise environment.
- Competitive contract-to-hire opportunity with long-term growth prospects.
Did you find something suspicious?
Posted By
Samuel prabu
Talent Acquisition Recruiter at People Prime Worldwide Pvt. Ltd.
Last Active: 25 Sep 2025
Posted in
Data Engineering
Functional Area
Big Data / Data Warehousing / ETL
Job Code
1549360
Interview Questions for you
View All