We are seeking a Senior Big Data Developer (6- 8 years experience) with mandatory expertise in the Hadoop Ecosystem, PySpark, and Scala. The ideal candidate will be responsible for designing, building, and optimizing scalable and robust Big Data pipelines using Spark and Hive.

Key responsibilities include implementing complex ETL/ELT logic, performing data ingestion and transformation, ensuring data quality, and contributing to overall architecture discussions, with hands-on experience in Dremio and Kafka being a significant advantage.

Key Responsibilities and Technical Deliverables :

Big Data Pipeline Development and Optimization :

- Design, develop, and optimize scalable Big Data pipelines (ETL/ELT) using PySpark and Scala to process massive datasets within the Hadoop Ecosystem.

- Mandatorily utilize Hive for efficient data storage, querying, and schema management across the data warehouse layer.

- Implement complex data transformations, aggregation logic, and data quality checks within Spark jobs to ensure data integrity.

- Hands-on experience on Dremio (a strong plus) for high-performance data lake querying and virtualization.

Core Technology Expertise :

- Demonstrate strong proficiency in Core Java principles for building robust backend services and complementing Big Data solutions where necessary.

- Work with messaging and streaming platforms, with Hands On exp. on Kafka (a strong plus) for real-time data ingestion and stream processing.

- Troubleshoot and performance-tune Spark applications, optimizing cluster usage, data shuffling, and memory allocation.

Architecture and Collaboration :

- Collaborate with data scientists, analysts, and other engineering teams to understand data requirements and translate them into technical pipeline specifications.

- Contribute to the architecture and governance of the data lake and data warehouse, ensuring security and compliance standards are met.

Mandatory Skills & Qualifications :

- Experience : 6- 8 Years in Big Data engineering.

- Big Data Ecosystem : Should have hands on Exp. of Big Data and Hadoop Ecosystems.

- Processing : Mandatory hands-on experience with Apache Spark (PySpark and Scala).

- Storage/Query : Strong expertise in Hive and Core Java.

Preferred Skills :

- Hands-on experience with Dremio (Data Virtualization Engine).

- Hands-on experience with Kafka (Streaming Platform).

- Experience with cloud data services (e.g., Azure HDInsight, AWS EMR, GCP Dataproc).

- Knowledge of advanced data warehousing concepts and data modeling techniques.