About the Role :

We are looking for a highly skilled and experienced Big Data Developer to join our data engineering team. The ideal candidate will have a strong background in big data technologies, with hands-on experience in building scalable data pipelines and infrastructure. If you're passionate about working on large-scale distributed systems and cutting-edge open-source technologies, wed love to connect with you.

Key Responsibilities :

- Design, develop, and maintain scalable data ingestion, transformation, and enrichment pipelines.

- Work with Apache Spark (batch and streaming) for processing large volumes of data efficiently.

- Utilize Scala and Python to implement robust data engineering solutions.

- Manage and optimize HDFS and lead migration strategies across storage layers.

- Implement and manage table formats like Apache Iceberg, Delta Lake, or Apache Hudi for schema evolution, ACID compliance, and time travel features.

- Run Spark/Flink workloads on Kubernetes using tools like Spark-on-K8s operator or Flink-on-K8s.

- Leverage distributed object storage systems such as Ceph or AWS S3.

- Use Infrastructure-as-Code (Terraform, Helm) to provision and manage data infrastructure.

Required Skills & Experience :

- 7 to 10 years of experience in Big Data Engineering.

- Proficiency in Scala and Python.

- Expertise in Apache Spark (batch + streaming).

- Strong understanding of HDFS internals and hands-on experience in migration strategies.

- Hands-on experience with Apache Iceberg (or similar Delta Lake, Apache Hudi).

- Experience running Spark/Flink on Kubernetes.

- Familiarity with distributed blob storage solutions such as Ceph or AWS S3.

- Experience building high-throughput data pipelines for large-scale datasets.

- Strong knowledge of Terraform and Helm for infrastructure provisioning.

Preferred Qualifications :

- Contributions to open-source big data projects.

- Exposure to performance tuning in Spark/Flink.

- Experience working in cloud-native environments (AWS/GCP/Azure).