HamburgerMenu
hirist

Big Data Engineer - Spark/Hadoop/Hive

SKS Enterpprises
Anywhere in India/Multiple Locations
3 - 8 Years

Posted on: 29/11/2025

Job Description

Description :

Job Summary :

We are looking for a skilled Big Data Engineer to design, build, and maintain large-scale data processing systems and analytics platforms.

The ideal candidate should have strong expertise in big data technologies, distributed systems, and cloud-based data architectures.

You will work closely with data scientists, analysts, and engineering teams to develop reliable, high-performance data pipelines that support analytics, machine learning, and business intelligence.

Key Responsibilities

- Design and build scalable, high-performance data pipelines for batch and real-time processing.

- Develop ETL/ELT workflows using big data frameworks such as Spark, Hadoop, Hive, or Flink.

- Build and maintain data ingestion processes from various sources (API, databases, streaming platforms).

- Design and implement data lake and data warehouse architectures on cloud and on-prem platforms.

- Work with structured, semi-structured, and unstructured data formats (JSON, Avro, Parquet, ORC).

- Develop optimized data models, partitions, and cataloging strategies for analytics and BI use cases.

- Build real-time pipelines using Kafka, Pulsar, Spark Streaming, or Flink.

- Ensure low-latency, fault-tolerant streaming architecture.

- Optimize Spark jobs, Hive queries, HDFS storage, and distributed computing workloads.

- Troubleshoot performance bottlenecks and improve scalability and reliability.

- Implement processes for data validation, metadata management, lineage tracking, and observability.

- Ensure compliance with data security policies and governance standards (IAM, encryption, access control).

- Work with Data Science, Analytics, Product, and DevOps teams to enable business insights.

- Build APIs, data services, or integration layers for downstream consumers.

- Collaborate in Agile environments, participating in sprint planning and daily scrums.

Required Technical Skills :

Programming :

- Strong experience in Python / Scala / Java (at least one is mandatory).

- Experience with shell scripting and automation.

Big Data Technologies :

- Hands-on expertise with :


a. Apache Spark (batch + streaming)


b. Hadoop ecosystem : HDFS, YARN, Hive, HBase


c. Kafka / Pulsar for streaming


d. Airflow / Oozie / Luigi for workflow orchestration


Cloud Platforms :

- Experience with at least one cloud provider :

a. AWS (preferred) : EMR, Glue, S3, Redshift, Athena


b. Azure : Data Lake, Synapse, Databricks


c. GCP : BigQuery, DataProc, Dataflow


Data Warehousing & Storage :


- Columnar formats : Parquet, ORC

- Data lakes & warehouses : Snowflake, Redshift, BigQuery, Delta Lake

- NoSQL databases : Cassandra, MongoDB, DynamoDB

- Familiarity with Docker, Kubernetes, Terraform, and cloud deployment pipelines.

- Experience using Git, Jenkins, GitHub Actions, or similar CI/CD tools.

- Strong SQL skills with ability to write optimized, complex queries


info-icon

Did you find something suspicious?