HamburgerMenu
hirist

Big Data Engineer - Hadoop/Hive

SKS Enterpprises
Anywhere in India/Multiple Locations
5 - 7 Years

Posted on: 21/12/2025

Job Description

Description :


Key Responsibilities :


- Design, develop, and maintain scalable big data pipelines for batch and real-time data processing.


- Process large volumes of structured and unstructured data efficiently.


- Ensure data accuracy, reliability, and performance across pipelines.


- Work with distributed computing frameworks such as Apache Spark, Hadoop, Hive.


- Optimize jobs for performance, scalability, and cost.


- Handle data partitioning, shuffling, and parallel processing.


- Build and manage data ingestion pipelines from multiple sources (databases, APIs, logs, IoT, etc.


- Implement real-time data streaming using Kafka, Kinesis, or Pub/Sub.


- Support event-driven and near-real-time analytics use cases.


- Design and manage data storage solutions including data lakes and data warehouses.


- Implement efficient data models for analytical and reporting use cases.


- Work with formats such as Parquet, ORC, Avro.


- Develop and manage big data solutions on AWS, Azure, or GCP.


- Use cloud-native services like EMR, Databricks, BigQuery, Redshift, Synapse.


- Monitor and optimize cloud resources for performance and cost.


- Implement data validation, monitoring, and error-handling mechanisms.


- Ensure data security, access controls, and compliance with governance standards.


- Maintain documentation and data lineage.


- Collaborate with data analysts, data scientists, and product teams to understand data requirements.


- Participate in code reviews and follow engineering best practices.


- Support production deployments and troubleshooting.


Required Skills & Experience :


- Strong programming skills in Python, Scala, or Java.


- Proficiency in SQL and data querying.


- Hands-on experience with Apache Spark (batch and/or streaming).


- Experience with Hadoop ecosystem tools (HDFS, Hive, HBase).


- Experience with Kafka, Spark Streaming, Flink, or similar technologies.


- Understanding of batch vs streaming processing paradigms.


- Experience with NoSQL databases (Cassandra, HBase, DynamoDB).


- Familiarity with relational databases (PostgreSQL, MySQL).


- Experience with workflow orchestration tools (Airflow, Oozie).


- Familiarity with Git, CI/CD pipelines, and version control.


- Experience with Docker and Kubernetes is a plus


info-icon

Did you find something suspicious?

Job Views:  
6
Applications:  6
Recruiter Actions:  0

Functional Area

Big Data / Data Warehousing / ETL

Job Code

1593434