HamburgerMenu
hirist

Emergy - Big Data Specialist - Hadoop/Spark

Emergys Software Private Limited Innovation for Ag
Pune
8 - 10 Years

Posted on: 24/12/2025

Job Description

Key Responsibilities :


Data Architecture & Design :


- Design and implement scalable, fault-tolerant, and high-performance data architectures supporting batch and streaming workloads.


- Define end-to-end data pipelines and platform architecture for large-scale data ingestion, processing, and consumption.


- Ensure architectural alignment with enterprise standards, cloud best practices, and long-term scalability goals.


Data Modeling & Integration :


- Design and maintain efficient data models (conceptual, logical, and physical) for analytics and operational use cases.


- Integrate data from multiple structured and unstructured sources, including databases, APIs, files, and event streams.


- Optimize data transformations and storage for performance, cost, and usability.


Big Data Technologies :


- Build and manage data processing solutions using big data frameworks and platforms (e.g., Spark, Hadoop, Kafka, Flink).


- Develop batch and real-time data pipelines using distributed computing technologies.


- Work with cloud-based big data services (AWS, Azure, or GCP) for ingestion, processing, and analytics.


Data Security & Governance :


- Implement data security, privacy, and access control mechanisms across the data ecosystem.


- Ensure compliance with regulatory requirements and internal governance policies.


- Define and enforce data quality, lineage, metadata management, and retention standards.


Collaboration & Stakeholder :


- Management Collaborate closely with data scientists, data engineers, analysts, and business stakeholders to deliver fit-for-purpose data solutions.


- Translate business requirements into technical designs and data architecture blueprints.


- Provide technical guidance, best practices, and recommendations to project teams and leadership.


Operational Excellence & Optimization :


- Monitor, troubleshoot, and optimize data pipelines for reliability and performance.


- Identify opportunities to improve system efficiency, reduce costs, and enhance data availability.


- Document architectures, data flows, and operational processes.


Must-Have Skills :


- Data Architecture & Design : Proven expertise in designing scalable and resilient architectures for batch and streaming systems.


- Data Modeling & Integration : Strong experience building efficient data models and integrating data from diverse sources.


- Big Data Technologies : Hands-on experience with big data platforms, tools, and distributed processing frameworks.


- Data Security & Governance : Solid understanding of data protection, compliance, governance, and best practices.


- Collaboration & Stakeholder Management : Ability to work effectively with technical and business teams to deliver data solutions.


Technical Skills (Preferred) :


- Big Data : Spark, Hadoop, Kafka, Flink, Hive, HBase Databases : SQL & NoSQL (PostgreSQL, MySQL, MongoDB, Cassandra, etc.)


- Cloud Platforms : AWS (EMR, Glue, Redshift, Kinesis), Azure (Synapse, Data Factory), or GCP (BigQuery, Dataflow)


- Data Formats : Parquet, Avro, ORC, JSON ETL/ELT tools and workflow orchestration Programming : Python, Scala, Java, or SQL

info-icon

Did you find something suspicious?

Similar jobs that you might be interested in