Artificial Intelligence

Machine Learning

NLP

Security Architect - AI

Cloud Architect - ML/AI

Emerging Technologies

DevOps / SRE

CyberSecurity

Quality Assurance

Platform Engineering / SAP/Oracle

DemandMatrix - Data Engineer - Big Data Platform

DemandMatrix

Anywhere in India/Multiple Locations

7 - 10 Years

Data Engineering Data Pipeline Data Infrastructure Big Data PySpark Hadoop AWS Hive Python ETL RDBMS

Posted on: 18/07/2025

Job Description

Job Description :

Key Responsibilities :

- Design, develop, and maintain robust big data pipelines using Spark (PySpark), Hadoop, and related technologies.

- Architect and implement large-scale data platforms on AWS or GCP with high availability, scalability, and performance.

- Handle structured and unstructured data transformation, cleansing, and integration for analytics, AI, and knowledge graph applications.

- Optimize data processing workflows with an in-depth understanding of data locality, disk I/O, network I/O, and shuffling strategies.

- Develop and deploy data applications on Unix/Linux-based environments ensuring optimal system performance and reliability.

- Apply strong software engineering practices, including code reviews, testing, version control, and continuous integration.

- Configure and integrate data tools such as Sqoop, Flume, Pig, Hive, and RDBMS for ETL and big data querying use cases.

- Monitor and ensure performance, data quality, and security across data infrastructure and pipelines.

- Lead and contribute to architectural decisions and reference implementations adhering to industry best practices.

- Uphold and promote high standards in engineering, following Unix philosophy and functional programming principles.

- Collaborate with cross-functional teams to understand business needs and translate them into scalable data solutions.

Required Skillsets :

- Strong foundation in Computer Engineering, Unix/Linux, Data Structures, and Algorithms.

- 7+ years of experience in big data platform architecture and development on AWS or GCP.

- Deep expertise in Apache Spark (especially PySpark) and Hadoop/MapReduce.

- Solid understanding of data processing models (streaming, batch, event-based).

- Experience with NoSQL stores like MongoDB, HBase/HDFS, and Elasticsearch.

- Proficiency in Python and working with non-structured text data.

- Experience with ETL frameworks such as Sqoop, Flume, and big data querying tools like Hive, Pig.

- Familiarity with RDBMS systems and SQL performance tuning.

- Strong grasp of big data design patterns, functional computation models, and orthogonal code design.

- Passion for clean, maintainable code with a commitment to engineering excellence and standards.

Did you find something suspicious?

Posted By

DemandMatrix

Recruitment Manager at DemandMatrix

Last Active: 19 Jul 2025

Job Views:
36

Applications: 11

Recruiter Actions: 0

Posted in

Data Engineering

Functional Area

Data Engineering

Job Code

1515360

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers