You will work closely with data scientists, analysts, and engineering teams to develop reliable, high-performance data pipelines that support analytics, machine learning, and business intelligence.

Key Responsibilities

- Design and build scalable, high-performance data pipelines for batch and real-time processing.

- Develop ETL/ELT workflows using big data frameworks such as Spark, Hadoop, Hive, or Flink.

- Build and maintain data ingestion processes from various sources (API, databases, streaming platforms).

- Design and implement data lake and data warehouse architectures on cloud and on-prem platforms.

- Work with structured, semi-structured, and unstructured data formats (JSON, Avro, Parquet, ORC).

- Develop optimized data models, partitions, and cataloging strategies for analytics and BI use cases.

- Build real-time pipelines using Kafka, Pulsar, Spark Streaming, or Flink.

- Ensure low-latency, fault-tolerant streaming architecture.

- Optimize Spark jobs, Hive queries, HDFS storage, and distributed computing workloads.

- Troubleshoot performance bottlenecks and improve scalability and reliability.

- Implement processes for data validation, metadata management, lineage tracking, and observability.

- Ensure compliance with data security policies and governance standards (IAM, encryption, access control).

- Work with Data Science, Analytics, Product, and DevOps teams to enable business insights.

- Build APIs, data services, or integration layers for downstream consumers.

- Collaborate in Agile environments, participating in sprint planning and daily scrums.

Required Technical Skills :

Programming :

- Strong experience in Python / Scala / Java (at least one is mandatory).

- Experience with shell scripting and automation.

Big Data Technologies :

- Hands-on expertise with :

a. Apache Spark (batch + streaming)

b. Hadoop ecosystem : HDFS, YARN, Hive, HBase

c. Kafka / Pulsar for streaming

d. Airflow / Oozie / Luigi for workflow orchestration

Cloud Platforms :

- Experience with at least one cloud provider :

a. AWS (preferred) : EMR, Glue, S3, Redshift, Athena

b. Azure : Data Lake, Synapse, Databricks

c. GCP : BigQuery, DataProc, Dataflow

Data Warehousing & Storage :

- Columnar formats : Parquet, ORC

- Data lakes & warehouses : Snowflake, Redshift, BigQuery, Delta Lake

- NoSQL databases : Cassandra, MongoDB, DynamoDB

- Familiarity with Docker, Kubernetes, Terraform, and cloud deployment pipelines.

- Experience using Git, Jenkins, GitHub Actions, or similar CI/CD tools.

- Strong SQL skills with ability to write optimized, complex queries

Did you find something suspicious?

Similar jobs that you might be interested in

Posted by

Senior Talent Acquisition Specialist at SKS Enterpprises

Last Active: 5 Feb 2026

Job Views:
153

Applications: 130

Recruiter Actions: 0

Posted in

Data Engineering

Functional Area

Big Data / Data Warehousing / ETL

Job Code

1582220

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers