Posted on: 29/11/2025
Description :
Job Summary :
We are looking for a skilled Big Data Engineer to design, build, and maintain large-scale data processing systems and analytics platforms.
The ideal candidate should have strong expertise in big data technologies, distributed systems, and cloud-based data architectures.
You will work closely with data scientists, analysts, and engineering teams to develop reliable, high-performance data pipelines that support analytics, machine learning, and business intelligence.
Key Responsibilities
- Design and build scalable, high-performance data pipelines for batch and real-time processing.
- Develop ETL/ELT workflows using big data frameworks such as Spark, Hadoop, Hive, or Flink.
- Build and maintain data ingestion processes from various sources (API, databases, streaming platforms).
- Design and implement data lake and data warehouse architectures on cloud and on-prem platforms.
- Work with structured, semi-structured, and unstructured data formats (JSON, Avro, Parquet, ORC).
- Develop optimized data models, partitions, and cataloging strategies for analytics and BI use cases.
- Build real-time pipelines using Kafka, Pulsar, Spark Streaming, or Flink.
- Ensure low-latency, fault-tolerant streaming architecture.
- Optimize Spark jobs, Hive queries, HDFS storage, and distributed computing workloads.
- Troubleshoot performance bottlenecks and improve scalability and reliability.
- Implement processes for data validation, metadata management, lineage tracking, and observability.
- Ensure compliance with data security policies and governance standards (IAM, encryption, access control).
- Work with Data Science, Analytics, Product, and DevOps teams to enable business insights.
- Build APIs, data services, or integration layers for downstream consumers.
- Collaborate in Agile environments, participating in sprint planning and daily scrums.
Required Technical Skills :
Programming :
- Strong experience in Python / Scala / Java (at least one is mandatory).
- Experience with shell scripting and automation.
Big Data Technologies :
- Hands-on expertise with :
a. Apache Spark (batch + streaming)
b. Hadoop ecosystem : HDFS, YARN, Hive, HBase
c. Kafka / Pulsar for streaming
d. Airflow / Oozie / Luigi for workflow orchestration
Cloud Platforms :
- Experience with at least one cloud provider :
a. AWS (preferred) : EMR, Glue, S3, Redshift, Athena
b. Azure : Data Lake, Synapse, Databricks
c. GCP : BigQuery, DataProc, Dataflow
Data Warehousing & Storage :
- Data lakes & warehouses : Snowflake, Redshift, BigQuery, Delta Lake
- NoSQL databases : Cassandra, MongoDB, DynamoDB
- Familiarity with Docker, Kubernetes, Terraform, and cloud deployment pipelines.
- Experience using Git, Jenkins, GitHub Actions, or similar CI/CD tools.
- Strong SQL skills with ability to write optimized, complex queries
Did you find something suspicious?
Posted By
Posted in
Data Engineering
Functional Area
Big Data / Data Warehousing / ETL
Job Code
1582220
Interview Questions for you
View All