Posted on: 21/12/2025
Description :
Key Responsibilities :
- Design, develop, and maintain scalable big data pipelines for batch and real-time data processing.
- Process large volumes of structured and unstructured data efficiently.
- Ensure data accuracy, reliability, and performance across pipelines.
- Work with distributed computing frameworks such as Apache Spark, Hadoop, Hive.
- Optimize jobs for performance, scalability, and cost.
- Handle data partitioning, shuffling, and parallel processing.
- Build and manage data ingestion pipelines from multiple sources (databases, APIs, logs, IoT, etc.
- Implement real-time data streaming using Kafka, Kinesis, or Pub/Sub.
- Support event-driven and near-real-time analytics use cases.
- Design and manage data storage solutions including data lakes and data warehouses.
- Implement efficient data models for analytical and reporting use cases.
- Work with formats such as Parquet, ORC, Avro.
- Develop and manage big data solutions on AWS, Azure, or GCP.
- Use cloud-native services like EMR, Databricks, BigQuery, Redshift, Synapse.
- Monitor and optimize cloud resources for performance and cost.
- Implement data validation, monitoring, and error-handling mechanisms.
- Ensure data security, access controls, and compliance with governance standards.
- Maintain documentation and data lineage.
- Collaborate with data analysts, data scientists, and product teams to understand data requirements.
- Participate in code reviews and follow engineering best practices.
- Support production deployments and troubleshooting.
Required Skills & Experience :
- Strong programming skills in Python, Scala, or Java.
- Proficiency in SQL and data querying.
- Hands-on experience with Apache Spark (batch and/or streaming).
- Experience with Hadoop ecosystem tools (HDFS, Hive, HBase).
- Experience with Kafka, Spark Streaming, Flink, or similar technologies.
- Understanding of batch vs streaming processing paradigms.
- Experience with NoSQL databases (Cassandra, HBase, DynamoDB).
- Familiarity with relational databases (PostgreSQL, MySQL).
- Experience with workflow orchestration tools (Airflow, Oozie).
- Familiarity with Git, CI/CD pipelines, and version control.
- Experience with Docker and Kubernetes is a plus
Did you find something suspicious?
Posted by
Posted in
Data Engineering
Functional Area
Big Data / Data Warehousing / ETL
Job Code
1593434
Interview Questions for you
View All