Posted on: 19/09/2025
Big Data Engineer :
- Design, build, and optimize robust and scalable data pipelines for both batch and real-time data ingestion using Apache Spark (Streaming + Batch), Apache Nifi, and Apache Kafka.
Data Storage and Management :
- Manage and maintain data storage solutions on Hadoop/HDFS. Implement data models and schemas in Hive for a data warehouse and reporting layer.
- Work with HBase for specific use cases requiring fast, random-access to large datasets, leveraging Phoenix/SQL line for SQL-based access.
- Workflow Orchestration: Develop and manage complex data workflows and dependencies using Apache Oozie.
ETL and Data Integration :
- Utilize Informatica for traditional ETL workflows and Apache Sqoop to efficiently transfer data between RDBMS and the Hadoop ecosystem.
Resource Management :
- Work with YARN to manage cluster resources, monitor job execution, and ensure high availability and fault tolerance.
Monitoring and Maintenance :
- Monitor the health and performance of the data platform using tools like Hue and New Relic. Proactively identify and resolve issues.
Collaboration :
- Work closely with data scientists, analysts, and other engineering teams to understand data requirements and deliver solutions.
Cloud and Advanced Skills (Good to Have) :
- Experience with containerization and cloud-native solutions, particularly Anthos, for deploying and managing applications.
- Familiarity with data observability and logging platforms like Cribl for advanced data collection and routing.
Qualifications :
Experience : Proven experience as a Big Data Engineer or a similar role.
Technical Skills :
- Strong expertise in Hadoop and HDFS.
- Proficiency in Apache Spark for both batch and stream processing.
- Hands-on experience with Apache Hive and HBase.
- Knowledge of data ingestion tools like Apache Kafka and Apache Nifi.
- Experience with Apache Oozie or other workflow schedulers.
- Familiarity with Apache Sqoop for RDBMS integration.
- Understanding of YARN for cluster resource management.
- Proficiency in at least one scripting language (e.g., Python, Scala).
- Familiarity with monitoring tools like Hue and New Relic.
- Experience with Informatica is a plus.
Soft Skills :
- Excellent problem-solving and analytical skills.
- Strong communication and collaboration abilities.
- Ability to work in a fast-paced, agile environment.
Education :
- Bachelors or Masters degree in Computer Science, Data Science, or a related field.
Did you find something suspicious?
Posted By
Posted in
Data Engineering
Functional Area
Big Data / Data Warehousing / ETL
Job Code
1549094
Interview Questions for you
View All