HamburgerMenu
hirist

Softility - Big Data Engineer - Apache Spark

Posted on: 20/09/2025

Job Description

Big Data Engineer :


- Design, build, and optimize robust and scalable data pipelines for both batch and real-time data ingestion using Apache Spark (Streaming + Batch), Apache Nifi, and Apache Kafka.

Data Storage and Management :


- Manage and maintain data storage solutions on Hadoop/HDFS. Implement data models and schemas in Hive for a data warehouse and reporting layer.


- Work with HBase for specific use cases requiring fast, random-access to large datasets, leveraging Phoenix/SQL line for SQL-based access.

- Workflow Orchestration: Develop and manage complex data workflows and dependencies using Apache Oozie.

ETL and Data Integration :


- Utilize Informatica for traditional ETL workflows and Apache Sqoop to efficiently transfer data between RDBMS and the Hadoop ecosystem.

Resource Management :


- Work with YARN to manage cluster resources, monitor job execution, and ensure high availability and fault tolerance.

Monitoring and Maintenance :


- Monitor the health and performance of the data platform using tools like Hue and New Relic. Proactively identify and resolve issues.

Collaboration :


- Work closely with data scientists, analysts, and other engineering teams to understand data requirements and deliver solutions.

Cloud and Advanced Skills (Good to Have) :

- Experience with containerization and cloud-native solutions, particularly Anthos, for deploying and managing applications.

- Familiarity with data observability and logging platforms like Cribl for advanced data collection and routing.

Qualifications :

Experience : Proven experience as a Big Data Engineer or a similar role.

Technical Skills :

- Strong expertise in Hadoop and HDFS.

- Proficiency in Apache Spark for both batch and stream processing.

- Hands-on experience with Apache Hive and HBase.

- Knowledge of data ingestion tools like Apache Kafka and Apache Nifi.

- Experience with Apache Oozie or other workflow schedulers.

- Familiarity with Apache Sqoop for RDBMS integration.

- Understanding of YARN for cluster resource management.

- Proficiency in at least one scripting language (e.g., Python, Scala).

- Familiarity with monitoring tools like Hue and New Relic.

- Experience with Informatica is a plus.

Soft Skills :

- Excellent problem-solving and analytical skills.

- Strong communication and collaboration abilities.

- Ability to work in a fast-paced, agile environment.

Education :

- Bachelors or Masters degree in Computer Science, Data Science, or a related field.


info-icon

Did you find something suspicious?