Senior Kafka Data Engineer - Cloudera/Hadoop Ecosystem

Virtusa Consulting Services Pvt Ltd

Multiple Locations

5 - 10 Years

Kafka Data Engineering Data Pipeline Cloudera Apache Spark ETL Azure Databricks Hadoop Azure Data Factory Python

Posted on: 06/11/2025

Job Description

Description :

Role Overview

We are seeking a highly skilled Senior Kafka Data Engineer to design, build, and manage robust data pipelines that power both batch and real-time data processing across our enterprise data ecosystem. This role requires deep technical expertise in Cloudera, Azure Databricks, Kafka, and other cloud-based data platforms. The ideal candidate will be passionate about building scalable and high-performing data solutions, ensuring data quality, and enabling data-driven decision-making across the organization.

Key Responsibilities :

Data Pipeline Design & Development :

- Design, develop, test, and maintain end-to-end batch and streaming data pipelines using Cloudera, Apache Spark, Kafka, and Azure Data Services such as ADF, Databricks, and Cosmos DB.

- Build efficient ETL and ELT frameworks to transform raw data into structured, usable formats for downstream analytics and reporting.

- Implement data ingestion frameworks from multiple structured and unstructured sources (APIs, databases, streams, files, etc.).

- Automate and orchestrate complex data workflows using Azure Data Factory and Airflow (if applicable).

Performance Optimization & Data Quality :

- Optimize data pipelines for scalability, performance, reliability, and cost efficiency.

- Implement data validation, monitoring, and error-handling mechanisms to ensure high-quality data delivery.

- Perform root cause analysis on data issues and propose long-term solutions for stability and consistency.

Collaboration & Solution Design :

- Collaborate with Data Architects, Analysts, and Data Scientists to design data models that align with business requirements.

- Partner with business stakeholders to translate requirements into technical data pipeline solutions.

- Contribute to the development and implementation of data governance, metadata management, and lineage tracking practices.

Innovation & Continuous Improvement :

- Evaluate and integrate emerging technologies and tools in the data ecosystem (e.g., Delta Lake, Iceberg, Lakehouse architectures).

- Advocate for and implement DevOps and CI/CD practices for data pipelines using tools like Git, Azure DevOps, Jenkins, or similar.

- Contribute to data platform modernization initiatives, including migration to cloud-native or Lakehouse architectures.

Mentorship & Leadership :

- Provide technical leadership and mentorship to junior data engineers, ensuring adherence to best practices in coding, testing, and deployment.

- Review code and ensure compliance with established engineering and data management standards.

Qualifications & Skills :

Required Technical Skills :

- 8+ years of IT experience, with 5+ years in Data Engineering and cloud-based data platforms.

- Strong hands-on experience with Cloudera / Hadoop Ecosystem, Apache Spark, and Kafka (Confluent or Apache) for batch and streaming data.

- Expertise in Azure data services Data Factory (ADF), Databricks, Cosmos DB, Synapse Analytics.

- Strong programming proficiency in Python or Scala, with advanced SQL skills.

- In-depth knowledge of NoSQL databases (Cosmos DB, MongoDB) including data modeling, indexing, and query optimization.

- Experience in building Lakehouse/Data Lake architectures and managing data across distributed storage environments.

- Familiarity with data security, compliance, and governance frameworks.

Preferred Skills :

- Knowledge of containerization and orchestration tools (Docker, Kubernetes).

- Familiarity with streaming frameworks like Structured Streaming, Flink, or Storm.

- Experience with data cataloging tools (e.g., Purview, Collibra, or Alation).

- Working knowledge of CI/CD pipelines and infrastructure-as-code (Terraform, ARM templates).

Soft Skills :

- Strong analytical and problem-solving abilities with a focus on optimization and data flow efficiency.

- Excellent communication and collaboration skills to work cross-functionally with engineering, analytics, and business teams.

- Demonstrated ability to mentor junior engineers and lead by example in an agile, fast-paced environment.

- Proactive mindset with a passion for continuous learning and innovation.