Role : Kafka Developer

Location : Pune , India

Exp : 10- 15 years

Location : Pune, India (with Travel to Onsite)

Experience Required :

10+ years overall, with 5+ years in Kafka-based data streaming development. Must have delivered production-grade Kafka pipelines integrated with real-time data sources and downstream analytics platforms.

Overview :

We are looking for a Kafka Developer to design and implement real-time data ingestion pipelines using Apache Kafka. The role involves integrating with upstream flow record sources, transforming and validating data, and streaming it into a centralized data lake for analytics and operational intelligence.

Key Responsibilities :

- Develop Kafka producers to ingest flow records from upstream systems such as flow record exporters (e.g., IPFIX-compatible probes).

- Build Kafka consumers to stream data into Spark Structured Streaming jobs and downstream data lakes.

- Define and manage Kafka topic schemas using Avro and Schema Registry for schema evolution.

- Implement message serialization, transformation, enrichment, and validation logic within the streaming pipeline.

- Ensure exactly once processing, checkpointing, and fault tolerance in streaming jobs.

- Integrate with downstream systems such as HDFS or Parquet-based data lakes, ensuring compatibility with ingestion standards.

- Collaborate with Kafka administrators to align topic configurations, retention policies, and security protocols.

- Participate in code reviews, unit testing, and performance tuning to ensure high-quality deliverables.

- Document pipeline architecture, data flow logic, and operational procedures for handover and support.

Required Skills & Qualifications :

- Proven experience in developing Kafka producers and consumers for real-time data ingestion pipelines.

- Strong hands-on expertise in Apache Kafka, Kafka Connect, Kafka Streams, and Schema Registry.

- Proficiency in Apache Spark (Structured Streaming) for real-time data transformation and enrichment.

- Solid understanding of IPFIX, NetFlow, and network flow data formats; experience integrating with nProbe Cento is a plus.

- Experience with Avro, JSON, or Protobuf for message serialization and schema evolution.

- Familiarity with Cloudera Data Platform components such as HDFS, Hive, YARN, and Knox.

- Experience integrating Kafka pipelines with data lakes or warehouses using Parquet or Delta formats.

- Strong programming skills in Scala, Java, or Python for stream processing and data engineering tasks.

- Knowledge of Kafka security protocols including TLS/SSL, Kerberos, and access control via Apache Ranger.

- Experience with monitoring and logging tools such as Prometheus, Grafana, and Splunk.

- Understanding of CI/CD pipelines, Git-based workflows, and containerization (Docker/Kubernetes)