- Monitor Kafka clusters using tools like Prometheus, Grafana, Confluent Control Center, and OpenTelemetry to track system health, resource utilization, and performance metrics.

- Troubleshoot Kafka broker failures, consumer/producer lag, replication issues, and other system bottlenecks.

- Manage Kafka Schema Registry, Kafka Connect, and Kafka Streams for seamless data integration across various platforms.

- Implement disaster recovery strategies, including cross-cluster replication with MirrorMaker 2.0.

Real-Time & Batch Data Processing (Flink/Spark) :

- Develop and optimize real-time streaming applications using Apache Flink for event-driven data processing.

- Implement complex event processing (CEP) and windowed aggregations in Flink for business analytics.

- Build and optimize Apache Spark jobs for batch data processing, ensuring high performance and cost efficiency.

- Integrate Flink/Spark applications with Kafka, Hadoop, S3, Snowflake, and NoSQL databases.

- Tune Flink and Spark performance parameters, including checkpointing, parallelism, and memory management.

Data Infrastructure & Governance:

- Deploy Kafka and Flink workloads in Kubernetes (K8s) using Helm charts and operators.

- Implement observability using distributed tracing, logging, and monitoring tools like Jaeger, ELK Stack, and Splunk.

- Collaborate with DevOps teams to automate Kafka infrastructure using Terraform, Ansible, or CloudFormation.

- Ensure data governance, lineage, and compliance (GDPR, HIPAA, SOC 2) by integrating Apache Atlas, Confluent RBAC, or Ranger.

- Work with data teams, ML engineers, and business stakeholders to build scalable and efficient data solutions.

Required Qualifications :

- 8+ years of experience in Kafka administration & data engineering with a strong focus on real-time streaming architectures.

- Expert-level knowledge of Apache Kafka internals, including broker configuration, producer/consumer tuning, and security.

- Strong hands-on experience in Kafka Streams, Kafka Connect, MirrorMaker 2.0, and Schema Registry.

- Proficiency in Apache Flink (or Spark Streaming) for stream processing.

- Strong programming skills in Python, Java, or Scala.

- Experience with SQL and NoSQL databases (e.g., PostgreSQL, Cassandra, MongoDB).

- Hands-on experience with cloud platforms (AWS, Azure, GCP) and Kubernetes.

- Knowledge of monitoring tools (Prometheus, Grafana, Confluent Control Center, ELK).

Preferred Qualifications :

- Experience with Confluent Kafka Enterprise or Redpanda for high-performance streaming.

- Hands-on knowledge of Kafka Tiered Storage and Cloud-native Kafka deployments.

- Knowledge of workflow orchestration tools like Apache Airflow, Prefect, or Dagster.

- Contributions to open-source Kafka/Flink/Spark projects

Did you find something suspicious?

Posted By

Nandana K

Management Trainee (Recruiter) at Pylon Management Consulting

Last Active: 26 Aug 2025

Job Views:
70

Applications: 9

Recruiter Actions: 0

Posted in

DevOps / SRE

Functional Area

DevOps / Cloud

Job Code

1535667

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers