Principal Data Engineer - Java/Spark/AWS

Info Edge

Bangalore

10 - 16 Years

Data Engineering Java AWS Spark Kafka MySQL Native Cloud Data Pipeline Apache Flink ETL LLM

Posted on: 20/07/2025

Job Description

Title : Principal Data Engineer

Keywords : Java | AWS |SPARK |KAFKA | MySQL | ElasticSearch

Office location : Bangalore, EGL - Domlur

Experience : 10 to 16y

Responsibilities :

- As a Principal Data Engineer, you will be responsible for :

- Leading the design and implementation of high-scale, cloud-native data pipelines for real-time and batch workloads.

- Collaborating with product managers, architects, and backend teams to translate business needs into secure and scalable data solutions.

- Integrating big data frameworks (like Spark, Kafka, Flink) with cloud-native services (AWS/GCP/Azure) to support security analytics use cases.

- Driving CI/CD best practices, infrastructure automation, and performance tuning across distributed environments.

- Evaluating and piloting the use of AI/LLM technologies in data pipelines (e.g., anomaly detection, metadata enrichment, automation).

- Evaluate and integrate LLM-based automation and AI-enhanced observability into engineering workflows.

- Ensure data security and privacy compliance.

- Mentoring engineers, ensuring high engineering standards, and promoting technical excellence across teams.

What Were Looking For (Minimum Qualifications) :

- 10-16 years of experience in big data architecture and engineering, including deep proficiency with the AWS cloud platform.

- Expertise in distributed systems and frameworks such as Apache Spark, Scala, Kafka, Flink, and Elasticsearch, with experience building production-grade data pipelines.

- Strong programming skills in Java for building scalable data applications.

- Hands-on experience with ETL tools and orchestration systems.

- Solid understanding of data modeling across both relational (PostgreSQL, MySQL) and NoSQL (HBase) databases and performance tuning.

What Will Make You Stand Out :

- Experience integrating AI/ML or LLM frameworks (e.g., LangChain, LlamaIndex) into data workflows.

- Experience implementing CI/CD pipelines with Kubernetes, Docker, and Terraform.

- Knowledge of modern data warehousing (e.g., BigQuery, Snowflake) and data governance principles (GDPR, HIPAA).

- Strong ability to translate business goals into technical architecture and mentor teams through delivery.

- Familiarity with visualization tools (Tableau, Power BI) to communicate data insights, even if not a primary responsibility.