- Lead the design and development of scalable, high-performance data architectures on AWS, leveraging services such as S3, EMR, Glue, Redshift, Lambda, and Kinesis.

- Architect and manage Data Lakes for handling structured, semi-structured, and unstructured data.

- Design and build complex data pipelines using Apache Spark (Scala & PySpark), Kafka Streams (Java), and cloud-native technologies for batch and real-time data processing.

- Optimize these pipelines for high performance, scalability, and cost-effectiveness.

- Develop and optimize real-time data streaming applications using Kafka Streams in Java.

- Build reliable, low-latency streaming solutions to handle high-throughput data, ensuring smooth data flow from sources to sinks in real-time.

- Manage Snowflake for cloud data warehousing, ensuring seamless data integration, optimization of queries, and advanced analytics.

- Implement Apache Iceberg in Data Lakes for managing large-scale datasets with ACID

compliance, schema evolution, and versioning.

- Design and maintain highly scalable Data Lakes on AWS using S3, Glue, and Apache Iceberg.

- Ensure data is easily accessible, stored in optimal formats, and well-integrated with

downstream analytics systems.

- Work with business stakeholders to create actionable insights using Tableau.

- Build data models and dashboards that drive key business decisions, ensuring that data is easily accessible and interpretable.

- Continuously monitor and optimize Spark jobs, Kafka Streams processing, and other cloud-based data systems for performance, scalability, and cost.

- Implement best practices for stream processing, batch processing, and cloud resource management.

- Lead and mentor junior engineers, fostering a culture of collaboration, continuous learning, and technical excellence.

- Ensure high-quality code delivery, adherence to best practices, and optimal use of resources.

- Work closely with Data Scientists, Product Managers, and DevOps teams to understand business needs and deliver impactful data solutions.

- Participate in technical discussions, from system design to data governance.

- Ensure that data pipelines, architectures, and systems are thoroughly documented and follow coding and design best practices.

- Promote knowledge-sharing across the team to maintain high standards for quality and scalability.

Education :

Required Skills & Qualifications :

- Bachelors or Masters degree in Computer Science, or related field (or equivalent work experience).

Experience :

- 5+ years of experience in Data Engineering or a related field, with a proven track record of designing, implementing, and maintaining large-scale distributed data systems.

- Proficiency in Apache Spark (Scala & PySpark) for distributed data processing and real-time

analytics.

- Hands-on experience with Kafka Streams using Java for real-time data streaming applications.

- Strong experience in Data Lake architectures on AWS, using services like S3, Glue, EMR, and

data management platforms like Apache Iceberg.

- Proficiency in Snowflake for cloud-based data warehousing, data modeling, and query

optimization.

- Expertise in SQL for querying relational and NoSQL databases, and experience with database

design and optimization.

Technical Skills :

- Strong Experience in building ETL pipelines using Spark(Scala & Pyspark) and maintain them.

- Proficiency in Java, particularly in the context of building and optimizing Kafka Streams applications for real-time data processing.

- Experience with AWS services (e.g., Lambda, Redshift, Athena, Glue, S3) and managing cloud infrastructure.

- Expertise with Apache Iceberg for handling large-scale, transactional data in Data Lakes, supporting versioning, schema evolution, and partitioning.

- Experience with Tableau for business intelligence, dashboard creation, and data visualization.

- Knowledge of CI/CD tools and practices, particularly in data engineering environments.

- Familiarity with containerization tools like Docker and Kubernetes for managing cloud-based services.

Soft Skills :

- Excellent problem-solving skills, with a strong ability to debug and optimize large-scale distributed systems.

- Strong communication skills to engage with both technical and non-technical stakeholders.

- Proven leadership ability, including mentoring and guiding junior engineers.

- A collaborative mindset and the ability to work across teams to deliver integrated solutions.

Preferred Qualifications :

- Experience with stream processing frameworks like Apache Flink or Apache Beam.

- Knowledge of machine learning workflows and integration of ML models in data pipelines.

- Familiarity with data governance, security, and compliance practices in cloud environments.

- Experience with DevOps practices and infrastructure automation tools such as Terraform or CloudFormation.

Did you find something suspicious?

Similar jobs that you might be interested in

Posted by

Rakesh Akula

Recruiting Lead at FANATICS E-COMMERCE (INDIA) LLP

Last Active: 23 Dec 2025

Job Views:
22

Applications: 31

Recruiter Actions: 0

Posted in

Data Engineering

Functional Area

Data Engineering

Job Code

1593545

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers