HamburgerMenu
hirist

Fanatics - Senior Data Engineer - Scala/PySpark

FANATICS E-COMMERCE (INDIA) LLP
Hyderabad
5 - 7 Years

Posted on: 23/12/2025

Job Description

Description :


Key Responsibilities :



- Lead the design and development of scalable, high-performance data architectures on AWS, leveraging services such as S3, EMR, Glue, Redshift, Lambda, and Kinesis.


- Architect and manage Data Lakes for handling structured, semi-structured, and unstructured data.


- Design and build complex data pipelines using Apache Spark (Scala & PySpark), Kafka Streams (Java), and cloud-native technologies for batch and real-time data processing.


- Optimize these pipelines for high performance, scalability, and cost-effectiveness.


- Develop and optimize real-time data streaming applications using Kafka Streams in Java.


- Build reliable, low-latency streaming solutions to handle high-throughput data, ensuring smooth data flow from sources to sinks in real-time.


- Manage Snowflake for cloud data warehousing, ensuring seamless data integration, optimization of queries, and advanced analytics.


- Implement Apache Iceberg in Data Lakes for managing large-scale datasets with ACID

compliance, schema evolution, and versioning.


- Design and maintain highly scalable Data Lakes on AWS using S3, Glue, and Apache Iceberg.


- Ensure data is easily accessible, stored in optimal formats, and well-integrated with

downstream analytics systems.


- Work with business stakeholders to create actionable insights using Tableau.


- Build data models and dashboards that drive key business decisions, ensuring that data is easily accessible and interpretable.


- Continuously monitor and optimize Spark jobs, Kafka Streams processing, and other cloud-based data systems for performance, scalability, and cost.


- Implement best practices for stream processing, batch processing, and cloud resource management.


- Lead and mentor junior engineers, fostering a culture of collaboration, continuous learning, and technical excellence.


- Ensure high-quality code delivery, adherence to best practices, and optimal use of resources.


- Work closely with Data Scientists, Product Managers, and DevOps teams to understand business needs and deliver impactful data solutions.


- Participate in technical discussions, from system design to data governance.


- Ensure that data pipelines, architectures, and systems are thoroughly documented and follow coding and design best practices.


- Promote knowledge-sharing across the team to maintain high standards for quality and scalability.

Education :

Required Skills & Qualifications :



- Bachelors or Masters degree in Computer Science, or related field (or equivalent work experience).

Experience :


- 5+ years of experience in Data Engineering or a related field, with a proven track record of designing, implementing, and maintaining large-scale distributed data systems.


- Proficiency in Apache Spark (Scala & PySpark) for distributed data processing and real-time

analytics.


- Hands-on experience with Kafka Streams using Java for real-time data streaming applications.


- Strong experience in Data Lake architectures on AWS, using services like S3, Glue, EMR, and

data management platforms like Apache Iceberg.


- Proficiency in Snowflake for cloud-based data warehousing, data modeling, and query

optimization.


- Expertise in SQL for querying relational and NoSQL databases, and experience with database

design and optimization.


Technical Skills :



- Strong Experience in building ETL pipelines using Spark(Scala & Pyspark) and maintain them.


- Proficiency in Java, particularly in the context of building and optimizing Kafka Streams applications for real-time data processing.


- Experience with AWS services (e.g., Lambda, Redshift, Athena, Glue, S3) and managing cloud infrastructure.


- Expertise with Apache Iceberg for handling large-scale, transactional data in Data Lakes, supporting versioning, schema evolution, and partitioning.


- Experience with Tableau for business intelligence, dashboard creation, and data visualization.


- Knowledge of CI/CD tools and practices, particularly in data engineering environments.


- Familiarity with containerization tools like Docker and Kubernetes for managing cloud-based services.

Soft Skills :


- Excellent problem-solving skills, with a strong ability to debug and optimize large-scale distributed systems.


- Strong communication skills to engage with both technical and non-technical stakeholders.


- Proven leadership ability, including mentoring and guiding junior engineers.


- A collaborative mindset and the ability to work across teams to deliver integrated solutions.


Preferred Qualifications :



- Experience with stream processing frameworks like Apache Flink or Apache Beam.

- Knowledge of machine learning workflows and integration of ML models in data pipelines.



- Familiarity with data governance, security, and compliance practices in cloud environments.


- Experience with DevOps practices and infrastructure automation tools such as Terraform or CloudFormation.


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in