HamburgerMenu
hirist

Snapmint - Data Engineer - Python/Java/Scala

Snapmint
3 - 5 Years
Bangalore

Posted on: 07/04/2026

Job Description

Description : Snapmint is looking for a skilled Data Engineer with 3-5 years of experience to design, build, and manage real-time data pipelines using technologies like Kafka, Flink, and Spark Streaming. The role involves optimizing scalable, fault-tolerant pipelines, performing real-time transformations, and collaborating with data scientists for feature development. The ideal candidate will have strong programming skills in Python, Java, or Scala, along with solid SQL expertise and a good understanding of data modeling, data warehousing, and OLTP vs. OLAP systems. Experience with CDC tools, data lakes/lakehouse architectures (Databricks), open table formats (Delta Lake, Iceberg, Hudi), and orchestration tools like Airflow is essential.


Roles and Responsibilities :


Key Responsibilities :


- Design, build, and manage real-time data pipelines using tools like Apache Kafka, Apache Flink, Apache Spark Streaming.


- Optimize data pipelines for performance, scalability, and fault-tolerance.


- Perform real-time transformations, aggregations, and joins on streaming data.


- Collaborate with data scientists to onboard new features and ensure they're discoverable, documented, and versioned.


- Optimize feature retrieval latency for real-time inference use cases.


- Ensure strong data governance : lineage, auditing, schema evolution, and quality checks using tools such as dbt, and Open Lineage.


Requirements :


- Bachelor's degree in Engineering from a premier institute (IIT/NIT/ BIT)


- 3-5 years of experience in an Indian startup/ tech company


- Strong programming skills in Python, Java, or Scala and proficient in SQL.


- Solid understanding of data modeling, data warehousing concepts, and the differences between OLTP and OLAP workloads.


- Experience ingesting and processing various data formats, including semi-structured (JSON, Avro), unstructured, and document-based data from sources like NoSQL databases (e.g., MongoDB), APIs, and event tracking platforms (e.g., PostHog).


- Hands-on experience with Change Data Capture (CDC) tools such as Debezium or AWS DMS for replicating data from transactional databases.


- Proven experience designing and building scalable data lakes or lakehouse architectures on platforms like Databricks.


- Hands-on experience with modern open table formats such as Delta Lake, Apache Iceberg, or Apache Hudi.


- Hands-on experience with real-time streaming technologies like Kafka, Flink, and Spark Streaming.


- Proficiency with data pipeline orchestration tools like Apache Airflow.


- Exposure to event-driven microservices architecture.


- Strong written and verbal communication skills.


Good to have :


- Familiarity with cloud data warehouse systems like BigQuery or Snowflake.


- Experience with real-time analytical databases like ClickHouse.


- Familiarity with designing, building, and maintaining feature store infrastructure to support machine learning use cases.

info-icon

Did you find something suspicious?

Similar jobs that you might be interested in