As a Data Engineer, the candidate will design and maintain scalable data pipelines and analytics systems. The ideal candidate will have 2- 4 years of experience with Apache Spark,Scala/Python, Trino/Presto, Hadoop,kafka and data lake technologies such as Delta Lake.

Experience with Elasticsearch, streaming data, and modern analytics platforms is preferred.

Mandatory Skills Requirements :

- Proficient in Python and/or Scala with strong experience in developing and optimizing data processing applications using Apache Spark.

- Extensive experience with Apache Spark Structured Streaming for near real-time and streaming data processing.

- Strong hands-on experience with Apache Kafka, including integration with Spark for reliable real-time data ingestion and event-driven pipelines.

- Experience working with analytical and distributed data stores such as ClickHouse, Trino/Presto, and data lake technologies (Delta Lake or equivalent).

- Solid understanding of data modeling and metric design for large-scale analytics systems, including fact/dimension modeling and event-based schemas.

- Proven ability to design and implement ETL / ELT pipelines for data ingestion, transformation, aggregation, and performance optimization using Spark.

- Demonstrated experience in writing efficient, scalable, and maintainable code for large-scale data processing workloads.

- Experience operating in on-prem or hybrid data platforms, with a working understanding of cluster resource management, performance tuning, and capacity planning.

- Familiarity with Elasticsearch for search, observability, or analytical use cases is a plus.

Preferred Skills Requirements :

- Bachelors degree in Computer Science, Software Engineering, or a related field, or equivalent practical experience.

- Strong familiarity with version control systems, particularly Git, and collaborative development workflows.

- Working knowledge of cloud platforms such as AWS, Azure, or Google Cloud, primarily for data services, storage, or hybrid deployments.

- Understanding of distributed data systems and database administration principles, including performance tuning, reliability, and scaling of analytical or NoSQL databases (e.g., ClickHouse, Elasticsearch, HBase, or similar