HamburgerMenu
hirist

Octro - Data Engineer - Python/Apache Spark

Octro Inc
Noida
2 - 4 Years

Posted on: 02/02/2026

Job Description

Description :



As a Data Engineer, the candidate will design and maintain scalable data pipelines and analytics systems. The ideal candidate will have 2- 4 years of experience with Apache Spark,Scala/Python, Trino/Presto, Hadoop,kafka and data lake technologies such as Delta Lake.


Experience with Elasticsearch, streaming data, and modern analytics platforms is preferred.

Mandatory Skills Requirements :



- Proficient in Python and/or Scala with strong experience in developing and optimizing data processing applications using Apache Spark.


- Extensive experience with Apache Spark Structured Streaming for near real-time and streaming data processing.


- Strong hands-on experience with Apache Kafka, including integration with Spark for reliable real-time data ingestion and event-driven pipelines.


- Experience working with analytical and distributed data stores such as ClickHouse, Trino/Presto, and data lake technologies (Delta Lake or equivalent).


- Solid understanding of data modeling and metric design for large-scale analytics systems, including fact/dimension modeling and event-based schemas.


- Proven ability to design and implement ETL / ELT pipelines for data ingestion, transformation, aggregation, and performance optimization using Spark.


- Demonstrated experience in writing efficient, scalable, and maintainable code for large-scale data processing workloads.


- Experience operating in on-prem or hybrid data platforms, with a working understanding of cluster resource management, performance tuning, and capacity planning.


- Familiarity with Elasticsearch for search, observability, or analytical use cases is a plus.

Preferred Skills Requirements :



- Bachelors degree in Computer Science, Software Engineering, or a related field, or equivalent practical experience.


- Strong familiarity with version control systems, particularly Git, and collaborative development workflows.


- Working knowledge of cloud platforms such as AWS, Azure, or Google Cloud, primarily for data services, storage, or hybrid deployments.


- Understanding of distributed data systems and database administration principles, including performance tuning, reliability, and scaling of analytical or NoSQL databases (e.g., ClickHouse, Elasticsearch, HBase, or similar


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in