Posted on: 08/10/2025
Job Description :
Function : Software Engineering Backend Development
SparkSpark Streaming
Cognite is revolutionising industrial data management through our flagship product, Cognite Data Fusion - a state-of-the-art SaaS platform that transforms how industrial companies leverage their data. We're seeking a Senior Data Platform Engineer who excels at building high-performance distributed systems and thrives in a fast-paced startup environment. You'll be working on cutting-edge data infrastructure challenges that directly impact how Fortune 500 industrial companies manage their most critical operational data.
Responsibilities :
- High-Performance Data Systems : Design and implement robust data processing pipelines using Apache Spark, Flink, and Kafka for terabyte-scale industrial datasets.
- Build efficient APIs and services that serve thousands of concurrent users with sub-second response times.
- Optimise data storage and retrieval patterns for time-series, sensor, and operational data.
- Implement advanced caching strategies using Redis and in-memory data structures.
Distributed Processing Excellence :
- Engineer Spark applications with a deep understanding of Catalyst optimiser, partitioning strategies, and performance tuning
- Develop real-time streaming solutions processing millions of events per second with Kafka and Flink.
- Design efficient data lake architectures using S3/GCS with optimised partitioning and file formats (Parquet, ORC).
- Implement query optimisation techniques for OLAP datastores like ClickHouse, Pinot, or Druid.
- Scalability and Performance :
- Scale systems to 10K+ QPS while maintaining high availability and data consistency.
- Optimise JVM performance through garbage collection tuning and memory management.
- Implement comprehensive monitoring using Prometheus, Grafana, and distributed tracing.
- Design fault-tolerant architectures with proper circuit breakers and retry mechanisms.
Technical Innovation :
- Contribute to open-source projects in the big data ecosystem (Spark, Kafka, Airflow).
- Research and prototype new technologies for industrial data challenges.
- Collaborate with product teams to translate complex requirements into scalable technical solutions.
- Participate in architectural reviews and technical design discussions.
Requirements :
Distributed Systems Experience (4-6 years) :
- Production Spark experience - built and optimised large-scale Spark applications with understanding of internals
- Streaming systems proficiency - implemented real-time data processing using Kafka, Flink, or Spark Streaming
- JVM Language expertise - strong programming skills in Java, Scala, or Kotlin with performance optimisation experience.
Data Platform Foundations (3+ years) :
- Big data storage systems - hands-on experience with data lakes, columnar formats, and table formats (Iceberg, Delta Lake)
- OLAP query engines - worked with Presto/Trino, ClickHouse, Pinot, or similar high-performance analytical databases
- ETL/ELT pipeline development - built robust data transformation pipelines using tools like DBT, Airflow, or custom frameworks
Infrastructure and Operations :
- Kubernetes production experience -deployed and operated containerised applications in production environments.
- Cloud platform proficiency - hands-on experience with AWS, Azure, or GCP data services.
- Monitoring and observability - implemented comprehensive logging, metrics, and alerting for data systems.
Technical Depth Indicators :
- Performance Engineering - System optimisation experience - delivered measurable performance improvements (2x+ throughput gains).
- Resource efficiency - optimised systems for cost while maintaining performance requirements.
- Concurrency expertise - designed thread-safe, high-concurrency data processing systems.
- Data Engineering Best Practices - Data quality frameworks -implemented validation, testing, and monitoring for data pipelines.
- Schema evolution - managed backwards-compatible schema changes in production systems.
- Data modelling expertise - designed efficient schemas for analytical workloads.
Did you find something suspicious?
Posted By
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1557850
Interview Questions for you
View All