Posted on: 04/11/2025
About Cognite :
Cognite is revolutionizing industrial data management through our flagship product Cognite Data Fusion, a state-of-the-art SaaS platform that transforms how industrial companies leverage their data.
Were seeking a Senior Data Platform Engineer who excels at building high-performance distributed systems and thrives in a fast-paced, startup-style environment. Youll work on cutting-edge data infrastructure challenges that directly impact how Fortune 500 industrial companies manage their most critical operational data.
Key Responsibilities :
1. High-Performance Data Systems :
- Design and implement scalable data processing pipelines using Apache Spark, Flink, and Kafka for terabyte-scale datasets.
- Build efficient APIs and backend services supporting thousands of concurrent users with sub-second latency.
- Optimize data storage and retrieval for time-series, sensor, and operational datasets.
- Implement advanced caching strategies using Redis and in-memory data structures.
2. Distributed Processing Excellence :
- Engineer optimized Spark applications with deep knowledge of the Catalyst optimizer and partitioning strategies.
- Develop real-time streaming solutions processing millions of events per second using Kafka and Flink.
- Design efficient data lake architectures on S3/GCS using formats like Parquet and ORC.
- Implement query optimization for OLAP datastores such as ClickHouse, Pinot, or Druid.
3. Scalability & Performance :
- Scale systems to 10K+ QPS while maintaining high availability and data consistency.
- Tune JVM performance through garbage collection and memory optimization.
- Establish comprehensive monitoring using Prometheus, Grafana, and distributed tracing.
- Design fault-tolerant architectures with circuit breakers and retry mechanisms.
4. Technical Innovation :
- Contribute to open-source projects within the big data ecosystem (Spark, Kafka, Airflow).
- Research and prototype emerging technologies for industrial data challenges.
- Collaborate with product teams to deliver scalable and reliable technical solutions.
- Participate in architecture reviews and technical design discussions.
Requirements :
1. Distributed Systems Experience (4- 6 years) :
- Proven production experience with Spark (building and optimizing large-scale applications).
- Strong proficiency with Kafka, Flink, or Spark Streaming for real-time data processing.
- Expertise in JVM languages (Java, Scala, or Kotlin) with performance tuning experience.
2. Data Platform Foundations (3+ years) :
- Hands-on experience with data lakes, columnar formats, and table formats (Iceberg, Delta Lake).
- Worked with OLAP query engines like Presto/Trino, ClickHouse, or Pinot.
- Built robust ETL/ELT pipelines using Airflow, DBT, or custom frameworks.
Technical Depth Indicators :
- Delivered measurable performance improvements (2x+ throughput gains).
- Optimized resource utilization and cost efficiency.
- Designed thread-safe, high-concurrency data processing systems.
- Implemented data quality frameworks and schema evolution management.
- Designed efficient schemas for analytical workloads.
Collaboration & Growth :
- Worked cross-functionally with PMs, ML engineers, and data scientists.
- Maintained high code quality through thoughtful reviews and documentation.
- Adapted quickly to new tools and frameworks.
- Demonstrated systematic debugging and problem-solving in distributed systems.
Startup Mindset :
- Delivered high-quality features under tight deadlines.
- Balanced technical debt, speed, and system reliability.
- Took end-to-end ownership from design to production.
- Thrived amid evolving requirements and ambiguity.
- Made customer-centric technical decisions.
Bonus Points :
- Contributions to Apache open-source projects (Spark, Kafka, Airflow).
- Public speaking or technical blogging experience.
- Industrial domain knowledge (IoT, manufacturing, operational systems).
Did you find something suspicious?
Posted by
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1568985