HamburgerMenu
hirist

Cognite - Senior Data Platform Engineer

Cognite
6 - 10 Years
Bangalore

Posted on: 13/02/2026

Job Description

Description :



Cognite is revolutionizing industrial data management through our flagship product, Cognite Data Fusion, a state-of-the-art SaaS platform that transforms how industrial companies leverage their data.


We're seeking a senior data platform engineer who excels at building high-performance distributed systems and thrives in a fast-paced startup environment. You'll be working on cutting-edge data infrastructure challenges that directly impact how Fortune 500 industrial companies manage their most critical operational data.



The core responsibilities for the job include the following :



High-Performance Data Systems :



- Design and implement robust data processing pipelines using Apache Spark, Flink, and Kafka for terabyte-scale industrial datasets.



- Build efficient APIs and services that serve thousands of concurrent users with sub-second response times.



- Optimize data storage and retrieval patterns for time-series, sensor, and operational data.



- Implement advanced caching strategies using Redis and in-memory data structures.



Distributed Processing Excellence :



- Engineer Spark applications with a deep understanding of Catalyst optimizer, partitioning strategies, and performance tuning



- Develop real-time streaming solutions processing millions of events per second with Kafka and Flink.



- Design efficient data lake architectures using S3/GCS with optimized partitioning and file formats (Parquet, ORC).



- Implement query optimization techniques for OLAP datastores like ClickHouse, Pinot, or Druid.



Scalability and Performance :



- Scale systems to 10K+ QPS while maintaining high availability and data consistency.



- Optimize JVM performance through garbage collection tuning and memory management.



- Implement comprehensive monitoring using Prometheus, Grafana, and distributed tracing.



- Design fault-tolerant architectures with proper circuit breakers and retry mechanisms.




Technical Innovation :



- Contribute to open-source projects in the big data ecosystem (Spark, Kafka, Airflow).



- Research and prototype new technologies for industrial data challenges.



- Collaborate with product teams to translate complex requirements into scalable technical solutions.



- Participate in architectural reviews and technical design discussions.



Requirements :



- Distributed Systems Experience (2-6 years) : Production Spark experience; built and optimized large-scale Spark applications with understanding of internals; streaming systems proficiency; implemented real-time data processing using Kafka, Flink, or Spark Streaming; JVM language expertise; strong programming skills in Java, Scala, or Kotlin with performance optimization experience.



- Data Platform Foundations (3+ years) : Big data storage systems; hands-on experience with data lakes, columnar formats, and table formats (Iceberg, Delta Lake); OLAP query engines; worked with Presto/Trino, ClickHouse, Pinot, or similar high-performance analytical databases; ETL/ELT pipeline development; built robust data transformation pipelines using tools like DBT, Airflow, or custom frameworks.



- Infrastructure and Operations : Kubernetes production experience. deployed and operated containerized applications in production environments. Cloud platform proficiency and hands-on experience with AWS, Azure, or GCP data services.



- Monitoring and observability : implemented comprehensive logging, metrics, and alerting for data systems.



Technical Debt Indicators :



- Performance Engineering : System optimization experience; delivered measurable performance improvements (2x+ throughput gains).



- Resource efficiency : optimized systems for cost while maintaining performance requirements.



- Concurrency expertise : designed thread-safe, high-concurrency data processing systems.



- Data Engineering Best Practices : Data quality frameworks; implemented validation, testing, and monitoring for data pipelines.



- Schema evolution : managed backward-compatible schema changes in production systems.



- Data modeling expertise : designed efficient schemas for analytical workloads.




Collaboration and Growth :



- Technical Collaboration : Cross-functional partnership worked effectively with product managers, ML engineers, and data scientists.



- Code review excellence : provided thoughtful technical feedback and maintained high code quality standards.



- Documentation and knowledge sharing : created technical documentation and participated in knowledge transfer.



- Continuous Learning : Technology adoption; quickly learned and applied new technologies to solve business problems.



- Industry awareness : stayed current with big data ecosystem developments and best practices.



- Problem-solving approach : demonstrated a systematic approach to debugging complex distributed system issues.



Startup Mindset :



- Execution Excellence : Rapid delivery; consistently shipped high-quality features within aggressive timelines.



- Technical pragmatism : made smart trade-offs between technical debt, velocity, and system reliability.



- End-to-end ownership : took responsibility for features from design through production deployment and monitoring.



- Ambiguity comfort : thrived in environments with evolving requirements and unclear specifications.



- Technology flexibility : adapted to new tools and frameworks based on project needs.



- Customer focus : understood how technical decisions impact user experience and business metrics.



Bonus Points :



- Open-source contributions to major Apache projects in the data space (e. g., Apache Spark or Kafka) are a big plus.



- Conference speaking or technical blog writing experience, industrial domain knowledge, and previous experience with IoT, manufacturing, or operational technology systems.



Primary Technologies (Technical Stack) :



- Languages : Kotlin, Scala, Python, and Java.



- Big Data : Apache Spark, Apache Flink, Apache Kafka.



- Storage : PostgreSQL, ClickHouse, Elasticsearch, S3-compatible systems.



- Infrastructure : Kubernetes, Docker, Terraform.



Technologies You May Work With :



- Table Formats : Apache Iceberg, Delta Lake, Apache Hudi.



- Query Engines : Trino/Presto, Apache Pinot, DuckDB.



- Orchestration : Apache Airflow, Dagster.



- Monitoring : Prometheus, Grafana, Jaeger, and ELK Stack.


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in