Principal Data Engineer Role Overview

We are looking for a Principal Data Engineer to lead the architecture and technical direction of next-generation data and knowledge platforms that power intelligent automation, advanced analytics, and AI-driven products. This role is pivotal in building a strong data foundation to support scalable, secure, and AI-ready systems.

Youll lead efforts to architect robust data platforms, build secure pipelines, support real-time and batch processing, and enforce best practices in data governance, privacy, and operational excellence.

Key Responsibilities :

- Architect and optimize scalable data platforms for analytics, AI/ML, and unified knowledge access.

- Design and implement high-throughput data pipelines and data lakes for both batch and real-time workloads.

- Set technical standards for data modeling, quality, metadata management, and lineage tracking with a focus on AI-readiness.

- Develop secure, extensible connectors for customer data integration.

- Build systems to process, enrich, and contextualize data for higher-order intelligence and timeline reconstruction.

- Collaborate with data scientists and ML engineers to operationalize machine learning workflows.

- Evaluate and adopt modern tools from the AI/ML data stack (e.g., feature stores, vector databases, orchestration tools, ML pipelines).

- Lead data governance, automation, and continuous improvement initiatives.

- Mentor engineers and provide thought leadership across teams.

- Ensure compliance with data privacy, security, and regulatory standards in multi-tenant environments.

Must-Have Skills :

- Strong experience in cloud-native data engineering (AWS preferred), data lakes, warehouses, and streaming architectures.

- Proficiency in frameworks like Spark, Kafka, Airflow, and dbt.

- Hands-on experience with ML data workflows, feature engineering, and pipeline orchestration.

- Familiarity with tools like Feast, Pinecone, Weaviate, MLflow, and Tecton.

- Experience with open table formats such as Apache Iceberg, Delta Lake, or Hudi.

- In-depth knowledge of data privacy frameworks (e.g., GDPR), anonymization techniques, RBAC, encryption, and compliance.

Good to Have :

- Experience with metadata management, semantic layers, or graph data architectures.

- Exposure to SaaS and multi-cloud environments.

- Background in AI agent integration and AI-driven automation.

- Experience working with telemetry data (e.g., AWS CloudTrail, CloudWatch) for real-time analytics and monitoring.

Education & Experience :

- 10+ years in data engineering, distributed systems, or related fields.

- Bachelors or Masters degree in Computer Science, Engineering, or a related field (preferred).