Role Overview :

As Principal Data Engineer, you will drive the architecture and technical direction for organisation's next-generation data and knowledge platforms, enabling intelligent automation, advanced analytics, and AI-driven products for a wide range of users.

You will play a pivotal role in shaping the data foundation for AI-driven systems, ensuring our platform is robust, scalable, and ready to support state-of-the-art AI workflows. You will also lead the efforts in maintaining stringent data security standards, safeguarding sensitive information throughout data pipelines and platforms.

Key Responsibilities :

- Architect and optimize scalable data platforms that support advanced analytics, AI/ML capabilities, and unified knowledge access.

- Lead the design and implementation of high-throughput data pipelines and data lakes for both batch and real-time workloads.

- Set technical standards for data modeling, data quality, metadata management, and lineage tracking, with a strong focus on AI-readiness.

- Design and implement secure, extensible data connectors and frameworks for integrating customer-provided data streams.

- Build robust systems for processing and contextualizing data, including reconstructing event timelines and enabling higher-order intelligence.

- Partner with data scientists, ML engineers, and cross-functional stakeholders to operationalize data for machine learning and AI-driven insights.

- Evaluate and adopt best-in-class tools from the modern AI data stack (e.g., feature stores, orchestration frameworks, vector databases, ML pipelines).

- Drive innovation and continuous improvement in data engineering practices, data governance, and automation.

- Provide mentorship and technical leadership to the broader engineering team.

- Champion security, compliance, and privacy best practices in multi-tenant, cloud-native environments.

Desired Skills :

Must Have :

- Deep expertise in cloud-native data engineering (AWS preferred), including large-scale data lakes, warehouses, and event-driven/data streaming architectures.

- Hands-on experience building and maintaining data pipelines with modern frameworks (e.g., Spark, Kafka, Airflow, dbt).

- Strong track record of enabling AI/ML workflows, including data preparation, feature engineering, and ML pipeline operationalization.

- Familiarity with modern AI/ML data stack components such as feature stores (e.g., Feast), vector databases (e.g., Pinecone, Weaviate), orchestration tools (e.g., Airflow, Prefect), and ML ops tools (e.g., MLflow, Tecton).

- Experience working with modern open table formats such as Apache Iceberg, Delta Lake, or Hudi for scalable data lake and lakehouse architectures.

- Experience implementing data privacy frameworks such as GDPR and supporting data anonymization for diverse use cases.

- Strong understanding of data privacy, RBAC, encryption, and compliance in multi-tenant platforms.

Good to Have :

- Experience with metadata management, semantic layers, or knowledge of graph architectures.

- Exposure to SaaS and multi-cloud environments serving both internal and external consumers.

- Background in supporting AI Agents or AI-driven automation in production environments.

- Experience processing high-volume cloud infrastructure telemetry, including AWS CloudTrail, CloudWatch logs, and other event-driven data sources, to support real-time monitoring, anomaly detection, and operational analytics.

Experience :

-10+ years of experience in data engineering, distributed systems, or related fields.

Education :

- Bachelors or Masters degree in Computer Science, Engineering, or related field (preferred).