Posted on: 18/02/2026
Description :
Databricks (Spark) :
- Develop scalable ETL/ELT pipelines using PySpark (RDD/DataFrame APIs), Delta Lake, Auto Loader (cloudFiles), and Structured Streaming.
- Optimize jobs : partitioning, bucketing, Z-Ordering, OPTIMIZE + VACUUM, broadcast joins, AQE, checkpointing.
- Manage Unity Catalog : catalogs/schemas/tables, data lineage, permissions, secrets, tokens, and cluster policies.
- CI/CD for Databricks assets : notebooks, Jobs, Repos, MLflow artifacts.
- Build Medallion Architecture (Bronze/Silver/Gold) with Delta Live Tables (DLT) and expectations for data quality.
- Event-driven ingestion : Kafka/Kinesis ? Databricks Streaming
Snowflake (DW & ELT) :
- Model and implement star/snowflake schemas, data marts, and secure views.
- Performance tuning : clustering keys, micro-partitions, result caching, warehouse sizing, query profile analysis.
- Implement Task/Stream patterns for CDC; external tables for data lakes (S3); Snowpipe for near-real-time ingestion.
- Python/Snowpark for transformations and UDFs; SQL best practices (CTEs, window functions).
- Security : Row Level Security (RLS), Column Masking, OAuth/SCIM, network policies, data sharing (reader accounts).
AWS Data Engineering :
- Storage & compute : S3 (lifecycle, encryption, partitioning), EMR (if needed), Lambda, Glue (ETL/Schema registry), Athena, Kinesis (Data Streams/Firehose), RDS/Aurora, Step Functions.
- Orchestration : MWAA/Airflow or Step Functions (error handling, retries, backfills, SLA alerts).
- Infra-as-code : Terraform/CloudFormation for reproducible environments (Databricks workspace, IAM, S3, networking).
- Security/compliance : IAM least privilege, KMS, VPC endpoints/private links, Secrets Manager, CloudTrail/CloudWatch, GuardDuty.
- Observability : CloudWatch metrics/logs, structured logging, datadog/Prometheus (optional), cost monitoring (tags/budgets).
Data Quality, Governance & Security :
- Implement unit/integration tests for pipelines (e.g., pytest + Great Expectations + DLT expectations).
- Data contracts and schema evolution; monitor SLA/SLO; DQ dashboards (missingness, drift, freshness, completeness).
- PII handling : tokenization/pseudonymization, field-level encryption, KYB/KYC data flows adherence; audit trails.
- Cataloging & lineage through Unity Catalog and/or OpenLineage/Purview (if applicable).
DevOps & CI/CD :
- Git workflows (branching, PR reviews), Databricks CLI/Terraform modules for jobs/clusters/UC, Snowflake DevOps (object versioning via schemachange or SQL-based migration).
- Automated testing in pipelines; feature flags, canary releases for data jobs; rollback strategies.
Client-Facing PoCs & Delivery :
- Rapid PoC build : clearly defined success metrics, benchmark cost/performance, produce a transition plan to production.
- Present architectural decisions, trade-offs (Spark vs Snowflake ELT), and cost projections (Databricks DBU, Snowflake credits, storage egress).
- Produce runbooks, operational playbooks, and knowledge transfer documents for client teams.
Required Technical Skillset :
- Databricks : PySpark, Delta Lake, Auto Loader, DLT, Jobs, Unity Catalog, MLflow basics.
- Snowflake : SQL, Snowpipe, Tasks/Streams, Snowpark (Python), warehouse sizing, performance tuning, security policies.
- Python : strong in packages for DE (pandas, pyarrow, pytest), robust error handling, typing, and packaging.
- Orchestration : Airflow DAGs (Sensors, Operators, XCom), Step Functions state machines.
- Streaming & CDC : Kafka/Kinesis, Debezium (nice-to-have), CDC patterns to Delta/Snowflake.
- AWS : S3, Glue, Lambda, Kinesis, IAM/KMS, VPC, CloudWatch; Terraform/CloudFormation.
- Data Modeling : 3NF/Dimensional, slowly changing dimensions (SCD Type 2), surrogate keys, surrogate vs natural debates.
- Security & Compliance : encryption at rest/in transit, tokenization, key rotation, audit logging, governance controls.
- Performance & Cost : Spark job tuning, Snowflake warehouse right-sizing, partitioning/clustering, object storage best practices.
Nice-to-Have :
- dbt (Snowflake) with tests & exposures; Great Expectations.
- Databricks SQL Warehouses and BI connectivity; Photon engine awareness.
- Lakehouse Federation (UC external locations); Delta Sharing; Iceberg experience.
- Kafka Connect/Debezium, NiFi or MuleSoft (for data integrations).
- Experience in financial services
- Exposure to ISO/IEC 27001 controls in data platforms.
Education & Certifications :
- Bachelors/Masters in CS/IT/EE or related.
- Certifications (plus) : Databricks Data Engineer Associate/Professional, Snowflake SnowPro Core/Advanced, AWS Solutions Architect/Big Data/DP.
Did you find something suspicious?
Posted by
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1613806