Job Summary :

We are looking for an experienced Senior Databricks Developer to design, build, and optimize scalable data pipelines and analytics solutions on the Databricks platform. The role involves working with large-scale datasets, managing historical data, enabling advanced analytics, and integrating Databricks with multiple enterprise data sources to support business intelligence and AI/ML use cases.

Key Responsibilities :

- Design, develop, and maintain scalable ETL/ELT pipelines using Databricks (PySpark / Spark SQL)

- Build and manage Delta Lake tables with support for historical data tracking, versioning, and time travel

- Implement strategies to track and manage historical data (SCD Type 1/2, snapshots, audit tables)

- Optimize data pipelines for performance, reliability, and cost efficiency

- Integrate Databricks with various data sources including Snowflake, Oracle, PostgreSQL, SQL Server, and cloud object storage (S3/ADLS/GCS)

- Design and maintain Bronze, Silver, and Gold data layers following lakehouse best practices

- Support downstream analytics and reporting use cases (Power BI, Tableau, etc.)

- Ensure data quality, validation, and reconciliation across pipelines

- Implement security best practices including data access controls and governance

- Collaborate with analytics, AI/ML, and business teams to deliver high-quality data solutions

- Monitor, troubleshoot, and resolve pipeline failures and performance issues

Must-Have Skills :

- 5 - 7 years of experience in Data Engineering

- Strong hands-on experience with Databricks

- Expertise in PySpark and Spark SQL

- Deep understanding of Delta Lake concepts (ACID, time travel, schema evolution)

- Proven experience in historical data management (SCDs, incremental loads, CDC patterns)

- Experience connecting Databricks to Snowflake, Oracle, PostgreSQL, SQL Server, and other enterprise data sources

- Strong SQL skills for complex data transformations and performance tuning

- Experience working with large-scale datasets (millions to billions of records)

- Familiarity with cloud platforms (AWS / Azure / GCP)

- Experience with version control (Git) and CI/CD pipelines

Nice-to-Have Skills :

- Experience with Databricks Workflows, Jobs, and Auto Loader

- Knowledge of Power BI, including DirectQuery and incremental refresh patterns

- Exposure to real-time or streaming pipelines (Kafka, Event Hubs, Kinesis)

- Experience with data governance and security tools (Unity Catalog, data lineage, role-based

access controls)

- Strong understanding of performance tuning techniques (partitioning, Z-Ordering, caching)

- Healthcare domain experience, including familiarity with healthcare data models, patient assistance programs (PAP), eBV workflows, claims, eligibility, or regulatory considerations such as HIPAA compliance