Description :

Job Summary :

We are looking for a highly skilled Databricks PySpark Engineer with 7+ years of experience in building scalable data engineering solutions. The ideal candidate must have strong hands-on expertise in Python, PySpark, SQL, Databricks, and modern data engineering tools. Experience with Databricks Unity Catalog, Delta Lake, Databricks Asset Bundles (DAB), and performance tuning is critical for this role.

Mandatory Skills :

- Python

- PySpark

- SQL

- Databricks

- Apache Airflow

- Git & GitHub

- Snowflake (good exposure)

Key Responsibilities :

- Design, develop, and optimize large-scale data pipelines using PySpark on Databricks

- Build and manage end-to-end data workflows using Databricks Workflows and Apache Airflow

- Implement and manage Unity Catalog for data governance, security, and access control

- Design and maintain data lakes using Delta Lake (ACID transactions, schema evolution)

- Develop and manage Databricks Asset Bundles (DAB) for CI/CD and deployment automation

- Perform PySpark performance tuning and optimization for large datasets

- Optimize SQL queries and Spark jobs for cost and performance

- Work with Spark APIs (RDD, DataFrame, Dataset)

- Handle data ingestion from multiple sources (batch and streaming)

- Collaborate with stakeholders to understand business requirements and translate them into technical solutions

- Troubleshoot and resolve performance, data quality, and pipeline issues

- Follow best practices for coding standards, version control, and documentation

Technical Skills & Expertise :

- Strong experience with Apache Spark APIs (RDDs, DataFrames, Spark SQL)

- Deep understanding of Spark execution engine, partitions, shuffles, caching, and joins

- Hands-on experience with Delta Lake (MERGE, OPTIMIZE, Z-ORDER, VACUUM)

- Strong knowledge of Unity Catalog (metastore, catalogs, schemas, privileges)

- Experience in Query Optimization (broadcast joins, indexing, partitioning)

- Proficiency in Databricks Workflows and job scheduling

- Experience with Databricks Asset Bundles (DAB) for environment promotion

- Working knowledge of Snowflake integration with Databricks

- Strong SQL skills for analytics and transformation

Performance Tuning (Must Have) :

- PySpark code optimization and memory management

- Efficient partitioning and bucketing strategies

- Broadcast joins and shuffle optimization

- Caching and persistence strategies

- Cluster configuration and autoscaling optimization

- Monitoring and debugging using Spark UI

Good to Have Skills :

- Experience with streaming frameworks (Structured Streaming, Kafka)

- Knowledge of cloud platforms (Azure / AWS / GCP)

- Experience with CI/CD pipelines for data engineering

- Exposure to BI tools like Power BI or Tableau

- Databricks or cloud certifications

Soft Skills :

- Strong analytical and problem-solving skills

- Excellent communication and collaboration abilities

- Ability to work independently and in agile teams

- Mentoring junior engineers