Databricks Developer - ETL/ELT Workflows

VY Systems Pvt Ltd.

Multiple Locations

4 - 8 Years

3.8

79+ Reviews

Azure Databricks Apache Spark PySpark AWS Azure Data Factory Data Strategy ETL Python SQL Data Warehousing

Posted on: 18/12/2025

Job Description

Description :

Role : Databricks Developer

Experience : 4+ Years

Location : Remote

Employment Type : Full-time

Role Summary :

We are seeking a highly technical Databricks Developer to lead the architecture and implementation of scalable data solutions on our AWS-based Databricks platform.

In this role, you will be responsible for the end-to-end data lifecycle, converting raw data into actionable insights through high-performance ETL pipelines.

You will leverage Apache Spark and PySpark to process massive datasets while ensuring that the infrastructure remains cost-effective and secure.

This position requires a deep understanding of the Lakehouse paradigm and the ability to integrate diverse cloud servicesincluding AWS and Azure Data Factoryinto a unified data strategy.

Technical Responsibilities

- End-to-End Pipeline Engineering : Design and deploy sophisticated ETL/ELT workflows using PySpark, Spark SQL, and Python to ingest and transform structured and unstructured data into optimized formats.

- Lakehouse Architecture Implementation : Architect and manage the Medallion Architecture (Bronze, Silver, Gold layers) within Delta Lake to ensure data reliability, ACID transactions, and schema enforcement for downstream analytics.

- Spark Performance Tuning : Conduct deep-dive performance optimization of Spark jobs by tuning configurations, managing shuffle partitions, implementing broadcast joins, and leveraging caching strategies to reduce latency.

- Multi-Cloud Integration : Orchestrate data movement between AWS S3 and other environments using tools like Azure Data Factory (ADF) and AWS Glue to support a hybrid or multi-cloud data ecosystem.

- Data Transformation & Logic : Implement complex business logic, aggregations, and data cleansing routines within Databricks notebooks and jobs to maintain high data quality.

- Cluster Management & Cost Control : Monitor and troubleshoot Databricks clusters and jobs, implementing auto-scaling policies and optimizing resource allocation to balance performance with cloud spend.

- Security & Governance Implementation : Enforce fine-grained access control and data governance policies using Unity Catalog, ensuring compliance with organizational security standards and data lineage requirements.

- CI/CD & DevOps Integration : Collaborate with DevOps teams to integrate data workflows into automated CI/CD pipelines using Terraform, GitHub Actions, or Azure DevOps for seamless deployment.

- Technical Documentation : Maintain rigorous documentation of data models, pipeline architectures, and codebases to facilitate knowledge sharing and system maintainability.

- Stakeholder Collaboration : Partner with Data Architects and Business Analysts to translate high-level business requirements into robust, production-ready technical designs.

Required Technical Skills

- Core Frameworks : Expert-level proficiency in Apache Spark and PySpark is mandatory for this role.

- Storage Technologies : Hands-on experience with Delta Lake, including features like Time Travel, Z-Ordering, and Delta Live Tables (DLT).

- Cloud Infrastructure : Deep understanding of the AWS ecosystem (S3, IAM, EC2) and experience with Azure Data Factory for orchestration.

- Programming Languages : Advanced skills in Python and SQL; familiarity with Scala is highly desirable.

- Database Management : Strong knowledge of data warehousing concepts, dimensional modeling, and Star/Snowflake schemas.

Preferred Skills

- Data Governance : Experience implementing Unity Catalog for centralized discovery and access control across Databricks workspaces.

- Workflow Orchestration : Advanced experience with Airflow or Databricks Workflows for complex task dependency management.

- Stream Processing : Knowledge of Spark Structured Streaming for real-time data ingestion and processing.

- Infrastructure as Code : Proficiency in Terraform or CloudFormation to automate the provisioning of Databricks workspaces and clusters.

- Observability : Experience with Databricks monitoring tools and integrating with Prometheus or Grafana for pipeline health dashboards.

- Certifications : Databricks Certified Data Engineer Associate/Professional or AWS Certified Data Analytics.