Posted on: 18/12/2025
Description :
Role : Databricks Developer
Experience : 4+ Years
Location : Remote
Employment Type : Full-time
Role Summary :
We are seeking a highly technical Databricks Developer to lead the architecture and implementation of scalable data solutions on our AWS-based Databricks platform.
In this role, you will be responsible for the end-to-end data lifecycle, converting raw data into actionable insights through high-performance ETL pipelines.
You will leverage Apache Spark and PySpark to process massive datasets while ensuring that the infrastructure remains cost-effective and secure.
This position requires a deep understanding of the Lakehouse paradigm and the ability to integrate diverse cloud servicesincluding AWS and Azure Data Factoryinto a unified data strategy.
Technical Responsibilities
- End-to-End Pipeline Engineering : Design and deploy sophisticated ETL/ELT workflows using PySpark, Spark SQL, and Python to ingest and transform structured and unstructured data into optimized formats.
- Lakehouse Architecture Implementation : Architect and manage the Medallion Architecture (Bronze, Silver, Gold layers) within Delta Lake to ensure data reliability, ACID transactions, and schema enforcement for downstream analytics.
- Spark Performance Tuning : Conduct deep-dive performance optimization of Spark jobs by tuning configurations, managing shuffle partitions, implementing broadcast joins, and leveraging caching strategies to reduce latency.
- Multi-Cloud Integration : Orchestrate data movement between AWS S3 and other environments using tools like Azure Data Factory (ADF) and AWS Glue to support a hybrid or multi-cloud data ecosystem.
- Data Transformation & Logic : Implement complex business logic, aggregations, and data cleansing routines within Databricks notebooks and jobs to maintain high data quality.
- Cluster Management & Cost Control : Monitor and troubleshoot Databricks clusters and jobs, implementing auto-scaling policies and optimizing resource allocation to balance performance with cloud spend.
- Security & Governance Implementation : Enforce fine-grained access control and data governance policies using Unity Catalog, ensuring compliance with organizational security standards and data lineage requirements.
- CI/CD & DevOps Integration : Collaborate with DevOps teams to integrate data workflows into automated CI/CD pipelines using Terraform, GitHub Actions, or Azure DevOps for seamless deployment.
- Technical Documentation : Maintain rigorous documentation of data models, pipeline architectures, and codebases to facilitate knowledge sharing and system maintainability.
- Stakeholder Collaboration : Partner with Data Architects and Business Analysts to translate high-level business requirements into robust, production-ready technical designs.
Required Technical Skills
- Core Frameworks : Expert-level proficiency in Apache Spark and PySpark is mandatory for this role.
- Storage Technologies : Hands-on experience with Delta Lake, including features like Time Travel, Z-Ordering, and Delta Live Tables (DLT).
- Cloud Infrastructure : Deep understanding of the AWS ecosystem (S3, IAM, EC2) and experience with Azure Data Factory for orchestration.
- Programming Languages : Advanced skills in Python and SQL; familiarity with Scala is highly desirable.
- Database Management : Strong knowledge of data warehousing concepts, dimensional modeling, and Star/Snowflake schemas.
Preferred Skills
- Data Governance : Experience implementing Unity Catalog for centralized discovery and access control across Databricks workspaces.
- Workflow Orchestration : Advanced experience with Airflow or Databricks Workflows for complex task dependency management.
- Stream Processing : Knowledge of Spark Structured Streaming for real-time data ingestion and processing.
- Infrastructure as Code : Proficiency in Terraform or CloudFormation to automate the provisioning of Databricks workspaces and clusters.
- Observability : Experience with Databricks monitoring tools and integrating with Prometheus or Grafana for pipeline health dashboards.
- Certifications : Databricks Certified Data Engineer Associate/Professional or AWS Certified Data Analytics.
Did you find something suspicious?
Posted by
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1592529
Interview Questions for you
View All