HamburgerMenu
hirist

Skan.ai - Data Engineer - Azure Databricks/PySpark

Skan ai
Anywhere in India/Multiple Locations
3 - 5 Years

Posted on: 14/07/2025

Job Description

Job Summary :


We are seeking a skilled Data Engineer with 3 to 5 years of experience in building scalable data pipelines and solutions, with strong hands-on expertise in Databricks.


The ideal candidate should be proficient in working with large-scale data processing frameworks and have a solid understanding of Delta Lake, PySpark, and cloud-based data platforms.


Key Responsibilities :


- Design, build, and maintain robust ETL/ELT pipelines using Databricks (PySpark/SQL).


- Develop and optimize data workflows and pipelines on Delta Lake and Databricks Lakehouse architecture.


- Integrate data from multiple sources, ensuring data quality, reliability, and performance.


- Collaborate with data scientists, analysts, and business stakeholders to translate requirements into scalable data solutions.


- Monitor and troubleshoot production data pipelines; ensure performance and cost optimization.


- Work with DevOps teams for CI/CD integration and automation of Databricks jobs and notebooks.


- Maintain metadata, documentation, and versioning for data pipelines and assets.


Required Skills :


- 35 years of experience in data engineering or big data development.


- Strong hands-on experience with Databricks (Notebook, Jobs, Workflows).


- Proficiency in PySpark, Spark SQL, and Delta Lake.


- Experience working with Azure or AWS (preferably Azure Data Lake, Blob Storage, Synapse, etc.)


- Strong SQL skills for data manipulation and analysis.


- Familiarity with Git, CI/CD pipelines, and job orchestration tools (e.g., Airflow, Databricks Workflows).


- Understanding of data modeling, data warehousing, and data governance best practices.


Preferred Qualifications :


- Databricks certification (e.g., Databricks Certified Data Engineer Associate).


- Experience with Power BI, Snowflake, or Synapse Analytics is a plus.


- Exposure to streaming data pipelines (e.g., using Kafka, Structured Streaming).


- Understanding of cost optimization and performance tuning in Databricks


info-icon

Did you find something suspicious?