Location : Chennai/Bangalore/Hyderabad/Coimbatore/ Pune

WFO : 3 days Mandatory from the above-mentioned locations.

Role Summary :

We are seeking a highly skilled PySpark Developer with hands-on experience in Databricks to join Companies IT Systems Development unit in an offshore capacity. This role focuses on designing, building, and optimizing large-scale data pipelines and processing solutions on the Databricks Unified Analytics Platform. The ideal candidate will have expertise in big data frameworks, distributed computing, and cloud platforms, with a deep understanding of Databricks architecture. This is an excellent opportunity to work with cutting-edge technologies in a dynamic, fast-paced environment.

Role Responsibilities :

Data Engineering and Processing :

- Develop and manage data pipelines using PySpark on Databricks.

- Implement ETL/ELT processes to process structured and unstructured data at scale.

- Optimize data pipelines for performance, scalability, and cost-efficiency in Databricks.

Databricks Platform Expertise :

- Experience in Perform Design, Development & Deployment using Azure Services (Data Factory, Databricks, PySpark, SQL)

- Develop and maintain scalable data pipelines and build new Data Source integrations to support increasing data volume and complexity.

- Leverage the Databricks Lakehouse architecture for advanced analytics and machine learning workflows.

- Manage Delta Lake for ACID transactions and data versioning.

- Develop notebooks and workflows for end-to-end data solutions.

Cloud Platforms and Deployment :

- Deploy and manage Databricks on Azure (e.g., Azure Databricks).

- Use Databricks Jobs, Clusters, and Workflows to orchestrate data pipelines.

- Optimize resource utilization and troubleshoot performance issues on the Databricks platform.

CI/CD and Testing :

- Build and maintain CI/CD pipelines for Databricks workflows using tools like Azure DevOps, GitHub Actions, or Jenkins.

- Write unit and integration tests for PySpark code using frameworks like Pytest or unittest.

Collaboration and Documentation :

- Work closely with data scientists, data analysts, and IT teams to deliver robust data solutions.

- Document Databricks workflows, configurations, and best practices for internal use.

Technical Qualifications :

Experience :

- 4+ years of experience in data engineering or distributed systems development.

- Strong programming skills in Python and PySpark.

- Hands-on experience with Databricks and its ecosystem, including Delta Lake and Databricks SQL.

- Knowledge of big data frameworks like Hadoop, Spark, and Kafka.

Databricks Expertise :

- Proficiency in setting up and managing Databricks Workspaces, Clusters, and Jobs.

- Familiarity with Databricks MLflow for machine learning workflows is a plus.

Cloud Platforms :

- Expertise in deploying Databricks solutions Azure (e.g., Data Lake, Synapse).

- Knowledge of Kubernetes for managing containerized workloads is advantageous.

Database Knowledge :

- Experience with both SQL (e.g., PostgreSQL, SQL Server) and NoSQL databases (e.g., MongoDB, Cosmos DB).

General Qualifications :

- Strong analytical and problem-solving skills.

- Ability to manage multiple tasks in a high-intensity, deadline-driven environment.

- Excellent communication and organizational skills.

- Experience in regulated industries like insurance is a plus.

Education Requirements :

- A Bachelors Degree in Computer Science, Data Engineering, or a related field is preferred.

- Relevant certifications in Databricks, PySpark, or cloud platforms are highly desirable.