HamburgerMenu
hirist

Job Description

Description :


Responsibilities :


- Set up workflows and orchestration processes to streamline data pipelines and ensure efficient data movement within the Azure ecosystem.


- Create and configure compute resources within Databricks, including All-Purpose and SQL Compute and Job Clusters to support data processing and analysis.


- Set up and manage Azure Data Lake (ADLS) Gen 2 storage accounts and establish a seamless integration with Databricks Workspace for data ingestion and processing.


- Create and manage Service Principals, key vaults to securely authenticate and authorize access to Azure resources.


- Utilize ETL (Extract, Transform, Load) techniques to design and implement data warehousing solutions and ensure compliance with data governance policies.


- Develop highly automated ETL scripts for data processing.


- Scale infrastructure resources based on workload requirements, optimizing performance and cost-efficiency.


- Profile new data sources in a different format, including CSVs, JSONs, etc.


- Apply problem-solving skills to address complex business and technical challenges, such as data quality issues, performance bottlenecks, and system failures.


- Demonstrate excellent soft skills and the ability to effectively communicate and collaborate with clients, stakeholders, and cross-functional teams.


- Implement Continuous Integration/Continuous Deployment (CI/CD) practices to automate the deployment and testing of data pipelines and infrastructure changes.


- Delivering tangible value very rapidly, collaborating with diverse teams of varying backgrounds and disciplines.


- Codifying best practices for future reuse in the form of accessible, reusable patterns, templates, and code bases.


- Manage timely, appropriate communication and relationships with clients, partners, and other stakeholders.


- Create and manage periodic reporting of project execution status and other trackers in standard accepted formats.


Requirements :


- Exp in the Data Engineering domain : 2+ Years.


- Skills : SQL, Python, PySpark. Spark, Distributed Systems.


- Azure Databricks, Azure Data Factory, ADLS Gen 2 Blob Storage


- Key Vaults, Azure DevOps.


- ETL, Building Data Pipelines, Data Warehousing, Data Modelling, and Governance.


- Agile Practices, SDLC, Multi-year experience with Azure-Databricks ecosystem and PySpark.


- Ability to write clean, concise, and organized PySpark code.


- Ability to break down the project into executable steps, prepare a DFD, and execute the same.


- Propose innovative DE solutions to achieve business objectives. Quick on his feet, good at tech, and has logically complex communication.


- Good Knowledge of ADF, Docker /containerization.


Good to Have :


- Event Hubs, Logic Apps.


- Power BI Competitive coding and knows most PySpark syntax by heart.


info-icon

Did you find something suspicious?