Posted on: 01/12/2025
JOB DESCRIPTION :
Responsibilities :
- Collaborate with Data Engineers and Data Scientists to integrate and process structured and unstructured data sets into actionable insights.
- Optimize PySpark jobs and data pipelines for performance, scalability, and reliability.
- Conduct regular financial risk assessments to identify potential vulnerabilities in data processing workflows.
- Ensure data quality and integrity throughout all stages of data processing.
- Develop and implement strategies to mitigate financial risks associated with data transformation and aggregation.
- Troubleshoot and debug issues related to data pipelines and processing.
- Ensure compliance with regulatory requirements and industry standards in all data processing activities.
- Implement best practices for data security, compliance, and privacy within Azure environment.
- Document technical specifications, data flows, and solution architecture
- Design and build reusable components, frameworks and libraries at scale to support analytics products
- Design and implement product features in collaboration with business and Technology stakeholders
- Design and develop scalable data pipelines using Azure Databricks and PySpark. - Transform raw data into actionable insights through advanced data engineering techniques.
- Build, deploy, and maintain machine learning models using MLlib, TensorFlow, and MLflow.
- Optimize data integration workflows from Azure Blob Storage, Data Lake, and SQL/NoSQL sources.
- Execute large-scale data processing using Spark Pools, fine-tuning configurations for performance and cost-efficiency.
- Collaborate with data scientists, analysts, and business stakeholders to deliver robust data solutions.
- Maintain and enhance Databricks notebooks and Delta Lake architectures.
- Architect, build, and maintain scalable and reliable data pipelines from diverse data sources.
- Design effective data storage, retrieval mechanisms, and data models to support analytics and business needs.
- Implement data validation, transformation, and quality monitoring processes.
- Collaborate with cross-functional teams to deliver impactful, data-driven solutions.
- Proactively identify bottlenecks and optimize existing workflows and processes. - Provide guidance and mentorship to junior engineers in the team.
- Anticipate, identify and solve issues concerning data management to improve data quality
- Clean, prepare and optimize data at scale for ingestion and consumption
- Drive the implementation of new data management projects and re-structure of the current data architecture
- Implement complex automated workflows and routines using workflow scheduling tools
- Build continuous integration, test-driven development and production deployment frameworks
- Drive collaborative reviews of design, code, test plans and dataset implementation performed by other data engineers in support of maintaining data engineering standards
- Analyze and profile data for the purpose of designing scalable solutions
- Troubleshoot complex data issues and perform root cause analysis to proactively resolve product and operational issues
- Mentor and develop other data engineers in adopting best practices
Qualifications :
- 3+ years experiencing developing scalable Big Data applications or solutions on distributed platforms
- Able to partner with others in solving complex problems by taking a broad perspective to identify innovative solutions
- Strong skills building positive relationships across Product and Engineering.
- Able to influence and communicate effectively, both verbally and written, with team members and business stakeholders
- Able to quickly pick up new programming languages, technologies, and frameworks
- Experience working in Agile and Scrum development process
- Experience working in a fast-paced, results-oriented environment
- Experience in Amazon Web Services (AWS) or other cloud platform tools
- Experience working with Data warehousing tools, including Dynamo DB, SQL, Amazon Redshift, and Snowflake
- Experience architecting data product in Streaming, Serverless and Microservices Architecture and platform.
- Experience working with Data platforms, including EMR, Data Bricks etc
- Experience working with distributed technology tools, including Spark, Presto, Scala, Python, Databricks, Airflow
- Working knowledge of Data warehousing, Data modelling, Governance and Data Architecture
- Working knowledge of Reporting & Analytical tools such as Tableau, Quick site etc.
- Demonstrated experience in learning new technologies and skills
- Bachelor's degree in Computer Science, Information Systems, Business, or other relevant subject area
Did you find something suspicious?
Posted By
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1583127
Interview Questions for you
View All