HamburgerMenu
hirist

Airflow Data Engineer - Big Data

Virtusa
Multiple Locations
7 - 12 Years

Posted on: 05/11/2025

showcase-imageshowcase-imageshowcase-image

Job Description

Description :

We are looking for an experienced Senior Data Engineer to design, develop, and optimize scalable data pipelines and data processing frameworks that support rapidly growing data volume and complexity. The ideal candidate will have deep expertise in Big Data technologies, Azure cloud services, and data pipeline orchestration tools such as Airflow and Kafka. This individual will play a key role in ensuring the accuracy, reliability, and accessibility of enterprise data, enabling advanced analytics and data-driven decision-making across the organization.

Key Responsibilities :

Data Pipeline Development & Maintenance :

- Design, build, and maintain scalable, efficient, and high-performance ETL/ELT data pipelines to support batch and streaming data workflows.

- Develop solutions that handle large-scale, complex data processing using Apache Spark, Python, and SQL within Hadoop and Azure environments.

- Implement data ingestion frameworks from multiple internal and external sources (APIs, databases, data lakes, streaming platforms).

- Continuously improve existing pipelines for performance optimization, scalability, and data quality assurance.

Data Modeling & Architecture :

- Collaborate with business analysts and data architects to design and improve data models that support business intelligence (BI) and analytics tools.

- Apply advanced data modeling techniques, including Kimball star schema and dimensional modeling, to support analytical and reporting use cases.

- Design and implement data integration strategies between disparate systems, ensuring consistency and traceability across the data ecosystem.

Data Quality, Governance & Monitoring :

- Develop and implement data quality frameworks to ensure accuracy, completeness, and consistency of data across environments.

- Establish data validation and reconciliation mechanisms for critical data pipelines.

- Implement monitoring and alerting systems for production data pipelines to ensure timely detection and resolution of data issues.

- Contribute to data governance and metadata management efforts by documenting lineage, data flows, and transformation logic.

Cloud & Platform Engineering :

- Leverage Azure cloud services including Azure Data Factory (ADF), Azure Databricks, and Azure Blob Storage / Data Lake Storage for scalable data processing.

- Configure and manage workflow orchestration tools such as Apache Airflow, including DAG creation, dependency management, and scheduling.

- Integrate data pipelines with data streaming platforms such as Apache Kafka to support real-time data ingestion and analytics use cases.

- Utilize CI/CD pipelines for automated data pipeline deployment, testing, and monitoring, ensuring robust data operations practices.

Collaboration & Documentation :

- Collaborate closely with Analytics, BI, and Data Science teams to provide reliable and timely data access for business insights.

- Write unit and integration tests to ensure pipeline reliability and maintainability.

- Document code, workflows, and design patterns in the engineering wiki, contributing to internal knowledge sharing and process standardization.

- Participate in agile ceremonies and provide input on sprint planning, prioritization, and technical feasibility assessments.

Qualifications & Technical Skills :

Required Technical Expertise :

- 7+ years of experience in Data Engineering, with a focus on Big Data and Cloud-based solutions.

- 6+ years of hands-on experience with the Hadoop ecosystem, including tools like Hive, Spark, and HDFS.

- Proficiency in Python (PySpark preferred), SQL, and data transformation scripting.

- Strong understanding of data modeling, ETL/ELT design patterns, and data warehousing best practices.

- Expertise in Azure cloud services : ADF, Databricks, Azure Data Lake, and Cosmos DB.

- Proficient in Airflow (DAG creation, scheduling, dependency management).

- Working knowledge of Kafka or similar data streaming technologies for real-time data ingestion and processing.

- Familiarity with CI/CD practices, version control (Git), and deployment automation tools.

- Experience working with relational (SQL) and NoSQL databases for analytical and operational workloads.

Preferred Skills :

- Experience in containerization and orchestration (Docker, Kubernetes).

- Familiarity with data governance and cataloging tools (e.g., Azure Purview, Collibra).

- Exposure to machine learning data pipelines and feature engineering frameworks.

- Knowledge of Delta Lake / Lakehouse architectures for unified batch and stream data management


info-icon

Did you find something suspicious?