Artificial Intelligence

Machine Learning

NLP

Security Architect - AI

Cloud Architect - ML/AI

Emerging Technologies

DevOps / SRE

CyberSecurity

Quality Assurance

Platform Engineering / SAP/Oracle

Senior Data Engineer - Big Data/PySpark

Virtusa

Anywhere in India/Multiple Locations

5 - 8 Years

Data Engineering Azure Databricks PySpark Confluent Kafka Data Pipeline Big Data AWS Azure

Posted on: 27/11/2025

Job Description

Description :

We are looking for a highly skilled Senior Data Engineer with strong experience in building scalable, resilient, and high-performance data pipelines across batch and streaming architectures. The ideal candidate will have deep expertise in cloud platforms (preferably Azure), big-data frameworks, CDC pipelines, and modern Lakehouse technologies such as Delta Lake and Databricks. You will collaborate with cross-functional teams, design end-to-end ingestion frameworks, optimize performance, and ensure data quality across all layers of the platform.

Key Responsibilities :

Data Ingestion & Processing :

- Design and implement streaming data ingestion pipelines using Apache Kafka, Confluent Cloud, and Delta Live Tables.

- Build and maintain batch ingestion workflows leveraging Databricks, PySpark, and cloud-native orchestration tools.

- Utilize Confluent Kafka Connectors for CDC pipelines, including configuration, monitoring, troubleshooting, and optimization.

- Implement Change Data Capture (CDC) solutions using Debezium, SQL Server CDC, Oracle GoldenGate, or equivalent tools.

- Develop efficient PySpark jobs for large-scale data transformations, enrichment, validations, and upsert workloads.

- Ensure performance tuning, including optimizing Spark jobs, partitioning strategies, shuffle minimization, and resource utilization.

Data Quality, Monitoring & Operations :

- Implement robust data validation, deduplication, and reconciliation processes, ensuring high-quality and reliable data loads.

- Analyze and document data load performance metrics, including time taken, throughput, and scalability benchmarks.

- Monitor pipelines for partial loads, duplicate detection, schema evolution, and handle operational exceptions.

- Build alerting, logging, and operational dashboards using tools such as Datadog, Azure Monitor, or CloudWatch.

Architecture & Modelling :

- Design scalable Lakehouse architectures using Delta Lake, Unity Catalog, and medallion patterns (Bronze/Silver/Gold).

- Develop data models, partitioning strategies, and table optimization techniques (Z-Ordering, VACUUM, OPTIMIZE).

- Implement best practices for metadata management, governance, and lineage.

Cloud Platform Expertise :

- Hands-on experience with Azure (ADLS Gen2, ADF, Event Hub, Azure Functions, Synapse, Key Vault, Azure DevOps).

- Exposure to AWS services such as S3, Glue, Lambda, Kinesis, and IAM is a plus.

- Build and optimize compute clusters using Databricks (cluster configs, job clusters, autoscaling, DBX CLI).

Programming & Tools :

- Strong programming experience with Python, Scala, or Java, focusing on data transformation and distributed processing.

- Advanced SQL skills for analytics, transformations, and performance optimization.

- Familiarity with Git, CI/CD pipelines, code reviews, and automated deployments.

- Experience with infrastructure-as-code tools like Terraform, ARM templates, or CloudFormation is advantageous.

Collaboration & Leadership :

- Work with product owners, data modelers, and business stakeholders to interpret requirements and translate them into scalable technical solutions.

- Provide mentorship to junior engineers and establish engineering best practices across the team.

- Communicate complex technical concepts clearly to both technical and non-technical stakeholders.

Required Qualifications :

- 5+ years of experience designing and building data pipelines using Apache Spark, Databricks, or equivalent big-data frameworks.

- Strong hands-on knowledge of Kafka, Confluent Cloud, Event Hub, and messaging/streaming ecosystems.

- Expertise in CDC pipelines, relational databases (SQL Server, Oracle, PostgreSQL), and event-driven processing.

- Experience with Azure or AWS cloud services for data ingestion, compute, and orchestration.

- Solid understanding of data warehousing, Lakehouse architectures, Delta Lake, and data modelling.

- Proficiency with DevOps, Git-based workflows, and CICD frameworks.

- Strong analytical, problem-solving, and communication skills.

Preferred Qualifications :

- Experience with Delta Live Tables (DLT), Databricks Workflows, and Unity Catalog.

- Knowledge of RabbitMQ, Azure Event Hub, or other messaging platforms.

- Exposure to ML pipelines or feature engineering frameworks (nice to have).

- Certifications in Azure, Databricks, or Confluent.

Did you find something suspicious?

Posted By

Lakshmi Prasanna

TAG Associate Consultant at Virtusa

Last Active: 3 Dec 2025

Job Views:
22

Applications: 8

Recruiter Actions: 5

Posted in

Data Engineering

Functional Area

Data Engineering

Job Code

1581896

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers