Location : Pune / Bangalore

Experience : 5- 9 Years

Role Summary :

The Senior QA/Data Tester is responsible for ensuring the technical integrity, accuracy, and performance of enterprise-scale data pipelines. This role goes beyond traditional functional testing, requiring a deep understanding of Big Data architectures to validate complex ETL/ELT processes.

You will be the gatekeeper of data quality, working closely with Data Engineers to verify Spark applications, Hive schemas, and large-scale data movements across Hadoop and object storage environments.

Responsibilities :

- Design and execute comprehensive data validation strategies for complex ETL/ELT pipelines, ensuring 100% data accuracy from source to target.

- Develop and maintain automated testing frameworks using Python and Shell scripting to validate large datasets in Hadoop and Object Storage environments.

- Perform advanced SQL validation (Oracle/Hive) to verify data transformations, aggregations, and business logic across distributed databases.

- Validate Spark/Scala applications by performing unit, integration, and system testing on large-scale distributed workloads.

- Conduct end-to-end testing of data flows orchestrated via Apache NiFi, ensuring proper data routing, transformation, and error handling.

- Analyze complex data structures and schemas to identify data anomalies, missing records, and performance bottlenecks within the pipeline.

- Implement data reconciliation scripts to verify parity between legacy Data Warehouses and modern Big Data lakes.

- Collaborate with Agile delivery teams to define "Definition of Done" for data features and ensure quality is integrated into the CI/CD pipeline.

- Document detailed test plans, test cases, and defect reports, providing clear root-cause analysis for data-related failures.

- Manage test data environments and generate synthetic datasets to simulate high-volume production scenarios.

- Coordinate with cross-functional stakeholders to manage environment dependencies and ensure seamless deployment across SIT and UAT environments.

Technical Requirements :

- 5- 9 years of dedicated QA experience with a specialized focus on Data Warehousing, ETL, or Big Data Engineering projects.

- Expert-level proficiency in SQL, with the ability to write complex joins, subqueries, and analytical functions in Oracle or Hive.

- Hands-on experience testing Spark applications (Spark Core, Spark SQL) running on Hadoop or Cloud Object Storage.

- Proficiency in Python and Unix/Shell scripting for automating repetitive testing tasks and building custom validation tools.

- Strong understanding of Big Data components such as HDFS, YARN, and Hive Metastore.

- Solid experience working in an Agile/Scrum environment, utilizing tools like JIRA and ALM for defect tracking and sprint management.

- Proven ability to perform white-box testing by reviewing code logic in Scala or Python to identify potential failure points.

Preferred Skills :

- Hands-on experience with Apache NiFi for validating real-time data ingestion and flow management.

- Familiarity with Data Quality (DQ) tools like Great Expectations, Deequ, or Informatica DVO.

- Experience with CI/CD tools (Jenkins/GitLab) and integrating automated data tests into deployment pipelines.

- Knowledge of NoSQL databases (HBase, Cassandra, or MongoDB) and their respective testing methodologies.

- Exposure to cloud data platforms (AWS Glue, EMR, or Snowflake) is a significant plus.

- Understanding of Data Privacy and Security testing, including data masking and PII validation.

- Strong analytical mindset with a proactive approach to identifying edge cases in massive datasets.