Description :

- Design scalable batch and real-time data pipelines across structured and unstructured sources

- Integrate analytics systems with annotation tools and ML validation platforms for traceability

- Develop ETL/ELT workflows using Glue, PySpark, or Airflow with data quality controls

- Implement observability pipelines and alerts for throughput, quality, and latency metrics

- Build data models and queries powering dashboards via Athena, QuickSight, or Redash

- Contribute to cloud deployments, CI/CD pipelines, and infrastructure-as-code practices

- 3+ years experience in data engineering or backend development in data-intensive systems

- Strong proficiency in Python and SQL

- Hands-on experience with AWS services (S3, Lambda, Glue, Kinesis, Firehose, RDS)

- Experience with distributed data processing frameworks such as Spark or Hadoop

- Working knowledge of data lake and warehouse architectures (Delta Lake, Redshift, Snowflake)

- Experience building production-grade, resilient data pipelines

- Working knowledge of messaging frameworks like Kafka or Firehose

- Strong understanding of relational databases and database fundamentals

- Experience designing and consuming performant APIs

Process :

- HR Screening

- Technical Round

- Assignment

- Hiring Manager Round