CACTUS is a remote-first organization and we embrace an accelerate from anywhere culture. You may be required to travel to our Mumbai office based on business requirements or for company/team events.

Job Description :

We are looking for a Data Engineer to build and maintain the robust data foundations required for high-impact AI/ML projects. In this role, you will design scalable data pipelines, develop sophisticated ETL processes, and ensure the integrity of datasets sourced from diverse platforms. If you are passionate about optimizing data flow performance and implementing governance practices that align with standards , this role offers the chance to play a vital part in shaping secure, data-driven solutions.

Responsibilities :

- Design, implement, and maintain robust data pipelines supporting AI/ML models.

- Develop ETL processes for ingesting data from multiple sources including APIs, databases, and flat files.

- Ensure data integrity, lineage, and compliance with metadata standards.

- Collaborate with Data Science and AI/ML teams to optimize datasets for model consumption.

- Implement data versioning and quality validation routines.

- Monitor data flow performance and optimize for latency and throughput.

- Apply data governance practices aligned with responsible AI framework and practices.

Requirements :

- B.Tech / M.Tech in Computer Science, Information Systems, or Data Engineering.

- Certification in Big Data / Cloud Data Platforms (AWS, Azure, GCP) preferred.

- 4 to 7 years in designing and implementing scalable data pipelines and integration frameworks.

- Strong understanding of ETL, data quality, and schema design in distributed systems.

- Experience in integrating structured, semi-structured, and unstructured data for AI/ML projects.

Technical Competencies :

- Programming : Python, SQL, Scala.

- Data Tools : Apache Airflow, Kafka, Spark, NiFi.

- Databases : PostgreSQL, MongoDB, BigQuery, Snowflake.

- ETL & Warehousing : Talend, AWS Glue, Azure Data Factory.

- Data Management : Delta Lake, DataBricks, Hive.

- Cloud Data : AWS (S3, RDS, Lambda), Azure (Data Factory, Storage), GCP (BigQuery, Cloud Storage).

- Tools : Docker, Git, data modelling tools, basic infrastructure automation.

- Streaming : Apache Kafka, AWS Kinesis, real-time data processing

- Best Practices : Data validation, error handling, and pipeline observability.

Did you find something suspicious?

Similar jobs that you might be interested in

Posted by

Melissa Dias

Head - Global Talent Acquisition at Cactus Communications Pvt. Ltd.

Last Active: 8 Apr 2026

Job Views:
212

Applications: 145

Recruiter Actions: 0

Posted in

Data Engineering

Functional Area

Data Engineering

Job Code

1626881

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers