HamburgerMenu
hirist

Cactus Communications - Data Engineer - ETL/Data Warehousing

Cactus Communications Pvt. Ltd.
4 - 7 Years
Mumbai

Posted on: 08/04/2026

Job Description

Overview :


CACTUS is a remote-first organization and we embrace an accelerate from anywhere culture. You may be required to travel to our Mumbai office based on business requirements or for company/team events.


Job Description :


We are looking for a Data Engineer to build and maintain the robust data foundations required for high-impact AI/ML projects. In this role, you will design scalable data pipelines, develop sophisticated ETL processes, and ensure the integrity of datasets sourced from diverse platforms. If you are passionate about optimizing data flow performance and implementing governance practices that align with standards , this role offers the chance to play a vital part in shaping secure, data-driven solutions.


Responsibilities :

- Design, implement, and maintain robust data pipelines supporting AI/ML models.



- Develop ETL processes for ingesting data from multiple sources including APIs, databases, and flat files.



- Ensure data integrity, lineage, and compliance with metadata standards.



- Collaborate with Data Science and AI/ML teams to optimize datasets for model consumption.



- Implement data versioning and quality validation routines.



- Monitor data flow performance and optimize for latency and throughput.



- Apply data governance practices aligned with responsible AI framework and practices.


Requirements :

- B.Tech / M.Tech in Computer Science, Information Systems, or Data Engineering.

- Certification in Big Data / Cloud Data Platforms (AWS, Azure, GCP) preferred.

- 4 to 7 years in designing and implementing scalable data pipelines and integration frameworks.

- Strong understanding of ETL, data quality, and schema design in distributed systems.

- Experience in integrating structured, semi-structured, and unstructured data for AI/ML projects.


Technical Competencies :

- Programming : Python, SQL, Scala.

- Data Tools : Apache Airflow, Kafka, Spark, NiFi.

- Databases : PostgreSQL, MongoDB, BigQuery, Snowflake.

- ETL & Warehousing : Talend, AWS Glue, Azure Data Factory.

- Data Management : Delta Lake, DataBricks, Hive.

- Cloud Data : AWS (S3, RDS, Lambda), Azure (Data Factory, Storage), GCP (BigQuery, Cloud Storage).

- Tools : Docker, Git, data modelling tools, basic infrastructure automation.

- Streaming : Apache Kafka, AWS Kinesis, real-time data processing

- Best Practices : Data validation, error handling, and pipeline observability.



info-icon

Did you find something suspicious?

Similar jobs that you might be interested in