About the Company :

Zemoso Technologies is a Software Product Market Fit Studio that brings Silicon Valley style rapid prototyping and rapid application builds to Entrepreneurs and Corporate innovation. We offer Innovation as a service and work on ideas from scratch and take it to the Product Market Fit stage using Design Thinking->Lean Execution>Agile Methodology.

We were featured as one of Deloitte's Fastest 50 growing tech companies from India thrice (2016, 2018, and 2019). We were also featured in Deloitte Technology Fast 500 Asia Pacific both in 2016 and 2018.

We are located in Hyderabad, India, and Dallas, US. We have recently incorporated another office in Waterloo, Canada. Our founders have had past successes founded a decision management company acquired by SAP AG (now part of Hana Big Data stack & NetWeaver BPM), early engineering team of Zoho (leading billion$ SaaS player) & some Private Equity experience. Marquee customers, along with some exciting start-ups, are part of our clientele.

Location - Pune/Mumbai/Chennai/Hyderabad/Bangalore (Hybrid)

Role Overview :

We are looking for a Senior Data Engineer who can own and deliver complex data engineering solutions, working across data pipelines, transformations, and cloud-based data platforms.

This role bridges hands-on execution and solution design, ensuring reliability, scalability, and clarity in data systems used for analytics and AI-driven use cases.

Key Responsibilities :

Data Engineering & Solution Ownership :

- Design, build, and maintain robust and scalable data pipelines using Python, SQL, and modern data engineering frameworks.

- Own sub-systems or end-to-end data workflows, from ingestion to curated datasets.

- Work with structured and semi-structured data across data warehouses, data lakes, and cloud storage.

Pipeline Orchestration & Processing :

- Develop and manage data workflows using orchestration tools such as Apache Airflow or equivalent frameworks.

- Implement ETL / ELT pipelines using tools such as Databricks, PySpark, or Spark SQL, depending on project needs.

- Handle data migrations and integrations from multiple source systems.

Cloud & Platform Enablement :

- Build and deploy data pipelines in cloud environments (AWS / Azure / GCP).

- Work with cloud-native services such as object storage (S3 / ADLS), compute clusters, and containerized environments where applicable.

- Optimize pipelines and clusters for performance, reliability, and cost efficiency.

Data Quality & Reliability :

- Implement data validation, quality checks, and monitoring to ensure data accuracy and consistency.

- Troubleshoot and resolve pipeline failures and performance issues.

- Contribute to maintaining data models, schemas, and technical documentation.

Collaboration & Mentoring :

- Collaborate closely with data scientists, analysts, backend engineers, and product teams.

- Guide and mentor Data Engineers, providing code and design reviews.

- Clearly communicate design decisions, trade-offs, and technical concepts to stakeholders.

Must-Have Skills :

- Strong proficiency in Python and SQL for data processing and transformations.

- Solid experience with data modeling, data warehouses, and data lakes.

- Hands-on experience with Databricks, Spark / PySpark, or similar distributed data processing frameworks.

- Experience designing and managing ETL / ELT pipelines.

- Working knowledge of cloud platforms (AWS / Azure / GCP).

- Experience with Git-based version control and collaborative development workflows.

- Ability to work with APIs and external data sources.

Nice-to-Have Skills :

- Experience with workflow orchestration tools beyond Airflow.

- Exposure to Kubernetes, Docker, or containerized data workloads.

- Familiarity with data visualization tools (Tableau, Power BI, etc.).

- Understanding of big data ecosystems (Hadoop, HDFS, YARN, Snowflake).

- Experience with data security, governance, or compliance-related data flows.

- Relevant certifications in data engineering or cloud technologies.

What Success Looks Like in This Role :

- You independently own and deliver complex data pipelines or data subsystems.

- You reduce ambiguity for junior engineers through clear design and guidance.

- Your pipelines are reliable, observable, and production-ready.

- You contribute to improving standards, practices, and documentation across the data engineering team.