Posted on: 28/01/2026
About the Role :
- Design and build scalable ETL pipelines on GCP using Python (Cloud Run, Dataflow) for multiple data sources (API, CSV, XLS, JSON, SDMX).
- Perform schema mapping & data modeling using LLM-based auto-schematization; define Statistical Variables and generate MCF/TMCF.
- Implement entity resolution and standardized ID generation.
- Integrate curated data into the Knowledge Graph with proper versioning and governance.
- Develop and maintain REST & SPARQL APIs using Apigee.
- Ensure data quality, validation, and anomaly detection; troubleshoot ingestion issues.
- Drive automation and optimization by partnering with Automation and Managed Service PODs.
- Collaborate with cross-functional teams and stakeholders.
Mandatory Skills :
- Strong hands-on experience with Google Cloud Platform (GCP) :
- Cloud Storage, Cloud SQL, Cloud Run, Dataflow/Apache Beam, Pub/Sub, BigQuery, Apigee
- Python & SQL for data engineering and pipeline development.
- Proven expertise in data wrangling, cleaning, and transformation across varied data formats.
- Experience with Git-based version control.
Additional Skills :
- Knowledge of data modeling, schema design, and knowledge graphs (Schema.org, RDF, SPARQL, JSON-LD).
- Familiarity with CI/CD (Cloud Build) and Agile delivery.
- Strong problem-solving skills and attention to detail.
Preferred Qualifications :
- Experience with LLM-based data automation / auto-schematization.
- Exposure to large-scale public or open dataset integrations.
- Experience handling multilingual datasets
Did you find something suspicious?
Posted by
Lishta Jain
Senior Talent Acquisition Partner at Cloudsufi India Private Limited
Last Active: 5 Feb 2026
Posted in
Data Engineering
Functional Area
Big Data / Data Warehousing / ETL
Job Code
1606837