Posted on: 28/04/2026
Description :
- Design and build scalable data pipelines using PySpark, Python, and SQL for batch and real-time processing
- Architect modern data platforms including Data Warehouses, Data Lakes, and Lakehouse configurations on AWS, Azure, or GCP
- Develop and optimize ETL/ELT workflows with performance tuning, partitioning strategies, and data quality frameworks
- Orchestrate complex data workflows using Airflow DAGs, managing dependencies and monitoring at scale
- Implement data fabric architectures with robust data lineage, cataloging, and governance
- Build data quality frameworks with automated validation, profiling, and anomaly detection
- Work with platforms like Databricks, Snowflake, Redshift, DBT, and NoSQL databases to deliver optimized solutions
- Deploy and manage data infrastructure on cloud platforms (AWS Glue, Athena, S3, Redshift, Lambda, EMR)
- Establish CI/CD pipelines for data workflows using Git, Jenkins, and cloud-native deployment tools
- Lead architecture design discussions, propose technical solutions, and define development standards and best practices
- Create and enforce data engineering best practices including coding standards, testing frameworks, documentation, and deployment patterns
- Build reusable frameworks, templates, and libraries to accelerate team productivity
- Mentor data engineering teams on best practices for scalable data storage, processing, and data quality excellence
- Ensure strict security, compliance, and data privacy throughout all data solutions
- Collaborate with cross-functional teams including Data Scientists, Analytics Engineers, QA, and DevOps
- Deliver solutions in Agile environments with JIRA for project management
Preferred candidate profile :
- 12+ years building production-grade data engineering solutions
- Exceptional Team leader setting the stage for the other data engineers to consistently execute leveraging best practrices
- Strong expertise in Python and PySpark for distributed data processing
- Advanced SQL proficiency including query optimization, window functions, CTEs, and performance tuning
- Deep experience with batch and real-time/streaming data systems (Spark Streaming, Kafka, Kinesis)
- Hands-on experience with modern data platforms : Databricks, Snowflake, Redshift, BigQuery
- Expertise in data modeling techniques : dimensional modeling, star/snowflake schemas, data vault
- Strong knowledge of data warehousing and data lake architectures with hands-on implementation experience
- Proficiency with Airflow for workflow orchestration, DAG design, and operational monitoring
- Deep cloud platform experience (AWS, Azure, GCP) building scalable data solutions
- Experience with data transformation tools like DBT for analytics engineering
- Knowledge of NoSQL databases (DynamoDB, MongoDB, Cassandra) and when to use them
- Understanding of data quality frameworks, data validation, and data profiling techniques
- Experience with data lineage tools and metadata management (Apache Atlas, Collibra, DataHub)
- Proficiency with version control (Git, CodeCommit) and CI/CD pipelines (Jenkins, CodePipeline)
- Strong Unix/Linux and shell scripting skills for automation
- Data governance and compliance knowledge (GDPR, HIPAA, data privacy regulations)
- Performance optimization expertise including indexing, caching, and query tuning
- Experience establishing coding standards, testing strategies, and documentation practices
- Strong problem-solving skills with ability to diagnose issues and architect effective solutions
- Proven ability to mentor junior engineers, lead technical discussions, and drive engineering excellence
- Clear communicator who thrives in collaborative, Agile environments
Bonus Points :
- Life sciences or pharma domain knowledge
- Cloud certifications (AWS Data Analytics, Azure Data Engineer, GCP Data Engineer)
- Experience with streaming technologies (Kafka, Flink, Spark Structured Streaming)
- Knowledge of machine learning pipelines and MLOps practices
- Familiarity with data observability platforms (Monte Carlo, Great Expectations)
- Experience with containerization (Docker, Kubernetes) for data workloads
- Exposure to data mesh architecture patterns
- Knowledge of graph databases (Neo4j) or vector databases for AI applications
- Experience with reverse ETL and data activation platforms
- Terraform or Infrastructure-as-Code for data infrastructure
Education :
- Bachelor's or Master's degree in Computer Science or related field
Did you find something suspicious?
Posted by
Recruiter
HR at TRINITYPARTNERS INDIA LLP
Last Active: NA as recruiter has posted this job through third party tool.
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1631997