AI/ML

Artificial Intelligence

Machine Learning

Security Architect - AI

Cloud Architect - ML/AI

Emerging Technologies

DevOps / SRE

CyberSecurity

Quality Assurance

Platform Engineering / SAP/Oracle

Trinity - Data Engineer - PySpark/Python/SQL

TRINITYPARTNERS INDIA LLP

12 - 20 Years

Bangalore

Data Engineering Data Pipeline PySpark Python SQL Data Warehousing DataLake AWS Azure Google Cloud Platform ETL

Posted on: 28/04/2026

Job Description

Description :

- Design and build scalable data pipelines using PySpark, Python, and SQL for batch and real-time processing

- Architect modern data platforms including Data Warehouses, Data Lakes, and Lakehouse configurations on AWS, Azure, or GCP

- Develop and optimize ETL/ELT workflows with performance tuning, partitioning strategies, and data quality frameworks

- Orchestrate complex data workflows using Airflow DAGs, managing dependencies and monitoring at scale

- Implement data fabric architectures with robust data lineage, cataloging, and governance

- Build data quality frameworks with automated validation, profiling, and anomaly detection

- Work with platforms like Databricks, Snowflake, Redshift, DBT, and NoSQL databases to deliver optimized solutions

- Deploy and manage data infrastructure on cloud platforms (AWS Glue, Athena, S3, Redshift, Lambda, EMR)

- Establish CI/CD pipelines for data workflows using Git, Jenkins, and cloud-native deployment tools

- Lead architecture design discussions, propose technical solutions, and define development standards and best practices

- Create and enforce data engineering best practices including coding standards, testing frameworks, documentation, and deployment patterns

- Build reusable frameworks, templates, and libraries to accelerate team productivity

- Mentor data engineering teams on best practices for scalable data storage, processing, and data quality excellence

- Ensure strict security, compliance, and data privacy throughout all data solutions

- Collaborate with cross-functional teams including Data Scientists, Analytics Engineers, QA, and DevOps

- Deliver solutions in Agile environments with JIRA for project management

Preferred candidate profile :

- 12+ years building production-grade data engineering solutions

- Exceptional Team leader setting the stage for the other data engineers to consistently execute leveraging best practrices

- Strong expertise in Python and PySpark for distributed data processing

- Advanced SQL proficiency including query optimization, window functions, CTEs, and performance tuning

- Deep experience with batch and real-time/streaming data systems (Spark Streaming, Kafka, Kinesis)

- Hands-on experience with modern data platforms : Databricks, Snowflake, Redshift, BigQuery

- Expertise in data modeling techniques : dimensional modeling, star/snowflake schemas, data vault

- Strong knowledge of data warehousing and data lake architectures with hands-on implementation experience

- Proficiency with Airflow for workflow orchestration, DAG design, and operational monitoring

- Deep cloud platform experience (AWS, Azure, GCP) building scalable data solutions

- Experience with data transformation tools like DBT for analytics engineering

- Knowledge of NoSQL databases (DynamoDB, MongoDB, Cassandra) and when to use them

- Understanding of data quality frameworks, data validation, and data profiling techniques

- Experience with data lineage tools and metadata management (Apache Atlas, Collibra, DataHub)

- Proficiency with version control (Git, CodeCommit) and CI/CD pipelines (Jenkins, CodePipeline)

- Strong Unix/Linux and shell scripting skills for automation

- Data governance and compliance knowledge (GDPR, HIPAA, data privacy regulations)

- Performance optimization expertise including indexing, caching, and query tuning

- Experience establishing coding standards, testing strategies, and documentation practices

- Strong problem-solving skills with ability to diagnose issues and architect effective solutions

- Proven ability to mentor junior engineers, lead technical discussions, and drive engineering excellence

- Clear communicator who thrives in collaborative, Agile environments

Bonus Points :

- Life sciences or pharma domain knowledge

- Cloud certifications (AWS Data Analytics, Azure Data Engineer, GCP Data Engineer)

- Experience with streaming technologies (Kafka, Flink, Spark Structured Streaming)

- Knowledge of machine learning pipelines and MLOps practices

- Familiarity with data observability platforms (Monte Carlo, Great Expectations)

- Experience with containerization (Docker, Kubernetes) for data workloads

- Exposure to data mesh architecture patterns

- Knowledge of graph databases (Neo4j) or vector databases for AI applications

- Experience with reverse ETL and data activation platforms

- Terraform or Infrastructure-as-Code for data infrastructure

Education :

- Bachelor's or Master's degree in Computer Science or related field

Did you find something suspicious?

Similar jobs that you might be interested in

Posted by

Recruiter

HR at TRINITYPARTNERS INDIA LLP

Last Active: NA as recruiter has posted this job through third party tool.

Job Views:
56

Applications: 25

Recruiter Actions: 15

Posted in

Data Engineering

Functional Area

Data Engineering

Job Code

1631997

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers