HamburgerMenu
hirist

Trinity - Data Engineer - PySpark/Python/SQL

TRINITYPARTNERS INDIA LLP
12 - 20 Years
Bangalore

Posted on: 28/04/2026

Job Description

Description :


- Design and build scalable data pipelines using PySpark, Python, and SQL for batch and real-time processing


- Architect modern data platforms including Data Warehouses, Data Lakes, and Lakehouse configurations on AWS, Azure, or GCP


- Develop and optimize ETL/ELT workflows with performance tuning, partitioning strategies, and data quality frameworks


- Orchestrate complex data workflows using Airflow DAGs, managing dependencies and monitoring at scale


- Implement data fabric architectures with robust data lineage, cataloging, and governance


- Build data quality frameworks with automated validation, profiling, and anomaly detection


- Work with platforms like Databricks, Snowflake, Redshift, DBT, and NoSQL databases to deliver optimized solutions


- Deploy and manage data infrastructure on cloud platforms (AWS Glue, Athena, S3, Redshift, Lambda, EMR)


- Establish CI/CD pipelines for data workflows using Git, Jenkins, and cloud-native deployment tools


- Lead architecture design discussions, propose technical solutions, and define development standards and best practices


- Create and enforce data engineering best practices including coding standards, testing frameworks, documentation, and deployment patterns


- Build reusable frameworks, templates, and libraries to accelerate team productivity


- Mentor data engineering teams on best practices for scalable data storage, processing, and data quality excellence


- Ensure strict security, compliance, and data privacy throughout all data solutions


- Collaborate with cross-functional teams including Data Scientists, Analytics Engineers, QA, and DevOps


- Deliver solutions in Agile environments with JIRA for project management


Preferred candidate profile :


- 12+ years building production-grade data engineering solutions


- Exceptional Team leader setting the stage for the other data engineers to consistently execute leveraging best practrices


- Strong expertise in Python and PySpark for distributed data processing


- Advanced SQL proficiency including query optimization, window functions, CTEs, and performance tuning


- Deep experience with batch and real-time/streaming data systems (Spark Streaming, Kafka, Kinesis)


- Hands-on experience with modern data platforms : Databricks, Snowflake, Redshift, BigQuery


- Expertise in data modeling techniques : dimensional modeling, star/snowflake schemas, data vault


- Strong knowledge of data warehousing and data lake architectures with hands-on implementation experience


- Proficiency with Airflow for workflow orchestration, DAG design, and operational monitoring


- Deep cloud platform experience (AWS, Azure, GCP) building scalable data solutions


- Experience with data transformation tools like DBT for analytics engineering


- Knowledge of NoSQL databases (DynamoDB, MongoDB, Cassandra) and when to use them


- Understanding of data quality frameworks, data validation, and data profiling techniques


- Experience with data lineage tools and metadata management (Apache Atlas, Collibra, DataHub)


- Proficiency with version control (Git, CodeCommit) and CI/CD pipelines (Jenkins, CodePipeline)


- Strong Unix/Linux and shell scripting skills for automation


- Data governance and compliance knowledge (GDPR, HIPAA, data privacy regulations)


- Performance optimization expertise including indexing, caching, and query tuning


- Experience establishing coding standards, testing strategies, and documentation practices


- Strong problem-solving skills with ability to diagnose issues and architect effective solutions


- Proven ability to mentor junior engineers, lead technical discussions, and drive engineering excellence


- Clear communicator who thrives in collaborative, Agile environments


Bonus Points :


- Life sciences or pharma domain knowledge


- Cloud certifications (AWS Data Analytics, Azure Data Engineer, GCP Data Engineer)


- Experience with streaming technologies (Kafka, Flink, Spark Structured Streaming)


- Knowledge of machine learning pipelines and MLOps practices


- Familiarity with data observability platforms (Monte Carlo, Great Expectations)


- Experience with containerization (Docker, Kubernetes) for data workloads


- Exposure to data mesh architecture patterns


- Knowledge of graph databases (Neo4j) or vector databases for AI applications


- Experience with reverse ETL and data activation platforms


- Terraform or Infrastructure-as-Code for data infrastructure


Education :


- Bachelor's or Master's degree in Computer Science or related field

info-icon

Did you find something suspicious?

Similar jobs that you might be interested in