Posted on: 11/09/2025
The core responsibilities for the job include the following :
Data Pipeline Development :
- Design, develop, and maintain data pipelines to ingest, process, and transform data from various sources into usable formats.
- Implement data integration solutions that connect disparate data systems, including databases, APIs, and third-party data sources.
Data Storage and Warehousing :
- Create and manage data storage solutions, such as data lakes, data warehouses, and NoSQL databases.
- Optimize data storage for performance, scalability, and cost-efficiency.
Data Quality and Governance :
- Establish data quality standards and implement data validation and cleansing processes.
- Collaborate with data analysts and data scientists to ensure data consistency and accuracy.
ETL (Extract, Transform, Load) :
- Develop ETL processes to transform raw data into a structured and usable format.
- Monitor and troubleshoot ETL jobs to ensure data flows smoothly.
Data Security and Compliance :
- Implement data security measures and access controls to protect sensitive data.
- Ensure compliance with data privacy regulations and industry standards (e. g., GDPR, HIPAA).
Performance Tuning :
- Optimize data pipelines and queries for improved performance and efficiency.
- Identify and resolve bottlenecks in data processing.
Data Documentation :
- Maintain comprehensive documentation for data pipelines, schemas, and data dictionaries.
- Create and update data lineage and metadata documentation.
Scalability and Reliability :
- Design data infrastructure to scale with growing data volumes and business requirements.
- Implement data recovery and backup strategies to ensure data availability and resilience.
Collaboration :
- Collaborate with cross-functional teams, including data scientists, analysts, and business stakeholders, to understand data requirements and deliver data solutions.
Required Technical Skills :
- Programming Languages : Proficiency in Python is essential. Experience with SQL for database querying and manipulation is also required. Knowledge of Java or Scala is a plus.
- Big Data Technologies : Hands-on experience with big data frameworks such as Apache Spark, Hadoop, or Flink.
- Cloud Platforms : Strong experience with at least one major cloud provider's data services (AWS, GCP, or Azure). This includes services like AWS S3, Redshift, Glue; GCP BigQuery, Dataflow, Cloud Storage; or Azure Synapse, Data Factory, Blob Storage.
- Databases and Warehousing : Deep expertise in relational databases (PostgreSQL, MySQL) and NoSQL databases (MongoDB, Cassandra). Experience with modern data warehouses (Snowflake, Redshift, BigQuery) is highly preferred.
- ETL/ELT Tools : Experience with ETL/ELT tools like Apache Airflow for workflow orchestration, or other similar tools.
- Data Governance and Security : Understanding of data governance principles and experience with implementing security measures like encryption, access controls, and data masking.
- Version Control : Proficiency with Git for version control.
- Containerization : Experience with Docker and Kubernetes is a plus.
Did you find something suspicious?
Posted By
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1544621
Interview Questions for you
View All