HamburgerMenu
hirist

Job Description

The core responsibilities for the job include the following :


Data Pipeline Development :


- Design, develop, and maintain data pipelines to ingest, process, and transform data from various sources into usable formats.


- Implement data integration solutions that connect disparate data systems, including databases, APIs, and third-party data sources.


Data Storage and Warehousing :


- Create and manage data storage solutions, such as data lakes, data warehouses, and NoSQL databases.


- Optimize data storage for performance, scalability, and cost-efficiency.


Data Quality and Governance :


- Establish data quality standards and implement data validation and cleansing processes.


- Collaborate with data analysts and data scientists to ensure data consistency and accuracy.


ETL (Extract, Transform, Load) :


- Develop ETL processes to transform raw data into a structured and usable format.


- Monitor and troubleshoot ETL jobs to ensure data flows smoothly.


Data Security and Compliance :


- Implement data security measures and access controls to protect sensitive data.


- Ensure compliance with data privacy regulations and industry standards (e. g., GDPR, HIPAA).


Performance Tuning :


- Optimize data pipelines and queries for improved performance and efficiency.


- Identify and resolve bottlenecks in data processing.


Data Documentation :


- Maintain comprehensive documentation for data pipelines, schemas, and data dictionaries.


- Create and update data lineage and metadata documentation.


Scalability and Reliability :


- Design data infrastructure to scale with growing data volumes and business requirements.


- Implement data recovery and backup strategies to ensure data availability and resilience.


Collaboration :


- Collaborate with cross-functional teams, including data scientists, analysts, and business stakeholders, to understand data requirements and deliver data solutions.


Required Technical Skills :


- Programming Languages : Proficiency in Python is essential. Experience with SQL for database querying and manipulation is also required. Knowledge of Java or Scala is a plus.


- Big Data Technologies : Hands-on experience with big data frameworks such as Apache Spark, Hadoop, or Flink.


- Cloud Platforms : Strong experience with at least one major cloud provider's data services (AWS, GCP, or Azure). This includes services like AWS S3, Redshift, Glue; GCP BigQuery, Dataflow, Cloud Storage; or Azure Synapse, Data Factory, Blob Storage.


- Databases and Warehousing : Deep expertise in relational databases (PostgreSQL, MySQL) and NoSQL databases (MongoDB, Cassandra). Experience with modern data warehouses (Snowflake, Redshift, BigQuery) is highly preferred.


- ETL/ELT Tools : Experience with ETL/ELT tools like Apache Airflow for workflow orchestration, or other similar tools.


- Data Governance and Security : Understanding of data governance principles and experience with implementing security measures like encryption, access controls, and data masking.


- Version Control : Proficiency with Git for version control.


- Containerization : Experience with Docker and Kubernetes is a plus.


info-icon

Did you find something suspicious?