HamburgerMenu
hirist

Job Description

Responsibilities :


- Design, develop, and maintain scalable data pipelines using Apache Spark and Databricks.

- Implement data warehouse solutions on AWS, leveraging services like Redshift, Athena, and Glue.

- Lead the development of data models and schemas for both SQL and NoSQL databases.

- Implement and manage data governance and quality processes.

- Collaborate with data scientists and analysts to support their data needs.

- Implement CI/CD pipelines for data and ML workflows.

- Mentor and guide junior data engineers.


Qualifications :


- 8+ years of hands-on experience in data engineering, with at least 4 years in a lead or

architect-level role.

- Deep expertise in Apache Spark, with proven experience developing large-scale distributed

data processing pipelines.

- Strong experience with Databricks platform and its internal ecosystem (e.g., Delta Lake, Unity

Catalog, MLflow, Job orchestration, Workspaces, Clusters, Lakehouse architecture).

- Extensive experience with workflow orchestration using Apache Airflow.

- Proficiency in both SQL and NoSQL databases (e.g., Postgres, DynamoDB, MongoDB,

Cassandra) with a deep understanding of schema design, query tuning, and data partitioning.

- Proven background in building data warehouse/data mart architectures using AWS services like Redshift, Athena, Glue, Lambda, DMS, and S3.


- Familiarity with MLflow, Feature Store, and Databricks-native ML tooling is a plus.


- Strong grasp of CI/CD for data and ML pipelines, automated testing, and infrastructure-as-

code (Terraform, CDK, etc.).


info-icon

Did you find something suspicious?