About the Role :

We are seeking a highly skilled Senior Data Engineer with expertise in designing, building, and optimizing large-scale data pipelines and platforms.

The ideal candidate will have strong hands-on experience with big data technologies, AWS cloud services, and modern CI/CD automation frameworks.

You will play a pivotal role in architecting robust data solutions, ensuring scalability, performance, and reliability, while collaborating closely with cross-functional teams across engineering, product, and operations.

Key Responsibilities :

Data Platform Engineering :

- Design, develop, and enhance data ingestion, transformation, and orchestration pipelines using open-source frameworks, AWS cloud services, and GitLab automation.

- Implement best practices in distributed data processing using PySpark, Python, and SQL.

Collaboration & Solutioning :

- Partner with product managers, data scientists, and technology stakeholders to design and validate scalable data platform capabilities.

- Translate business requirements into technical specifications and implement data-driven solutions.

Optimization & Automation :

- Identify, design, and implement process improvements including automation of manual processes, pipeline optimization, and system scalability enhancements.

- Drive adoption of infrastructure-as-code and automated CI/CD pipelines for data workloads.

Monitoring & Reliability :

- Define, implement, and maintain robust monitoring, logging, and alerting mechanisms for data pipelines and services.

- Ensure data quality, availability, and reliability across the production environment.

Technical Enablement :

- Provide platform usage guidance, technical support, and best practices to teams consuming the data platform.

- Contribute to internal knowledge bases, playbooks, and engineering documentation.

Required Qualifications :

- Proven experience in building, maintaining, and optimizing large-scale data pipelines in distributed computing environments.

- Strong programming experience in Python and PySpark, with advanced working knowledge of SQL (4+ years).

- Expertise in working within Linux environments for data development and operations.

- Strong knowledge and experience with AWS services such as S3, EMR, Glue, Redshift, Lambda, and Step Functions.

- Hands-on experience with DevOps/CI/CD tools such as Git, Bitbucket, Jenkins, AWS CodeBuild, and CodePipeline.

- Familiarity with monitoring and alerting platforms (CloudWatch, Prometheus, Grafana, or equivalent).

- Knowledge of Palantir is a strong plus.

- Experience collaborating with cross-functional teams (engineering, product, operations) in a fast-paced environment.

Preferred Skills :

- Experience with containerized environments (Docker, Kubernetes).

- Exposure to data governance, lineage, and metadata management tools.

- Working knowledge of infrastructure-as-code tools (Terraform, CloudFormation).

- Familiarity with streaming technologies such as Kafka or Kinesis.