Job Title :

Data Lake Integration Engineer (Node.js/Python/AWS Data Lake)

Job Purpose :

The Data Lake Integration Engineer is responsible for designing, developing, and maintaining scalable data ingestion, transformation, and integration solutions, with a strong focus on cloud based data lake architectures in AWS.

This role emphasizes event driven and batch data pipelines, streaming ingestion, and analytical data access patterns over traditional API centric integrations. Primary development is done using Node.js, with additional work in Python and Java as needed.

The ideal candidate has hands on experience building and operating AWS data lakes using services such as Kinesis, S3, Athena, and related tooling, and understands medallion architecture (Bronze/Silver/Gold) concepts. They should demonstrate strong engineering judgment, system design thinking, and the ability to deliver reliable data solutions across multiple concurrent initiatives.

Duties and Responsibilities :

- Design, develop, and maintain data ingestion and integration pipelines for both streaming and batch workloads

- Build and support AWS data lake solutions leveraging services such as Kinesis, S3, Athena, Glue, and Lambda

- Implement and maintain medallion architecture patterns (Bronze, Silver, Gold layers) to support analytics, reporting, and downstream consumers

- Develop reusable, scalable Node.js based services to orchestrate data movement, validation, and transformation

- Integrate data from on prem and cloud based systems using a mix of event streams, file based ingestion, and APIs where appropriate

- Optimize data pipelines for performance, scalability, cost efficiency, and reliability

- Partner with analytics, platform, and application teams to enable self service data access and analytics use cases

- Manage development time effectively and communicate risks, tradeoffs, and technical complexity clearly

- Conduct code reviews and help enforce clean code, testing, and operational best practices

- Document data integration and data lake solutions from concept through implementation and production support

- Stay current with AWS platform updates, data engineering best practices, and emerging technologies

Qualifications Required :

- 10+ years of professional software development experience

- 3+ years of hands on experience developing with Node.js

- Experience with Python or Java strongly preferred

Strong experience building data lakes on AWS, including :

- Amazon S3 as a primary storage layer

- Streaming ingestion using Amazon Kinesis (Data Streams and/or Firehose)

- Query and analytics tooling such as Athena, Presto, Trino, or similar

- Solid understanding and practical use of medallion architecture and modern data modeling approaches

- 3+ years working with messaging or streaming technologies (Kafka, Kinesis, SQS, MQ, Confluent, etc.)

- 4+ years of experience with relational databases (preferably MySQL) and strong SQL skills

- Experience designing and operating systems in cloud and serverless environments

- Proven ability to troubleshoot and resolve production data and pipeline issues, including root cause analysis

- Experience working in Agile / Scrum development environments

- Strong documentation, communication, and collaboration skills

- Self driven, accountable, and able to deliver high quality work within agreed timelines

Preferred/Nice to Have :

- Experience with AWS Glue, data cataloging, and schema evolution

- Familiarity with data quality, observability, and lineage concepts

- Experience with REST based APIs for data access or orchestration (SOAP experience not required)

- Exposure to data governance, security, and compliance in cloud environments

- Domain experience in transportation and logistics

- Bachelors degree in Computer Science, Engineering, or a related field

- This role is data lake and pipeline centric, not API heavy

- SOAP is not required and REST APIs are used only where they support data ingestion or orchestration

- Success is measured by data reliability, scalability, and analytics readiness, not just service delivery