Posted on: 26/03/2026
Description :
About the Role :
As a DevOps Engineer (Data / ML Platform) at BalanceHero, you will design, build, and operate shared infrastructure and platform environments for Data Engineering and Machine Learning teams.
You will be responsible for the end-to-end Kubernetes-based Data / ML platform infrastructure, leveraging your experience in AWS infrastructure design and operations to continuously evolve the platform.
With consideration for security requirements and operational stability in a fintech environment, you will enable data and ML services to run in a stable, scalable, and reliable manner.
In the AI agent era, DevOps is no longer just about operating infrastructure.
This role requires an engineer who understands DevOps as a philosophy and process and can define and continuously improve the DevOps processes and platform operation model aligned with the companys technical direction.
About the Responsibilities :
- Design and operate infrastructure in AWS environments (IAM, networking, security, cost management, etc.)
- Configure infrastructure security including access control, network isolation, and data protection
- Build and operate production-grade Kubernetes (EKS) infrastructure for Data / ML platforms
- Design scalable and highly available infrastructure for large-scale data and ML workloads
- Design and build CI/CD pipeline architectures (infra / platform focused)
- Standardize and automate infrastructure using Infrastructure as Code (IaC)
- Build and improve platform operations based on GitOps workflows
- Design and operate observability systems including monitoring, logging, and alerting
- Handle incidents, improve performance, automate operations, and optimize costs
- Collaborate closely with Data Engineers and ML Engineers to continuously improve the platform
- Improve operational efficiency through AI/LLM-assisted automation
Requirements :
- 6+ years of experience as a DevOps, Cloud, or Platform Engineer
- Hands-on experience designing and operating AWS-based infrastructure
- Experience building and operating production-grade infrastructure on Kubernetes
- Experience managing and automating infrastructure using Infrastructure as Code (IaC) (e.g., Terraform)
- Experience designing and building CI/CD pipelines
- Strong Linux system operation and troubleshooting skills
- Strong collaboration and communication skills
- A proactive mindset toward automating repetitive operational tasks using AI-based tools
Preferred Qualification :
- Experience operating and improving data applications using Kafka, Kinesis, and Flink, including both AWS-managed services and on-premises environments.
- Experience operating infrastructure in fintech or financial services
- Experience operating large-scale or high-traffic production systems
- Understanding of security requirements in fintech service environments
- Interest or experience collaborating on Data / ML platform environments
- Interest in cost-efficient infrastructure operations
- Ability to communicate effectively in English
- Willingness to travel internationally
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
ML / DL / AI Research
Job Code
1623800