Description :
Job Title : Principle Software Platform Engineer (SW Platform)
Location : Bangalore, India (R&D Center)
Experience : 10 - 16 years
Employment Type : Full-time
About the Team :
The Software Platform team builds and operates a cloud-native infrastructure platform that automates and scales HPC and AI workloads across AWS environments.
We design and develop internal platform tools and services - including CLI tools, CI/CD automation, platform services, Kubernetes operators, and observability systems - to empower engineering teams to deploy, manage, and monitor complex distributed systems efficiently.
The team owns the entire infrastructure lifecycle, from design and development to deployment, monitoring, and optimization.
Role Overview :
We are looking for a senior, hands-on platform engineer with deep expertise in Rust, Go, or Python, who thrives in building scalable cloud-native systems. This role requires strong technical ownership, architectural thinking, and the ability to work on complex distributed platforms supporting HPC and AI workloads.
Key Responsibilities :
- Design and develop scalable cloud-native platform solutions to accelerate HPC and AI workloads.
- Build and maintain internal platform tools, services, and automation frameworks.
- Develop and operate Kubernetes-based infrastructure, including operators and controllers.
- Design, implement, and optimize CI/CD pipelines and automated deployment workflows.
- Implement GitOps and Infrastructure as Code (IaC) practices.
- Ensure platform scalability, reliability, security, and observability.
- Collaborate with cross-functional teams across geographies.
- Troubleshoot, debug, and optimize complex distributed systems.
Required Skills & Qualifications :
- 10 - 16 years of hands-on engineering experience.
- Strong proficiency in Rust, Go, or Python (hands-on coding is mandatory).
- Solid background in Linux environments.
- Extensive experience with cloud infrastructure and services (AWS preferred; Azure/GCP acceptable).
- Strong expertise in containers and orchestration : Docker & Kubernetes.
- Mandatory experience with Ansible and Kubernetes.
- Proven experience with :
1. CI/CD workflows (e.g., GitHub Actions)
2. Observability tools : OpenTelemetry, Prometheus, Grafana
3. GitOps and Infrastructure as Code (IaC)
- Experience designing and maintaining distributed systems at scale.
- Bachelors degree in Computer Science or an equivalent discipline.
Nice-to-Have Skills :
- Experience with microservices and event-driven architectures.
- Background in HPC or AI infrastructure technologies.
- Hands-on experience building Kubernetes operators/controllers.
Ways to Stand Out :
- Proven experience building Kubernetes operators or custom controllers.
- Strong exposure to HPC, ML, or AI platform infrastructure.
- Demonstrated ownership of large-scale, production-grade platforms.