HamburgerMenu
hirist

SRE DevOps Lead Engineer - AI SaaS Fintech Product Domain

TwinPacs Sdn Bhd
Hyderabad
8 - 15 Years

Posted on: 22/08/2025

Job Description

We have an exciting role as below in Hyderabad for an AI SaaS Fintech Product Firm.

SRE DevOps Lead Engineer (SaaS) || 8-12 Y || Hyderabad (Hybrid) || Quick Starter ||

Key Responsibilities :

- Architect, design, and deploy end-to-end infrastructure solutions for a multi-tenant microservices-based SaaS application with a focus on AI/ML model integration.

- Ensure system reliability, scalability, performance, and security, specifically enhancing AI/ML processing pipelines and workflows.

- Utilize Terraform scripting for on-demand environment provisioning within the AWS cloud, optimized for AI/ML workloads.


- Implement and refine monitoring and alerting systems across application, network, and OS layers to support AI model operations and data processing.

- Diagnose, support, and resolve production issues and alerts, participating in a 24/7 on-call rotation to maintain seamless AI/ML service operations.

Scope Of Work :

- Actively participate in the Scrum team, delivering test automation for sprint features and ensuring high-quality product increments by certifying new and regression features using automated test suites

- Integrate automated tests into the CI/CD pipeline and schedule them to run periodically in product development environments

- Identify defects, collaborate with development engineers to resolve them, and verify the fixes

- Maintain continuous availability in alignment with startup culture, staying informed and up to date with communications across various channels and email threads

- Focus on the primary goal of minimizing customer-reported bugs to near zero.

Required Qualification :

- 8+ years of experience in Site Reliability Engineering (SRE) and DevOps roles with a track record of managing large-scale enterprise SaaS services in production, including 1+ year in AI/ML infrastructure


- Demonstrated expertise with AWS public cloud technologies, including extensive experience in deploying and managing large-scale container clusters using AWS, EKS.

- Skilled in Infrastructure as Code (IaC) using Terraform, and container technologies such as Docker and Kubernetes.

- Proficient in scripting and programming for automation (Python, Bash, etc.), with strong Linux OS and networking fundamentals relevant to AI/ML workloads.

- Experience in establishing monitoring systems to ensure high availability, performance, and security integrity, using tools like ELK Stack, CloudWatch, and others tailored for AI/ML monitoring.

- Hands-on experience managing microservices architecture SaaS products, enabling RESTful web services, SSO integration (Okta, Auth0), and utilizing cloud databases like EC2-RDS, MySQL, and Elasticsearch, especially in AI/ML deployments.

- Proficient in backup and disaster recovery strategies specific to AI/ML data resources like RDS and Elasticsearch.

- AWS Certified Solutions Architect is strongly preferred.

- Self-driven, proactive, and adaptable to thrive in an early-stage startup environment, with a keen interest in integrating AI/ML technologies into modern SaaS solutions.

- Strictly, prefer applicants with stable career (consistent employment) within 0-30 days NP only!


The job is for:

Women candidates preferred
Differently-abled candidates preferred
Ex-defence personnel preferred
For women joining back the workforce
info-icon

Did you find something suspicious?