Posted on: 27/08/2025
Job Title : SRE Lead Engineer.
Location : Hyderabad, India.
We are seeking a DevOps / SRE Lead Engineer to architect and scale our client's multi-tenant SaaS platform with AI/ML at the core.
Our client, a fast-growing AI-powered SaaS company in the FinTech space, is looking for a Site Reliability Engineering (SRE) Lead Engineer to join their dynamic team.
This is an opportunity to design and operate large-scale SaaS systems that integrate cutting-edge AI/ML capabilities.
About the Role :
As the SRE Lead Engineer, you will be responsible for architecting, building, and maintaining infrastructure that powers a multi-tenant SaaS platform.
Youll drive reliability, scalability, and security, while supporting AI/ML pipelines in production.
This is a hands-on role with significant ownership, requiring both technical depth and leadership in site reliability practices.
Key Responsibilities :
- Architect, design, and deploy end-to-end infrastructure for large-scale, microservices-based SaaS platforms.
- Ensure system reliability, scalability, and security for AI/ML model integrations and data pipelines.
- Automate environment provisioning and management using Terraform in AWS (EKS-focused).
- Implement full-stack observability across applications, networks, and operating systems.
- Lead incident management and participate in 24/7 on-call rotation.
- Optimize SaaS reliability while enabling REST APIs, SSO integrations (Okta/Auth0), and cloud data services (RDS/MySQL, Elasticsearch).
- Define and maintain backup and disaster recovery for critical workloads.
Required Skills & Experience :
- 8+ years in SRE/DevOps roles, managing enterprise SaaS applications in production.
- Minimum 1 year experience with AI/ML infrastructure or model-serving environments.
- Strong expertise in AWS cloud, particularly EKS, container orchestration, and Kubernetes.
- Hands-on experience with Infrastructure as Code (Terraform), Docker, and scripting (Python, Bash).
- Solid Linux OS and networking fundamentals.
- Experience in monitoring and observability with ELK, CloudWatch, or similar tools.
- Strong track record with microservices, REST APIs, SSO, and cloud databases.
Nice-to-Have Skills :
- Experience with MLOps and AI/ML pipeline observability.
- Cost optimization and security hardening in multi-tenant SaaS.
- Prior exposure to FinTech or enterprise finance solutions.
Qualifications :
- Bachelors degree in Computer Science, Engineering, or related discipline.
- AWS Certified Solutions Architect (strongly preferred).
- Experience in early-stage or high-growth startups is an advantage.
Why Join?
- Be at the forefront of AI/ML-powered SaaS innovation in FinTech.
- Work with a high-energy, entrepreneurial team building next-gen infrastructure.
- Take ownership of mission-critical reliability challenges.
- Grow your career in an environment that values impact, adaptability, and innovation.
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1536810
Interview Questions for you
View All