HamburgerMenu
hirist

Job Description

Description :



Job Title : Senior Specialist Cloud SRE



Education : Bachelors Degree



Experience : 8+ years



Location : Mumbai



As a Senior SRE Engineer (Cloud SRE Specialist), you will be responsible for ensuring the reliability, scalability, performance, and cost optimization of cloud services across AWS, Azure, and multi-cloud environments.



You will act as the primary technical lead for assigned customers, manage incident escalations, drive automation-first practices, and mentor junior engineers.



You will also collaborate closely with development teams to embed resilience and observability into applications.



Key Responsibilities :



Customer Leadership & Collaboration :



- Serve as the primary technical point of contact for assigned customer accounts.



- Provide regular updates and lead initiatives to improve customer environments.



- Be highly familiar with assigned accounts to make tactical decisions without escalation.



- Collaborate with customer development teams to align infrastructure with application requirements.



Incident & Problem Management :



- Lead incident response and postmortems, ensuring corrective and preventive measures.



- Be the Tier 3 escalation point for offshore/onshore SRE teams.



- Perform Root Cause Analysis (RCA) and validate work quality of Tier-2 engineers.



- Develop and maintain incident response plans for security breaches and operational incidents.



Reliability Engineering :



- Define and maintain SLIs/SLOs, track error budgets, and monitor alignment.



- Participate in architecture discussions for high availability, disaster recovery, and scalability.



- Integrate resilience patterns such as circuit breakers, retries, and bulkheading.



- Use chaos engineering / fault injection practices where applicable.



Automation & Infrastructure as Code :



- Automate infrastructure and operations tasks using Terraform, CloudFormation, AWS CDK.



- Build and maintain CI/CD pipelines with canary deployments and blue/green strategies.



- Implement automation workflows with AWS Lambda, Step Functions, Azure Functions.



Monitoring & Observability :



- Implement observability systems : Prometheus, Grafana, OpenTelemetry, ELK, Jaeger.



- Configure proactive monitoring and alerts using AWS CloudWatch / Azure Monitor.



- Ensure visibility into metrics, traces, and logs for troubleshooting.



Cloud Infrastructure Management :



- Provision and manage VMs, storage, networking, VPNs, and ExpressRoute/Peering.



- Manage patching, backups, encryption, decryption, and image management.



- Optimize performance and cost via rightsizing, autoscaling, and reserved instances.



- Manage identity and access controls (AWS IAM, Azure AD, RBAC).



Security & Compliance :



- Implement and enforce security best practices across multi-cloud environments.



- Ensure compliance with GDPR, HIPAA, and industry regulations.



- Conduct regular audits and compliance reporting.



Mentoring & Knowledge Sharing :



- Coach and mentor Tier 2 and junior SREs.



- Conduct reliability-focused design reviews.



- Maintain up-to-date documentation, runbooks, and SOPs.


info-icon

Did you find something suspicious?