We are looking for a highly skilled GCP Site Reliability Engineer (SRE) to support cloud infrastructure for a leading E-Commerce company, ensuring high availability, scalability, and performance of critical systems.

This position requires deep expertise in Google Cloud Platform (GCP), SRE principles, and DevOps best practices. You will be responsible for designing and implementing infrastructure, improving observability, and maintaining SLAs for services running at scale.

Key Responsibilities :

- Manage and scale production systems hosted on Google Cloud Platform (GCP).

- Implement SRE best practices: monitoring, alerting, SLAs, SLOs, and error budgets.

- Automate operational tasks using Infrastructure as Code (IaC) tools like Terraform.

- Improve system reliability and reduce manual interventions through automation.

- Collaborate with development teams to ensure new services are production-ready.

- Incident response and post-mortem analysis to prevent recurring issues.

- Design and implement CI/CD pipelines for rapid and safe deployments.

- Manage GCP resources: IAM, VPC, Compute Engine, GKE, Cloud Functions, Pub/Sub, BigQuery, etc.

- Ensure security, compliance, and cost optimization on the cloud infrastructure.

Required Skills & Qualifications :

- 5+ years of experience in SRE, DevOps, or Cloud Infrastructure roles.

- Strong hands-on experience with Google Cloud Platform (GCP) services.

- Proficiency with Terraform or other IaC tools.

- Solid knowledge of Kubernetes (GKE), containerization, and microservices.

- Strong scripting skills in Python, Go, or Shell.

- Familiarity with incident response and post-mortem culture.

- Knowledge of networking, security, and cloud cost management.

Preferred Qualifications

- GCP certifications (e.g., Professional Cloud DevOps Engineer).

- Prior experience working with e-commerce or high-scale platforms.

- Familiarity with SRE tooling like Chaos Engineering, Service Mesh (Istio), etc.

Soft Skills :

- Strong communication and stakeholder management.

- Problem-solving mindset with a focus on reliability and automation.

- Ability to work independently in a distributed, outsourced team model.

Did you find something suspicious?

Similar jobs that you might be interested in

Posted by

Akansha Gupta

Talent Acquisition Manager at Prismberry Technologies

Last Active: 14 Dec 2025

Job Views:
21

Applications: 25

Recruiter Actions: 0

Posted in

DevOps / SRE

Functional Area

Site Reliability Engineering

Job Code

1551612

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers