HamburgerMenu
hirist

MetLife - Site Reliability Engineer - ELK Stack

MetLife Global Operations Support Center Pvt. Ltd.
Multiple Locations
3 - 6 Years
star-icon
4white-divider1,642+ Reviews

Posted on: 26/08/2025

Job Description

Note : This job role is part of MetLifes Hack4Job India (a hiring hackathon).

Only shortlisted candidates will be invited.

Department : Global Technology.


Role Overview


MetLife is seeking an experienced Site Reliability Engineer (SRE) to ensure the availability, scalability, and performance of critical systems and services.


The role involves monitoring, automation, incident management, and collaboration with engineering teams to optimize system reliability and efficiency.


Key Responsibilities :


- System Reliability & Performance: Ensure system uptime, troubleshoot issues, and optimize performance.

- Service Design & Automation: Develop automation scripts and tools to streamline operations.

- Monitoring & Alerting: Implement observability solutions using ELK, Grafana, Splunk, and Azure Monitor.

- Incident Response & Management: Lead root cause analysis, post-mortems, and corrective actions.

- Collaboration: Work with engineering teams to align system performance with business goals.

- Documentation & Knowledge Sharing: Maintain accurate system documentation and promote best practices.


Qualifications & Skills :


- Experience: 3+ years as an SRE, supporting hybrid cloud platforms (On-Prem and Azure).

- Programming: Java, Python, Bash, PowerShell.

- Cloud & Containers: Azure services, Docker, Kubernetes, Terraform.

- Monitoring & Logging: ELK stack, Grafana, Splunk, Azure Application Insights.

- Database: Strong hands-on experience with SQL.

- Tools: Azure DevOps, Pipelines, Repos, ServiceNow.

- Soft Skills: Strong analytical, problem-solving, and communication skills.

- Language: Business proficiency in English; Japanese language is a plus.


info-icon

Did you find something suspicious?