HamburgerMenu
hirist

Job Description

Description :

We are looking for a Engineer- SRE with experience in observability and monitoring who can help us keep our large, distributed systems reliable, fast, and running smoothly.

In this role, you will help improve how we build and maintain reliable systems, strengthen system stability, create automation standards, and guide engineering teams on SRE best practices.

You will work closely with platform, backend, security, and product teams to ensure our services are stable, easy to monitor, and always available.

You will use tools like Prometheus, Grafana, Elastic, and New Relic to improve system visibility, manage incidents, and boost overall performance.

Key Responsibilities :

- Implement monitoring, logging, and tracing for applications, services, and infrastructure.

- Build dashboards and alerts to monitor system health and performance.

- Support production systems and participate in incident response activities.

- Troubleshoot operational issues using logs, metrics, and system diagnostics.

- Work with engineering teams to onboard services into monitoring platforms.

- Assist in defining alert thresholds and reducing unnecessary alert noise.

- Maintain monitoring configurations and ensure operational documentation is up to date.

- Support post-incident reviews and implement improvements in monitoring coverage.

- Automate routine operational tasks where possible.

Required Experience :

- 3 to 5 years of experience in infrastructure operations, monitoring, or Site Reliability Engineering.

- Experience working with Infrastructure as Code tools such as Terraform.

- Familiarity with cloud platforms such as GCP or Azure.

- Understanding of APIs, service monitoring, and system logs.

- Experience supporting production environments and incident response processes.

- Strong written and verbal communication skills with the ability to collaborate across teams.

Preferred Experience :

- Experience with observability tools such as Grafana, Prometheus, Splunk, or New Relic.

- Experience supporting distributed systems or microservices.

- Exposure to automation or scripting for operational tasks.

- Experience working in Media, SaaS, or streaming environments.


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in