HamburgerMenu
hirist

Senior Site Reliability Engineer - CloudOps/DevOps

BQE Software
Multiple Locations
5 - 8 Years
star-icon
4.3white-divider46+ Reviews

Posted on: 22/12/2025

Job Description

Description :

We are seeking a Senior Site Reliability Engineer to lead reliability efforts across our application stack, focusing on high availability, performance, and scalability.

This role will own the health and uptime of our mission-critical application, Cloud infrastructure, database system, and monitoring infrastructure.

About Us :


At BQE, our mission is to transform the operational landscape of professional services firms, empowering them to achieve more and serve their customers better.

These firms play a crucial role in building infrastructure that significantly impacts global progress.

BQE CORE serves as the operational backbone for these firms, providing an all-in-one SaaS solution.

Our platform enables them to efficiently manage projects, improve budget tracking and profitability, and streamline processes through automation.

With a robust customer base, we are on a trajectory of continuous growth, constantly innovating to meet the evolving needs of our customers and the industries they influence.

Why Join Us :


- Work with a modern tech stack in a high-impact reliability role.

- Be a key part of our CloudOps and App Reliability strategy.

- A collaborative and supportive engineering culture.

Responsibilities :


- Ensure application uptime, performance, and scalability.

- Own incident management, including on-call rotations, root cause analysis, and incident reviews.

- Manage and monitor MS SQL Server clusters and high-availability configurations.

- Set up and improve monitoring, alerting, and observability using New Relic, Logz.io, CloudWatch, and other tools.

- Proactively identify system bottlenecks and improve system reliability and automation.

- Define and improve SLOs/SLAs across services.

- Drive disaster recovery testing and availability simulations.

- Collaborate with CloudOps and DevOps for infrastructure automation and enhancements.

- Work with Jira and JSM to manage operational tasks, incidents, and changes.

Qualifications & Experience :


- Bachelors degree in computer science, Engineering, or related field (or equivalent experience).

- 5-8 years of experience in Site Reliability Engineering, CloudOps, DevOps or related roles.

Must Have Skills :


- Certifications in AWS, Microsoft, Windows, SQL Server, or SRE disciplines.

- Exposure to New Relic APM, IaC automation is a plus.

- Experience working in a 24x7 on-call rotation.

- Strong knowledge of Windows OS eco-system, IIS, MS SQL Server administration, clustering, performance tuning, and failover.

- Deep experience with monitoring/logging tools like New Relic, Logz.io, AWS CloudWatch.

- Experience with AWS (EC2, ASG, CloudWatch, CloudTrail, VPC) and infrastructure management.

- Good understanding of networking, DNS, load balancing, and security principles.

- Proficient in scripting languages such as PowerShell, Python.

- Strong understanding of incident response, change management, postmortem culture.

- Experience using Jira and Jira Service Management for operational workflows.

- Ability to work independently and drive technical initiatives

info-icon

Did you find something suspicious?