We are seeking a Senior Site Reliability Engineer to lead reliability efforts across our application stack, focusing on high availability, performance, and scalability.

This role will own the health and uptime of our mission-critical application, Cloud infrastructure, database system, and monitoring infrastructure.

About Us :

At BQE, our mission is to transform the operational landscape of professional services firms, empowering them to achieve more and serve their customers better.

These firms play a crucial role in building infrastructure that significantly impacts global progress.

BQE CORE serves as the operational backbone for these firms, providing an all-in-one SaaS solution.

Our platform enables them to efficiently manage projects, improve budget tracking and profitability, and streamline processes through automation.

With a robust customer base, we are on a trajectory of continuous growth, constantly innovating to meet the evolving needs of our customers and the industries they influence.

Why Join Us :

- Work with a modern tech stack in a high-impact reliability role.

- Be a key part of our CloudOps and App Reliability strategy.

- A collaborative and supportive engineering culture.

Responsibilities :

- Ensure application uptime, performance, and scalability.

- Own incident management, including on-call rotations, root cause analysis, and incident reviews.

- Manage and monitor MS SQL Server clusters and high-availability configurations.

- Set up and improve monitoring, alerting, and observability using New Relic, Logz.io, CloudWatch, and other tools.

- Proactively identify system bottlenecks and improve system reliability and automation.

- Define and improve SLOs/SLAs across services.

- Drive disaster recovery testing and availability simulations.

- Collaborate with CloudOps and DevOps for infrastructure automation and enhancements.

- Work with Jira and JSM to manage operational tasks, incidents, and changes.

Qualifications & Experience :

- Bachelors degree in computer science, Engineering, or related field (or equivalent experience).

- 5-8 years of experience in Site Reliability Engineering, CloudOps, DevOps or related roles.

Must Have Skills :

- Certifications in AWS, Microsoft, Windows, SQL Server, or SRE disciplines.

- Exposure to New Relic APM, IaC automation is a plus.

- Experience working in a 24x7 on-call rotation.

- Strong knowledge of Windows OS eco-system, IIS, MS SQL Server administration, clustering, performance tuning, and failover.

- Deep experience with monitoring/logging tools like New Relic, Logz.io, AWS CloudWatch.

- Experience with AWS (EC2, ASG, CloudWatch, CloudTrail, VPC) and infrastructure management.

- Good understanding of networking, DNS, load balancing, and security principles.

- Proficient in scripting languages such as PowerShell, Python.

- Strong understanding of incident response, change management, postmortem culture.

- Experience using Jira and Jira Service Management for operational workflows.

- Ability to work independently and drive technical initiatives

Did you find something suspicious?

Similar jobs that you might be interested in

Posted by

Afeef Arif

Manager - Human Resources at BQE Software

Last Active: 23 Dec 2025

Job Views:
162

Applications: 74

Recruiter Actions: 0

Posted in

DevOps / SRE

Functional Area

Site Reliability Engineering

Job Code

1593997

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers