Posted on: 22/12/2025
Description :
We are seeking a Senior Site Reliability Engineer to lead reliability efforts across our application stack, focusing on high availability, performance, and scalability.
This role will own the health and uptime of our mission-critical application, Cloud infrastructure, database system, and monitoring infrastructure.
About Us :
At BQE, our mission is to transform the operational landscape of professional services firms, empowering them to achieve more and serve their customers better.
These firms play a crucial role in building infrastructure that significantly impacts global progress.
BQE CORE serves as the operational backbone for these firms, providing an all-in-one SaaS solution.
Our platform enables them to efficiently manage projects, improve budget tracking and profitability, and streamline processes through automation.
With a robust customer base, we are on a trajectory of continuous growth, constantly innovating to meet the evolving needs of our customers and the industries they influence.
Why Join Us :
- Work with a modern tech stack in a high-impact reliability role.
- Be a key part of our CloudOps and App Reliability strategy.
- A collaborative and supportive engineering culture.
Responsibilities :
- Ensure application uptime, performance, and scalability.
- Own incident management, including on-call rotations, root cause analysis, and incident reviews.
- Manage and monitor MS SQL Server clusters and high-availability configurations.
- Set up and improve monitoring, alerting, and observability using New Relic, Logz.io, CloudWatch, and other tools.
- Proactively identify system bottlenecks and improve system reliability and automation.
- Define and improve SLOs/SLAs across services.
- Drive disaster recovery testing and availability simulations.
- Collaborate with CloudOps and DevOps for infrastructure automation and enhancements.
- Work with Jira and JSM to manage operational tasks, incidents, and changes.
Qualifications & Experience :
- Bachelors degree in computer science, Engineering, or related field (or equivalent experience).
- 5-8 years of experience in Site Reliability Engineering, CloudOps, DevOps or related roles.
Must Have Skills :
- Certifications in AWS, Microsoft, Windows, SQL Server, or SRE disciplines.
- Exposure to New Relic APM, IaC automation is a plus.
- Experience working in a 24x7 on-call rotation.
- Strong knowledge of Windows OS eco-system, IIS, MS SQL Server administration, clustering, performance tuning, and failover.
- Deep experience with monitoring/logging tools like New Relic, Logz.io, AWS CloudWatch.
- Experience with AWS (EC2, ASG, CloudWatch, CloudTrail, VPC) and infrastructure management.
- Good understanding of networking, DNS, load balancing, and security principles.
- Proficient in scripting languages such as PowerShell, Python.
- Strong understanding of incident response, change management, postmortem culture.
- Experience using Jira and Jira Service Management for operational workflows.
- Ability to work independently and drive technical initiatives
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
Site Reliability Engineering
Job Code
1593997
Interview Questions for you
View All