HamburgerMenu
hirist

Site Reliability Engineer - Payments Infrastructure

Wits Innovation Lab
4 - 11 Years
Hyderabad

Posted on: 10/04/2026

Job Description

Description :


We are looking for a highly skilled site reliability engineer to manage and scale our on-premise payments infrastructure.


You will work on a hybrid environment spanning virtual machines and containerized workloads on bare metal, ensuring high availability, security, and performance for mission-critical systems.


Key Responsibilities :


- Operate and optimize virtualized environments (VMs) and containerized workloads (Docker on bare metal)


- Manage and scale middleware systems like :


1. Nginx (traffic routing, reverse proxy, load balancing)


2. Redis (caching, HA setup)


3. Kafka (streaming, partitioning, fault tolerance)


- Build and maintain CI/CD pipelines using Jenkins


- Manage infrastructure and application configurations using Git-based version control


- Ensure high availability, resilience, and performance tuning across systems


- Work on Linux system administration (RHEL/CentOS/Ubuntu)


- Implement and maintain automation frameworks using :


1. Ansible


2. Shell scripting


- Manage and troubleshoot networking components :


1. TCP/IP, DNS, Load balancing


2. Firewalls, WAF policies


3. Akamai


- Handle security and compliance requirements


- Maintain accurate inventory and asset management systems


- Participate in incident response, RCA, and system reliability improvements


- Collaborate with application, security, and DevOps teams


Required Skills & Qualifications :


Core Infrastructure :


- Strong hands-on experience with Linux system administration


- Experience managing on-prem data center environments


- Solid understanding of:


- Virtualization (VMware / KVM or similar)


- Bare metal provisioning


Containers & Middleware :


- Experience running Docker in production (non-Kubernetes setups preferred)


- Strong operational knowledge of :


1. Nginx


2. Redis


3. Kafka


4. RDBMS


5. Java


Observability, Alerting & Reliability :


- Design and manage observability platforms :


1. Elastic Stack (ELK)


2. Grafana / Prometheus stack


- Build and maintain :


1. Metrics, logs, and tracing pipelines


2. Dashboards for system health and business KPIs


- Develop intelligent alerting strategies :


1. Reduce noise (alert fatigue)


2. Improve signal quality


- Build correlation mechanisms / alert aggregation systems to :


1. Reduce MTTD (Mean Time to Detect)


2. Reduce MTTR (Mean Time to Recover)


- Drive proactive monitoring and anomaly detection


- Lead incident response, debugging, and RCA with data-driven insights


CI/CD & Version Control :


- Hands-on experience with :


1. Git (branching strategies, code reviews, infra-as-code workflows)


2. Jenkins (pipeline creation, build automation, deployment orchestration)


Networking & Security :


- Good understanding of :


1. Networking fundamentals (L3/L4 concepts)


2. Firewalls and WAF (rule tuning, debugging)


3. Experience handling secure production environments


Automation :


- Hands-on experience with :


1. Ansible


2. Shell scripting (bash)


Operations :


- Experience with : Monitoring, alerting, and logging systems


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in