HamburgerMenu
hirist

Job Description

About the job :

We are seeking a highly skilled and proactive Server Management Lead to oversee end-to-end operations of data center infrastructure. This role involves ensuring high availability, optimal performance, cost efficiency, and adherence to security standards across our server ecosystem. The candidate will also lead a team of six Level 1 and Level 2 engineers to achieve near-zero downtime while maintaining industry-leading operational excellence.

Key Responsibilities :

Infrastructure Management :

- Design, deploy, and maintain golden images for consistent and secure server provisioning.

- Ensure standardized builds and configurations across all environments.

- Oversee hardware and OS lifecycle management, including patching and upgrades.

Security & Compliance :

- Conduct regular Vulnerability Assessment and Penetration Testing (VAPT).

- Remediate identified risks in line with security best practices and compliance requirements.

- Enforce access control, audit readiness, and adherence to organizational security policies.

Performance & Capacity Planning :

- Develop and maintain a capacity planning framework to anticipate and scale resources proactively.

- Monitor system performance, troubleshoot bottlenecks, and optimize resource allocation.

- Partner with architecture teams to align capacity with business growth.

Monitoring & Uptime :

- Implement and fine-tune end-to-end monitoring tools (infrastructure, application, and network layers).

- Establish escalation procedures and SLAs to maintain 99.99% uptime.

- Lead root cause analysis (RCA) for incidents and drive permanent corrective actions.

Cost Optimization :

- Analyze server utilization trends to identify cost-saving opportunities (rightsizing, consolidation, cloud/hybrid strategies).

- Implement automation for provisioning, scaling, and decommissioning resources to reduce waste.

- Provide periodic reporting to leadership on cost-performance balance.

Leadership & Team Management :

- Manage and mentor a team of 6 Level 1 & Level 2 engineers, fostering technical growth and operational discipline.

- Define KPIs for performance, ticket resolution, and uptime accountability.

- Promote a culture of continuous improvement, automation, and service excellence.

Qualifications :


- Bachelor's degree in Computer Science, Information Technology, or related field.

- 8-12 years of experience in server management/data center operations, including at least 3 years in a leadership role.

- Strong expertise in virtualization, server operating systems (Linux/Windows), storage, and networking fundamentals.

- Hands-on experience with monitoring platforms (Site 24 - 7, Patch Manager etc.) and automation tools (Ansible, Puppet, or similar) is added advantage

- Proven track record of driving zero-downtime initiatives and cost optimization in enterprise environments.

Key Competencies :


- Technical Excellence - deep understanding of server operations and best practices.

- Leadership - ability to lead and inspire a team, with strong decision-making skills.

- Analytical Thinking - capacity planning, problem-solving, and cost analysis.

- Resilience & Accountability - ensuring uptime and compliance under pressure.

- Communication - ability to work cross-functionally and present technical insights to leadership.

Success Metrics :


- Consistent achievement of 99.99% uptime across server infrastructure.

- Successful closure of all VAPT findings within SLA.

- Demonstrated cost reduction in server operations through optimization initiatives.

- Improved incident resolution times and reduced recurring issues.

- High team engagement and skill growth within the engineering group.

info-icon

Did you find something suspicious?