Posted on: 10/10/2025
About the job :
We are seeking a highly skilled and proactive Server Management Lead to oversee end-to-end operations of data center infrastructure. This role involves ensuring high availability, optimal performance, cost efficiency, and adherence to security standards across our server ecosystem. The candidate will also lead a team of six Level 1 and Level 2 engineers to achieve near-zero downtime while maintaining industry-leading operational excellence.
Key Responsibilities :
Infrastructure Management :
- Design, deploy, and maintain golden images for consistent and secure server provisioning.
- Ensure standardized builds and configurations across all environments.
- Oversee hardware and OS lifecycle management, including patching and upgrades.
Security & Compliance :
- Conduct regular Vulnerability Assessment and Penetration Testing (VAPT).
- Remediate identified risks in line with security best practices and compliance requirements.
- Enforce access control, audit readiness, and adherence to organizational security policies.
Performance & Capacity Planning :
- Develop and maintain a capacity planning framework to anticipate and scale resources proactively.
- Monitor system performance, troubleshoot bottlenecks, and optimize resource allocation.
- Partner with architecture teams to align capacity with business growth.
Monitoring & Uptime :
- Implement and fine-tune end-to-end monitoring tools (infrastructure, application, and network layers).
- Establish escalation procedures and SLAs to maintain 99.99% uptime.
- Lead root cause analysis (RCA) for incidents and drive permanent corrective actions.
Cost Optimization :
- Analyze server utilization trends to identify cost-saving opportunities (rightsizing, consolidation, cloud/hybrid strategies).
- Implement automation for provisioning, scaling, and decommissioning resources to reduce waste.
- Provide periodic reporting to leadership on cost-performance balance.
Leadership & Team Management :
- Manage and mentor a team of 6 Level 1 & Level 2 engineers, fostering technical growth and operational discipline.
- Define KPIs for performance, ticket resolution, and uptime accountability.
- Promote a culture of continuous improvement, automation, and service excellence.
Qualifications :
- Bachelor's degree in Computer Science, Information Technology, or related field.
- 8-12 years of experience in server management/data center operations, including at least 3 years in a leadership role.
- Strong expertise in virtualization, server operating systems (Linux/Windows), storage, and networking fundamentals.
- Hands-on experience with monitoring platforms (Site 24 - 7, Patch Manager etc.) and automation tools (Ansible, Puppet, or similar) is added advantage
- Proven track record of driving zero-downtime initiatives and cost optimization in enterprise environments.
Key Competencies :
- Technical Excellence - deep understanding of server operations and best practices.
- Leadership - ability to lead and inspire a team, with strong decision-making skills.
- Analytical Thinking - capacity planning, problem-solving, and cost analysis.
- Resilience & Accountability - ensuring uptime and compliance under pressure.
- Communication - ability to work cross-functionally and present technical insights to leadership.
Success Metrics :
- Consistent achievement of 99.99% uptime across server infrastructure.
- Successful closure of all VAPT findings within SLA.
- Demonstrated cost reduction in server operations through optimization initiatives.
- Improved incident resolution times and reduced recurring issues.
- High team engagement and skill growth within the engineering group.
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
Systems Administration
Job Code
1558120
Interview Questions for you
View All