Posted on: 28/08/2025
Key Responsibilities :
- Technical Leadership : Lead, mentor, and guide a team of Ceph Storage Engineers in designing and managing scalable Ceph storage environments.
- Architecture & Design : Plan and architect complex Ceph infrastructures spanning multiple data centers, ensuring high availability, scalability, and optimal performance.
- Deployment & Operations : Manage the installation, configuration, tuning, upgrades, and lifecycle management of Ceph clusters across diverse environments.
- Performance & Capacity Management : Benchmark and optimize Ceph storage performance; conduct capacity planning to support growing workloads effectively.
- Automation & Scripting : Develop automation tools and custom scripts using Python to streamline Ceph operations and improve system reliability.
- Upstream Contribution : Engage actively with the Ceph open-source community by contributing code, fixing bugs, and suggesting enhancements.
- Monitoring & Troubleshooting : Build comprehensive monitoring systems and quickly diagnose and resolve critical Ceph cluster issues.
- Backup, Disaster Recovery & Security : Design and implement resilient backup and disaster recovery strategies while ensuring compliance with security best practices.
- Cross-Team Collaboration : Collaborate with infrastructure, development, and network teams to integrate Ceph storage within broader infrastructure stacks seamlessly.
Qualifications & Skills :
- 8-9 years of experience in storage engineering with a strong focus on Ceph.
- Proven expertise in architecting, deploying, and managing large-scale Ceph storage clusters.
- Strong programming skills in Python for automation and custom tooling.
- Deep understanding of distributed storage systems, multi-datacenter architectures, and high-availability design principles.
- Hands-on experience with performance tuning, capacity planning, and lifecycle management of Ceph clusters.
- Familiarity with monitoring tools and troubleshooting methodologies specific to Ceph.
- Experience contributing to open-source projects, ideally Ceph.
- Solid knowledge of backup, disaster recovery, and security best practices related to storage systems.
- Excellent leadership, communication, and team collaboration skills.
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
IT Infrastructure Services
Job Code
1536611
Interview Questions for you
View All