Posted on: 02/09/2025
Job Description :
Key Responsibilities :
Technical Leadership :
- Lead and mentor a team of Ceph Storage Engineers.
- Provide architectural guidance and best practices for scalable Ceph implementations.
- Review code, conduct technical sessions, and contribute to team skill development.
Architecture & Design :
- Design highly available and scalable Ceph storage clusters across multi-datacenter environments.
- Plan and document architecture blueprints aligned with organizational goals and growth.
Deployment & Operations :
- Manage end-to-end Ceph cluster lifecycle, including installation, configuration, upgrades, and maintenance.
- Ensure high availability, data integrity, and fault tolerance in production environments.
Performance & Capacity Management :
- Perform benchmarking, tuning, and optimization of Ceph storage for various workloads.
- Conduct capacity planning and forecasting based on business growth and storage demands.
Automation & Tooling :
- Develop Python scripts and automation tools to streamline storage operations.
- Build internal tools to improve cluster management, deployment speed, and reliability.
Upstream Contribution :
- Actively participate in the Ceph open-source community.
- Contribute patches, bug fixes, and enhancements to upstream projects.
Monitoring & Troubleshooting :
- Implement and manage robust monitoring and alerting systems (e., Prometheus, Grafana).
- Troubleshoot complex performance and infrastructure issues in live environments.
Backup, Disaster Recovery & Security :
- Design and implement effective backup and disaster recovery (DR) strategies.
- Ensure compliance with storage security best practices and internal data policies.
Cross-Team Collaboration :
- Collaborate with infrastructure, development, and network teams to ensure seamless integration of Ceph within broader system architectures.
- Provide storage insights and recommendations during system design or scaling discussions.
Required Qualifications :
- 8- 9 years of experience in infrastructure engineering with a strong focus on Ceph storage.
- Proven expertise in designing, deploying, and operating Ceph clusters in production environments.
- Strong hands-on experience with Python scripting for automation and tooling.
- In-depth understanding of Linux systems, filesystems, and storage protocols.
- Experience in performance tuning, monitoring, and capacity planning.
- Excellent problem-solving skills and the ability to troubleshoot complex issues.
- Exposure to backup/DR planning and security compliance in storage environments.
Preferred Qualifications :
- Contributions to the Ceph open-source community (GitHub commits, bug reports, forum participation).
- Familiarity with containerized environments (e., Kubernetes, Rook) for Ceph.
- Experience with monitoring tools like Prometheus, Grafana, ELK Stack, etc.
- Familiarity with configuration management tools (e., Ansible, Terraform) for automation.
- Certifications in Linux, Red Hat, or relevant storage technologies are a plus
Did you find something suspicious?