Posted on: 22/12/2025
Description :
As an L3 specialist, you will lead complex troubleshooting efforts, architect high-availability clusters, and implement rigorous security and disaster recovery protocols. You will be instrumental in ensuring that the container platform remains resilient, scalable, and integrated seamlessly with modern CI/CD pipelines.
Responsibilities :
- Lead the end-to-end installation, configuration, and lifecycle management of production-grade OpenShift clusters across multi-node environments.
- Architect and manage persistent storage solutions using OpenShift Data Foundation (ODF) and Ceph, ensuring optimal performance for stateful applications.
- Implement and manage the Loki stack for centralized logging and Prometheus/Grafana for advanced cluster-wide observability and proactive alerting.
- Engineer high-availability (HA) architectures for the OpenShift control plane and worker nodes to eliminate single points of failure.
- Perform deep-dive L3 troubleshooting of complex platform issues, focusing on SDN (Software Defined Networking), OVN-Kubernetes, and container runtime errors.
- Execute cluster hardening and security remediation, including the implementation of RBAC, Network Policies, and secret management best practices.
- Automate routine administrative tasks, cluster provisioning, and configuration drift management using Ansible and advanced Shell scripting.
- Direct backup and disaster recovery operations (using tools like Velero) to ensure data integrity and rapid service restoration across Ceph storage volumes.
- Collaborate with development and DevOps teams to optimize application deployment strategies using GitOps and CI/CD pipeline integrations.
- Manage networking components within the cluster, including Ingress controllers, Egress IPs, and load balancer integrations.
- Conduct regular performance tuning and resource optimization to ensure the cluster effectively meets strict service level agreements (SLAs).
- Document specialized SOPs and architectural blueprints, providing technical mentorship to L1/L2 support tiers.
Technical Requirements :
- OpenShift Administration : 7+ years of dedicated experience in Red Hat OpenShift (OCP) administration, specifically with versions 4.x.
- Storage Expertise : Extensive hands-on experience with ODF (Ceph storage), including RDB, CephFS, and Object storage management.
- Networking & SDN : Strong understanding of container networking, OVN-Kubernetes, and SDN (Software Defined Networking) protocols.
- Automation : Proficiency in Ansible for configuration management and Shell/Python for infrastructure automation.
- Monitoring & Observability : Advanced experience with the Loki stack, Prometheus, and Grafana for monitoring distributed container systems.
- Linux Internals : Deep knowledge of Red Hat Enterprise Linux (RHEL) and CoreOS (RHCOS) system administration and kernel tuning.
- CI/CD & Git : Experience with Git and CI/CD pipelines to support automated application lifecycles within the cluster.
- Security : Solid grasp of container security frameworks, including SELinux, Seccomp profiles, and Image Security Scanning.
Preferred Skills :
- Red Hat Certified Specialist in OpenShift Administration or Red Hat Certified Architect (RHCA) is highly desirable.
- Experience with Service Mesh (Istio) for managing microservices communication and security.
- Familiarity with Velero or similar tools for OpenShift backup and migration.
- Understanding of hybrid cloud deployments (OCP on AWS/Azure/GCP or On-Premise Bare Metal).
- Knowledge of Tekton or ArgoCD for implementing modern GitOps workflows.
- Experience in managing large-scale, multi-tenant clusters with strict resource quotas and limit ranges.
- Strong communication skills to manage high-priority incident bridges and stakeholder technical sessions.
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
Systems Administration
Job Code
1594028
Interview Questions for you
View All