HamburgerMenu
hirist

Site Reliability Engineer - Kubernetes

D2KSS
Any Location
5 - 8 Years

Posted on: 27/11/2025

Job Description

Description :


Key Responsibilities :


- Manage, deploy, and optimize applications using Kubernetes in production environments.


- Work extensively with AWS services including IAM, EC2, EKS, S3, CloudWatch, and related cloud infrastructure components.


- Develop automation scripts and tools using Shell or Python to improve reliability, reduce toil, and enable self-healing mechanisms.


- Troubleshoot complex issues related to applications, networking, system performance, and low-latency environments.


- Perform Linux system debugging, optimization, and performance tuning using advanced tools and techniques.


- Create robust monitoring and alerting frameworks for high-performance systems.


- Collaborate with cross-functional teams to ensure smooth deployment, scalability, and reliability of services.


- Implement and follow SRE principles including monitoring, alerting, incident management, error budgets, fault analysis, capacity planning, and automation.


- Participate in on-call rotations to ensure 24/7 availability and rapid response to incidents.


Required Skills & Qualifications :


- 5+ years of hands-on experience in SRE, DevOps, or Infrastructure Engineering roles.


- Strong experience managing large-scale infrastructure in medium to large networks.


- Deep understanding of Kubernetes, AWS cloud infrastructure, and Linux systems.


- Proficiency in scripting/programming using Shell or Python.


- Strong analytical, troubleshooting, and problem-solving abilities.


- Excellent collaboration and interpersonal skills.


- Ability to thrive in a fast-paced, evolving technology environment.


- Bachelor's degree in Computer Science, Engineering, or related field.


- Willingness to upskill and stay current with emerging technologies


info-icon

Did you find something suspicious?