Posted on: 28/08/2025
Job Summary :
We are seeking a skilled and proactive Site Reliability Engineer (SRE) with a strong DevOps mindset and hands-on experience in application troubleshooting. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our applications and infrastructure. This role requires a blend of software engineering, system administration, and operational expertise, with a focus on automating processes and proactively resolving issues.
Key Responsibilities :
Site Reliability & Automation :
- Design and implement tools to automate infrastructure provisioning, application deployment, and operational tasks.
- Build and manage CI/CD pipelines using Jenkins to ensure seamless and efficient software delivery.
- Utilize a strong understanding of Linux to maintain and troubleshoot server environments, including certificate renewals.
Monitoring & Troubleshooting :
- Implement and manage monitoring solutions using tools like Splunk or Dynatrace to create dashboards, set up alerting, and execute log queries for proactive issue detection.
- Perform application troubleshooting, debugging, and root cause analysis to resolve complex incidents promptly.
- Leverage SQL (DML & SELECT queries) to analyze application data for performance and troubleshooting insights.
Process & Collaboration :
- Apply ITIL/ITSM principles for effective incident, problem, and change management.
- Collaborate closely with development, quality assurance, and product teams to improve system reliability.
- Manage and track code changes using Git or Bitbucket.
Required Skills :
Core Technical Skills :
- 5-8 years of experience in an SRE, DevOps, or similar role.
- Strong proficiency in at least one scripting language : Shell, Groovy, or YAML.
- Expertise in monitoring tools like Splunk or Dynatrace for alerting, dashboarding, and log analysis.
- Hands-on experience with CI/CD tools, specifically Jenkins.
System & Infrastructure :
- Strong understanding of Linux system administration.
- Basic exposure to cloud environments, with AWS being preferred.
Process & Data :
- Basic knowledge of ITIL/ITSM concepts (Incident, Problem, Change Management).
- Proficiency in SQL (DML and SELECT queries).
Preferred Skills :
- Experience with configuration management tools like Ansible or Chef.
- Hands-on experience with Docker and Kubernetes for container orchestration.
- Knowledge of other monitoring tools such as Prometheus or Grafana.
- Relevant certifications in Linux or cloud platforms.
- Strong problem-solving and analytical skills, with a proactive attitude toward identifying and resolving issues.
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
Site Reliability Engineering
Job Code
1537162
Interview Questions for you
View All