HamburgerMenu
hirist

Job Description

Job Summary :


We are seeking a skilled and proactive Site Reliability Engineer (SRE) with a strong DevOps mindset and hands-on experience in application troubleshooting. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our applications and infrastructure. This role requires a blend of software engineering, system administration, and operational expertise, with a focus on automating processes and proactively resolving issues.


Key Responsibilities :


Site Reliability & Automation :


- Design and implement tools to automate infrastructure provisioning, application deployment, and operational tasks.


- Build and manage CI/CD pipelines using Jenkins to ensure seamless and efficient software delivery.


- Utilize a strong understanding of Linux to maintain and troubleshoot server environments, including certificate renewals.


Monitoring & Troubleshooting :


- Implement and manage monitoring solutions using tools like Splunk or Dynatrace to create dashboards, set up alerting, and execute log queries for proactive issue detection.


- Perform application troubleshooting, debugging, and root cause analysis to resolve complex incidents promptly.


- Leverage SQL (DML & SELECT queries) to analyze application data for performance and troubleshooting insights.


Process & Collaboration :


- Apply ITIL/ITSM principles for effective incident, problem, and change management.


- Collaborate closely with development, quality assurance, and product teams to improve system reliability.


- Manage and track code changes using Git or Bitbucket.


Required Skills :


Core Technical Skills :


- 5-8 years of experience in an SRE, DevOps, or similar role.


- Strong proficiency in at least one scripting language : Shell, Groovy, or YAML.


- Expertise in monitoring tools like Splunk or Dynatrace for alerting, dashboarding, and log analysis.


- Hands-on experience with CI/CD tools, specifically Jenkins.


System & Infrastructure :


- Strong understanding of Linux system administration.


- Basic exposure to cloud environments, with AWS being preferred.


Process & Data :


- Basic knowledge of ITIL/ITSM concepts (Incident, Problem, Change Management).


- Proficiency in SQL (DML and SELECT queries).


Preferred Skills :


- Experience with configuration management tools like Ansible or Chef.


- Hands-on experience with Docker and Kubernetes for container orchestration.


- Knowledge of other monitoring tools such as Prometheus or Grafana.


- Relevant certifications in Linux or cloud platforms.


- Strong problem-solving and analytical skills, with a proactive attitude toward identifying and resolving issues.

info-icon

Did you find something suspicious?