Posted on: 14/07/2025
Job Overview :
We are seeking a highly skilled and proactive SRE Engineer to join our team. You will play a critical role in ensuring the reliability, scalability, and performance of our production systems and applications. This position requires a strong blend of software engineering expertise, operational acumen, and a deep commitment to automation and continuous improvement. You will contribute to our mission of providing a highly available and efficient platform for our users.
Responsibilities :
- Collaborate with cross-functional teams to define and establish Service Level Objectives (SLOs) and Service Level Agreements (SLAs) for critical systems.
- Monitor systems and applications proactively, identifying and resolving any performance bottlenecks or availability issues before they impact users.
- Develop and maintain robust monitoring tools, alerts, and dashboards to provide comprehensive visibility into system health and performance.
- Conduct post-incident analyses to identify root causes and implement preventive measures to avoid future incidents, fostering a culture of learning.
- Automate repetitive tasks and processes to improve operational efficiency, reduce manual intervention, and increase system reliability.
- Create and maintain clear, comprehensive documentation for system architecture, configurations, and troubleshooting procedures.
- Perform capacity planning and resource allocation to ensure optimal system performance and scalability for future growth.
- Collaborate with development teams to implement and deploy new features and enhancements, ensuring they meet reliability, performance, and operational standards.
- Stay up to date with industry best practices, new technologies, and emerging trends in Site Reliability Engineering (SRE).
- Provide primary operational support and engineering for multiple large-scale distributed software applications.
Objectives of this Role: :
- Run the production environment by monitoring availability and taking a holistic view of system health.
- Build software and systems to manage platform infrastructure and applications.
- Improve reliability, quality, and time-to-market of our suite of software solutions.
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement.
Primary Skills Required :
- Proficiency in scripting languages such as Python, Shell, or Perl.
- Experience with configuration management tools like Ansible, Puppet, or Chef.
- Familiarity with cloud platforms like AWS, Azure, or Google Cloud.
- Understanding of networking principles and protocols (TCP/IP, HTTP, DNS, etc.).
- Knowledge of containerization technologies (Docker, Kubernetes) and orchestration tools.
- Expertise in monitoring and logging tools such as Prometheus, Grafana, ELK stack, or Splunk (Optional - But Good to Know).
- Experience with Citrix technologies such as XenApp, XenDesktop, and NetScaler.
- Ability to support the administration and engineering of the Citrix environment.
- Experience working with Citrix Provisioning Server, SQL Database, and Citrix License Server.
- Experienced knowledge of virtualization technologies such as VMware or Hyper-V.
- Strong problem-solving and troubleshooting skills, with the ability to analyze and resolve complex technical issues.
- Terraform basic syntax and GitLab CI/CD configuration, pipelines, jobs.
- Cloud resources provisioning and configuration through CLI/API.
- Understanding of how to do basic queries in logs tools for general questions.
- Operating system (Linux) configuration, package management, startup, and troubleshooting.
- Block and object storage configuration.
- Networking VPCs, proxies, and CDNs.
Secondary Skills Required :
- Proven experience as a Site Reliability Engineer or a similar role.
- Solid understanding of software development methodologies and DevOps principles.
- Experience with agile and iterative development processes.
- Certification in relevant technologies or frameworks is a plus (e.g., AWS Certified DevOps Engineer, Certified Kubernetes Administrator).
- Familiarity with continuous integration/continuous deployment (CI/CD) pipelines.
- Experience with source control systems such as Git or SVN.
- Knowledge of security best practices and experience implementing security measures in a production environment.
- Ability to work independently and handle multiple projects and priorities simultaneously.
- Strong analytical and problem-solving skills, with a focus on continuous improvement and automation.
- Excellent communication and collaboration skills to work effectively with cross-functional teams.
- Strong attention to detail and ability to work in a fast-paced, dynamic environment.
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1512764
Interview Questions for you
View All