Posted on: 20/08/2025
Responsibilities :
- Ensure that our applications and environments are stable, scalable, secure, and performing as expected.
- Proactively engage and work in alignment with cross-functional colleagues to understand their requirements, contributing to and providing suitable supporting solutions.
- Develop and introduce systems to aid and facilitate rapid growth, including implementation of deployment policies, designing and implementing new procedures, configuration management, and planning of patches, and for capacity upgrades.
- Observability : ensure suitable levels of monitoring and alerting are in place to keep engineers aware of issues.
- Establish runbooks and procedures to keep outages to a minimum. Jump in before users notice that things are off track, then automate it for the future.
- Automate everything so that nothing is ever done manually in production.
- Identify and mitigate reliability and security risks.
- Make sure we are prepared for peak times, DDoS attacks, and fat fingers.
- Troubleshoot issues across the whole stack - software, application, and network.
- Manage individual project priorities, deadlines, and deliverables as part of a self-organizing team.
- Learn and unlearn every day by exchanging knowledge and new insights, conducting constructive code reviews, and participating in retrospectives.
Requirements :
- 2+ years of extensive experience in Linux server administration, including patching, packaging (rpm), performance tuning, networking, user management, and security.
- 2+ years of implementing systems that are highly available, secure, scalable, and self-healing on the Azure cloud platform.
- Strong understanding of networking, especially in cloud environments, along with a good understanding of CICD.
- Prior experience implementing industry-standard security best practices, including those recommended by Azure.
- Proficiency with Bash and any high-level scripting language.
- Basic working knowledge of observability stacks like ELK, prometheus, grafana, Signoz, etc.
- Proficiency with Infrastructure as Code and Infrastructure Testing, preferably using Pulumi/Terraform.
- Hands-on experience in building and administering VMs and Containers using tools such as Docker/Kubernetes.
- Excellent communication skills, spoken as well as written, with a demonstrated ability to articulate technical problems and projects to all stakeholders.
Extra credits for :
- Experience with these technologies : Pulumi with TypeScript or Golang, Node.js, Kubernetes, Serverless infrastructure, Azure cloud.
- Experience in governance processes and compliance validation, especially for financial services such as ISOm, SOC, 2 PCI, etc.
- Experience working in product startups.
- Experience in administering and scaling PostgreSQL.
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1532102
Interview Questions for you
View All