HamburgerMenu
hirist

Job Description

Key Responsibilities :


- Design, build, and maintain multi-region infrastructure using Terraform and Atlantis.


- Continuously optimize system performance, scalability, and cost efficiency.


- Implement infrastructure automation and self-healing capabilities.


- Develop and maintain Datadog dashboards, SLOs, SLIs, and alerting mechanisms.


- Automate incident detection, recovery, and runbook execution.


- Implement monitoring for reliability, availability, and latency across distributed systems.


- Manage and enhance CI/CD pipelines with security and quality gates (GitLeaks, static code checks, GitOps).


- Ensure deployment consistency and eliminate manual infrastructure drift.


- Collaborate with development teams to improve deployment processes and accelerate release cycles.


- Enforce strong IAM (Identity and Access Management) practices and maintain compliance across systems.


- Automate checks and reporting for SOC 2, ISO 27001, and GDPR compliance.


- Implement policies and automation for least privilege access and secure network configurations.


- Coach engineers on infrastructure best practices, observability, and cloud reliability.


- Advocate for DevOps and reliability engineering culture across the organization.


- Partner with cross-functional teams to define infrastructure standards and long-term roadmap.


Requirements & Qualifications :


- 4- 6 years of hands-on experience in Infrastructure, DevOps, or Platform Engineering roles.


- Strong expertise in AWS (ECS/Fargate, EKS) GCP (GKE) Terraform and Atlantis for Infrastructure as Code (IaC)


- Experience managing multi-region, multi-cloud environments.


- Proficiency with CI/CD pipelines, GitOps, and infrastructure security automation.


- Deep understanding of observability tools such as Datadog, Last9, and CloudWatch.


- Strong debugging, troubleshooting, and performance optimization skills.


- Demonstrated experience in cost management, monitoring automation, and incident management.


- Excellent communication and documentation skills able to explain complex technical topics clearly.


Preferred Skills :


- Experience with Cloudflare, Linear, or similar DevOps tools.


- Familiarity with container orchestration (Docker, Kubernetes) and service mesh technologies.


- Knowledge of security scanning, compliance automation, and infrastructure observability design.


- Understanding of SRE (Site Reliability Engineering) principles and error budgets.


- Experience in mentoring or leading small infrastructure teams.


Soft Skills :


- Proactive and detail-oriented problem solver.


- Strong leadership and mentoring capabilities.


- Collaborative team player with a build and automate everything mindset.


- Passion for innovation, reliability, and continuous improvement.



info-icon

Did you find something suspicious?