HamburgerMenu
hirist

DevOps Engineer - Site Reliability

Magnifire
Multiple Locations
3 - 8 Years

Posted on: 30/01/2026

Job Description

Role Overview :

The DevOps Engineer will own the infrastructure, CI/CD, reliability, and observability for multiple products and projects at the same time. This role suits someone who is accountable, takes end- to- end responsibility, and can take care of tools that are self- hosted in dev and prod infra, while working closely with backend, frontend, and AI teams.

You will design and implement CI/CD pipelines from scratch, build and maintain Infrastructure as Code (IaC), implement secure and observable API platforms (gateways, routers, load balancers, NGINX), and help systems move towards 99.99-99.999% uptime using SRE- style practices (SLOs, monitoring, incident response).

Key Responsibilities :


- Design, implement, and maintain CI/CD pipelines for multiple products and projects (backend, frontend, AI services), enabling fast, repeatable deployments across dev, staging, and prod.

- Create metrics and alerts on infra services according to the SLAs and SLOs, and recommend strategies to support those using observability products like Signoz, New Relic, etc

- Self- host and operate services and tools (e.g., n8n, logto)

- Design and operate API gateways, routers, and load balancers (e.g., API Gateway, ALB/NLB, Application Gateway, NGINX/Ingress) to provide secure, scalable access to backend services.

- Define and enforce logging standards: structured logs, correlation IDs, log levels, log retention policies, and secure logging (no secrets/PII in logs).

- Implement secure API endpoints in collaboration with developers by:

- Enforcing authentication/authorization at gateways and services (OAuth2/OIDC/JWT).

- Applying least- privilege IAM, rate limiting, WAF rules, and TLS configurations on ingress points.

- Architect and maintain high-availability and resiliency patterns (multi- AZ, auto-scaling, health checks, blue/green or canary deployments, rollback strategies) to support 99.99-99.999% uptime targets.

- Identify gaps in current systems (reliability, security, performance, cost, observability), recommend improvements, and drive implementation end- to- end (plan - change - validate - document).

- Work with developers to containerize applications (Docker) and manage deployments on ECS/EKS/AKS/Kubernetes or serverless where appropriate.

- Implement security best practices in pipelines and infrastructure: secrets management (Secrets Manager, Key Vault), image/dependency scanning, hardened base images, secure configuration baselines.

- Contribute to internal DevOps best practices, standards, and documentation, and mentor developers on environment, logging, and deployment hygiene.

- Immediate joiners preferred

Must-Have Skills & Experience :

- 3-6 years of hands-on experience in DevOps / SRE / Cloud Engineering roles.

- Strong, practical experience with AWS (EC2, ECS/EKS, RDS, S3, VPC, ALB/NLB, API Gateway, IAM, CloudWatch).

- AND/OR Solid experience with Azure (App Services, AKS, Azure SQL/Storage, VNets, Application Gateway/Front Door, Azure AD, Monitor).

- Proven ability to build CI/CD pipelines from scratch using tools such as GitHub Actions, GitLab CI, Azure DevOps, Jenkins, or AWS CodePipeline/CodeBuild.

- Experience with Docker and at least one container orchestration or platform (Kubernetes, EKS, AKS, ECS, or similar).

- Proven experience setting up monitoring, logging, and alerting in production environments (e.g., CloudWatch, Azure Monitor, Prometheus, Grafana, ELK/EFK).

- Demonstrated ability to manage multiple products/projects simultaneously, prioritize work, and take ownership from design through production.

- Understanding of secure DevOps practices: secrets management, IAM, network segmentation, and basic vulnerability scanning in CI/CD.

- Experience supporting stacks including Python/Flask backends, Node.js/Express, Java/Spring Boot, and React/Next.js frontends, and generative AI services.

- Experience with cost optimization and simple FinOps workflows on AWS/Azure.

- Interest to learn, contribute, work, and grow in a startup environment.

Nice-to-Have / Preferred :

- Experience with SAST/SCA/DAST integration in pipelines (e.g., SonarQube, Snyk, OWASP ZAP) and feeding findings into a central system (e.g., Security Hub).

- Exposure to environments with formal SRE practices: defining SLOs/SLIs, managing error budgets, and running regular incident reviews.

- Build and manage Infrastructure as Code (IaC) for AWS and Azure using tools like Terraform and/or CloudFormation/Bicep, including reusable modules and environment separation.

info-icon

Did you find something suspicious?

Similar jobs that you might be interested in