HamburgerMenu
hirist

Job Description

We are seeking a skilled Site Reliability Engineer (SRE) / DevOps Engineer to join our infrastructure team. In this role, you will design, build, and maintain scalable infrastructure, CI/CD pipelines, and observability systems to ensure high availability, reliability, and security of our services.


You will work cross-functionally with development, QA, and security teams to automate operations, reduce toil, and enforce best practices in cloud-native environments.


Responsibilities :


- Design, implement, and manage cloud infrastructure (GCP/AWS/Azure) using Infrastructure as Code (Terraform).

- Maintain and improve CI/CD pipelines using tools like CircleCI, GitLab CI, or ArgoCD.

- Ensure high availability and performance of services using Kubernetes (GKE/EKS/AKS) and container orchestration.

- Implement monitoring, logging, and alerting using Prometheus, Grafana, ELK, or similar tools.

- Collaborate with developers to optimize application performance and deployment processes.

- Manage and automate security controls such as IAM, RBAC, network policies, and vulnerability scanning.


Requirements :


- Strong knowledge of Linux.

- Experience with scripting languages such as Python, Bash, or Go.

- Experience with cloud platforms (GCP preferred, AWS or Azure acceptable).

- Proficient in Kubernetes operations, including Helm, operators, and service meshes.

- Experience with Infrastructure as Code (Terraform).

- Solid experience with CI/CD pipelines (GitLab CI, CircleCI, ArgoCD, or similar).

- Familiarity with monitoring and observability tools (Prometheus, Grafana, ELK, etc.).

- Knowledge of networking concepts (TCP/IP, DNS, Load Balancers, Firewalls).

- Understanding of security best practices (Secrets Management, IAM, Zero Trust).


Preferred Qualifications :


- Experience with advanced networking solutions.

- Familiarity with SRE principles such as SLOs, SLIs, and error budgets.

- Exposure to multi-cluster or hybrid-cloud environments.

- Knowledge of service meshes (Istio).

- Experience participating in incident management and postmortem processes.


info-icon

Did you find something suspicious?