HamburgerMenu
hirist

Job Description

Job Requirement :

What you'll do :

- Bridging the gaps b/w core infra, security, QA and development team.

- Owning the end-to-end Availability, Performance, Capacity of applications and their infrastructure and creating/maintaining the respective observability with Prometheus/New Relic/ELK/Loki.

- Providing 24X7 infra & app support, building processes and documenting "tribal" knowledge around the same time.

- Mentor and train L1 engineers and continually improve app and infra support processes.

- Managing application deployment & GKE platforms - automate and improve development and release processes.

- Creating, managing and maintaining datastores & data platform infra using IaC.

- Owning and onboarding new applications with the production readiness review process.

- Managing the SLO/Error Budgets/Alerts and performing root cause analysis for production errors.

- Working with Core Infra, Dev and Product teams to define SLO/Error Budgets/Alerts.

- Working with the Dev team to have an in-depth understanding of the application architecture and its bottlenecks.

- Identifying observability gaps in application & infrastructure and working with stakeholders to fix them.

- Managing outages and doing detailed RCA with developers and identifying ways to avoid that situation.

- Automate toil and repetitive work.

What We're Looking For :

- 6+ Years of experience in managing high traffic, large scale microservices and infrastructure with excellent troubleshooting skills.

- Experience in troubleshooting, managing and deploying containerized environments using Docker/containerd, Kubernetes is a must.

- Must be proficient with the helm with experience in service mesh like Istio, Linkerd.

- Must be very hands-on in managing and troubleshooting the Kubernetes environment.

- Extensive experience with Linux administration and a good understanding of the various Linux kernel subsystems (memory, storage, network etc).

- Extensive experience in DNS, TCP/IP, UDP, GRPC, Routing and Load Balancing.

- Expertise in GitOps, Infrastructure as a Code tool such as Terraform etc.. and Configuration Management

- Tools such as Chef, Puppet, Saltstack, Ansible.

- Expertise in Google Cloud (GCP) and/or other relevant Cloud Infrastructure solutions like AWS or Azure.

- Experience in building the CI/CD pipelines with tools such as Jenkins, GitLab, Spinnaker, Argo etc.

- Experience with multiple datastores is a plus (Kafka/RabbitMQ, Redis, Elasticsearch).

- Must be good in any of the DevOps scripting languages - python or go.

- A collaborative spirit with the ability to work across disciplines to influence, learn and deliver.

info-icon

Did you find something suspicious?