HamburgerMenu
hirist

Job Description

Description :


Key Responsibilities :


- Lead the design and development of highly scalable, reliable, and secure infrastructure on AWS/GCP.

- Own infrastructure architecture across microservices, streaming systems, storage, networking, and distributed systems.

- Build, improve, and scale CI/CD pipelines, GitOps workflows, and automated deployment frameworks.

- Drive observability strategy : monitoring, logging, alerting using Prometheus, Grafana, ELK/EFK, CloudWatch, etc.

- Oversee infrastructure supporting data engineering, ML workloads, and real-time analytics.

- Implement industry best practices around reliability, performance tuning, and cost optimization.

- Be capable of reviewing design documentation, contributing to code when required, debugging live issues, and supporting incident resolution.

- Guide infra teams on Kubernetes deployments, network routing, cloud security, IaC, and container lifecycle management.

- Build automation and tooling using Python, Go, Java, or similar languages.

- Lead, mentor, and grow high-performing DevOps, SRE, Platform, or Infra teams.

- Foster a culture of ownership, learning, and engineering excellence.

- Collaborate cross-functionally with product engineering, data engineering, and ML teams.

- Drive hiring, onboarding, skill development, and performance management.

- Ensure uptime, SLAs, and platform reliability for a large-scale consumer platform.

- Manage incident response, root-cause analysis, and preventive actions.

- Own budgets, cloud cost optimization, and capacity planning.

- Advocate for automation-first and infrastructure-as-code practices.


Required Skills & Qualifications :


- 5- 12 years of experience in infrastructure, DevOps, SRE, or platform engineering.

- 2+ years of leadership or engineering management experience.

- Deep expertise in AWS/GCP cloud services.

- Strong experience with Docker, Kubernetes, and orchestration systems.

- Expertise with Infrastructure-as-Code tools such as Terraform, Pulumi, or AWS CDK.

- Strong hands-on knowledge of streaming platforms like Kafka, RabbitMQ, Spark Streaming, etc.

- Experience with distributed systems, high-scale architecture, and real-time systems.

- Proven experience building and scaling CI/CD pipelines.

- Strong foundation in monitoring, observability, and logging systems.

- Working knowledge of programming languages such as Python, Go, Java, or Django/Spring.

- Experience managing infra-heavy, data-focused, or platform engineering teams.


Preferred / Nice-to-Have :


- Experience in OTT, streaming, or consumer internet platforms.

- Exposure to ML infrastructure, feature stores, and data governance.

- Contributions to open-source infrastructure or data tooling.

- Strong engineering community presence (conferences, blogs, open-source, meetups).


info-icon

Did you find something suspicious?