Responsibilities (What Youll Do) :

Infrastructure Management :

- Oversee and maintain the infrastructure that supports the ad exchange applications.

- This includes load balancers, data stores, CI/CD pipelines, and monitoring stacks.

- Continuously improve infrastructure resilience, scalability, and efficiency to meet the demands of massive request volume and stringent latency requirements.

- Developing policies and procedures that improve overall platform stability and participate in shared On-call schedule.

Collaboration with Developers :

- Work closely with developers to establish and uphold quality and performance benchmarks, ensuring that applications meet necessary criteria before they are deployed to production.

- Participate in design reviews and provide feedback on infrastructure-related aspects to improve system performance and reliability.

Building Tools for Infra Management :

- Develop tools to simplify and enhance infrastructure management, automate processes, and improve operational efficiency.

- These tools may address areas such as monitoring, alerting, deployment automation, and failure detection and recovery, which are critical in minimizing latency and maintaining uptime.

Performance Optimization :

- Focus on reducing latency and maximizing efficiency across all components, from request handling in load balancers to database optimization.

- Implement best practices and tools for performance monitoring, including real-time analysis and response mechanisms.

Who Should Apply :

- 6-10 years of experience in handling a team of 4-5 members and managing services in large-scale distributed systems.

- Strong experience in Kubernetes and On Prem Cloud data center.

- Strong understanding of networking concepts (e.g., TCP/IP, routing, SDN) and modern software architectures.

- Proficiency in programming and scripting languages such as Python, Go, or Ruby, with a focus on automation.

- Experience with container orchestration tools like Kubernetes and virtualization platforms (preferably GCP).

- Ability to independently own problem statements, manage priorities, and drive solutions.

Preferred Skills & Tools Expertise :

- Infrastructure as Code : Experience with Puppet, Ansible, or Terraform.

- Monitoring and Logging Tools : Expertise with Prometheus, Grafana, or ELK stack.

- CI/CD Pipelines : Hands-on experience with Jenkins, or ArgoCD.

- Databases : Proficiency in MySQL (relational) or Redis (NoSQL).

- Web/Application Servers : Familiarity with Envoy, or Nginx.

- Strong knowledge of operating systems and networking fundamentals.

- Experience with version control systems such as Git.