Posted on: 28/07/2025
About Us :
Paytms Infra Tech Lab solves some of the most complex Devops issues across the globe. Some of our DNA fingerprints are :
Cloud-First Paytm is a cloud-first company where we work on some of the largest cloud (AWS) workloads.
Scalability We tame scale. Our definition of scale is - PBs of data, millions of requests per minute, few thousand microservices hosted on a few hundred thousand cores.
Innovation is our blood. Automation is our key Mantra. We work smartly - We have designed one of the world's largest CI platforms, our own AWS cost management platform, EBS autoscale (Yes we downscale EBS without even an ms of outage), and so on.
Observability One of the best in-house designed observability systems processing a few 10s of millions of events per second and many more.
About the Role :
Key responsibilities :
- Develop a deep understanding of our complex architecture, automate infrastructure and deployment using code and ensure performance, reliability and uptime of every component of the system.
- Improve observability of the system, troubleshoot production incidents, identify root causes and implement corrective and preventive measures.
- Work with our Information security team to implement security fixes and make systems secure and compliant as per the guidelines.
- Document and implement best practices and strategies around running Low-latency high-throughput applications in the Cloud.
- Manage and improve our NoSQL/big-data infrastructure (Cassandra, EMR etc)
- Participate in weekly Oncall rotation for the production systems.
Expectations/Requirements :
- Cloud expertise : AWS (Preferred) / Azure / Google Cloud.
- Infrastructure as Code (IaC) : Hands-on experience with Terraform for cloud resource provisioning.
CI/CD & Deployment Pipelines :
- Expertise in Zero Downtime Deployment strategies.
- Strong knowledge of GitOps practices using ArgoCD.
Containerization & Orchestration :
Monitoring & Observability :
- Experience with Grafana & Prometheus for system monitoring.
Database Systems :
Messaging & Caching Systems :
- Hands-on expertise in Redis (production-level clusters).
Scripting & Automation :
AI & Automation Tools :
- Ability to integrate AI-driven solutions into DevOps workflows.
Cloud Cost Optimization :
- Strong understanding of cost-saving best practices for cloud infrastructure.
Superpowers/Skills That Will Help You Succeed in This Role :
- High level of drive, initiative, and self-motivation.
- Strong problem-solving skills with a growth mindset.
- Excellent communication and stakeholder management.
- Passion for automation and AI-driven efficiencies.
- Willingness to experiment, innovate, and continuously improve.
Why Join Us?
- Opportunities to work on large-scale, high-impact projects.
- A culture that values technical excellence, automation, and efficiency.
- Respect that is earned from peers and leadership based on contributions and impact.
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1519962
Interview Questions for you
View All