At Coschool, were building AI-powered learning solutions that create real-world impact. If you enjoy solving complex infrastructure problems and want to scale systems that support Generative AI, this is your opportunity.
Make an Impact : Join a fast-growing startup with a go-to-market product and help scale our platform from the ground up.
Work on Cutting-Edge Tech : Be at the forefront of LLMOps, MLOps, and AI-driven systems.
Grow with the Company : Thrive in a high-ownership environment that values learning, adaptability, and leadership.
Innovative Culture : Collaborate with a team that encourages experimentation and bold problem-solving.
What Youll Do :
- Own production stability, uptime, and reliability across applications and infrastructure.
- Lead incident management, on-call rotations, and post-incident reviews.
- Design and maintain CI/CD pipelines using Jenkins and GitHub Actions.
- Manage and optimize AWS infrastructure (EC2, EKS, ALB/NLB, Lambda, API Gateway, Cognito, SNS/SES, ElastiCache).
- Build and operate containerized platforms using Docker and Kubernetes (EKS).
- Define and monitor SLIs/SLOs aligned with business outcomes.
- Implement observability using Prometheus, Grafana, and ELK.
- Automate infrastructure using Terraform, Pulumi, and Ansible.