HamburgerMenu
hirist

Job Description

Key Responsibilities :


- Infrastructure & Automation : Design, deploy, and maintain highly reliable and scalable systems and infrastructure.
Automate routine tasks and workflows to improve operational efficiency through scripts like python, PowerShell, go, etc.


- Monitoring & Incident Management : Build and manage monitoring systems, identify key metrics, and respond to incidents in a timely manner. Lead post-mortem analysis to prevent future incidents and improve system reliability.


- Performance Optimization : Analyze system performance and implement improvements for latency, throughput, and system resource usage.


- Collaboration & Support : Work closely with development teams to ensure that application architectures are robust, scalable, and easy to monitor. Provide guidance on best practices for code deployment and maintenance.


- Capacity Planning : Monitor and forecast infrastructure usage and capacity to ensure systems can handle future demand.
Recommend and implement changes to optimize resource allocation.


- Disaster Recovery & Business Continuity : Develop and implement disaster recovery and business continuity plans to ensure that critical services remain available in the event of failures.


- Security & Compliance : Collaborate with security teams to ensure infrastructure and applications meet security best practices and compliance requirements.


Skills and Qualifications :


- Experience : 3-6 years of experience in Site Reliability Engineering, DevOps, or a similar field, with a solid understanding of

both software development and system administration.


Technical Expertise :


- Proficient with cloud platforms (AWS, GCP, Azure) and containerization technologies (Docker, Kubernetes).


- Strong experience with monitoring and alerting tools (Prometheus, Grafana, Datadog, etc.


- Proficiency with configuration management tools (Terraform, Ansible, Puppet, Chef).


- Experience with CI/CD pipeline management (Jenkins, GitLab, CircleCI).


- Strong knowledge with scripting languages (Python, Powershell, Go, etc.) for automation tasks.


- Strong understanding of networking, security, and system architecture principles.


Problem-Solving Skills : Excellent analytical and troubleshooting skills, able to diagnose complex technical issues and identify solutions quickly.


Communication : Strong verbal and written communication skills. Ability to explain complex technical concepts to both technical and non-technical stakeholders.


Team Player : Ability to work collaboratively in a cross-functional team, mentoring junior team members and contributing to team success.


Preferred Qualifications :


- Cloud certifications (e., AWS Certified Solutions Architect, Google Professional Cloud Architect) are a plus.


- Experience with distributed systems and large-scale infrastructure is highly desirable.


- Experience with service meshes, load balancing, and fault-tolerant architectures.


- Understanding of software development lifecycle and Agile methodologies


info-icon

Did you find something suspicious?