Posted on: 28/10/2025
Description :
- Implement AWS-native monitoring services (CloudWatch, CloudTrail, VPC Flow Logs) and integrate with Datadog to provide complete observability across Drupal websites, EKS, and RDS environments.
Performance & SLA Management :
- Configure log aggregation, health checks, and alerting rules to ensure application uptime (target 99.99%) and monitor cross-region failover effectiveness.
Dashboards & Analytics :
- Develop real-time dashboards for tracking traffic, latency, and error rates to support proactive issue detection and optimization.
Incident Management :
- Integrate monitoring insights into incident management workflows to enable faster detection, triage, and resolution for operational incidents.
Tooling & Automation :
- Oversee agent deployments and integrations of cloud workloads with third-party tools to enhance monitoring, logging, and security coverage.
Leadership & Collaboration :
- Lead a team of cloud engineers, ensuring best practices in DevOps, SRE, and Cloud Infrastructure Operations.
- Collaborate with cross-functional teams for system design, scaling, and reliability improvements.
Required Skills & Experience :
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1566105
Interview Questions for you
View All