- Design, deploy, and manage scalable, secure, and highly available systems on AWS.

- Optimize cloud costs, enforce tagging, and implement security best practices (IAM, VPC, GuardDuty, etc.).

- Automate infrastructure provisioning using Terraform or AWS CDK.

- Ensure backup, disaster recovery, and high availability (HA) strategies are in place.

2. Kubernetes (EKS preferred) :

- Manage and scale Kubernetes clusters (preferably Amazon EKS).

- Implement CI/CD pipelines with GitOps (e.g., ArgoCD or Flux) or traditional tools (e.g., Jenkins, GitLab).

- Enforce RBAC policies, namespaces isolation, and pod security policies.

- Monitor cluster health, optimize pod scheduling, autoscaling, and resource limits/requests.

3. Monitoring and Observability (Datadog) :

- Build and maintain Datadog dashboards for real-time visibility across systems and services.

- Set up alerting policies, SLOs, SLIs, and incident response workflows.

- Integrate Datadog with AWS, Kubernetes, and applications for full-stack observability.

- Conduct post-incident reviews using Datadog analytics to reduce MTTR.

4. Automation and DevOps :

- Automate manual processes (e.g., server setup, patching, scaling) using Python, Bash, or Ansible.

- Maintain and improve CI/CD pipelines (Jenkins) for faster and more reliable deployments.

- Drive Infrastructure-as-Code (IaC) practices using Terraform to manage cloud resources.

- Promote GitOps and version-controlled deployments.

5. Linux Systems Administration :

- Administer Linux servers (Ubuntu, RHEL, Amazon Linux) for stability and performance.

- Harden OS security, configure SELinux, firewalls, and ensure timely patching.

- Troubleshoot system-level issues: disk, memory, network, and processes.

- Optimize system performance using tools like top, htop, iotop, netstat, etc.