HamburgerMenu
hirist

AIOps Engineer

Digihelic Solutions Private Limited
Anywhere in India/Multiple Locations
5 - 8 Years
star-icon
4.6white-divider20+ Reviews

Posted on: 20/11/2025

Job Description

Job Description :


Key Responsibilities :


- Build, integrate, and optimize AIOps platforms to improve incident detection, root cause analysis, and automated remediation.


- Develop and maintain automation scripts for repetitive operational tasks using Python, Shell, or PowerShell.


- Implement intelligent anomaly detection and predictive analytics using ML/AI models.


- Configure, manage, and enhance monitoring tools such as Datadog, New Relic, Dynatrace, Prometheus, Grafana, Splunk, ELK, or similar.


- Develop advanced dashboards and observability pipelines for logs, metrics, traces, and events.


- Ensure high availability and performance of monitoring and alerting systems.


- Lead triage during critical incidents and work with engineering teams to reduce MTTR.


- Perform root cause analysis using AI-driven insights and create preventive action plans.


- Build self-healing workflows and automated incident response systems.


- Work closely with DevOps teams to integrate AIOps in CI/CD pipelines.


- Manage and support cloud infrastructure on AWS / Azure / GCP.


- Implement and optimize cloud-native services for logging, monitoring, and autoscaling.


- Build and manage data pipelines for operational data across logs, metrics, traces, events, and alerts.


- Work with ML engineers to integrate models into operational workflows.


- Ensure data quality, integrity, and real-time processing for AIOps systems.


- Collaborate with SRE, DevOps, Infrastructure, and Security teams to enhance operational intelligence.


- Create clear documentation, runbooks, workflows, and automation playbooks.


- Train internal teams on AIOps processes and automation tools.


Required Skills & Experience :


Technical Expertise :


- 4+ years of experience in AIOps, DevOps, SRE, or Cloud Operations roles.


- Hands-on experience with at least one AIOps or observability platform :


1. Dynatrace, Moogsoft, Datadog, BigPanda, Splunk ITSI, New Relic, Elastic APM, etc.


2. Strong scripting/programming skills in Python, Shell, Go, or PowerShell.


- Experience with ML/AI models for anomaly detection, forecasting, or event correlation (nice-to-have but preferred).


- Strong knowledge of cloud platforms (AWS, Azure, GCP) and cloud-native monitoring services.


- Strong understanding of Kubernetes, Docker, and microservices monitoring.


- Experience with CI/CD tools such as Jenkins, GitHub Actions, GitLab CI, Azure DevOps, etc.


- Strong understanding of Infrastructure as Code tools : Terraform, CloudFormation, Ansible.


- Solid knowledge of logs, metrics, traces, APM tools, synthetic monitoring, and alerting frameworks.


- Experience with log aggregation tools (Splunk, ELK, Loki, etc.).


Soft Skills :


- Strong analytical thinking and problem-solving abilities.


- Excellent communication skills to collaborate with cross-functional teams.


- Ability to work in a fast-paced environment and handle critical incidents calmly.

info-icon

Did you find something suspicious?