AI/ML

Artificial Intelligence

Machine Learning

Security Architect - AI

Cloud Architect - ML/AI

Emerging Technologies

DevOps / SRE

CyberSecurity

Quality Assurance

Platform Engineering / SAP/Oracle

AIOps Engineer

Digihelic Solutions Private Limited

5 - 8 Years

Anywhere in India/Multiple Locations

Artificial Intelligence AIOps Python Machine Learning Datadog Monitoring Tools Cloud AWS Azure Kubernetes

Posted on: 21/11/2025

Job Description

Job Description :

Key Responsibilities :

- Build, integrate, and optimize AIOps platforms to improve incident detection, root cause analysis, and automated remediation.

- Develop and maintain automation scripts for repetitive operational tasks using Python, Shell, or PowerShell.

- Implement intelligent anomaly detection and predictive analytics using ML/AI models.

- Configure, manage, and enhance monitoring tools such as Datadog, New Relic, Dynatrace, Prometheus, Grafana, Splunk, ELK, or similar.

- Develop advanced dashboards and observability pipelines for logs, metrics, traces, and events.

- Ensure high availability and performance of monitoring and alerting systems.

- Lead triage during critical incidents and work with engineering teams to reduce MTTR.

- Perform root cause analysis using AI-driven insights and create preventive action plans.

- Build self-healing workflows and automated incident response systems.

- Work closely with DevOps teams to integrate AIOps in CI/CD pipelines.

- Manage and support cloud infrastructure on AWS / Azure / GCP.

- Implement and optimize cloud-native services for logging, monitoring, and autoscaling.

- Build and manage data pipelines for operational data across logs, metrics, traces, events, and alerts.

- Work with ML engineers to integrate models into operational workflows.

- Ensure data quality, integrity, and real-time processing for AIOps systems.

- Collaborate with SRE, DevOps, Infrastructure, and Security teams to enhance operational intelligence.

- Create clear documentation, runbooks, workflows, and automation playbooks.

- Train internal teams on AIOps processes and automation tools.

Required Skills & Experience :

Technical Expertise :

- 4+ years of experience in AIOps, DevOps, SRE, or Cloud Operations roles.

- Hands-on experience with at least one AIOps or observability platform :

1. Dynatrace, Moogsoft, Datadog, BigPanda, Splunk ITSI, New Relic, Elastic APM, etc.

2. Strong scripting/programming skills in Python, Shell, Go, or PowerShell.

- Experience with ML/AI models for anomaly detection, forecasting, or event correlation (nice-to-have but preferred).

- Strong knowledge of cloud platforms (AWS, Azure, GCP) and cloud-native monitoring services.

- Strong understanding of Kubernetes, Docker, and microservices monitoring.

- Experience with CI/CD tools such as Jenkins, GitHub Actions, GitLab CI, Azure DevOps, etc.

- Strong understanding of Infrastructure as Code tools : Terraform, CloudFormation, Ansible.

- Solid knowledge of logs, metrics, traces, APM tools, synthetic monitoring, and alerting frameworks.

- Experience with log aggregation tools (Splunk, ELK, Loki, etc.).

Soft Skills :

- Strong analytical thinking and problem-solving abilities.

- Excellent communication skills to collaborate with cross-functional teams.

- Ability to work in a fast-paced environment and handle critical incidents calmly.

Did you find something suspicious?

Similar jobs that you might be interested in

Posted by

Prashant balsaraf

Director at Digihelic Solutions Private Limited

Last Active: 24 Mar 2026

Job Views:
69

Applications: 14

Recruiter Actions: 0

Posted in

DevOps / SRE

Functional Area

ML / DL / AI Research

Job Code

1577901

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers