Posted on: 01/12/2025
Location : PAN India
Exp : 6 to 10 Yrs
Job Description : MLOps Engineer - AIOps & Observability
We are seeking an MLOps or Data Science Engineer with strong experience in machine learning operations and observability platforms to enable intelligent AIOps capabilities within our observability practice.
Key Responsibilities :
- AIOps Model Development : Design, develop, and deploy machine learning models for observability use cases including anomaly detection, predictive alerting, incident forecasting, and automated root cause analysis using time-series data from distributed infrastructure and application telemetry
- Time-Series Analytics & Forecasting : Build and optimize ML models using Python libraries (scikit-learn, Prophet, TensorFlow, PyTorch, statsmodels) to identify patterns, predict system failures, detect performance degradations, and enable proactive incident prevention across complex Azure environments
- Data Pipeline Engineering : Architect and implement scalable data pipelines using Apache Kafka for real-time streaming observability data, leveraging Apache Flink for stream processing and transformation to feed ML models with high-quality, timely telemetry from multiple sources
- Azure Databricks Integration : Utilize Azure Databricks for large-scale data processing, feature engineering, model training, and batch analytics on observability datasets, establishing MLOps workflows for continuous model improvement and retraining based on evolving infrastructure patterns
- Observability Platform Expertise : Apply hands-on experience with observability tools such as Datadog or similar platforms to understand metrics, logs, and traces data structures, translating observability domain knowledge into effective ML features and intelligent automation capabilities
- MLOps Lifecycle Management : Establish end-to-end MLOps practices including model versioning, automated testing, deployment pipelines, performance monitoring, and feedback loops to ensure AIOps models deliver reliable, actionable insights that reduce mean-time-to-detection and mean-time-to-resolution for incidents
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
ML / DL Engineering
Job Code
1582559
Interview Questions for you
View All