Posted on: 26/10/2025
WHAT YOU'LL DO :
- Support, maintain, and enhance the reliability, scalability, and performance of our Azure-based Data Analytics Platform.
- Collaborate closely with Data Engineers, Developers, and Architects to operationalize solutions in Synapse, Fabric, Databricks, and related Azure services.
- Design and implement monitoring, alerting, and observability strategies to ensure end-to-end visibility of data services and pipelines.
- Drive automation for provisioning, deployment, scaling, and recovery of critical services using Infrastructure-as-Code (IaC).
- Implement CI/CD pipelines tailored for data workloads (e.g., notebook deployments, schema evolution, integration testing).
- Ensure system compliance with enterprise security, privacy, and data governance policies.
- Participate in incident response, troubleshooting, and root cause analysis to improve system resilience.
- Optimize cost, performance, and service availability through best-practice configurations and usage monitoring.
- Contribute to SRE playbooks and knowledge bases for operational excellence.
- Act as a technical mentor and advocate for reliability engineering within data and product teams.
WHAT YOU'LL NEED :
- Proven experience as an SRE or DevOps engineer supporting large-scale data platforms, preferably in an enterprise Azure environment.
- Strong hands-on expertise with Azure Data Services, especially :
1. Azure Synapse Analytics
2. Microsoft Fabric
3. Azure Databricks
4. Azure Data Lake Storage
5. Azure Data Factory / Synapse Pipelines
- Deep understanding of data architecture principles, data pipeline orchestration, and distributed data processing.
- Proficiency in Infrastructure-as-Code tools like Terraform, Bicep, or ARM templates.
- Solid scripting experience (e.g., PowerShell, Python, or Bash) for automation tasks.
- Familiarity with CI/CD tools (e.g., Azure DevOps, GitHub Actions) and containerization
- Expertise in monitoring/logging solutions such as Azure Monitor, Log Analytics, Application Insights, and third-party tools like Prometheus/Grafana or Honeycomb.
- Knowledge of cloud security and data governance best practices.
- Strong analytical and problem-solving skills, with the ability to work collaboratively in a cross-functional team.
- Excellent communication skills to engage technical and non-technical stakeholders.
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
Site Reliability Engineering
Job Code
1564807
Interview Questions for you
View All