Posted on: 19/11/2025
Description :
Position : SME Application Support Engineer - Databricks 24/7 Operations
Work Mode : 1/6/5 rotational support across Morning, Afternoon, General, Weekend, Night support on need basis
Position Count : 3
Education : B.E/B.Tech/MCA
Total IT Experience : 6-10 years
Location : RCP Navi Mumbai
Responsibilities :
- Serve as First Level Escalation for 24/7 monitoring of Databricks clusters, jobs, workflows, repos, and data pipelines
- SME Level issue troubleshooting/analysis related to :
a. Cluster failures or auto-scaling issues
b. Job failures (PySpark/Scala/Spark SQL/Delta Live Tables)
c. Workspace availability issues
- Work directly with application Dev owners to remediate pipeline failures
- Participate in resolution of Sev1/Sev2 Incidents
- Prepare RCA
- Implement Workspace governance, User access control (RBAC), Cluster policies, Data security best practices
- Ensure compliance with Audit requirements
- Build custom dashboards/logging for Job performance, Failure analytics, Cluster utilization
- Maintain SOPs, runbooks, Architecture diagrams provided by Data Engineering and Platform Engineering teams
- Identify recurring issues and report to L3/Platform Engineering
- Support debugging complex Spark issues, including OOM in driver/executor, Long GC cycles
Skills :
- 6 to 10 years of experience in Big Data / Cloud Data Platform Support
- SME of Databricks platform (clusters, jobs, repos, MLflow, warehouse)
- Expertise in UNIX, SQL, Shell Scripting
- Expertise in Spark UI job debugging
- Strong skill in CI/CD pipelines (Azure DevOps)
- Strong skill in Apache Spark, Azure Cloud
Did you find something suspicious?