Posted on: 21/11/2025
Description :
- First-level escalation for 247 monitoring and support of Databricks clusters, jobs, workflows, repos, and data pipelines
- SME-level troubleshooting for cluster failures, auto-scaling issues, job failures (PySpark/Scala/Spark SQL/Delta Live Tables), workspace availability
- Work with application development teams to remediate pipeline failures
- Participate in resolution of Sev1/Sev2 incidents and perform root cause analysis
- Implement workspace governance, RBAC, cluster policies, and data security best practices
- Build custom dashboards for job performance, analytics, and cluster utilization
- Maintain SOPs, runbooks, and architecture diagrams
- Escalate recurring issues to L3/platform engineering
- Debug complex Spark issues, including OOM errors, long GC cycles
Required Skills & Experience :
- 6-10 years of experience in Big Data/Cloud Data Platform Support
- SME-level knowledge of Databricks platform, Spark clusters, jobs, repos, MLflow, warehouse
- Strong experience in UNIX, SQL, Shell Scripting
- Experience with Spark UI debugging
- Hands-on with CI/CD pipelines (Azure DevOps preferred)
- Strong expertise in Apache Spark and Azure Cloud
- Educational Qualification : B.E/B.Tech/MCA
Work Details :
- 247 rotational support across Morning, Afternoon, and Night shifts
- Location : Navi Mumbai (Ghansoli)
- Work Mode : Work from Office
- Immediate Joiners Only
- Interview Process : 2-3 rounds (including Client Round)
Did you find something suspicious?