Posted on: 31/10/2025
Description :
SPN Globe is a premier firm providing a comprehensive consultancy and staffing solutions for a wide range of domains in IT. We have positioned as a trusted partner for organizations seeking top-tier, niche skills talent with a focus on quality, integrity, and timely delivery. We have successfully navigated the challenges posed by a volatile IT market, consistently expanding its reach. By maintaining a client-first approach and leveraging innovative recruitment strategies, the company has continued to grow steadily in the face of economic fluctuations with win-win approach.
Location : Pune
Joining : 0 to 30 days / Immediate Joiners
Objectives :
- Act as the Site Reliability Engineer for global operations, ensuring system stability, scalability, and efficiency through advanced automation, observability, and proactive infrastructure management.
- Provide expertise in Kubernetes, Linux, networking, and automation practices to support reliable deployments and resilient services.
- Maintain a strong sense of reliability, with clear awareness of the risks and impacts that infrastructure and application changes can have.
Role & Responsibilities :
- Has strong knowledge of Kubernetes (including Talos) for deployment, scaling, and maintaining containerized applications.
- Provides Linux administration expertise and ensures secure, efficient system operations.
- Implements and maintains GitOps workflows using Flux for consistent, automated deployments.
- Designs and manages infrastructure automation using Puppet and Terraform.
- Ensures reliable operation of databases such as MySQL/MariaDB, Yugabyte, and MongoDB, supporting data integrity and availability.
- Operates and integrates streaming platforms (Confluent, Strimzi) for event-driven and real-time processing.
- Develops automation scripts and tools using Python to improve operational efficiency.
- Oversees edge device management, ensuring secure connectivity and smooth lifecycle operations.
- Supports and integrates solutions with Azure and hybrid/multi-cloud environments.
- Builds and operates monitoring and observability systems (Datadog, Prometheus, Grafana) to ensure system health and transparency.
- Designs for scalability and high availability, including disaster recovery and failover strategies.
- Applies security best practices across infrastructure, applications, and data.
- Evaluates risks carefully before changes, ensuring reliable rollout strategies and minimizing downtime or service disruption.
- Monitors system reliability, identifies risks, and implements proactive improvements.
- Collaborates with global teams to share best practices and ensure consistency across environments.
- Defines and standardizes developer tooling (e.g., IDEs, code quality tools, CI/CD integrations) to ensure consistent development environments and maintain high software quality.
- Manages developer workstations and operating system standards (currently Ubuntu-based), ensuring performance, security, and compatibility across the engineering organization with focus on the Asia team.
- Promotes a documentation culture, ensuring clear processes, runbooks, and troubleshooting guides.
- Report to the offshore Digital Manufacturing team based in Switzerland.
- Also, immediately refer this opportunity to your friends since we are closing all positions in this week only.
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
Site Reliability Engineering
Job Code
1568340
Interview Questions for you
View All