Job Summary :
We are seeking a proactive and skilled Site Reliability Engineer (SRE) to join our team on a Contract-to-Hire (C2H) basis. The ideal candidate will have solid troubleshooting abilities, strong scripting experience, and hands-on exposure to modern DevOps and monitoring tools. You will be instrumental in ensuring the reliability, scalability, and performance of our production systems.
Key Details :
Role : Site Reliability Engineer (SRE)
Experience : 4 to 6 Years
Location : Pune
Employment Type : Contract-to-Hire (C2H)
Notice Period : Immediate Joiners Only (Candidates with a maximum of a 15-day notice period will be prioritized)
Core Responsibilities :
- System Reliability: Focus on automating toil, improving system performance, and ensuring the high availability and scalability of critical production services.
- Troubleshooting & Incident Management: Serve as a key resource in triaging, troubleshooting, and resolving complex incidents, leveraging your knowledge of the software development lifecycle and production environments.
- Monitoring & Alerting: Configure, maintain, and enhance monitoring solutions, including creating effective dashboards, defining critical alerts, and writing analytical queries within monitoring tools.
- Automation: Utilize scripting to automate repetitive tasks, implement proactive health checks, and streamline deployment and operational procedures.
- CI/CD Pipeline Management: Contribute to the maintenance and improvement of our Continuous Integration and Continuous Delivery (CI/CD) pipelines.
- Process Adherence: Apply basic principles of IT Service Management (ITSM) to manage incidents, problems, and changes effectively.
Required Technical Skills (Must-Have) :
- The numbering in parentheses (e.g., 3/5) indicates the required proficiency level.
- ITSM/ITIL (Basic): Foundational understanding of core processes like Incident Management, Problem Management, and Change Management.
- Operating Systems: Good working knowledge of Linux administration, command-line tools, and system diagnostics. Experience with managing and troubleshooting certificate renewals is required.
- Scripting (3/5): Strong proficiency in at least one scripting language for automation: Shell (Bash/Ksh), Groovy, or YAML.
- Monitoring Tools (3/5): Hands-on experience with commercial monitoring platforms like Splunk or Dynatrace.
Specific Ability: Must be able to create alerts, build dashboards, and write complex queries within the chosen monitoring tool(s).
- Database (Basic): Ability to write standard SQL statements, including SELECT queries and basic DML (Data Manipulation Language) operations.
- CI/CD: Practical experience with Jenkins (3/5) for building and deploying applications.
- Version Control: Good working knowledge of Git / Bitbucket (3/5) for code management and collaboration.
Desirable Technical Skills (Nice-to-Have) :
- Configuration Management: Exposure to configuration management tools such as Ansible or Chef.
- Cloud Platforms: Experience with any major cloud provider, with AWS exposure being preferred.
Behavioral Skills :
- Communication (4/5): Excellent verbal and written communication skills; must demonstrate confidence and clarity during technical discussions.
- Problem-Solving: Strong aptitude for problem-solving and a proactive, ownership-driven mindset.