We are seeking an experienced L2 TechOps Engineer to manage and support large-scale production systems across Linux, Big Data, and containerized environments. The role requires a strong foundation in Linux administration, SQL optimization, Big Data tools (Hive, Spark), and OpenShift (OCP). The ideal candidate will be adept at monitoring, troubleshooting, automation, and incident response, ensuring the stability, scalability, and resilience of mission-critical systems.

Key Responsibilities

1. Production Environment Management :

- Manage, monitor, and maintain production systems to ensure optimal performance, uptime, and reliability.

- Perform system performance tuning, capacity planning, and load management to prevent bottlenecks.

- Ensure system compliance with security and governance policies across all environments.

2. Monitoring & Observability :

- Design and maintain monitoring dashboards in tools such as Grafana, Kibana, and Prometheus.

- Set up alerting frameworks to proactively detect performance degradations or failures.

- Perform root cause analysis (RCA) using metrics, logs, and distributed tracing tools.

3. Troubleshooting & Incident Management :

- Analyze and debug complex production issues using Airflow logs, Spark UI, Hive performance metrics, and system logs.

- Lead incident triage, drive restoration efforts, and document post-incident analysis reports.

- Collaborate with application, data, and infrastructure teams for quick resolution of issues.

4. Automation & Optimization :

- Develop automation scripts using Shell scripting, Python, or Ansible to reduce manual

intervention and operational overhead.

- Automate repetitive tasks such as system checks, deployment verifications, and data

validations.

- Contribute to CI/CD pipeline improvements to enhance deployment reliability.

5. Container & Cloud Platform Operations :

- Monitor, troubleshoot, and maintain OpenShift (OCP) clusters, including pods, nodes, and services.

- Manage containerized workloads, handle resource utilization, and perform cluster scaling as needed.

- Collaborate with DevOps and platform teams to ensure smooth application deployments.

6. Data Platform Operations :

- Support and optimize Big Data workloads involving Hive, Spark, and related data frameworks.

- Write, tune, and debug SQL queries to analyze large datasets and identify data

inconsistencies.

- Work with data engineering teams to maintain healthy data pipelines and job schedules.

7. Reliability & Process Excellence :

- Participate in on-call rotations, incident management, and 24/7 support coverage as required.

- Establish and document SOPs (Standard Operating Procedures) for key operational workflows.

- Contribute to continuous improvement initiatives in monitoring, automation, and fault tolerance.

Required Skills & Experience :

- 4+ years of experience in Production Support, DevOps, or IT Operations roles.

- Strong hands-on expertise in Linux server management and troubleshooting.

- Proficiency in SQL and experience with Hive, Spark, or other Big Data ecosystems.

- Experience managing OpenShift (OCP) or Kubernetes-based environments.

- Strong understanding of monitoring and logging tools (Grafana, Kibana, Prometheus, ELK

Stack).

- Experience with Airflow for workflow orchestration and debugging.

- Scripting experience in Shell, Python, or equivalent automation frameworks.

- Solid understanding of incident, change, and problem management practices (ITIL or similar).

- Excellent analytical, communication, and collaboration skills.

Did you find something suspicious?

Similar jobs that you might be interested in

Posted by

Shweta Goel

HR at FLAIR CONSULTING

Last Active: 3 Feb 2026

Job Views:
38

Applications: 15

Recruiter Actions: 5

Posted in

DevOps / SRE

Functional Area

DevOps / Cloud

Job Code

1572115

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers