Senior Hadoop Administrator

Location : Bangalore (Hybrid) Experience: 810 Years

Role Summary :

The Senior Hadoop Administrator will join the PRE-Big Data team, taking full ownership of the performance, reliability, and architectural integrity of our large-scale distributed data platforms.

This role requires a blend of deep systems administration and modern DevOps engineering to manage diverse open-source ecosystems including Hadoop, Kafka, HBase, and Spark. You will be responsible for moving the platform beyond basic maintenance toward a highly automated, self-healing infrastructure that supports mission-critical data processing and real-time analytics for the organization.

Responsibilities :

- Administer and engineer multi-tenant Hadoop clusters, ensuring high availability and optimal resource allocation across HDFS, YARN, and MapReduce components.

- Design and implement automated deployment and configuration management workflows using Ansible, Shell, or Python to eliminate manual intervention in cluster scaling.

- Execute deep-dive root cause analysis (RCA) for complex production incidents, debugging issues across the full stack from hardware and OS to JVM and application layers.

- Architect and deploy high-availability (HA) solutions for core services to eliminate single points of failure (SPOF) and ensure continuous service delivery.

- Lead capacity planning and proactive hardware expansion initiatives to accommodate data growth and prevent performance degradation during peak processing windows.

- Perform large-scale cluster upgrades and security patching across Hadoop, Kafka, and Spark environments with minimal downtime and zero data loss.

- Develop custom observability frameworks and fine-tune alerting systems to proactively identify hardware failures, network bottlenecks, and resource contention.

- Implement robust cluster hardening techniques, focusing on Kerberos authentication, Ranger/Sentry authorization, and data-at-rest encryption.

- Collaborate with L3 engineering teams to review new use cases and ensure architectural alignment with existing platform standards and performance benchmarks.

- Create and maintain comprehensive Standard Operating Procedures (SOPs) and technical playbooks for incident, problem, and change management workflows.

- Engineer self-healing scripts and automated remediation tools to resolve common platform issues without manual SRE intervention.

- Optimize Spark and Hive execution engines through parameter tuning, memory management, and query plan analysis to meet strict Service Level Agreements (SLAs).

Technical Requirements :

- Expert-level proficiency in Hadoop administration (HDP, CDP, or Vanilla Apache) including HDFS, YARN, Hive, and ZooKeeper.

- Strong hands-on experience in managing distributed streaming and NoSQL platforms such as Apache Kafka and HBase in a production environment.

- Advanced automation skills with significant experience in Ansible for configuration management and Python/Shell for system-level scripting.

- Deep understanding of Linux internals, including kernel tuning, disk I/O optimization, and network troubleshooting for big data workloads.

- Proven ability to code in at least one high-level language, preferably Python, for developing internal tools and data reporting dashboards.

- Mastery of DevOps disciplines, including version control (Git), CI/CD pipelines, and structured change management processes.

- Experience in configuring and managing cluster security using Kerberos, TLS/SSL, and integration with LDAP/Active Directory.

- Competency in monitoring and observability stacks such as Prometheus, Grafana, ELK, or Nagios for distributed systems.

Preferred Skills :

- Experience with containerization of Big Data components using Docker and Kubernetes (K8s) for hybrid cloud deployments.

- Familiarity with cloud-native Big Data services such as Amazon EMR, Google Cloud Dataproc, or Azure HDInsight for migration projects.

- Knowledge of advanced Spark optimization techniques, including Catalyst Optimizer analysis and custom partitioner implementation.

- Prior experience with Infrastructure as Code (IaC) tools like Terraform to provision underlying compute and storage resources.

- Certification in Hadoop Administration (e.g., Cloudera Certified Administrator) or Red Hat Certified Engineer (RHCE).

- Understanding of Data Governance and lineage tools like Apache Atlas to manage metadata and data compliance.

- Experience in implementing tiered storage strategies (Hot/Warm/Cold) within HDFS to optimize infrastructure costs.