Description :

Job Title : Senior Principal Application Architect

Location : Bangalore / Hyderabad

Work Mode : Work From Office (5 Days a Week)

WFH : 2 Days per Month

Experience : 10- 20 Years

Notice Period : Immediate to 15 Days

Interview Process :

- Round 1 : Online

- Round 2 : Face-to-Face

About the Role :

We are seeking a Senior Principal Application Architect to join our internal engineering team and take end-to-end ownership of the performance, reliability, scalability, and observability of our AWS-hosted SaaS platform.

This is a high-impact, hands-on technical leadership role, not an advisory or documentation-only position.

The successful candidate will actively read, write, debug, and optimize production code while also defining system-wide architecture, enforcing engineering standards, and driving deep operational visibility across distributed systems.

This role directly addresses critical technical challenges including slow APIs, production instability, inefficient microservices and serverless workloads, and gaps in observability across logs, traces, and metrics.

What This Role Owns :

- Application performance and horizontal scalability

- Production reliability, availability, and incident reduction

- Observability, logging, and APM strategy

- Cloud-native and distributed system architecture

- Code-level quality, resilience, and engineering excellence

Key Responsibilities :

1. Hands-on Application & Code Ownership

a. Review, debug, and optimize production code across :

- Microservices-based applications

- REST and asynchronous APIs

- AWS Lambda functions

- Background workers and batch jobs

b. Diagnose and resolve :

- Memory leaks and thread exhaustion

- Slow or inefficient database queries

- High-latency and chatty API calls

- Blocking operations in asynchronous systems

c. Enforce :

- Performance-aware coding standards

- Observability-first development practices across teams

2. Performance Engineering & Optimization

a. Profile applications in production and staging environments

b. Identify and remediate :

- High-latency endpoints and services

- Resource-intensive API calls and workflows

- Inefficient execution paths and bottlenecks

c. Tune and optimize :

- Runtime configurations (JVM, Node.js, Python, .NET)

- Database and connection pooling strategies

- Caching layers and eviction policies

- Queue-based and batch-processing systems

3. Observability & APM Leadership (Including Splunk)

a. Design and implement end-to-end distributed tracing

b. Define and standardize :

- Structured logging frameworks

- Correlation IDs and trace propagation

- Error classification and severity models

c. Use, extend, and govern observability platforms :

- OpenTelemetry

- Splunk (SPL, dashboards, alerts)

- Datadog, New Relic, Dynatrace, Elastic, Prometheus, Grafana

d. Build and maintain :

- Splunk dashboards for service health, API latency, and error rates

e. Advanced SPL queries to analyze :

- Failed transactions and timeouts

- Exception and error patterns

- Customer-impacting incidents

f. Enable seamless trace ? log ? SPL ? code ? root-cause analysis

4. Cloud-Native Architecture & Governance

a. Architect, design, and govern :

- Microservices and domain-driven architectures

- Serverless workflows and event-driven systems

- APIs and API gateways

b. Enforce architectural best practices :

- Resilience patterns (timeouts, retries, circuit breakers)

- API contracts, versioning, and backward compatibility

- Clear service ownership and operational boundaries

5. AI-Driven Diagnostics & Automation

a. Leverage AI to :

- Detect anomalies in logs, metrics, and traces

- Identify performance regressions proactively

- Correlate incidents across distributed services

b. Implement and evolve :

- AIOps platforms

- LLM-based log and trace analysis

- Automated anomaly detection and diagnostics pipelines

Required Technical Skills :

Application Development (Hands-On - Mandatory) :

1. Strong expertise in at least one backend technology stack :

- Java / Spring Boot

- Python / FastAPI

- Node.js

- .NET Core

2. Must demonstrate the ability to :

- Read, refactor, and optimize complex production codebases

- Debug and resolve live production issues

- Optimize algorithms, database queries, and API workflows

Cloud, Containers & DevOps :

- AWS services : ECS, EKS, Lambda, RDS, Redis, API Gateway

- Docker and Kubernetes

- CI/CD pipelines and deployment automation

Observability & Logging :

- OpenTelemetry instrumentation and standards

- Splunk (SPL, dashboards, alerts)

- Datadog, New Relic, Dynatrace, Elastic, Prometheus, Grafana

- Distributed tracing, metrics, and log analytics

APIs & Data Management :

- RESTful and asynchronous APIs

- SQL and NoSQL database performance tuning

- Redis and advanced caching strategies

AI & Automation :

- AIOps platforms

- LLM-based troubleshooting and diagnostics

- Automated performance monitoring and anomaly detection