HamburgerMenu
hirist

Job Description

Description :

Job Title : Principal Application Architect

Location : Bangalore / Hyderabad

Work Mode : Work From Office (5 Days a Week)

WFH : 2 Days per Month

Experience : 5- 10 Years

Notice Period : Immediate to 15 Days

Interview Process :

Round 1 : Online

Round 2 : Face-to-Face

About the Role :

We are looking for a hands-on Principal Application Architect to join our internal engineering team and take full ownership of the performance, stability, scalability, and observability of our AWS-hosted SaaS platform.

This is not an advisory or documentation-only role. The ideal candidate will actively read, write, debug, and optimize production code while also shaping cloud-native architecture and driving deep operational visibility. The role directly impacts production reliability, customer experience, and system resilience.

Key Ownership Areas :

- Application performance and scalability

- Production reliability and uptime

- Observability, logging, and APM

- Cloud-native and microservices architecture

- Code-level quality, resilience, and optimization

Key Responsibilities :

1. Hands-on Application & Code Ownership

a. Review, debug, and optimize production code across :

- Microservices

- APIs

- AWS Lambda functions

- Background workers

b. Identify and resolve :

- Memory leaks

- Thread exhaustion

- Slow database queries

- Inefficient API calls

- Blocking code in asynchronous systems

c. Enforce :

- Performance-aware coding standards

- Observability-first development practices

2. Performance Engineering

a. Profile applications in both production and staging environments

b. Identify and address :

- High-latency endpoints

- Resource-intensive API calls

- Inefficient execution paths

c. Tune and optimize :

- Runtime configurations (JVM, Node.js, Python, .NET)

- Database and connection pools

- Caching strategies

- Queue-based and batch processing systems

3. Observability & APM (Including Splunk)

a. Implement end-to-end distributed tracing

b. Standardize :

- Structured logging

- Correlation IDs

- Error classification and severity models

c. Use and extend observability platforms :

- OpenTelemetry

- Splunk (SPL, dashboards, alerts)

- Datadog, New Relic, Dynatrace, Elastic, Prometheus, Grafana

d. Build and maintain :

- Splunk dashboards for API latency, error rates, and service health

e. Custom SPL queries to investigate :

- Failed transactions

- Timeouts

- Exception trends

- Customer-impacting incidents

f. Enable seamless trace ? log ? SPL ? code ? root-cause analysis

4. Cloud-Native Architecture

a. Design and govern :

- Microservices architectures

- Serverless workflows

- Event-driven systems

- APIs and API gateways

b. Enforce architectural best practices :

- Resilience patterns (timeouts, retries, circuit breakers)

- API contracts and versioning

- Clear service ownership boundaries

5. AI-Driven Diagnostics & Automation

a. Leverage AI to :

- Detect anomalies in logs and metrics

- Identify performance regressions

- Correlate incidents across distributed systems

b. Implement :

- AIOps platforms

- LLM-based log and trace analysis

- Automated anomaly detection and diagnostics

Required Technical Skills :

Application Development (Hands-On - Mandatory) :

a. Strong expertise in at least one backend technology :

- Java / Spring Boot

- Python / FastAPI

- Node.js

- .NET Core

b. Must be able to :

- Read, refactor, and optimize production code

- Debug live production issues

- Optimize algorithms, database queries, and API flows

Cloud & Containers :

- AWS services : ECS, EKS, Lambda, RDS, Redis, API Gateway

- Docker and Kubernetes

- CI/CD pipelines and deployment automation

Observability & Logging :

- OpenTelemetry

- Splunk (SPL, dashboards, alerts)

- Datadog, New Relic, Dynatrace, Elastic, Prometheus, Grafana

- Distributed tracing, metrics, and log analytics

APIs & Data :

- RESTful and asynchronous APIs

- SQL and NoSQL performance tuning

- Redis and advanced caching strategies

AI & Automation :

- AIOps platforms

- LLM-based troubleshooting and diagnostics

- Automated performance and anomaly detection


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in