Posted on: 16/03/2026
Role : AI Azure Architect
Overview & Expectations :
Role Summary :
- Design, build, lead, and deliver production-grade AI solutions on Azure.
- Own execution excellence with measurable business value, technical depth, governance, and reliability.
Key Outcomes (06 to 12 months) :
- Ship production-grade AI/GenAI solutions with clear ROI, reliability (SLOs), and security.
- Establish engineering standards, CI/CD pipelines, observability, and repeatable delivery patterns.
- Build a reusable AI platform that enables AI applications across multiple domains (paved paths, templates, guardrails).
- Mentor engineers via reviews, playbooks, and hands-on guidance.
Responsibilities :
- Translate business problems into well-posed technical specifications and architectures.
- Lead design reviews, prototype quickly, and harden solutions for scale (high QPS / 1M+ users).
- Build automated pipelines (CI/CD) and model/data governance across environments (dev/test/prod).
- Define and track KPIs : accuracy, latency, cost, adoption, and compliance readiness.
- Partner with Product, Security, Compliance, and Ops to land safe-by-default systems.
GenAI + Agentic AI on Azure (must-have focus) :
- Implement Azure OpenAI solutions (prompting, evals, fine-tuning where applicable, safety filters).
- Build RAG architectures using Azure AI Search (vector) + curated data sources (SharePoint, SQL, Blob/ADLS, APIs).
- Design agentic workflows (tool use, multi-step orchestration, human-in-the-loop) using combinations of :
a. Azure Functions / Durable Functions, Logic Apps, Event Grid, Service Bus
b. Frameworks like Semantic Kernel / LangChain (as orchestration layer)
- Implement observability for agent workflows (traces, latency breakdown, failure modes, cost per run).
Technical Skills (Azure-focused) :
Platform & Runtime :
- Azure Kubernetes Service (AKS), Docker, Helm; Azure Container Registry (ACR)
- API Management, ingress patterns, autoscaling, secure networking (VNet, Private Link)
MLOps :
- Azure Machine Learning (pipelines, registries, endpoints), MLflow (tracking/registry)
- CI/CD with Azure DevOps or GitHub Actions, environment promotion, canary/champion-challenger patterns
Serving :
- Azure ML managed online endpoints and/or AKS-based inference
- FastAPI/gRPC-based services; performance tuning for low-latency inference
Data & Feature :
- ADLS Gen2, Azure Data Factory, Synapse/Databricks (as applicable)
- Feature store approach (Feast/managed equivalents), batch vs streaming (Event Hubs/Stream Analytics)
Monitoring & Observability :
- Azure Monitor, Application Insights, Log Analytics; Prometheus/Grafana where needed
- Model/data drift monitoring and alerting (Azure ML monitoring patterns)
Security & Compliance :
- Microsoft Entra ID (Azure AD), RBAC, Managed Identities, Key Vault
- Encryption at rest/in transit, network isolation, audit logging, policy controls
Hands-on programming :
- Strong applied coding in Python (plus scripting/automation).
Architecture & Tooling Stack :
- Git, branching standards, PR reviews, trunk-based delivery
- IaC : Bicep / Terraform (preferred), policy-as-code, reusable modules
- Registries/lineage/versioning and staged promotions for data/models
- Must have designed and built at least 3 Agentic AI solutions on Azure (end-to-end, production-grade).
Performance, Reliability & Cost :
- Define SLAs/SLOs for accuracy, tail latency, throughput, availability
- Capacity planning, autoscaling, load tests, caching, graceful degradation
- Cost controls : instance sizing, reserved/spot strategies, storage tiering
Qualifications :
- Bachelors/Masters or equivalent practical experience
- Proven track record of shipping and operating systems in production
- Must have strong platform engineering experience
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
ML / DL / AI Research
Job Code
1621007