Posted on: 15/12/2025
Description :
Role : Lead DevOps Engineer
Company/Operating Company : AI Center of Excellence in Engineering
About the AI Center of Excellence in Engineering :
Are you ready for the next step in your career as a Senior Software Developer? Youll work with an elite group of technologists and create real business impact. With room for initiative, the latest technologies, and an AI-first work environment, you will actively contribute to the development of products across the full portfolio.
About the Role :
We're looking for an experienced DevOps Engineer to help build, automate, and maintain both our SaaS cloud infrastructure and on-premises client installations. You'll work closely with development teams to implement robust CI/CD pipelines, manage Kubernetes deployments, and ensure security across our microservices architecture in multiple environments, with a focus on search, AI, and vector database technologies.
Key Responsibilities :
- Design and evolve AI-augmented CI/CD pipelines that serve as reusable blueprints across rewrite projects-supporting multi-tenant SaaS deployments, agentic automation, and environment creation through code.
- Collaborate with the AI methodology team to refine automation patterns and integrate AI-driven pipeline generation, test orchestration, and telemetry collection into the rewrite process.
- Develop automated installation and update frameworks for hybrid and customer-managed environments, emphasizing repeatability and low-touch deployment.
- Manage Azure-based SaaS infrastructure, ensuring reliability, elasticity, and security across Kubernetes and containerized services.
- Deploy, scale, and optimize Elasticsearch and vector database clusters supporting GenAI workloads.
- Implement, monitor, and tune LLM and AI service deployments on Azure (OpenAI Service, Cognitive Search, model hosting).
- Design and maintain federated authentication and identity integration across microservices (Okta, OAuth2, and SSO patterns).
- Oversee PostgreSQL/MS SQL and data infrastructure, ensuring resilience, automated backup, and performance tuning for high-throughput workloads.
- Establish observability standards-metrics, traces, and logs-for AI and non-AI services; use insights to improve future rewrites.
- Embed security automation into every deployment model, enforcing Zero Trust and continuous vulnerability assessment.
- Partner with development and AI teams to industrialize deployment methodology, transforming learnings from each rewrite into platform-level automation improvements.
Required Experience :
- AI-Augmented CI/CD : Proven experience building and maintaining automated pipelines (GitHub Actions or Azure DevOps) that integrate AI-assisted code generation, testing, and deployment workflows.
- Version Control & Collaboration : Deep experience with Git-based systems (GitHub, Bitbucket), including managing multi-repo architectures and enforcing branching and governance standards.
- Kubernetes & Cloud Infrastructure : Advanced proficiency with Kubernetes and Helm; experienced in operating containerized microservices across multiple environments in Azure.
- Infrastructure as Code (IaC) : Strong knowledge of Terraform or Bicep for creating repeatable, parameterized deployment templates used across multiple rewrite projects.
- Authentication & IAM : Hands-on experience implementing federated identity (Okta, Azure AD, OAuth2/OIDC) across microservices and SaaS environments.
- PostgreSQL & Data Layer Operations : Skilled in tuning, scaling, and backing up PostgreSQL; familiarity with managing schema migrations in automated CI/CD contexts.
- Vector & Search Systems : Operational experience with Elasticsearch and vector databases (e.g., Milvus, Pinecone, or Azure AI Search) to support AI-driven use cases.
- Azure AI & LLM Deployments : Experience provisioning and managing Azure OpenAI, Cognitive Search, and other AI workloads, including model deployment and scaling.
- Observability & Telemetry : Strong command of Prometheus, Grafana, and distributed tracing; ability to design observability frameworks that feed back into AI-driven optimization loops.
- Security by Design : Practical application of DevSecOps, vulnerability scanning, and Zero Trust principles; automation of compliance and secret management (Vault or Azure Key Vault).
Nice to Have :
- Experience with AI workflow orchestration and agent monitoring within build or deployment pipelines.
- Background in deployment automation or customer-managed installers for hybrid environments.
- Multi-cloud fluency (AWS, GCP, Azure).
- Containerization expertise with Docker and image lifecycle management.
- Familiarity with ingress controllers, API gateways, and service mesh solutions such as Istio or Linkerd.
- Strong scripting and automation skills (Bash, Python, PowerShell).
- Experience creating and managing Helm charts, Kustomize overlays, and GitOps-style repositories.
- Skilled in defining operational and quality metrics that inform continuous improvement cycles.
- Experience integrating code quality and security scanning tools (SonarQube, Trivy, Snyk) into CI/CD pipelines.
What Were Looking For :
- 10+ years of DevOps/SRE experience in both cloud and on-premise environments
- Strong background in microservices architecture
- Experience with Elasticsearch and modern AI infrastructure components
- Familiarity with vector databases (e.g., Pinecone, Milvus, Weaviate)
- Hands-on experience deploying LLMs on Azure AI or similar platforms
- Experience automating complex installation processes
- Strong problem-solving and communication skills
- Relevant certifications (e.g., CKA, AWS/Azure certifications) are a plus
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1590649
Interview Questions for you
View All