Posted on: 15/04/2026
Description :
We are looking for a Principal AI/ML Engineer to lead the design and development of our next-generation enterprise AI platform.
In this role, you will drive key technical decisions and take ownership of critical systems that leverage advanced agentic frameworks and large language models (LLMs) to enable intelligent, multi-step reasoning workflows.
Role & responsibilities :
- Architect Core Agentic Systems : Lead the design of modular, high-throughput components for a large-scale AI platform, transitioning from simple API wrappers to stateful, autonomous systems.
- Scale Cloud-Native Infrastructure : Build and optimize high-concurrency backend services and REST/gRPC APIs within a distributed, containerized environment (e.g., Kubernetes, Docker) to support sub-second model orchestration.
- Engineer Advanced Reasoning : Design and implement sophisticated agent workflows, specifically focusing on multi-agent collaboration, dynamic tool orchestration, and ReAct/Chain-of-Thought reasoning loops.
- Productionize ML & RAG : Integrate LLMs into production via high-performance pipelines, utilizing Vector Databases (like Pinecone, Milvus, or Weaviate) and advanced RAG techniques such as hybrid search and query expansion.
- Establish LLM Excellence : Define organizational best practices for prompt engineering, model fine-tuning, and performance benchmarking, ensuring a balance between cost-efficiency and output quality.
- Optimize System Reliability : Partner with DevOps and Infrastructure teams to manage cloud architecture, focusing on LLM observability, automated scaling, and 99.9% system uptime.
- Develop AI Monitoring & Debugging : Implement advanced tracing and logging strategies for non-deterministic systems, using tools like LangSmith or Arize Phoenix to debug reasoning failures and "hallucination" triggers.
- Secure Agentic Operations : Ensure robust system behavior by implementing security guardrails, PII masking, and RBAC (Role-Based Access Control) to prevent prompt injection and data leakage.
- Lead & Mentor : Serve as a technical bridge between product stakeholders and the engineering team, providing deep-dive code reviews and mentoring junior engineers in the rapidly shifting AI landscape.
Preferred candidate profile :
- Python Expertise : 8+ years of software engineering experience with strong proficiency in Python, including async programming, testing, and packaging
- LLM / Agentic Systems Development : Experience building LLM-powered applications using frameworks such as LangChain, LangGraph, or equivalent abstractions
- RAG & Retrieval Systems : Experience with retrieval-augmented generation, vector databases (e.g., PGVector), and embedding pipelines
- LLM Integration : Hands-on experience integrating and operating systems using Azure OpenAI or OpenAI APIs
- Backend & API Design : Experience designing scalable REST APIs and microservices architectures
- Cloud & Infrastructure (AWS) : Strong experience designing and deploying systems using AWS services (Lambda, API Gateway, S3, ECR, Secrets Manager)
- Data & Persistence : Experience with PostgreSQL
- Containerization : Proficiency with Docker and container-based deployment workflows
Nice to Have Skills :
- Applied ML : Experience with applied ML beyond LLMs (e.g., forecasting, classification, etc.)
- Agent Protocols : Experience with Model Context Protocol (MCP) or similar agent communication standards
- Security & Code Quality : Familiarity with tools such as SonarQube, Snyk, or similar security and code analysis platforms
Did you find something suspicious?