Posted on: 11/04/2026
Role Overview :
We are looking for a Senior Applied AI Engineer (Agentic Systems) to join our core team and drive the development of datasets, evaluation frameworks, and training approaches for agentic AI systems in coding environments, with a strong focus on safety, security, and real-world reliability.
This role sits at the intersection of AI evaluation, software engineering, and safety, and will play a key role in shaping how AI coding agents are trained and evaluated to operate safely in real-world development workflows.
We work closely with leading AI labs to design high-quality, curated datasets and evaluation systems that improve the robustness, correctness, and safety of frontier AI models.
Key Responsibilities :
- Own end-to-end development of datasets and evaluation frameworks for agentic coding systems
- Design real-world coding tasks and workflows that reflect how developers interact with AI systems (e.g., debugging, refactoring, multi-step problem solving)
- Identify and structure failure modes in agentic systems, including incorrect reasoning, incomplete execution, and unsafe outputs
- Develop safety-focused data and evaluation scenarios, including :
a. Misuse cases in coding agents
b. Security vulnerabilities and unsafe code generation
c. Adversarial or edge-case behaviors
- Build scalable and repeatable pipelines for dataset creation, validation, and iteration
- Collaborate with partner AI labs to align on model training and evaluation needs
- Translate real-world engineering and safety challenges into high-signal training and evaluation data
Required Experience :
- 5+ years of experience in :
- Applied ML / AI Engineering, OR
- Software / Systems Engineering with exposure to ML systems
Hands-on experience with :
- LLMs, coding agents, or AI-assisted software development tools
- Designing or working with evaluation datasets or benchmarks for AI systems
- Strong software engineering skills
- Experience working in or with AI data ops companies or contributing to LLM training/evaluation pipelines
- Strong understanding of how AI systems fail in coding and multi-step tasks
Nice to Have :
- Experience with agentic workflows or autonomous systems
- Background in AI safety, security, or adversarial testing
- Familiarity with secure coding practices and common vulnerability patterns
- Experience designing complex, multi-step evaluation tasks
- Exposure to RLHF, SFT, or large-scale data pipelines
What Success Looks Like :
- High-quality datasets that meaningfully improve coding agent performance and safety
- Clear identification and structuring of agent failure modes and risks
- Evaluation frameworks that reflect real-world developer workflows
Role Type :
- Full-time; Remote
- High ownership, high impact
- Long-term role working across evolving AI systems and safety challenges
Did you find something suspicious?