Role Overview :

We are looking for a Senior Applied AI Engineer (Agentic Systems) to join our core team and drive the development of datasets, evaluation frameworks, and training approaches for agentic AI systems in coding environments, with a strong focus on safety, security, and real-world reliability.

This role sits at the intersection of AI evaluation, software engineering, and safety, and will play a key role in shaping how AI coding agents are trained and evaluated to operate safely in real-world development workflows.

We work closely with leading AI labs to design high-quality, curated datasets and evaluation systems that improve the robustness, correctness, and safety of frontier AI models.

Key Responsibilities :

- Own end-to-end development of datasets and evaluation frameworks for agentic coding systems

- Design real-world coding tasks and workflows that reflect how developers interact with AI systems (e.g., debugging, refactoring, multi-step problem solving)

- Identify and structure failure modes in agentic systems, including incorrect reasoning, incomplete execution, and unsafe outputs

- Develop safety-focused data and evaluation scenarios, including :

a. Misuse cases in coding agents

b. Security vulnerabilities and unsafe code generation

c. Adversarial or edge-case behaviors

- Build scalable and repeatable pipelines for dataset creation, validation, and iteration

- Collaborate with partner AI labs to align on model training and evaluation needs

- Translate real-world engineering and safety challenges into high-signal training and evaluation data

Required Experience :

- 5+ years of experience in :

- Applied ML / AI Engineering, OR

- Software / Systems Engineering with exposure to ML systems

Hands-on experience with :

- LLMs, coding agents, or AI-assisted software development tools

- Designing or working with evaluation datasets or benchmarks for AI systems

- Strong software engineering skills

- Experience working in or with AI data ops companies or contributing to LLM training/evaluation pipelines

- Strong understanding of how AI systems fail in coding and multi-step tasks

Nice to Have :

- Experience with agentic workflows or autonomous systems

- Background in AI safety, security, or adversarial testing

- Familiarity with secure coding practices and common vulnerability patterns

- Experience designing complex, multi-step evaluation tasks

- Exposure to RLHF, SFT, or large-scale data pipelines

What Success Looks Like :

- High-quality datasets that meaningfully improve coding agent performance and safety

- Clear identification and structuring of agent failure modes and risks

- Evaluation frameworks that reflect real-world developer workflows

Role Type :

- Full-time; Remote

- High ownership, high impact

- Long-term role working across evolving AI systems and safety challenges