Description :

Job Title : ML Engineer RL Environments

About the Role :

We are looking for a highly autonomous Machine Learning Engineer who can design and implement SWE-Bench-style RL environments and generate a continuous stream of evaluation tasks for LLMs and agentic systems.

This role is heavily engineering-focused and requires strong experience building custom environments, workflows, and code-based tasks.

You will work directly with a senior researcher but execute independently.

Responsibilities :

- Build custom RL environments inspired by SWE-Bench, code-debugging tasks, unit-test-driven workflows, and agent evaluation tasks.

- Create large volumes of structured tasks, including :

- Code reasoning tasks

- Multi-step workflows

- Debugging challenges

- Reward-driven evaluation episodes

- Define state/action/reward formats for each environment.

- Implement task infrastructure in Python

- Produce JSON schemas, templates, and reproducible task scripts.

- Build testing harnesses to validate correctness of tasks.

- Work closely with a researcher to align on quality, difficulty, and output structure.

- Stay current with LLM evaluation and agentic frameworks.

Required Skills :

- 4 to 5+ years of ML Engineering experience (title must reflect ML Engineer / RL Engineer / ML Research Engineer).

- Strong Python engineering background building production-ready code and modular libraries.

- Experience with RL environment creation (Gym, Gymnasium, custom RL tasks).

- Experience with SWE-Bench, code evaluation, repo-based tasks, or similar systems is a major advantage.

- Strong understanding of reward shaping, episode design, and environment logic.

- Hands-on ML experience (PyTorch, TensorFlow, HF).

- Ability to independently generate tasks without supervision.

- Strong familiarity with LLMs and evaluation frameworks.

Nice to Have

- Prior work with LLM agent frameworks.

- Experience building debugging/patching tasks.

- Research engineering experience.

What success looks like

- You can independently produce new RL tasks daily.

- You write clean, reusable environment code.

- You understand how LLMs fail and design tasks to measure that.

- You need minimal oversight.