CodeGym Logo

Generalizable End-to-End Tool-Use RL with Synthetic CodeGym

Weihua Du, Hailei Gong, Zhan Ling, Kang Liu, Lingfeng Shen, Xuesong Yao, Yufei Xu, Dingyuan Shi, Yiming Yang, Jiecao Chen
"Generalizable End-to-End Tool-Use RL with Synthetic CodeGym" (2025)

CodeGym is a synthetic environment generation framework for LLM agent reinforcement learning on multi-turn tool-use tasks. It automatically converts static code problems into interactive CodeGym environments where agents can learn to use tools to solve complex tasks in various configurations.

Key Components

We are open-sourcing the following key parts of the project:

CodeGym environment synthesis pipeline: refer to gym/README.md for details.
Server for launching CodeGym environments aimed at large-scale reinforcement learning: refer to online_server/README.md for details.

A community reproduction of the synthetic dataset is available at HuggingFace.

Overview

CodeGym Logo

CodeGym transforms traditional code problems into interactive environments where LLM agents can learn to:

Use tools and actions to solve problems step-by-step
Learn generalizable tool-use behaviors

Environment Synthesis Process

CodeGym Logo

We designed an elaborate process for CodeGym environment synthesis and verification:

Gym Synthesis:

Extract reusable code logic and functions from programming solutions
Convert them into a library of documented tools and utilities
Generate OpenAI Gym format environments with state, actions, transitions, and rewards

Gym Verification:

Generate comprehensive unit tests spanning multiple difficulty levels
Validate environment correctness (no compilation errors, timeouts, or memory issues)
Verify solvability by generating solution functions that successfully use the provided tools

Examples

The example/ folder contains sample CodeGym environments to help you get started:

example/example_envs contains some CodeGym environments examples
example/training_instance.jsonl contains some instances for RL training
example/raw_problems.jsonl contains some raw coding problems for generation pipeline demonstration

Key Result

By training in CodeGym, LLMs show stronger generalization on out-of-distribution (OOD) tool-use and multi-turn benchmarks:

CodeGym Logo

CodeGym Synthesis Pipeline

We release the pipeline for environment synthesis and verification. Please refer to gym/README.md for details.

Server for CodeGym Environments

We release a highly concurrent server for launching CodeGym environments aimed at large-scale reinforcement learning. Please refer to online_server/README.md for details.

License

This project and dataset are released under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.

Citation

If you find this work useful, please cite our paper:

hljs language-bibtex

@article{du2025generalizable,
  title={Generalizable End-to-End Tool-Use RL with Synthetic CodeGym},
  author={Du, Weihua and Gong, Hailei and Ling, Zhan and Liu, Kang and Shen, Lingfeng and Yao, Xuesong and Xu, Yufei and Shi, Dingyuan and Yang, Yiming and Chen, Jiecao},
  journal={arXiv preprint arXiv:2509.17325},
  year={2025}
}

CodeGym Logo

Generalizable End-to-End Tool-Use RL with Synthetic CodeGym

Weihua Du, Hailei Gong, Zhan Ling, Kang Liu, Lingfeng Shen, Xuesong Yao, Yufei Xu, Dingyuan Shi, Yiming Yang, Jiecao Chen
"Generalizable End-to-End Tool-Use RL with Synthetic CodeGym" (2025)

Key Components

We are open-sourcing the following key parts of the project:

CodeGym environment synthesis pipeline: refer to gym/README.md for details.
Server for launching CodeGym environments aimed at large-scale reinforcement learning: refer to online_server/README.md for details.

A community reproduction of the synthetic dataset is available at HuggingFace.

Overview

CodeGym Logo

CodeGym transforms traditional code problems into interactive environments where LLM agents can learn to:

Use tools and actions to solve problems step-by-step
Learn generalizable tool-use behaviors

Environment Synthesis Process

CodeGym Logo

We designed an elaborate process for CodeGym environment synthesis and verification:

Gym Synthesis:

Extract reusable code logic and functions from programming solutions
Convert them into a library of documented tools and utilities
Generate OpenAI Gym format environments with state, actions, transitions, and rewards

Gym Verification:

Generate comprehensive unit tests spanning multiple difficulty levels
Validate environment correctness (no compilation errors, timeouts, or memory issues)
Verify solvability by generating solution functions that successfully use the provided tools

Examples

The example/ folder contains sample CodeGym environments to help you get started:

example/example_envs contains some CodeGym environments examples
example/training_instance.jsonl contains some instances for RL training
example/raw_problems.jsonl contains some raw coding problems for generation pipeline demonstration

Key Result

By training in CodeGym, LLMs show stronger generalization on out-of-distribution (OOD) tool-use and multi-turn benchmarks:

CodeGym Logo

CodeGym Synthesis Pipeline

We release the pipeline for environment synthesis and verification. Please refer to gym/README.md for details.

Server for CodeGym Environments

We release a highly concurrent server for launching CodeGym environments aimed at large-scale reinforcement learning. Please refer to online_server/README.md for details.

License

This project and dataset are released under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.

Citation

If you find this work useful, please cite our paper:

hljs language-bibtex

@article{du2025generalizable,
  title={Generalizable End-to-End Tool-Use RL with Synthetic CodeGym},
  author={Du, Weihua and Gong, Hailei and Ling, Zhan and Liu, Kang and Shen, Lingfeng and Yao, Xuesong and Xu, Yufei and Shi, Dingyuan and Yang, Yiming and Chen, Jiecao},
  journal={arXiv preprint arXiv:2509.17325},
  year={2025}
}

CodeGym

Generalizable End-to-End Tool-Use RL with Synthetic CodeGym

Key Components

Overview

Environment Synthesis Process

Examples

Key Result

CodeGym Synthesis Pipeline

Server for CodeGym Environments

License

Citation

Similar Packages

CodeGym

Generalizable End-to-End Tool-Use RL with Synthetic CodeGym

Key Components

Overview

Environment Synthesis Process

Examples

Key Result

CodeGym Synthesis Pipeline

Server for CodeGym Environments

License

Citation

Similar Packages