# 样例总览

> 从数据集视角出发

该文档从数据集视角提供了一个样例列表，用户可以轻松了解哪些数据集已经在样例中覆盖和支持了。

| 数据集                                                                                                          | 算法              | 使用场景                                                    | 参考文档                                                                                                                                                                                                                                                    |
|--------------------------------------------------------------------------------------------------------------|-----------------|---------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [openai/gsm8k](https://huggingface.co/datasets/openai/gsm8k)                                                 | GRPO            | 常规 RFT                                                  | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_gsm8k), [相关文档](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_reasoning_basic.html)                                                                       |
|                                                                                                              | GRPO            | 异步训练                                                    | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/async_gsm8k), [相关文档](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_async_mode.html)                                                                           |
|                                                                                                              | Multi-Step GRPO | AgentScope ReAct 智能体训练                                  | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/agentscope_react), [相关文档](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_react.html)                                                                           |
|                                                                                                              | AsymRE          | 常规 RFT                                                  | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/asymre_gsm8k)                                                                                                                                                                     |
|                                                                                                              | CISPO           | 常规 RFT                                                  | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/cispo_gsm8k)                                                                                                                                                                      |
|                                                                                                              | GRPO            | 使用优先级任务进行训练                                             | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_gsm8k_task_pipeline), [相关文档](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html#example-data-processor-for-task-pipeline)                                                          |
|                                                                                                              | GRPO            | 在经验上进行奖励重塑的训练                                           | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_gsm8k_experience_pipeline), [相关文档](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_data_functionalities.html#example-data-processor-for-experience-pipeline) |
|                                                                                                              | GRPO            | 使用 RULER (Relative Universal LLM-Elicited Rewards) 进行训练 | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler)                                                                                                                                                                 |
|                                                                                                              | GRPO            | 训练策略模型作为其自身的奖励模型                                        | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler)                                                                                                                                                       |
|                                                                                                              | GRPO            | 使用 LoRA 进行训练                                            | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_lora_gsm8k)                                                                                                                                                                  |
|                                                                                                              | OPMD            | 异策略 RFT                                                 | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/opmd_gsm8k), [相关文档](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_reasoning_advanced.html)                                                                    |
|                                                                                                              | REC             | 使用组相对强化变体进行训练                                           | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/rec_gsm8k)                                                                                                                                                                        |
|                                                                                                              | sPPO            | 使用 sPPO 算法进行训练                                          | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/sppo_gsm8k)                                                                                                                                                                       |
|                                                                                                              | TOPR            | 渐减式异策略 RFT                                              | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/topr_gsm8k)                                                                                                                                                                       |
| 数学类型任务                                                                                                       | GRPO            | 使用 RM-Gallery 的奖励进行训练                                   | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_math)                                                                                                                                                                        |
|                                                                                                              | AsymRE          | 常规 RFT                                                  | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/asymre_math)                                                                                                                                                                      |
|                                                                                                              | MIX             | 使用更先进大模型生成的“专家”数据进行训练                                   | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/mix_math), [相关文档](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_mix_algo.html)                                                                                |
| [ALFWorld](https://github.com/alfworld/alfworld)                                                             | GRPO            | 拼接多轮 RFT                                                | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_alfworld), [相关文档](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_multi_turn.html)                                                                         |
|                                                                                                              | Multi-Step GRPO | 通用多轮 RFT                                                | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_alfworld_general_multi_step), [相关文档](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_step_wise.html)                                                       |
| [SciWorld](https://github.com/allenai/ScienceWorld)                                                          | GRPO            | 拼接多轮 RFT                                                | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_sciworld)                                                                                                                                                                    |
| [WebShop](https://github.com/princeton-nlp/WebShop)                                                          | GRPO            | 拼接多轮 RFT                                                | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_webshop), [相关文档](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_multi_turn.html)                                                                          |
| [callanwu/WebWalkerQA](https://huggingface.co/datasets/callanwu/WebWalkerQA)                                 | Multi-Step GRPO | 多轮网页搜索智能体训练                                             | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/agentscope_websearch)                                                                                                                                                             |
| [corbt/enron-emails](https://huggingface.co/datasets/corbt/enron-emails)                                     | Multi-Step GRPO | 多轮邮件搜索智能体训练                                             | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_email_search), [相关文档](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_search_email.html)                                                                   |
| [open-r1/DAPO-Math-17k-Processed](https://huggingface.co/datasets/open-r1/DAPO-Math-17k-Processed)           | GRPO            | 常规 RFT                                                  | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/dapo_math)                                                                                                                                                                        |
| [LLM360/guru-RL-92k](https://huggingface.co/datasets/LLM360/guru-RL-92k)                                     | GRPO            | 使用贝叶斯在线任务选择进行训练                                         | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/bots)                                                                                                                                                                             |
| [Frozen Lake](https://gymnasium.farama.org/environments/toy_text/frozen_lake/)                               | GRPO            | 拼接多轮 RFT                                                | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_frozen_lake)                                                                                                                                                                 |
| [anisha2102/RaR-Medicine](https://huggingface.co/datasets/anisha2102/RaR-Medicine)                           | GRPO            | 针对不可验证医学问答任务，使用大模型裁判和评分标准提供奖励进行训练                       | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward)                                                                                                                                                            |
| [Team-ACE/ToolACE](https://huggingface.co/datasets/Team-ACE/ToolACE)                                         | GRPO            | 针对工具调用的常规 RFT                                           | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_toolcall)                                                                                                                                                                    |
| [hiyouga/geometry3k](https://huggingface.co/datasets/hiyouga/geometry3k)                                     | GRPO            | 针对视觉语言模型的常规 RFT                                         | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_vlm)                                                                                                                                                                         |
|                                                                                                              | MIX             | 使用更先进大模型生成的“专家”数据进行训练                                   | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/mix_vlm)                                                                                                                                                                          |
| [datajuicer/RealMedConv](https://huggingface.co/datasets/datajuicer/RealMedConv)                             | GRPO            | 学习主动提问的常规 RFT                                           | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask)                                                                                                                                                                     |
| [datajuicer/Trinity-ToolAce-RL-split](https://huggingface.co/datasets/datajuicer/Trinity-ToolAce-RL-split)   | CHORD           | 动态 SFT 与 RL 联合训练                                        | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/mix_chord)                                                                                                                                                                        |
| [datajuicer/Trinity-ToolAce-SFT-split](https://huggingface.co/datasets/datajuicer/Trinity-ToolAce-SFT-split) | CHORD           | 动态 SFT 与 RL 联合训练                                        | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/mix_chord)                                                                                                                                                                        |
| [Jiayi-Pan/Countdown-Tasks-3to4](https://huggingface.co/datasets/Jiayi-Pan/Countdown-Tasks-3to4)             | PPO             | 基于 critic 模型的训练                                         | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/ppo_countdown)                                                                                                                                                                    |
|                                                                                                              | PPO             | 使用 Megatron-LM 作为训练后端                                   | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/ppo_countdown_megatron)                                                                                                                                                           |
|                                                                                                              | PPO             | 使用经验回放进行训练                                              | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)                                                                                                                                                         |
| [open-r1/Mixture-of-Thoughts](https://huggingface.co/datasets/open-r1/Mixture-of-Thoughts)                   | SFT             | 常规 SFT                                                  | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/sft_mot), [相关文档](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_dpo.html#configuration-for-sft)                                                                |
| [HumanLLMs/Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)         | DPO             | 基于预设人类偏好的训练                                             | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/dpo_humanlike), [相关文档](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_dpo.html)                                                                                |
| 示例数据                                                                                                         | DPO             | 基于训练环路中人类实时偏好标注的训练                                      | [样例位置](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/dpo_human_in_the_loop), [相关文档](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_data_functionalities.html#example-human-in-the-loop)                             |
