trinity.algorithm.advantage_fn.clipv_advantage module#
GRPO advantage computation with Clip_V token filtering.
- class trinity.algorithm.advantage_fn.clipv_advantage.ClipVAdvantageFn(epsilon: float = 1e-06, mu: float = 2.0, max_frac: float = 0.0001)[source]#
Bases:
AdvantageFnClip_V advantage: one-side clip only negative-advantage tokens, and cap the global clipped-token ratio.