trinity.algorithm.advantage_fn.clipv_advantage module

trinity.algorithm.advantage_fn.clipv_advantage module#

GRPO advantage computation with Clip_V token filtering.

class trinity.algorithm.advantage_fn.clipv_advantage.ClipVAdvantageFn(epsilon: float = 1e-06, mu: float = 2.0, max_frac: float = 0.0001)[source]#

Bases: AdvantageFn

Clip_V advantage: one-side clip only negative-advantage tokens, and cap the global clipped-token ratio.

__init__(epsilon: float = 1e-06, mu: float = 2.0, max_frac: float = 0.0001) None[source]#
classmethod default_args() Dict[source]#
Returns:

The default init arguments for the advantage function.

Return type:

Dict