trinity.algorithm.advantage_fn.clipb_advantage module

trinity.algorithm.advantage_fn.clipb_advantage module#

Advantage computation for Clip_B Ref: https://arxiv.org/pdf/2602.03392

class trinity.algorithm.advantage_fn.clipb_advantage.ClipBAdvantageFn(epsilon: float = 1e-06, mu: float = 2.5)[source]#

Bases: AdvantageFn

Clip_B advantage: keep all positive-advantage tokens, one-side clip negative-advantage tokens by entropy signal.

__init__(epsilon: float = 1e-06, mu: float = 2.5) None[source]#
classmethod default_args() Dict[source]#
Returns:

The default init arguments for the advantage function.

Return type:

Dict