trinity.algorithm.advantage_fn.clipb_advantage module#
Advantage computation for Clip_B Ref: https://arxiv.org/pdf/2602.03392
- class trinity.algorithm.advantage_fn.clipb_advantage.ClipBAdvantageFn(epsilon: float = 1e-06, mu: float = 2.5)[source]#
Bases:
AdvantageFnClip_B advantage: keep all positive-advantage tokens, one-side clip negative-advantage tokens by entropy signal.