trinity.algorithm.advantage_fn.reinforce_plus_plus_advantage module#
REINFORCE++ advantage computation
Ref: volcengine/verl
- class trinity.algorithm.advantage_fn.reinforce_plus_plus_advantage.REINFORCEPLUSPLUSAdvantageFn(gamma: float = 1.0)[source]#
Bases:
AdvantageFn