trinity.algorithm.advantage_fn.reinforce_plus_plus_advantage module

trinity.algorithm.advantage_fn.reinforce_plus_plus_advantage module#

REINFORCE++ advantage computation

Ref: volcengine/verl

class trinity.algorithm.advantage_fn.reinforce_plus_plus_advantage.REINFORCEPLUSPLUSAdvantageFn(gamma: float = 1.0)[source]#

Bases: AdvantageFn

__init__(gamma: float = 1.0) None[source]#
classmethod default_args() Dict[source]#
Returns:

The default init arguments for the advantage function.

Return type:

Dict