trinity.algorithm.advantage_fn.rec_advantage module#
REC advantage computation
- class trinity.algorithm.advantage_fn.rec_advantage.RECGroupedAdvantage(epsilon: float = 1e-06, std_normalize: bool | None = False, drop: str | None = None)[source]#
Bases:
GroupAdvantageAn advantage class that calculates REC advantages.
- __init__(epsilon: float = 1e-06, std_normalize: bool | None = False, drop: str | None = None) None[source]#
Initialize the REC advantage function.
- Parameters:
epsilon (float) – A small value to avoid division by zero.
std_normalize (Optional[bool]) – If provided, normalize the advantage with group-level reward standard deviation.
drop (Optional[str]) – Strategy to drop experiences. Options are “balance” or None.
- group_experiences(exps)[source]#
Group experiences by a certain criterion.
- Parameters:
exps (List[Experience]) – List of experiences to be grouped.
- Returns:
A dictionary where keys are group identifiers and values are lists of experiences.
- Return type:
Dict[str, List[Experience]]
- calculate_group_advantage(group_id: str, exps: List[Experience]) Tuple[List[Experience], Dict][source]#
Calculate advantages for a group of experiences.
- Parameters:
group_id (str) – The identifier for the group of experiences.
exps (List[Experience]) – List of experiences in the group.
- Returns:
A tuple containing the modified list of experiences and a dictionary of metrics.
- Return type:
List[Experience]