trinity.buffer.operators.filters.reward_filter module#

class trinity.buffer.operators.filters.reward_filter.RewardFilter(threshold: float = 0.0)[源代码]#

基类:ExperienceOperator

Filter experiences based on the reward value.

Note: This filter assumes that the reward is already calculated and stored in the Experience object.

__init__(threshold: float = 0.0)[源代码]#
process(exps: List[Experience]) Tuple[List[Experience], dict][源代码]#

Filter experiences based on reward value.

class trinity.buffer.operators.filters.reward_filter.RewardSTDFilter(threshold: float = 0.0)[源代码]#

基类:ExperienceOperator

Filter experiences based on the standard deviation of rewards within each group.

Note: This filter assumes that the reward is already calculated and stored in the Experience object.

__init__(threshold: float = 0.0)[源代码]#
process(exps: List[Experience]) Tuple[List[Experience], dict][源代码]#

Filter experiences based on reward std.

class trinity.buffer.operators.filters.reward_filter.InvalidRewardFilter[源代码]#

基类:ExperienceOperator

Filters out experiences with invalid reward values.

Note: This operator assumes that rewards are already computed and stored in the Experience object.Any experience with a missing (None) or invalid (NaN) reward is removed to prevent low-quality data from entering the training pipeline.

process(exps: List[Experience]) Tuple[List[Experience], dict][源代码]#

Process a list of experiences and return a transformed list.

参数:

exps (List[Experience]) -- List of experiences to process, which contains all experiences generated by the Explorer in one explore step.

返回:

A tuple containing the processed list of experiences and a dictionary of metrics.

返回类型:

Tuple[List[Experience], Dict]