trinity.common.experience module#

Experience Class.

class trinity.common.experience.EID(batch: int | str = '', task: int | str = '', run: int = 0, step: int = 0, suffix: str = <factory>)[源代码]#

基类:object

Experience ID class to uniquely identify an experience.

To enable the full functionality of the experience grouping, user should manually set the run and step fields in custom workflows.

batch: int | str = ''#
task: int | str = ''#
run: int = 0#
step: int = 0#
suffix: str#
property uid: str#

An unique identifier for the experience.

property sid: str#

Step ID of the experience.

For example, experiences generated by all runs of a same task at the same step will have the same sid.

property rid: str#

Run ID of the experience.

For example, experiences generated by one run of a task at all steps will have the same run_id.

property tid: str#

Task ID for the experience.

For example, experiences generated by a all run of a same task in GRPO-like algorithms will have the same tid.

to_dict() dict[源代码]#

Convert the EID to a dictionary.

__init__(batch: int | str = '', task: int | str = '', run: int = 0, step: int = 0, suffix: str = <factory>) None#
class trinity.common.experience.CustomField(source_field: str, destination_field: str, data_type: dtype)[源代码]#

基类:object

Custom field for Experiences.

This is used to store additional information into the Experiences class.

source_field: str#
destination_field: str#
data_type: dtype#
__init__(source_field: str, destination_field: str, data_type: dtype) None#
class trinity.common.experience.Experience(*, eid=None, tokens, logprobs=None, reward=None, token_level_reward=None, advantages=None, returns=None, truncate_status=None, info=None, metrics=None, prompt_length=1, response_text=None, prompt_text=None, action_mask=None, messages=None, tools=None, chosen=None, rejected=None, chosen_messages=None, rejected_messages=None, multi_modal_inputs=None, teacher_logprobs=None, custom_fields=None)[源代码]#

基类:object

__init__(*, eid=None, tokens, logprobs=None, reward=None, token_level_reward=None, advantages=None, returns=None, truncate_status=None, info=None, metrics=None, prompt_length=1, response_text=None, prompt_text=None, action_mask=None, messages=None, tools=None, chosen=None, rejected=None, chosen_messages=None, rejected_messages=None, multi_modal_inputs=None, teacher_logprobs=None, custom_fields=None)[源代码]#
eid: EID#
reward: float | None = None#
token_level_reward: Tensor | None = None#
advantages: Tensor | None = None#
returns: Tensor | None = None#
info: dict#
metrics: dict[str, float]#
truncate_status: str | None = None#
prompt_length: int = 1#
response_text: str | None = None#
prompt_text: str | None = None#
messages: List[dict] | None = None#
tools: List[dict] | None = None#
chosen_messages: List[dict] | None = None#
rejected_messages: List[dict] | None = None#
multi_modal_inputs: Dict[str, Tensor] | None = None#
tokens: Tensor | None = None#
logprobs: Tensor | None = None#
action_mask: Tensor | None = None#
chosen: Tensor | None = None#
rejected: Tensor | None = None#
teacher_logprobs: Tensor | None = None#
custom_fields: List[CustomField]#
serialize() bytes[源代码]#

Serialize the experience to bytes.

classmethod deserialize(data: bytes) Experience[源代码]#
classmethod serialize_many(experiences: List[Experience]) bytes[源代码]#

Serialize a list of experiences into a compact bytes payload.

Tensor fields are packed with safetensors while non-tensor fields are packed as metadata via pickle.

classmethod deserialize_many(data: bytes) List[Experience][源代码]#

Deserialize bytes into a list of experiences.

Supports both new batched payloads and legacy single-experience pickle payloads.

to_dict() dict[源代码]#

Convert the experience to a dictionary.

trinity.common.experience.split_dpo_experience_to_single_turn(experiences: List[Experience]) List[Experience][源代码]#
trinity.common.experience.gather_token_ids(experiences, max_prompt_length: int, max_response_length: int, pad_token_id: int) Tensor[源代码]#
trinity.common.experience.gather_action_masks(experiences, max_response_length: int) Tensor[源代码]#
trinity.common.experience.gather_attention_masks(experiences, max_prompt_length: int, max_response_length: int) Tensor[源代码]#
trinity.common.experience.gather_response_attrs(experiences, attr_name: str, max_response_length: int, pad_value: int = 0) Tensor[源代码]#
trinity.common.experience.gather_multi_modal_inputs(experiences) Dict[str, Tensor][源代码]#
trinity.common.experience.group_by(experiences: List[Experience], id_type: Literal['task', 'run', 'step']) Dict[str, List[Experience]][源代码]#

Group experiences by ID.

trinity.common.experience.to_hf_datasets(experiences: list[Experience]) Dataset[源代码]#

Convert a list of Experience objects to a HuggingFace Dataset, preserving all fields.

trinity.common.experience.from_hf_datasets(dataset: Dataset) List[Experience][源代码]#

Convert a HuggingFace Dataset back to a list of Experience objects.