trinity.trainer.verl.utils module#

Utils for ccompatibility issues with verl.

trinity.trainer.verl.utils.to_data_proto(experiences: List[Experience], pad_token_id: int, model: PreTrainedModel, logger: Logger) DataProto[源代码]#

Convert List[Experience] to verl DataProto.

trinity.trainer.verl.utils.compute_data_metrics(batch: DataProto) dict[源代码]#

Computes various metrics from a batch of data for PPO training. Modified from verl.trainer.ppo.metric_utils.compute_data_metrics

This function calculates metrics related to scores, rewards, advantages, returns, values, and sequence lengths from a batch of data. It provides statistical information (mean, max, min) for each metric category.

参数:

batch -- A DataProto object containing batch data with token-level scores, rewards, advantages, etc.

返回:

  • critic/score/mean, max, min: Statistics about sequence scores

  • critic/rewards/mean, max, min: Statistics about sequence rewards

  • critic/advantages/mean, max, min: Statistics about advantages

  • critic/returns/mean, max, min: Statistics about returns

  • critic/values/mean, max, min: Statistics about critic values

  • critic/vf_explained_var: Explained variance of the value function

  • response_length/mean, max, min, clip_ratio: Statistics about response lengths

  • prompt_length/mean, max, min, clip_ratio: Statistics about prompt lengths

返回类型:

A dictionary of metrics including

trinity.trainer.verl.utils.get_latest_hf_checkpoint_path(config: Config)[源代码]#

Get the latest huggingface checkpoint path

trinity.trainer.verl.utils.apply_fsdp2(model, fsdp_kwargs, config)[源代码]#

model: AutoModelForCausalLM

trinity.trainer.verl.utils.rearrange_micro_batches(batch, max_token_len, dp_group=None, num_batches_divided_by=None, same_micro_num_in_dp=True, min_num_micro_batch=None, use_dynamic_bsz_balance=True)[源代码]#

Split a batch into micro-batches by total token count, with optional DP sync and padding.

参数:
  • batch (TensorDict) -- must include "attention_mask" (B*S); other fields are sliced similarly.

  • max_token_len (int) -- max sum of attention_mask per micro-batch.

  • dp_group (optional) -- torch.distributed group for data-parallel sync.

  • num_batches_divided_by (optional) -- virtual pipeline parallel size, for megatron.

  • same_micro_num_in_dp (bool) -- if True and dp_group set, pad all ranks to the same count.

  • min_num_micro_batch (int, optional) -- force at least this many splits (pads empty ones).

  • use_dynamic_bsz_balance (bool, optional) -- balance the computational workload between micro-batches

返回:

the micro-batches. List[List[int]]: index lists mapping each micro-batch back to original positions.

返回类型:

List[TensorDict]

trinity.trainer.verl.utils.patch_rope_theta_in_hf_config(hf_config)[源代码]#