trinity.trainer.verl_legacy.utils module#
Utils for ccompatibility issues with verl.
- trinity.trainer.verl_legacy.utils.rank0_iterator(per_tensor_param: Iterable[tuple[str, Tensor]]) Iterator[tuple[str, Tensor]][source]#
Advance a distributed tensor iterator on every rank, yielding only on rank 0.
- trinity.trainer.verl_legacy.utils.iter_fsdp_per_tensor_param(*, model: Module, named_modules: Iterable[tuple[str, FullyShardedDataParallel]], strategy: str) Iterator[tuple[str, Tensor]][source]#
Yield full FSDP/FSDP2 parameters in the same order used for cached metadata.
- trinity.trainer.verl_legacy.utils.save_rank0_safetensors(*, per_tensor_param: Iterable[tuple[str, Tensor]], filepath: str, state_dict_meta) None[source]#
Save a distributed per-tensor iterator to safetensors on rank 0.
- trinity.trainer.verl_legacy.utils.to_data_proto(experiences: List[Experience], pad_token_id: int, model: PreTrainedModel, logger: Logger) DataProto[source]#
Convert List[Experience] to verl DataProto.
- trinity.trainer.verl_legacy.utils.compute_data_metrics(batch: DataProto) dict[source]#
Computes various metrics from a batch of data for PPO training. Modified from verl.trainer.ppo.metric_utils.compute_data_metrics
This function calculates metrics related to scores, rewards, advantages, returns, values, and sequence lengths from a batch of data. It provides statistical information (mean, max, min) for each metric category.
- Parameters:
batch – A DataProto object containing batch data with token-level scores, rewards, advantages, etc.
- Returns:
critic/score/mean, max, min: Statistics about sequence scores
critic/rewards/mean, max, min: Statistics about sequence rewards
critic/advantages/mean, max, min: Statistics about advantages
critic/returns/mean, max, min: Statistics about returns
critic/values/mean, max, min: Statistics about critic values
critic/vf_explained_var: Explained variance of the value function
response_length/mean, max, min, clip_ratio: Statistics about response lengths
prompt_length/mean, max, min, clip_ratio: Statistics about prompt lengths
- Return type:
A dictionary of metrics including
- trinity.trainer.verl_legacy.utils.apply_fsdp2(model, fsdp_kwargs, config)[source]#
model: AutoModelForCausalLM
- trinity.trainer.verl_legacy.utils.rearrange_micro_batches(batch, max_token_len, dp_group=None, num_batches_divided_by=None, same_micro_num_in_dp=True, min_num_micro_batch=None, use_dynamic_bsz_balance=True)[source]#
Split a batch into micro-batches by total token count, with optional DP sync and padding.
- Parameters:
batch (TensorDict) – must include “attention_mask” (B*S); other fields are sliced similarly.
max_token_len (int) – max sum of attention_mask per micro-batch.
dp_group (optional) – torch.distributed group for data-parallel sync.
num_batches_divided_by (optional) – virtual pipeline parallel size, for megatron.
same_micro_num_in_dp (bool) – if True and dp_group set, pad all ranks to the same count.
min_num_micro_batch (int, optional) – force at least this many splits (pads empty ones).
use_dynamic_bsz_balance (bool, optional) – balance the computational workload between micro-batches
- Returns:
the micro-batches. List[List[int]]: index lists mapping each micro-batch back to original positions.
- Return type:
List[TensorDict]