trinity.trainer.verl.monkey_patch module

trinity.trainer.verl.monkey_patch module#

trinity.trainer.verl.monkey_patch.patch_fused_kernels(fused_kernels_backend: str)[源代码]#

Fix VLM sequence parallelism bug with optimized backend in veRL.

trinity.trainer.verl.monkey_patch.patch_forward_with_backends(model: PreTrainedModel, use_fused_kernels: bool = False, fused_kernels_backend: str | None = None) None[源代码]#

Monkey-patch the model's forward method with optimized backend implementations.

参数:
  • model -- The model to patch.

  • use_fused_kernels -- Whether to enable fused kernels.

  • fused_kernels_backend -- The backend to use ('triton' or 'torch').

trinity.trainer.verl.monkey_patch.apply_monkey_patch(model: PreTrainedModel, ulysses_sp_size: int = 1, use_remove_padding: bool = True, use_fused_kernels: bool = False, fused_kernels_backend: str = None, use_prefix_grouper: bool = False, use_tiled_mlp: bool = False, tiled_mlp_shards: int = 4)[源代码]#

Apply monkey patch to the models for ulysses sequence parallel, fused kernel, prefix grouper, and tiled MLP.

In the end of this function forward function of the model is patched for fused kernel. If the model is not supported with fused kernel, please return after patch.

参数:
  • model -- The model to apply the monkey patch.

  • ulysses_sp_size -- The size of ulysses sequence parallel.

  • use_remove_padding -- Whether to use remove padding.

  • use_fused_kernels -- Whether to use fused kernels.

  • fused_kernels_backend -- The backend to use for fused kernels.

  • use_tiled_mlp -- Whether to use TiledMLP for memory-efficient MLP computation.

  • tiled_mlp_shards -- Number of shards for TiledMLP (higher = lower memory, slightly slower).