trinity.common.config_validator module#

class trinity.common.config_validator.ConfigValidator[source]#

Bases: ABC

Abstract base class for configuration validators.

Each validator is responsible for checking and potentially modifying specific aspects of the global configuration to ensure validity, set defaults, or handle deprecated settings.

__init__()[source]#
abstractmethod validate(config: Config) None[source]#

Validate and potentially modify the given configuration.

Parameters:

config – The global configuration object to validate and modify.

class trinity.common.config_validator.DeprecatedConfigValidator[source]#

Bases: ConfigValidator

Validator for handling deprecated configuration options.

Issues warnings when deprecated configuration parameters are used and suggests their replacements.

validate(config: Config) None[source]#

Check for deprecated configuration options and issue warnings.

Specifically checks for the deprecated explorer.runner_num parameter and recommends using explorer.runner_per_model instead.

Parameters:

config – The global configuration object to validate.

class trinity.common.config_validator.GlobalConfigValidator[source]#

Bases: ConfigValidator

Validator for global configuration settings.

Handles validation of the main operating mode, sets up checkpoint directories, and configures logging paths. Manages experiment naming conflicts by appending timestamps to avoid overwriting existing experiments.

validate(config: Config) None[source]#

Validate global configuration settings and set up directory structure.

  • Validates that the mode is one of the supported values

  • Creates absolute checkpoint paths and handles experiment naming conflicts

  • Sets up the log directory path

Parameters:

config – The global configuration object to validate.

Raises:

ValueError – If an invalid mode is specified.

class trinity.common.config_validator.RayClusterConfigValidator[source]#

Bases: ConfigValidator

Validator for Ray cluster configuration.

Handles Ray cluster setup including namespace configuration, automatic detection of cluster resources (node count and GPUs per node), and GPU allocation validation based on the current operating mode and model requirements.

validate(config: Config) None[source]#

Validate and configure Ray cluster settings.

  • Sets the Ray namespace if not provided

  • Skips validation if Tinker is enabled

  • Automatically detects cluster information if not provided

  • Validates GPU allocation based on mode and model requirements

Parameters:

config – The global configuration object to validate.

Raises:
  • RuntimeError – If no alive nodes are found in the Ray cluster.

  • ValueError – If GPU allocation requirements cannot be satisfied.

class trinity.common.config_validator.AlgorithmConfigValidator[source]#

Bases: ConfigValidator

Validator for algorithm-specific configuration.

Handles algorithm type validation, sets default configuration parameters, validates function registry entries, and manages deprecated optimizer settings.

validate(config: Config) None[source]#

Validate and configure algorithm-specific settings.

  • Validates the algorithm type and runs algorithm-specific validation

  • Sets default configuration values for various algorithm components

  • Validates and configures function registry entries (loss functions, etc.)

  • Handles deprecated optimizer configuration parameters

Parameters:

config – The global configuration object to validate.

Raises:

ValueError – If invalid algorithm types or function names are specified.

class trinity.common.config_validator.ModelConfigValidator[source]#

Bases: ConfigValidator

Validator for model configuration settings.

Handles model path validation, chat template loading, Tinker-specific validation, and model length parameter validation including prompt/response token limits.

validate(config: Config) None[source]#

Validate and configure model-specific settings.

  • Sets critic model path to actor model path if not specified

  • Loads chat templates from file if path is provided

  • Validates Tinker-specific configuration if enabled

  • Validates and sets model length parameters (max_model_len, max_prompt_tokens, etc.)

Parameters:

config – The global configuration object to validate.

Raises:

ValueError – If chat template file cannot be read, model length constraints are violated, or Tinker configuration is invalid.

class trinity.common.config_validator.ExplorerConfigValidator[source]#

Bases: ConfigValidator

Validator for explorer configuration settings.

Handles rollout model configuration inheritance, auxiliary model validation, over-rollout ratio validation, and LoRA configuration processing.

validate(config: Config) None[source]#

Validate and configure explorer-specific settings.

  • Inherits model configuration from the global model config to rollout models

  • Validates auxiliary model configurations

  • Validates over-rollout ratio settings and compatibility with sync style

  • Processes LoRA configurations including dummy LoRA creation

Parameters:

config – The global configuration object to validate.

Raises:

ValueError – If auxiliary models lack model paths, over-rollout ratio is invalid, or multiple LoRA adapters are configured.

class trinity.common.config_validator.SynchronizerConfigValidator[source]#

Bases: ConfigValidator

Validator for synchronizer configuration settings.

Handles synchronizer namespace configuration and validates NCCL synchronization compatibility with different modes and features.

validate(config: Config) None[source]#

Validate and configure synchronizer settings.

  • Sets the Ray namespace for the synchronizer

  • Sets the explorer world size based on rollout GPU count

  • Disables NCCL synchronization for incompatible modes and features

Parameters:

config – The global configuration object to validate.

class trinity.common.config_validator.IntervalConfigValidator[source]#

Bases: ConfigValidator

Validator for interval configuration settings.

Validates synchronization and evaluation intervals, ensuring that evaluation intervals are multiples of synchronization intervals when applicable.

validate(config: Config) None[source]#

Validate interval configuration settings.

  • Ensures synchronization interval is positive

  • Adjusts evaluation interval to be a multiple of sync interval when needed

Parameters:

config – The global configuration object to validate.

Raises:

AssertionError – If synchronization interval is not positive.

class trinity.common.config_validator.MonitorConfigValidator[source]#

Bases: ConfigValidator

Validator for monitor configuration settings.

Validates monitor type, sets default arguments, and configures monitor cache directory.

validate(config: Config) None[source]#

Validate and configure monitor settings.

  • Validates that the monitor type is supported

  • Sets default monitor arguments if not provided

  • Creates the monitor cache directory

Parameters:

config – The global configuration object to validate.

Raises:

ValueError – If an invalid monitor type is specified.

class trinity.common.config_validator.BufferConfigValidator[source]#

Bases: ConfigValidator

Validator for buffer configuration settings.

Handles train batch size validation, buffer directory setup, tokenizer configuration, and comprehensive validation of explorer/trainer input configurations including tasksets, experience buffers, and data pipelines.

validate(config: Config) None[source]#

Validate and configure buffer settings.

  • Sets train batch size based on mode and algorithm configuration

  • Creates buffer cache directory

  • Configures pad token ID using tokenizer

  • Validates explorer input configurations (tasksets, selectors)

  • Validates trainer input configurations (experience buffers, auxiliary buffers)

  • Validates data processor pipeline configurations

Parameters:

config – The global configuration object to validate.

Raises:
  • ValueError – If required buffer configurations are missing or invalid.

  • RuntimeError – If buffer directory creation fails.

class trinity.common.config_validator.TrainerConfigValidator[source]#

Bases: ConfigValidator

Validator for trainer configuration settings.

Handles trainer type validation, configuration merging, and parameter validation for different trainer implementations (veRL, Tinker, etc.).

validate(config: Config) None[source]#

Validate and configure trainer settings.

  • Validates trainer type and handles configuration for different trainer types

  • Merges trainer configuration with schema defaults

  • Validates save checkpoint strategy options

  • Synchronizes trainer configuration with global config

Parameters:

config – The global configuration object to validate.

Raises:

ValueError – If trainer type is invalid, deprecated config path is used, or save checkpoint strategy is invalid.

class trinity.common.config_validator.GPUMemoryValidator[source]#

Bases: ConfigValidator

Validator for GPU memory settings.

Checks GPU memory usage and suggests changes to configuration settings.

Note

  1. This validator is disabled when ignore_validator_suggestions is set to True.

  2. The coefficients of the following formulas are roughly estimated using the torch.profile tool and may not be accurate.

validate(config: Config) None[source]#

Validate GPU memory usage based on the provided configuration.

Skips validation if suggestions are disabled or if model tinker mode is enabled. Only runs memory validation for β€˜train’ or β€˜both’ modes.

Parameters:

config (Config) – The global configuration object.

validate_trainer_memory_usage(config: Config) None[source]#

Perform GPU memory validation for trainer components.

Detects CUDA availability and delegates to FSDP-specific checks if applicable.

Parameters:

config (Config) – The global configuration object.

fsdp_memory_check(config: Config) None[source]#

Perform comprehensive FSDP memory validation for actor and critic models.

Estimates total GPU memory usage including parameters, optimizer states, and activations. Issues warnings and suggestions if usage exceeds safe thresholds.

Parameters:

config (Config) – The global configuration object.

Raises:

ValueError – If estimated memory usage exceeds safe limits and suggestions are not bypassed.