model_ladder¶
- class olmo_core.model_ladder.ModelLadder(*, name, dir, project=None, sizes, max_devices, device_type, model_configurator, run_configurator, data_loader, instance_sources, sequence_length=8192, tokenizer, seed=42, backend='cpu:gloo,cuda:nccl')[source]¶
Bases:
ConfigRepresents a complete model ladder of runs.
-
project:
Optional[str] = None¶ An optional project name to associate with the ladder runs. Defaults to
name. This is used by some logging backends (e.g. Weights & Biases).
-
device_type:
str¶ The type of accelerator device available to use for each run (e.g. “NVIDIA H100 80GB HBM3”).
-
model_configurator:
ModelConfigurator¶ The model configurator to use.
-
run_configurator:
RunConfigurator¶ The run configurator to use.
-
data_loader:
ComposableDataLoaderConfig¶ The data loader configuration to use for each run.
-
instance_sources:
list[InstanceSourceConfig]¶ The instance sources to use for each run.
-
tokenizer:
TokenizerConfig¶ The tokenizer to use.
- dry_run(size_spec, show_plot=True, save_plot=None)[source]¶
Do a dry-run, which prints relevant hyperparameters, the required number of devices, and a displays a plot of the learning rate schedule.
- run(size_spec, for_benchmarking=False)[source]¶
Execute a particular model run of the experiment locally and store the results.
- run_benchmark(size_spec)[source]¶
Do a bench-marking run for a model of the given size spec. This is just like
run(), but with benchmarking-specific settings (no checkpoints, no evals, hard stop).
- get_model_config(size_spec)[source]¶
Get the model config for a model of the given size spec.
- Return type:
ModelConfig
- get_num_params(size_spec)[source]¶
Get the actual number of non-embedding parameters for a model of the given size spec.
- get_num_devices(size_spec)[source]¶
Get the number of devices that would be used for a run of the given size spec.
- Return type:
- get_save_folder(size_spec)[source]¶
Get the training save folder for a run of the given size spec.
- Return type:
- get_checkpoints(size_spec, download_metrics=False, discover_all=False, alternative_dirs=None)[source]¶
Get the list of ordered checkpoints from the run for the given size spec.
- Parameters:
size_spec (
str) – The size specification for the model run.download_metrics (
bool, default:False) – IfTrue, download metrics files to local cache.discover_all (
bool, default:False) – IfTrue, discover all checkpoints that exist in the save folder rather than only checking at the intervals defined byRunConfigurator.configure_checkpoint_intervals().alternative_dirs (
Optional[list[Union[Path,PathLike,str]]], default:None) – Optional list of alternative root directories to search for checkpoints. The size_spec is appended to each directory. For each checkpoint, the primary save directory is checked first, then each alternative directory in order until found.
- Return type:
- get_metrics(size_spec, prefix=None, discover_all=False, alternative_dirs=None)[source]¶
Get the metrics from the run of the given size spec.
- Parameters:
size_spec (
str) – The size specification for the model run.prefix (
Optional[str], default:None) – If provided, only include metrics with keys starting with this prefix.discover_all (
bool, default:False) – IfTrue, discover all checkpoints that exist in the save folder rather than only checking at the intervals defined byRunConfigurator.configure_checkpoint_intervals().alternative_dirs (
Optional[list[Union[Path,PathLike,str]]], default:None) – Optional list of alternative root directories to search for checkpoints and metrics files. The size_spec is appended to each directory.
- Return type:
Optional[DataFrame]
-
project:
- class olmo_core.model_ladder.ModelConfigurator[source]¶
-
Defines how to configure a model of a particular size.
- abstract configure_model(*, size_spec, sequence_length, tokenizer, device_type)[source]¶
Configure the model for the given size spec.
- Return type:
TypeVar(M, bound=ModelConfig)
- abstract configure_rank_microbatch_size(*, size_spec, sequence_length, device_type)[source]¶
Configure the training per-device micro-batch size in tokens for a model of this size.
- Return type:
- abstract configure_minimal_device_mesh_spec(*, size_spec, sequence_length, device_type)[source]¶
Configure the minimal device mesh spec needed to train a model of this size.
- Return type:
- class olmo_core.model_ladder.RunConfigurator[source]¶
Bases:
ConfigDefines how to configure a run for a model of a particular size.
- abstract configure_target_batch_size(num_params)[source]¶
Get the target global batch size in tokens for a model of this size. The actual batch size used may be slightly different to ensure it’s a multiple of the data parallel world size times the device micro-batch size.
- Return type:
- abstract configure_duration(num_params, batch_size)[source]¶
Get the training duration for a given model and batch size.
- Return type:
- abstract configure_optimizer(num_params, batch_size)[source]¶
Get the optimizer config for a given model and batch size.
- Return type:
- abstract configure_lr_scheduler(num_params, batch_size)[source]¶
Get the learning rate scheduler for a given model and batch size.
- Return type:
- class olmo_core.model_ladder.RunCheckpointInfo(name, step, tokens, path, metrics_path, exists)[source]¶
Bases:
objectDescribes a checkpoint from a model run.
-
name:
str¶ A descriptive name for the checkpoint, assigned by the
RunConfigurator.
-
name:
- class olmo_core.model_ladder.DeviceMeshSpec(world_size: int, dp_world_size: int | None)[source]¶
Bases:
NamedTupleDescribes the relevant dimensions of a device mesh needed to train a model of a certain size.
- class olmo_core.model_ladder.WSDSChinchillaRunConfigurator(*, chinchilla_multiple, decay_fraction=0.1, tokens_per_param=20, lr_multiplier=1.0, stepped_schedule=False)[source]¶
Bases:
RunConfiguratorA run configurator that uses WSD-S learning rate scheduling and Chinchilla scaling laws.
Note
You may need to tune the
tokens_per_paramvalue to your dataset and optimizer.-
chinchilla_multiple:
float¶ How long to train each run for, expressed as a multiple of the Chinchilla-optimal duration which must be a power of 2.
-
decay_fraction:
float= 0.1¶ The duration of each decay as a fraction of the period. Must be at least 10%.
-
lr_multiplier:
float= 1.0¶ A multiplier to apply to the learning rate calculated from Chinchilla scaling laws.
-
stepped_schedule:
bool= False¶ If
True, use a stepped schedule for the peak learning rate instead of a constant one, where the peak learning rate will be scaled by1 / sqrt(D)during each stage, whereDis the target chinchilla multiple of the stage. This assumes that the base learning rate is optimal for 1xC.
- configure_target_batch_size(num_params)[source]¶
Get the target global batch size in tokens for a model of this size. The actual batch size used may be slightly different to ensure it’s a multiple of the data parallel world size times the device micro-batch size.
- Return type:
- configure_duration(num_params, batch_size)[source]¶
Get the training duration for a given model and batch size.
- Return type:
- configure_optimizer(num_params, batch_size)[source]¶
Get the optimizer config for a given model and batch size.
- Return type:
- configure_lr_scheduler(num_params, batch_size)[source]¶
Get the learning rate scheduler for a given model and batch size.
- Return type:
-
chinchilla_multiple:
- class olmo_core.model_ladder.TransformerModelConfigurator(*, rank_microbatch_size=None)[source]¶
Bases:
ModelConfigurator[TransformerConfig]Generic model configurator for transformer models.
-
rank_microbatch_size:
Optional[int] = None¶ Optional fixed rank micro-batch size. If set, this value is used directly instead of computing it based on model size and device type.
- configure_rank_microbatch_size(*, size_spec, sequence_length, device_type)[source]¶
Configure the training per-device micro-batch size in tokens for a model of this size.
- Return type:
- configure_minimal_device_mesh_spec(*, size_spec, sequence_length, device_type)[source]¶
Configure the minimal device mesh spec needed to train a model of this size.
- Return type:
-
rank_microbatch_size:
- class olmo_core.model_ladder.Olmo3ModelConfigurator(*, rank_microbatch_size=None, model_construction_kwargs=<factory>)[source]¶
Bases:
TransformerModelConfiguratorModel configurator for Olmo 3 transformer models.