IntroductionΒΆ
OLMo-core represents a major rewrite of the original training and modeling code from OLMo with a focus on performance and API stability. It aims to provide a standard set of robust tools that can be used by LLM researchers at AI2 and other organizations to build their research projects on.
The library is centered around a highly efficient, yet flexible, Trainer and a launch
module that handles all of the boilerplate of launching experiments on Beaker
or other platforms. It also comes with a simple, yet optimized, Transformer
model and many other useful torch.nn.Module implementations.
Most users will likely follow a workflow that looks like this:
Define the various components of an experiment through configuration classes. For example:
model_config = TransformerConfig.llama2_7B(...) train_module_config = TransformerTrainModuleConfig(...) data_config = NumpyFSLDatasetConfig(...) data_loader_config = NumpyDataLoaderConfig(...) trainer_config = TrainerConfig(...)
Build the corresponding components within a
main()function at runtime and then callTrainer.fit(). For example:def main(): model = model_config.build() train_module = train_module_config.build(model) data_loader = data_loader_config.build(data_config.build(), dp_process_group=train_module.dp_process_groupo) trainer = trainer_config.build(train_module, data_loader) trainer.fit() if __name__ == "__main__": prepare_training_environment(seed=SEED) main() teardown_training_environment()
Launch their training script with a
launchconfig, like theBeakerLaunchConfig. For example:launch_config = BeakerLaunchConfig(...) launch_config.launch(follow=True)
Or simply launch their training script manually with
torchrun:torchrun --nproc-per-node=8 train_script.py ...
You can find a complete example of this workflow in the Train an LLM example. And for a more comprehensive overview, see the All-in-one for researchers.