nn.hf

Utilities for converting models between OLMo Core and Hugging Face formats. To configure the mappings between OLMo Core and Hugging Face, you may change the variables in olmo_core.nn.hf.convert (e.g. olmo_core.nn.hf.convert.HF_TO_OLMO_CORE_WEIGHT_MAPPINGS).

olmo_core.nn.hf.convert_checkpoint_to_hf(original_checkpoint_path, output_path, transformer_config_dict, tokenizer_config_dict, *, model_state_dict=None, dtype=None, tokenizer_id=None, max_sequence_length=None, validate=True, debug=False, device=None, moe_capacity_factor=None, validation_device=None, validation_sliding_window=None)[source]

Convert an OLMo Core checkpoint to HuggingFace format.

Parameters:
  • original_checkpoint_path (UnionType[str, Path, None]) – Path to the original checkpoint. Can be None if model_state_dict is provided.

  • output_path (str | Path) – Where to save the converted model.

  • transformer_config_dict (Dict[str, Any]) – Dictionary form of OLMo Core model config.

  • tokenizer_config_dict (Dict[str, Any]) – Dictionary form of OLMo Core tokenizer config.

  • model_state_dict (Optional[Dict[str, Any]], default: None) – Optional pre-gathered model state dict. If provided, weights are taken from this instead of loading from original_checkpoint_path.

Return type:

None

olmo_core.nn.hf.convert_hybrid_state_to_hf(state_dict, layer_types)[source]

Convert an OLMo-core hybrid state dict to HF olmo_hybrid format.

Uses HYBRID_SHARED_KEY_MAP for non-block keys, and per-layer HYBRID_GDN_LAYER_KEY_MAP / HYBRID_ATTN_LAYER_KEY_MAP based on layer_types.

Parameters:
  • state_dict (Dict[str, Any]) – An unsharded OLMo-core model state dict.

  • layer_types (List[str]) – Per-layer type list ("linear_attention" or "full_attention").

Return type:

Dict[str, Any]

Warning

This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.

olmo_core.nn.hf.convert_state_from_hf(config, hf_state, *, model_type=None)[source]

Converts a model state dict in Hugging Face transformers format into an unsharded state dict of OLMo Core format.

Parameters:
  • config (PreTrainedConfig) – The Hugging Face config for the model

  • hf_state (Dict[str, Any]) – A model state dict in HF format.

  • model_type (Optional[str], default: None) – The model type of the HF model.

Return type:

Dict[str, Any]

Warning

This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.

olmo_core.nn.hf.convert_state_to_hf(config, olmo_core_state)[source]

Converts an unsharded model state dict of OLMo Core format into Hugging Face transformers format.

Parameters:
  • config (PreTrainedConfig) – The Hugging Face config for the model

  • olmo_core_state (Dict[str, Any]) – An unsharded OLMo Core model state dict. None of the states can be DTensor or ShardedTensor

Return type:

Dict[str, Any]

Warning

This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.

olmo_core.nn.hf.get_converter_from_hf(model_type=None)[source]

Warning

This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.

Return type:

StateConverter

olmo_core.nn.hf.get_converter_to_hf(model_type=None)[source]

Warning

This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.

Return type:

StateConverter

olmo_core.nn.hf.get_hf_config(model)[source]

Warning

This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.

Return type:

PreTrainedConfig

olmo_core.nn.hf.get_hybrid_hf_config(model, layer_types, max_seq_len=65536)[source]

Build the config.json dict for a HF olmo_hybrid model.

Returns a plain dict (not PretrainedConfig) to avoid a hard dependency on a specific transformers version.

Parameters:
  • model (Transformer) – The OLMo-core hybrid transformer model.

  • layer_types (List[str]) – Per-layer type list from get_hybrid_layer_types().

  • max_seq_len (int, default: 65536) – Maximum sequence length for max_position_embeddings.

Return type:

Dict[str, Any]

Warning

This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.

olmo_core.nn.hf.get_hybrid_layer_types(model)[source]

Return a per-layer type list for a hybrid model.

Each entry is "linear_attention" (GDN) or "full_attention" (standard attention), matching the HF olmo_hybrid config format.

Return type:

List[str]

Warning

This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.

olmo_core.nn.hf.is_olmo_hybrid_model(model)[source]

Return True if the model has both GatedDeltaNet and Attention layers. :rtype: bool

Warning

This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.

olmo_core.nn.hf.load_config(checkpoint_input_dir)[source]

Load the experiment config from an OLMo Core checkpoint directory.

Return type:

Optional[dict]

olmo_core.nn.hf.load_hf_model(model_name_or_path, model_state_dict, *, revision='main', model_id=None, num_embeddings=None, process_group=None, work_dir=None)[source]

Loads an OLMo Core model state dict using a model in Hugging Face transformers format.

Parameters:
  • model_name_or_path (Union[Path, PathLike, str]) – The name of a model in HF Hub or the path to a model saved in HF format.

  • model_state_dict (Dict[str, Any]) – The OLMo Core model state dict in which to load HF state.

  • revision (str, default: 'main') – If model_name_or_path is the id of a model in HF Hub, then this is the revision (branch) of that model. Defaults to “main”.

  • model_id (Optional[str], default: None) – Deprecated, model-specific mappings are now determined by the model architecture, in olmo_core.nn.hf.convert

  • num_embeddings (Optional[int], default: None) – The number of embeddings in the OLMo Core model being loaded into, defaults to the number of embeddings in the HF model.

  • process_group (Optional[ProcessGroup], default: None) – The process group to use for distributed communication.

  • work_dir (Union[Path, PathLike, str, None], default: None) – A local directory that can be used for holding temporary state. Required when downloading a model from a cloud directory.

Warning

This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.

olmo_core.nn.hf.save_hf_hybrid_model(save_dir, model_state_dict, model, *, dtype=None, vocab_size=None, max_sequence_length=65536)[source]

Save a hybrid (GDN + attention) model as config.json + model.safetensors.

Unlike save_hf_model(), this writes files directly to avoid a hard dependency on a specific transformers version.

Parameters:
  • save_dir (Union[Path, PathLike, str]) – Directory in which to save the model.

  • model_state_dict (Dict[str, Any]) – The OLMo-core model state dict.

  • model (Transformer) – The OLMo-core hybrid transformer model.

  • dtype (Optional[DType], default: None) – Optional dtype to cast weights to.

  • vocab_size (Optional[int], default: None) – If set, truncate embeddings/lm_head to this size.

  • max_sequence_length (int, default: 65536) – Maximum sequence length for max_position_embeddings.

Return type:

None

Warning

This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.

olmo_core.nn.hf.save_hf_model(save_dir, model_state_dict, model, huggingface_tokenizer=None, *, dtype=None, vocab_size=None, process_group=None, work_dir=None, save_overwrite=False)[source]

Saves an OLMo Core model state dict in Hugging Face transformers format.

Parameters:
  • save_dir (Union[Path, PathLike, str]) – Directory in which to save model.

  • model_state_dict (Dict[str, Any]) – The OLMo Core model state dict being saved in HF format.

  • dtype (Optional[DType], default: None) – The torch dtype that model weights should be saved as.

  • vocab_size (Optional[int], default: None) – The size of the vocab, defaults to the number of embeddings in the OLMo Core model.

  • process_group (Optional[ProcessGroup], default: None) – The process group to use for distributed communication.

  • work_dir (Union[Path, PathLike, str, None], default: None) – A local directory that can be used for holding temporary state. Required when downloading a model from a cloud directory.

  • save_overwrite (bool, default: False) – Overwrite existing files in save_dir.

Warning

This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.

olmo_core.nn.hf.convert.HF_TO_OLMO_CORE_WEIGHT_MAPPINGS: Dict[str, str]

Map of Hugging Face weight keys to OLMo Core weight keys, that is used to determine how HF state maps to OLMo Core state. Different HF models may use different names for a given OLMo Core state. You may configure this to change how HF state maps to OLMo Core state.

This map only captures one-to-one mappings from HF to OLMo Core. For many-to-many mappings or mappings that require additional manipulation of state, see HF_TO_OLMO_CORE_TEMPLATE_MAPPINGS. If a given HF key can refer to different OLMo Core states depending on the HF model, see MODEL_TYPE_SPECIFIC_HF_TO_OLMO_CORE_WEIGHT_MAPPINGS.

olmo_core.nn.hf.convert.HF_TO_OLMO_CORE_MODULE_MAPPINGS: Dict[str, str]

Map of Hugging Face module keys to OLMo Core module keys, that is used to determine how HF state maps to OLMo Core state. Different HF models may use different names for a given OLMo Core state. You may configure this to change how HF state maps to OLMo Core state.

This map only captures one-to-one mappings from HF to OLMo Core. For many-to-many mappings or mappings that require additional manipulation of state, see HF_TO_OLMO_CORE_TEMPLATE_MAPPINGS. If a given HF key can refer to different OLMo Core states depending on the HF model, see MODEL_TYPE_SPECIFIC_HF_TO_OLMO_CORE_MODULE_MAPPINGS.

olmo_core.nn.hf.convert.MODEL_TYPE_SPECIFIC_HF_TO_OLMO_CORE_WEIGHT_MAPPINGS: Dict[str, Dict[str, str]]

Map of Hugging Face weight keys to OLMo Core weight keys. This map captures overrides of the standard one-to-one mappings in HF_TO_OLMO_CORE_WEIGHT_MAPPINGS, in case a given HF key can refer to different OLMo Core states depending on the HF model architecture. You may configure this to change how HF state maps to OLMo Core state.

olmo_core.nn.hf.convert.MODEL_TYPE_SPECIFIC_HF_TO_OLMO_CORE_MODULE_MAPPINGS: Dict[str, Dict[str, str]]

Map of Hugging Face module keys to OLMo Core module keys. This map captures overrides of the standard one-to-one mappings in HF_TO_OLMO_CORE_MODULE_MAPPINGS, in case a given HF key can refer to different OLMo Core states depending on the HF model architecture. You may configure this to change how HF state maps to OLMo Core state.

olmo_core.nn.hf.convert.HF_TO_OLMO_CORE_TEMPLATE_MAPPINGS: Dict[str, StateMappingTemplate]

Map of Hugging Face keys to OLMo Core keys, that is used to determine how HF state maps to OLMo Core state. Different HF models may use different names for a given OLMo Core state. You may configure this to change how HF state maps to OLMo Core state.

This map captures many-to-many mappings from HF to OLMo Core and mappings that require additional manipulation of state (e.g. merging dimensions). For simple one-to-one mappings from HF to OLMo Core, see HF_TO_OLMO_CORE_MAPPINGS.

olmo_core.nn.hf.convert.OLMO_CORE_TO_HF_WEIGHT_MAPPINGS: Dict[str, str]

Map of OLMo Core weight keys to Hugging Face weight keys, that is used to determine how OLMo Core state maps to HF state. You may configure this to change how OLMo Core state maps to HF state.

This map only captures one-to-one mappings from OLMo Core to HF. For many-to-many mappings or mappings that require additional manipulation of state, see OLMO_CORE_TO_HF_TEMPLATE_MAPPINGS.

olmo_core.nn.hf.convert.OLMO_CORE_TO_HF_MODULE_MAPPINGS: Dict[str, str]

Map of OLMo Core module keys to Hugging Face module keys, that is used to determine how OLMo Core state maps to HF state. You may configure this to change how OLMo Core state maps to HF state.

This map only captures one-to-one mappings from OLMo Core to HF. For many-to-many mappings or mappings that require additional manipulation of state, see OLMO_CORE_TO_HF_TEMPLATE_MAPPINGS.

olmo_core.nn.hf.convert.OLMO_CORE_TO_HF_TEMPLATE_MAPPINGS: Dict[str, StateMappingTemplate]

Map of OLMo Core keys to Hugging Face keys, that is used to determine how OLMo Core state maps to HF state. You may configure this to change how OLMo Core state maps to HF state.

This map captures many-to-many mappings from OLMo Core to HF and mappings that require additional manipulation of state (e.g. merging dimensions). For simple one-to-one mappings from OLMo Core to HF, see OLMO_CORE_TO_HF_MAPPINGS.

olmo_core.nn.hf.convert.MODEL_TYPE_SPECIFIC_OLMO_CORE_TO_HF_TEMPLATE_MAPPINGS: Dict[str, Dict[str, StateMappingTemplate]]

Map of OLMo Core keys to Hugging Face keys, that is used to determine how OLMo Core state maps to HF state. This map captures overrides of the standard mappings in OLMO_CORE_TO_HF_TEMPLATE_MAPPINGS, in case a given OLMo Core key can refer to different HF states depending on the HF model. You may configure this to change how OLMo Core state maps to HF state.