nn.hf¶
Utilities for converting models between OLMo Core and Hugging Face formats. To configure the
mappings between OLMo Core and Hugging Face, you may change the variables in
olmo_core.nn.hf.convert (e.g. olmo_core.nn.hf.convert.HF_TO_OLMO_CORE_WEIGHT_MAPPINGS).
- olmo_core.nn.hf.convert_checkpoint_to_hf(original_checkpoint_path, output_path, transformer_config_dict, tokenizer_config_dict, *, model_state_dict=None, dtype=None, tokenizer_id=None, max_sequence_length=None, validate=True, debug=False, device=None, moe_capacity_factor=None, validation_device=None, validation_sliding_window=None)[source]¶
Convert an OLMo Core checkpoint to HuggingFace format.
- Parameters:
original_checkpoint_path (
UnionType[str,Path,None]) – Path to the original checkpoint. Can beNoneifmodel_state_dictis provided.output_path (
str|Path) – Where to save the converted model.transformer_config_dict (
Dict[str,Any]) – Dictionary form of OLMo Core model config.tokenizer_config_dict (
Dict[str,Any]) – Dictionary form of OLMo Core tokenizer config.model_state_dict (
Optional[Dict[str,Any]], default:None) – Optional pre-gathered model state dict. If provided, weights are taken from this instead of loading fromoriginal_checkpoint_path.
- Return type:
- olmo_core.nn.hf.convert_hybrid_state_to_hf(state_dict, layer_types)[source]¶
Convert an OLMo-core hybrid state dict to HF
olmo_hybridformat.Uses
HYBRID_SHARED_KEY_MAPfor non-block keys, and per-layerHYBRID_GDN_LAYER_KEY_MAP/HYBRID_ATTN_LAYER_KEY_MAPbased on layer_types.- Parameters:
- Return type:
Warning
This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.
- olmo_core.nn.hf.convert_state_from_hf(config, hf_state, *, model_type=None)[source]¶
Converts a model state dict in Hugging Face transformers format into an unsharded state dict of OLMo Core format.
- Parameters:
- Return type:
Warning
This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.
- olmo_core.nn.hf.convert_state_to_hf(config, olmo_core_state)[source]¶
Converts an unsharded model state dict of OLMo Core format into Hugging Face transformers format.
- Parameters:
- Return type:
Warning
This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.
- olmo_core.nn.hf.get_converter_from_hf(model_type=None)[source]¶
Warning
This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.
- Return type:
- olmo_core.nn.hf.get_converter_to_hf(model_type=None)[source]¶
Warning
This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.
- Return type:
- olmo_core.nn.hf.get_hf_config(model)[source]¶
Warning
This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.
- Return type:
PreTrainedConfig
- olmo_core.nn.hf.get_hybrid_hf_config(model, layer_types, max_seq_len=65536)[source]¶
Build the
config.jsondict for a HFolmo_hybridmodel.Returns a plain dict (not
PretrainedConfig) to avoid a hard dependency on a specifictransformersversion.- Parameters:
model (
Transformer) – The OLMo-core hybrid transformer model.layer_types (
List[str]) – Per-layer type list fromget_hybrid_layer_types().max_seq_len (
int, default:65536) – Maximum sequence length formax_position_embeddings.
- Return type:
Warning
This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.
- olmo_core.nn.hf.get_hybrid_layer_types(model)[source]¶
Return a per-layer type list for a hybrid model.
Each entry is
"linear_attention"(GDN) or"full_attention"(standard attention), matching the HFolmo_hybridconfig format.Warning
This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.
- olmo_core.nn.hf.is_olmo_hybrid_model(model)[source]¶
Return
Trueif the model has bothGatedDeltaNetandAttentionlayers. :rtype:boolWarning
This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.
- olmo_core.nn.hf.load_config(checkpoint_input_dir)[source]¶
Load the experiment config from an OLMo Core checkpoint directory.
- olmo_core.nn.hf.load_hf_model(model_name_or_path, model_state_dict, *, revision='main', model_id=None, num_embeddings=None, process_group=None, work_dir=None)[source]¶
Loads an OLMo Core model state dict using a model in Hugging Face transformers format.
- Parameters:
model_name_or_path (
Union[Path,PathLike,str]) – The name of a model in HF Hub or the path to a model saved in HF format.model_state_dict (
Dict[str,Any]) – The OLMo Core model state dict in which to load HF state.revision (
str, default:'main') – Ifmodel_name_or_pathis the id of a model in HF Hub, then this is the revision (branch) of that model. Defaults to “main”.model_id (
Optional[str], default:None) – Deprecated, model-specific mappings are now determined by the model architecture, inolmo_core.nn.hf.convertnum_embeddings (
Optional[int], default:None) – The number of embeddings in the OLMo Core model being loaded into, defaults to the number of embeddings in the HF model.process_group (
Optional[ProcessGroup], default:None) – The process group to use for distributed communication.work_dir (
Union[Path,PathLike,str,None], default:None) – A local directory that can be used for holding temporary state. Required when downloading a model from a cloud directory.
Warning
This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.
- olmo_core.nn.hf.save_hf_hybrid_model(save_dir, model_state_dict, model, *, dtype=None, vocab_size=None, max_sequence_length=65536)[source]¶
Save a hybrid (GDN + attention) model as
config.json+model.safetensors.Unlike
save_hf_model(), this writes files directly to avoid a hard dependency on a specifictransformersversion.- Parameters:
save_dir (
Union[Path,PathLike,str]) – Directory in which to save the model.model_state_dict (
Dict[str,Any]) – The OLMo-core model state dict.model (
Transformer) – The OLMo-core hybrid transformer model.dtype (
Optional[DType], default:None) – Optional dtype to cast weights to.vocab_size (
Optional[int], default:None) – If set, truncate embeddings/lm_head to this size.max_sequence_length (
int, default:65536) – Maximum sequence length formax_position_embeddings.
- Return type:
Warning
This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.
- olmo_core.nn.hf.save_hf_model(save_dir, model_state_dict, model, huggingface_tokenizer=None, *, dtype=None, vocab_size=None, process_group=None, work_dir=None, save_overwrite=False)[source]¶
Saves an OLMo Core model state dict in Hugging Face transformers format.
- Parameters:
save_dir (
Union[Path,PathLike,str]) – Directory in which to save model.model_state_dict (
Dict[str,Any]) – The OLMo Core model state dict being saved in HF format.dtype (
Optional[DType], default:None) – The torch dtype that model weights should be saved as.vocab_size (
Optional[int], default:None) – The size of the vocab, defaults to the number of embeddings in the OLMo Core model.process_group (
Optional[ProcessGroup], default:None) – The process group to use for distributed communication.work_dir (
Union[Path,PathLike,str,None], default:None) – A local directory that can be used for holding temporary state. Required when downloading a model from a cloud directory.save_overwrite (
bool, default:False) – Overwrite existing files insave_dir.
Warning
This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.
- olmo_core.nn.hf.convert.HF_TO_OLMO_CORE_WEIGHT_MAPPINGS: Dict[str, str]¶
Map of Hugging Face weight keys to OLMo Core weight keys, that is used to determine how HF state maps to OLMo Core state. Different HF models may use different names for a given OLMo Core state. You may configure this to change how HF state maps to OLMo Core state.
This map only captures one-to-one mappings from HF to OLMo Core. For many-to-many mappings or mappings that require additional manipulation of state, see
HF_TO_OLMO_CORE_TEMPLATE_MAPPINGS. If a given HF key can refer to different OLMo Core states depending on the HF model, seeMODEL_TYPE_SPECIFIC_HF_TO_OLMO_CORE_WEIGHT_MAPPINGS.
- olmo_core.nn.hf.convert.HF_TO_OLMO_CORE_MODULE_MAPPINGS: Dict[str, str]¶
Map of Hugging Face module keys to OLMo Core module keys, that is used to determine how HF state maps to OLMo Core state. Different HF models may use different names for a given OLMo Core state. You may configure this to change how HF state maps to OLMo Core state.
This map only captures one-to-one mappings from HF to OLMo Core. For many-to-many mappings or mappings that require additional manipulation of state, see
HF_TO_OLMO_CORE_TEMPLATE_MAPPINGS. If a given HF key can refer to different OLMo Core states depending on the HF model, seeMODEL_TYPE_SPECIFIC_HF_TO_OLMO_CORE_MODULE_MAPPINGS.
- olmo_core.nn.hf.convert.MODEL_TYPE_SPECIFIC_HF_TO_OLMO_CORE_WEIGHT_MAPPINGS: Dict[str, Dict[str, str]]¶
Map of Hugging Face weight keys to OLMo Core weight keys. This map captures overrides of the standard one-to-one mappings in
HF_TO_OLMO_CORE_WEIGHT_MAPPINGS, in case a given HF key can refer to different OLMo Core states depending on the HF model architecture. You may configure this to change how HF state maps to OLMo Core state.
- olmo_core.nn.hf.convert.MODEL_TYPE_SPECIFIC_HF_TO_OLMO_CORE_MODULE_MAPPINGS: Dict[str, Dict[str, str]]¶
Map of Hugging Face module keys to OLMo Core module keys. This map captures overrides of the standard one-to-one mappings in
HF_TO_OLMO_CORE_MODULE_MAPPINGS, in case a given HF key can refer to different OLMo Core states depending on the HF model architecture. You may configure this to change how HF state maps to OLMo Core state.
- olmo_core.nn.hf.convert.HF_TO_OLMO_CORE_TEMPLATE_MAPPINGS: Dict[str, StateMappingTemplate]¶
Map of Hugging Face keys to OLMo Core keys, that is used to determine how HF state maps to OLMo Core state. Different HF models may use different names for a given OLMo Core state. You may configure this to change how HF state maps to OLMo Core state.
This map captures many-to-many mappings from HF to OLMo Core and mappings that require additional manipulation of state (e.g. merging dimensions). For simple one-to-one mappings from HF to OLMo Core, see
HF_TO_OLMO_CORE_MAPPINGS.
- olmo_core.nn.hf.convert.OLMO_CORE_TO_HF_WEIGHT_MAPPINGS: Dict[str, str]¶
Map of OLMo Core weight keys to Hugging Face weight keys, that is used to determine how OLMo Core state maps to HF state. You may configure this to change how OLMo Core state maps to HF state.
This map only captures one-to-one mappings from OLMo Core to HF. For many-to-many mappings or mappings that require additional manipulation of state, see
OLMO_CORE_TO_HF_TEMPLATE_MAPPINGS.
- olmo_core.nn.hf.convert.OLMO_CORE_TO_HF_MODULE_MAPPINGS: Dict[str, str]¶
Map of OLMo Core module keys to Hugging Face module keys, that is used to determine how OLMo Core state maps to HF state. You may configure this to change how OLMo Core state maps to HF state.
This map only captures one-to-one mappings from OLMo Core to HF. For many-to-many mappings or mappings that require additional manipulation of state, see
OLMO_CORE_TO_HF_TEMPLATE_MAPPINGS.
- olmo_core.nn.hf.convert.OLMO_CORE_TO_HF_TEMPLATE_MAPPINGS: Dict[str, StateMappingTemplate]¶
Map of OLMo Core keys to Hugging Face keys, that is used to determine how OLMo Core state maps to HF state. You may configure this to change how OLMo Core state maps to HF state.
This map captures many-to-many mappings from OLMo Core to HF and mappings that require additional manipulation of state (e.g. merging dimensions). For simple one-to-one mappings from OLMo Core to HF, see
OLMO_CORE_TO_HF_MAPPINGS.
- olmo_core.nn.hf.convert.MODEL_TYPE_SPECIFIC_OLMO_CORE_TO_HF_TEMPLATE_MAPPINGS: Dict[str, Dict[str, StateMappingTemplate]]¶
Map of OLMo Core keys to Hugging Face keys, that is used to determine how OLMo Core state maps to HF state. This map captures overrides of the standard mappings in
OLMO_CORE_TO_HF_TEMPLATE_MAPPINGS, in case a given OLMo Core key can refer to different HF states depending on the HF model. You may configure this to change how OLMo Core state maps to HF state.