nn.conversion

Common logic for converting olmo_core.nn features to/from other formats (like Hugging Face).

class olmo_core.nn.conversion.StateConverter(mapping_templates)[source]

Bases: object

A class for converting state from one format to another format (e.g. OLMo Core to HF).

Warning

This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.

get_mappings(state_dict, placeholder_bounds, state_type='weight')[source]

Gets the state mapping from the given state dict to the converted format, without performing conversion.

Parameters:
  • state_dict (Dict[str, Any]) – The state dictionary in unconverted format.

  • placeholder_bounds (Dict[TemplatePlaceholder, int]) – Upper bound values for any relevant placeholders (e.g. for TemplatePlaceholder.EXPERT, the number of experts).

  • state_type (StateType, default: 'weight') – The type of state this state dict corresponds to. Defaults to StateType.weight.

Return type:

List[StateMapping]

convert(state_dict, placeholder_bounds, state_type='weight')[source]

Converts a state dict to another format. This currently only supports tensor values.

Parameters:
  • state_dict (Dict[str, Any]) – The state dictionary to convert.

  • placeholder_bounds (Dict[TemplatePlaceholder, int]) – Upper bound values for any relevant placeholders (e.g. for TemplatePlaceholder.EXPERT, the number of experts).

  • state_type (StateType, default: 'weight') – The type of state this state dict corresponds to. Defaults to StateType.weight.

Return type:

Dict[str, Any]

class olmo_core.nn.conversion.StateMapping(source_keys, dest_keys, state_type='weight', source_concat_dim=0, unflatten_dim=None, dims_permutation=None, flatten_dims=None, dest_chunk_dim=0)[source]

Bases: object

A mapping from state from one format to another format (e.g. OLMo Core to HF).

The most standard mapping is a one-to-one state mapping, which corresponds to a single string entry for both source_keys and dest_keys. The class also supports more complicated mappings, like many-to-many mappings or mappings that also require further manipulations of state like permuting dimensions.

source_keys: Tuple[str, ...]

The key(s) of the state(s) being mapping from.

dest_keys: Tuple[str, ...]

The key or keys of the state(s) being mapping to.

source_concat_dim: int = 0

When many states are being mapping from, this specifies the dimension on which to combine them.

unflatten_dim: Optional[Tuple[int, Tuple[int, ...]]] = None

This specifies that the given dimension (unflatten_dim[0]) should be unflattened using the shape given in unflatten_dim[1].

dims_permutation: Optional[Tuple[int, ...]] = None

This specifies the permutation that should be applied to the dimensions of the state after any unflattening from unflatten_dim has occurred.

flatten_dims: Optional[Tuple[int, int]] = None

This specifies that all the dimensions between the 2 given dimensions (inclusive) should be flattened, after any permutations from dims_permutation have been applied.

dest_chunk_dim: int = 0

When many states are being mapping to, this specifies the dimension on which to (evenly) chunk them.

class olmo_core.nn.conversion.StateMappingTemplate(source_template_keys, dest_template_keys, state_type='weight', source_key_per_placeholder=None, dest_key_per_placeholder=None, source_concat_dim=0, unflatten_dim=None, dims_permutation=None, flatten_dims=None, dest_chunk_dim=0)[source]

Bases: object

The template for a mapping state from one format to another format (e.g. OLMo Core to HF). These mappings are ‘templates’ since they support keys and other metadata having placeholders for information like the layer number or number of MoE experts. This class can be converted to a StateMapping by providing the placeholder information.

The most standard mapping is a one-to-one state mapping, which corresponds to a single string entry for both source_template_keys and dest_template_keys. The class also supports more complicated mappings, like many-to-many mappings or mappings that also require further manipulations of state like permuting dimensions.

source_template_keys: Union[str, Tuple[str, ...]]

The key or keys of the state(s) being mapping from.

dest_template_keys: Union[str, Tuple[str, ...]]

The key or keys of the state(s) being mapping to.

source_key_per_placeholder: Optional[TemplatePlaceholder] = None

A placeholder in source_template_keys for which this mapping should map all valid placeholder values, rather than 1 specific value. For example, this enables mapping states from all experts (using TemplatePlaceholder.EXPERT) to a single state.

When provided, source_template_keys must be a string.

dest_key_per_placeholder: Optional[TemplatePlaceholder] = None

A placeholder in dest_template_keys for which this mapping should map all valid placeholder values, rather than 1 specific value. For example, this enables mapping from a single state to states from all experts (using TemplatePlaceholder.EXPERT).

When provided, dest_template_keys must be a string.

source_concat_dim: int = 0

When many states are being mapping from, this specifies the dimension on which to combine them.

unflatten_dim: Optional[Tuple[int, Tuple[TemplatePlaceholder | int, ...]]] = None

This specifies that the given dimension (unflatten_dim[0]) should be unflattened using the shape given in unflatten_dim[1]. A placeholder can be given instead of a number, to represent its corresponding upper bound (e.g. TemplatePlaceholder.EXPERT represents the number of experts).

dims_permutation: Optional[Tuple[int, ...]] = None

This specifies the permutation that should be applied to the dimensions of the state after any unflattening from unflatten_dim has occurred.

flatten_dims: Optional[Tuple[int, int]] = None

This specifies that all the dimensions between the 2 given dimensions (inclusive) should be flattened, after any permutations from dims_permutation have been applied.

dest_chunk_dim: int = 0

When many states are being mapping to, this specifies the dimension on which to (evenly) chunk them.

class olmo_core.nn.conversion.TemplatePlaceholder(value)[source]

Bases: StrEnum

A placeholder that can be used in the templates of StateMappingTemplate.

LAYER = '[layer]'
EXPERT = '[expert]'