`nn.layer_norm`¶

class olmo_core.nn.layer_norm.LayerNormType(value)[source]¶

Bases: StrEnum

An enumeration of the different layer norm implementations.

default = 'default'¶: ➡️ LayerNorm

rms = 'rms'¶: ➡️ RMSNorm

qwen_rms = 'qwen_rms'¶: ➡️ QwenRMSNorm

cute_rms = 'cute_rms'¶: ➡️ CuTeRMSNorm

fused_rms = 'fused_rms'¶: ➡️ FusedRMSNorm

l2_norm = 'l2_norm'¶: ➡️ L2Norm

class olmo_core.nn.layer_norm.LayerNormConfig(name='default', eps=None, elementwise_affine=None, bias=None, full_precision=None, dtype=None)[source]¶

Bases: ModuleConfig

A config for conveniently building any one of the different layer norm classes.

See the LayerNorm subclasses to learn which fields are valid for each implementation.

name: LayerNormType = 'default'¶: The name of the implementation.

num_params(size)[source]¶

The number of parameters in the module once built.

Parameters:: size (int) – The size of the input along the dimension to be normalized.
Return type:: int

build(size, init_device='cpu')[source]¶

Construct the corresponding LayerNorm class.

Parameters:

size (int) – The size of the input along the dimension to be normalized.
init_device (str, default: 'cpu') – The device initialize the parameters on, e.g. “cpu”, “meta”.

Return type:

LayerNorm

class olmo_core.nn.layer_norm.LayerNorm(*, size, eps=1e-05, elementwise_affine=True, bias=True, full_precision=True, dtype=torch.float32, init_device='cpu')[source]¶

Bases: Module

Layer normalization.

Parameters:

size (int) – The size of the input along the dimension to be normalized.
eps (float, default: 1e-05) – The epsilon used for numerical stability.
elementwise_affine (bool, default: True) – Whether to include an element-wise affine transform.
bias (bool, default: True) – Whether the element-wise affine should include an element-wise bias. Ignored if elementwise_affine=False.
full_precision (bool, default: True) – Force the operation to run in full precision regardless of the input data type.
dtype (dtype, default: torch.float32) – The default data type to use for the weight and bias in the element-wise affine. If full_precision=False it can be useful to set this to the expected input data type. Ignored if elementwise_affine=False.
init_device (str, default: 'cpu') – The device used when initializing the element-wise weight/bias.

extra_repr()[source]¶

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(x)[source]¶

Apply layer norm.

Parameters:: x (Tensor) – The input.
Return type:: Tensor

class olmo_core.nn.layer_norm.RMSNorm(*, size, eps=1e-05, elementwise_affine=True, bias=True, full_precision=True, dtype=torch.float32, init_device='cpu')[source]¶

Bases: LayerNorm

RMSNorm, a simplified layer norm implementation.

forward(x)[source]¶

Apply RMSNorm.

Parameters:: x (Tensor) – The input.
Return type:: Tensor

class olmo_core.nn.layer_norm.QwenRMSNorm(*, size, eps=1e-05, elementwise_affine=True, bias=True, full_precision=True, dtype=torch.float32, init_device='cpu')[source]¶

Bases: RMSNorm

RMSNorm variant matching HuggingFace’s Qwen3RMSNorm rounding order: the input is cast back to its original dtype before being multiplied by the affine weight, so the weight multiply happens in the input dtype rather than fp32.

forward(x)[source]¶

Apply RMSNorm.

Parameters:: x (Tensor) – The input.
Return type:: Tensor

class olmo_core.nn.layer_norm.CuTeRMSNorm(*, size, eps=1e-05, elementwise_affine=True, bias=True, full_precision=True, init_device='cpu', dtype=torch.float32)[source]¶

Bases: RMSNorm

A CuTe-based implementation from the QuACK library.

Warning

This requires quack to be installed.

forward(x)[source]¶

Apply RMSNorm.

Parameters:: x (Tensor) – The input.
Return type:: Tensor

class olmo_core.nn.layer_norm.FusedRMSNorm(*, size, eps=1e-05, elementwise_affine=True, bias=True, full_precision=True, init_device='cpu', dtype=torch.float32)[source]¶

Bases: RMSNorm

A “fused” triton-based implementation of RMSNorm.

Warning

This requires flash-attn to be installed.

Warning

Currently only elementwise_affine=True is supported.

forward(x)[source]¶

Apply RMSNorm.

Parameters:: x (Tensor) – The input.
Return type:: Tensor

class olmo_core.nn.layer_norm.L2Norm(*, size)[source]¶

Bases: LayerNorm

A variant of layer norm that just normalizes the last dimension of the input by its L2 norm, as done in nGPT.

Parameters:: size (int) – The size of the input along the dimension to be normalized.

forward(x)[source]¶

Apply layer norm.

Parameters:: x (Tensor) – The input.
Return type:: Tensor

nn.layer_norm¶

`nn.layer_norm`¶