nn.layer_norm

class olmo_core.nn.layer_norm.LayerNormType(value)[source]

Bases: StrEnum

An enumeration of the different layer norm implementations.

default = 'default'

➡️ LayerNorm

rms = 'rms'

➡️ RMSNorm

qwen_rms = 'qwen_rms'

➡️ QwenRMSNorm

cute_rms = 'cute_rms'

➡️ CuTeRMSNorm

fused_rms = 'fused_rms'

➡️ FusedRMSNorm

l2_norm = 'l2_norm'

➡️ L2Norm

class olmo_core.nn.layer_norm.LayerNormConfig(name='default', eps=None, elementwise_affine=None, bias=None, full_precision=None, dtype=None)[source]

Bases: ModuleConfig

A config for conveniently building any one of the different layer norm classes.

See the LayerNorm subclasses to learn which fields are valid for each implementation.

name: LayerNormType = 'default'

The name of the implementation.

num_params(size)[source]

The number of parameters in the module once built.

Parameters:

size (int) – The size of the input along the dimension to be normalized.

Return type:

int

build(size, init_device='cpu')[source]

Construct the corresponding LayerNorm class.

Parameters:
  • size (int) – The size of the input along the dimension to be normalized.

  • init_device (str, default: 'cpu') – The device initialize the parameters on, e.g. “cpu”, “meta”.

Return type:

LayerNorm

class olmo_core.nn.layer_norm.LayerNorm(*, size, eps=1e-05, elementwise_affine=True, bias=True, full_precision=True, dtype=torch.float32, init_device='cpu')[source]

Bases: Module

Layer normalization.

Parameters:
  • size (int) – The size of the input along the dimension to be normalized.

  • eps (float, default: 1e-05) – The epsilon used for numerical stability.

  • elementwise_affine (bool, default: True) – Whether to include an element-wise affine transform.

  • bias (bool, default: True) – Whether the element-wise affine should include an element-wise bias. Ignored if elementwise_affine=False.

  • full_precision (bool, default: True) – Force the operation to run in full precision regardless of the input data type.

  • dtype (dtype, default: torch.float32) – The default data type to use for the weight and bias in the element-wise affine. If full_precision=False it can be useful to set this to the expected input data type. Ignored if elementwise_affine=False.

  • init_device (str, default: 'cpu') – The device used when initializing the element-wise weight/bias.

extra_repr()[source]

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(x)[source]

Apply layer norm.

Parameters:

x (Tensor) – The input.

Return type:

Tensor

class olmo_core.nn.layer_norm.RMSNorm(*, size, eps=1e-05, elementwise_affine=True, bias=True, full_precision=True, dtype=torch.float32, init_device='cpu')[source]

Bases: LayerNorm

RMSNorm, a simplified layer norm implementation.

forward(x)[source]

Apply RMSNorm.

Parameters:

x (Tensor) – The input.

Return type:

Tensor

class olmo_core.nn.layer_norm.QwenRMSNorm(*, size, eps=1e-05, elementwise_affine=True, bias=True, full_precision=True, dtype=torch.float32, init_device='cpu')[source]

Bases: RMSNorm

RMSNorm variant matching HuggingFace’s Qwen3RMSNorm rounding order: the input is cast back to its original dtype before being multiplied by the affine weight, so the weight multiply happens in the input dtype rather than fp32.

forward(x)[source]

Apply RMSNorm.

Parameters:

x (Tensor) – The input.

Return type:

Tensor

class olmo_core.nn.layer_norm.CuTeRMSNorm(*, size, eps=1e-05, elementwise_affine=True, bias=True, full_precision=True, init_device='cpu', dtype=torch.float32)[source]

Bases: RMSNorm

A CuTe-based implementation from the QuACK library.

Warning

This requires quack to be installed.

forward(x)[source]

Apply RMSNorm.

Parameters:

x (Tensor) – The input.

Return type:

Tensor

class olmo_core.nn.layer_norm.FusedRMSNorm(*, size, eps=1e-05, elementwise_affine=True, bias=True, full_precision=True, init_device='cpu', dtype=torch.float32)[source]

Bases: RMSNorm

A “fused” triton-based implementation of RMSNorm.

Warning

This requires flash-attn to be installed.

Warning

Currently only elementwise_affine=True is supported.

forward(x)[source]

Apply RMSNorm.

Parameters:

x (Tensor) – The input.

Return type:

Tensor

class olmo_core.nn.layer_norm.L2Norm(*, size)[source]

Bases: LayerNorm

A variant of layer norm that just normalizes the last dimension of the input by its L2 norm, as done in nGPT.

Parameters:

size (int) – The size of the input along the dimension to be normalized.

forward(x)[source]

Apply layer norm.

Parameters:

x (Tensor) – The input.

Return type:

Tensor