nn.layer_norm¶
- class olmo_core.nn.layer_norm.LayerNormType(value)[source]¶
Bases:
StrEnumAn enumeration of the different layer norm implementations.
- qwen_rms = 'qwen_rms'¶
➡️
QwenRMSNorm
- cute_rms = 'cute_rms'¶
➡️
CuTeRMSNorm
- fused_rms = 'fused_rms'¶
➡️
FusedRMSNorm
- class olmo_core.nn.layer_norm.LayerNormConfig(name='default', eps=None, elementwise_affine=None, bias=None, full_precision=None, dtype=None)[source]¶
Bases:
ModuleConfigA config for conveniently building any one of the different layer norm classes.
See the
LayerNormsubclasses to learn which fields are valid for each implementation.-
name:
LayerNormType= 'default'¶ The name of the implementation.
-
name:
- class olmo_core.nn.layer_norm.LayerNorm(*, size, eps=1e-05, elementwise_affine=True, bias=True, full_precision=True, dtype=torch.float32, init_device='cpu')[source]¶
Bases:
ModuleLayer normalization.
- Parameters:
size (
int) – The size of the input along the dimension to be normalized.eps (
float, default:1e-05) – The epsilon used for numerical stability.elementwise_affine (
bool, default:True) – Whether to include an element-wise affine transform.bias (
bool, default:True) – Whether the element-wise affine should include an element-wise bias. Ignored ifelementwise_affine=False.full_precision (
bool, default:True) – Force the operation to run in full precision regardless of the input data type.dtype (
dtype, default:torch.float32) – The default data type to use for the weight and bias in the element-wise affine. Iffull_precision=Falseit can be useful to set this to the expected input data type. Ignored ifelementwise_affine=False.init_device (
str, default:'cpu') – The device used when initializing the element-wise weight/bias.
- class olmo_core.nn.layer_norm.RMSNorm(*, size, eps=1e-05, elementwise_affine=True, bias=True, full_precision=True, dtype=torch.float32, init_device='cpu')[source]¶
Bases:
LayerNormRMSNorm, a simplified layer norm implementation.
- class olmo_core.nn.layer_norm.QwenRMSNorm(*, size, eps=1e-05, elementwise_affine=True, bias=True, full_precision=True, dtype=torch.float32, init_device='cpu')[source]¶
Bases:
RMSNormRMSNorm variant matching HuggingFace’s
Qwen3RMSNormrounding order: the input is cast back to its original dtype before being multiplied by the affine weight, so the weight multiply happens in the input dtype rather than fp32.
- class olmo_core.nn.layer_norm.CuTeRMSNorm(*, size, eps=1e-05, elementwise_affine=True, bias=True, full_precision=True, init_device='cpu', dtype=torch.float32)[source]¶
Bases:
RMSNormA CuTe-based implementation from the QuACK library.
Warning
This requires quack to be installed.
- class olmo_core.nn.layer_norm.FusedRMSNorm(*, size, eps=1e-05, elementwise_affine=True, bias=True, full_precision=True, init_device='cpu', dtype=torch.float32)[source]¶
Bases:
RMSNormA “fused” triton-based implementation of
RMSNorm.Warning
This requires flash-attn to be installed.
Warning
Currently only
elementwise_affine=Trueis supported.