nn.feed_forward

class olmo_core.nn.feed_forward.ActivationFunction(value)[source]

Bases: StrEnum

An enumeration of the supported activation functions for feed-forward modules.

silu = 'silu'

SiLU/Swish activation function, used for SwiGLU.

gelu_tanh = 'gelu_tanh'

GELU with tanh approximation, used for GeGLU.

class olmo_core.nn.feed_forward.FeedForwardType(value)[source]

Bases: StrEnum

An enumeration of the different feed-forward / MLP implementations.

default = 'default'

➡️ FeedForward

normalized = 'normalized'

➡️ NormalizedFeedForward

class olmo_core.nn.feed_forward.FeedForwardConfig(hidden_size, name='default', bias=None, dtype=None, activation='silu')[source]

Bases: ModuleConfig

A config for building FeedForward modules.

name: FeedForwardType = 'default'

The name of the implementation.

activation: ActivationFunction = 'silu'

The activation function to use. See ActivationFunction for options.

num_params(d_model)[source]

The number of params that the module will have once built.

Parameters:

d_model (int) – The model dimensionality.

Return type:

int

build(d_model, *, dtype=None, init_device='cpu')[source]

Build the corresponding feed-forward module.

Parameters:
  • d_model (int) – The model dimensionality.

  • init_device (str, default: 'cpu') – The device initialize the parameters on, e.g. “cpu”, “meta”.

Return type:

FeedForward

class olmo_core.nn.feed_forward.FeedForward(*, d_model, hidden_size, bias=True, dtype=torch.float32, init_device='cpu', activation='silu')[source]

Bases: Module

Basic feed-forward module with gated activation (SwiGLU or GeGLU).

forward(x)[source]

Run the feed-forward on the input x.

Parameters:

x (Tensor) – The input of shape (*, d_model).

Return type:

Tensor

class olmo_core.nn.feed_forward.NormalizedFeedForward(*, d_model, hidden_size, dtype=torch.float32, init_device='cpu', activation='silu')[source]

Bases: FeedForward

An nGPT feed-forward implementation.

Warning

This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.

forward(x)[source]

Run the feed-forward on the input x.

Parameters:

x (Tensor) – The input of shape (*, d_model).

Return type:

Tensor

normalize_matrices()[source]

Normalize the weights in all matrices. This should be called after each optimizer step, which the TransformerTrainModule will handle for you.