nn.feed_forward¶
- class olmo_core.nn.feed_forward.ActivationFunction(value)[source]¶
Bases:
StrEnumAn enumeration of the supported activation functions for feed-forward modules.
- silu = 'silu'¶
SiLU/Swish activation function, used for SwiGLU.
- gelu_tanh = 'gelu_tanh'¶
GELU with tanh approximation, used for GeGLU.
- class olmo_core.nn.feed_forward.FeedForwardType(value)[source]¶
Bases:
StrEnumAn enumeration of the different feed-forward / MLP implementations.
- default = 'default'¶
➡️
FeedForward
- normalized = 'normalized'¶
- class olmo_core.nn.feed_forward.FeedForwardConfig(hidden_size, name='default', bias=None, dtype=None, activation='silu')[source]¶
Bases:
ModuleConfigA config for building
FeedForwardmodules.-
name:
FeedForwardType= 'default'¶ The name of the implementation.
-
activation:
ActivationFunction= 'silu'¶ The activation function to use. See
ActivationFunctionfor options.
-
name:
- class olmo_core.nn.feed_forward.FeedForward(*, d_model, hidden_size, bias=True, dtype=torch.float32, init_device='cpu', activation='silu')[source]¶
Bases:
ModuleBasic feed-forward module with gated activation (SwiGLU or GeGLU).
- class olmo_core.nn.feed_forward.NormalizedFeedForward(*, d_model, hidden_size, dtype=torch.float32, init_device='cpu', activation='silu')[source]¶
Bases:
FeedForwardAn nGPT feed-forward implementation.
Warning
This is a beta feature! The API is subject to change even with minor and patch releases. If you choose to use this feature please read the CHANGELOG before upgrading your version of this library.
- normalize_matrices()[source]¶
Normalize the weights in all matrices. This should be called after each optimizer step, which the
TransformerTrainModulewill handle for you.