`nn.functional`¶

Common nn function implementations.

olmo_core.nn.functional.cross_entropy_loss(logits, labels, *, ignore_index=-100, reduction='mean', compute_z_loss=False, z_loss_multiplier=0.0001)[source]¶

Cross entropy loss that optionally computes the softmax auxiliary loss (z-loss) as well.

Parameters:

logits (Tensor) – Predicted unnormalized logits with shape (N, vocab_size).
labels (Tensor) – Ground truth class indices with shape (N,).
ignore_index (int, default: -100) – Specifies a target value that is ignored and does not contribute to the input gradient.
reduction (Literal['mean', 'sum', 'none'], default: 'mean') – Specifies the reduction to apply to the output. Can be “none”, “mean”, or “sum”.
compute_z_loss (bool, default: False) – Compute the softmax auxiliary loss as well.
z_loss_multiplier (float, default: 0.0001) – The multiplier to apply to the z-loss.

Return type:

Tuple[Tensor, Optional[Tensor]]

Returns:

The cross entropy loss and optionally the z-loss.

olmo_core.nn.functional.fused_linear_cross_entropy_loss(_input, weight, labels, *, bias=None, ignore_index=-100, reduction='mean', compute_z_loss=False, z_loss_multiplier=0.0001, ce_weight=None, label_smoothing=0.0, softcap=None, accum_dtype=None)[source]¶

Cross entropy loss fused with the linear layer that computes the logits, which avoids materialization of the large logits tensor. Additionally, this function computes gradients during the forward pass, (valid when CrossEntropyLoss comes last), so _input and labels do not need to be stored for the backwards pass.