eval.lm_evaluator

class olmo_core.eval.lm_evaluator.LMEvaluator(*, name, batches, labels, device=None, deterministic=True)[source]

Bases: Evaluator

Language modeling evaluator that computes cross entropy loss and perplexity over one or more datasets.

Important

The batches generated from these evaluators must contain a “metadata” field which should be a list of dictionaries, and each dictionary item in the list should contain a string field called “label” which indicates which dataset the data file is associated with, and should be included in the labels argument to this class.

Parameters:
  • labels (Sequence[str]) – All of the task labels.

  • deterministic (bool, default: True) – See Evaluator for details.

classmethod from_numpy_dataset(dataset, *, name, global_batch_size, collator, device=None, dp_process_group=None, seed=0, num_threads=None, num_workers=0, prefetch_factor=None, deterministic=True)[source]

Initialize an LMEvaluator from a NumpyPaddedFSLDataset.

Return type:

LMEvaluator

update_metrics(batch, ce_loss, logits)[source]

Update metrics with from the batch just processed and the corresponding logits.

Parameters:
  • batch (Dict[str, Any]) – A batch generated from batches.

  • ce_loss (Optional[Tensor]) – The cross-entropy loss per token (un-reduced) of the batch. This will have shape (batch_size, (seq_len - 1)).

  • logits (Optional[Tensor]) – The logits generated from the forward pass of the model.

Return type:

None

compute_metrics()[source]

Compute the final value of the metrics for the current evaluation loop. The metrics returned should already be reduced, if needed.

Return type:

Dict[str, Tensor]

reset_metrics()[source]

Reset metrics. Should be called after compute_metrics().

Return type:

None