`eval.lm_evaluator`¶

class olmo_core.eval.lm_evaluator.LMEvaluator(*, name, batches, labels, device=None, deterministic=True)[source]¶

Bases: Evaluator

Language modeling evaluator that computes cross entropy loss and perplexity over one or more datasets.

Important

The batches generated from these evaluators must contain a “metadata” field which should be a list of dictionaries, and each dictionary item in the list should contain a string field called “label” which indicates which dataset the data file is associated with, and should be included in the labels argument to this class.

Parameters:

labels (Sequence[str]) – All of the task labels.
deterministic (bool, default: True) – See Evaluator for details.

classmethod from_numpy_dataset(dataset, *, name, global_batch_size, collator, device=None, dp_process_group=None, seed=0, num_threads=None, num_workers=0, prefetch_factor=None, deterministic=True)[source]¶

Initialize an LMEvaluator from a NumpyPaddedFSLDataset.

Return type:: LMEvaluator

update_metrics(batch, ce_loss, logits)[source]¶

Update metrics with from the batch just processed and the corresponding logits.

Parameters:

batch (Dict[str, Any]) – A batch generated from batches.
ce_loss (Optional[Tensor]) – The cross-entropy loss per token (un-reduced) of the batch. This will have shape (batch_size, (seq_len - 1)).
logits (Optional[Tensor]) – The logits generated from the forward pass of the model.

Return type:

None

compute_metrics()[source]¶

Compute the final value of the metrics for the current evaluation loop. The metrics returned should already be reduced, if needed.

Return type:: Dict[str, Tensor]

reset_metrics()[source]¶

Reset metrics. Should be called after compute_metrics().

Return type:: None

eval.lm_evaluator¶

`eval.lm_evaluator`¶