`eval.evaluator`¶

class olmo_core.eval.evaluator.Evaluator(*, name, batches=None, batches_factory=None, device=None, deterministic=True)[source]¶

Base class for in-loop evaluators.

See also

This can be used with an EvaluatorCallback to run an evaluator within the training loop.

Parameters:

name (str) – A name to assign to the evaluator.
batches (Optional[Iterable[Dict[str, Any]]], default: None) – Generates batches for the evaluator. These should at least include the “input_ids” field, but can contain any other arbitrary fields as well.
batches_factory (Optional[Callable[[], Iterable[Dict[str, Any]]]], default: None) – A callable that returns an iterable over batches. This is an alternative to providing the batches argument directly.
device (Optional[device], default: None) – The device to compute/reduce metrics on.
deterministic (bool, default: True) – When True and batches is a DataLoaderBase, each evaluation pass resets the data loader and reshuffles with epoch=1 so repeated evals read the same batches in the same order. This is useful when eval loops are truncated via Duration. When False, the data loader still resets to batch 0 before each pass, but reshuffles without pinning the epoch so the batch order may change between eval runs. This does not implement a moving window across evals; if an eval is truncated, different reshuffles may result in different instances being evaluated each time.

property total_batches: int | None¶: Get the total number of batches in an eval loop if it’s known ahead of time.

abstract update_metrics(batch, ce_loss, logits)[source]¶

Update metrics with from the batch just processed and the corresponding logits.

Parameters:

batch (Dict[str, Any]) – A batch generated from batches.
ce_loss (Optional[Tensor]) – The cross-entropy loss per token (un-reduced) of the batch. This will have shape (batch_size, (seq_len - 1)).
logits (Optional[Tensor]) – The logits generated from the forward pass of the model.

Return type:

None

abstract compute_metrics()[source]¶

Compute the final value of the metrics for the current evaluation loop. The metrics returned should already be reduced, if needed.

abstract reset_metrics()[source]¶

Reset metrics. Should be called after compute_metrics().

eval.evaluator¶