eval.evaluator¶
- class olmo_core.eval.evaluator.Evaluator(*, name, batches=None, batches_factory=None, device=None, deterministic=True)[source]¶
Bases:
objectBase class for in-loop evaluators.
See also
This can be used with an
EvaluatorCallbackto run an evaluator within the training loop.- Parameters:
name (
str) – A name to assign to the evaluator.batches (
Optional[Iterable[Dict[str,Any]]], default:None) – Generates batches for the evaluator. These should at least include the “input_ids” field, but can contain any other arbitrary fields as well.batches_factory (
Optional[Callable[[],Iterable[Dict[str,Any]]]], default:None) – A callable that returns an iterable over batches. This is an alternative to providing thebatchesargument directly.device (
Optional[device], default:None) – The device to compute/reduce metrics on.deterministic (
bool, default:True) – WhenTrueandbatchesis aDataLoaderBase, each evaluation pass resets the data loader and reshuffles withepoch=1so repeated evals read the same batches in the same order. This is useful when eval loops are truncated viaDuration. WhenFalse, the data loader still resets to batch 0 before each pass, but reshuffles without pinning the epoch so the batch order may change between eval runs. This does not implement a moving window across evals; if an eval is truncated, different reshuffles may result in different instances being evaluated each time.
- property total_batches: int | None¶
Get the total number of batches in an eval loop if it’s known ahead of time.
- abstract update_metrics(batch, ce_loss, logits)[source]¶
Update metrics with from the
batchjust processed and the correspondinglogits.- Parameters:
- Return type:
- abstract compute_metrics()[source]¶
Compute the final value of the metrics for the current evaluation loop. The metrics returned should already be reduced, if needed.
- abstract reset_metrics()[source]¶
Reset metrics. Should be called after
compute_metrics().- Return type: