data.types¶
- class olmo_core.data.types.LongDocStrategy(value)[source]¶
Bases:
StrEnumSpecifies how to handle documents that are longer than the max sequence length when packing.
- truncate = 'truncate'¶
Long docs are truncated and the excess tokens are discarded.
- fragment = 'fragment'¶
Long docs are split into smaller docs so that no tokens are discarded, but you end up with fragmented docs.