io

olmo_core.io.normalize_path(path)[source]

Normalize a path/URL.

Parameters:

path (Union[Path, PathLike, str]) – The path/URL to normalize.

Return type:

str

olmo_core.io.join_path(path, *paths)[source]

Join two or more paths.

Return type:

Union[Path, PathLike, str]

Returns:

The joined result.

olmo_core.io.get_parent(path)[source]

Get the parent directory of a path.

Parameters:

path (Union[Path, PathLike, str]) – The path/URL to get the parent of.

Return type:

Union[Path, PathLike, str]

olmo_core.io.resource_path(folder, fname, local_cache=None)[source]

Returns an actual path for local or remote file, potentially downloading it if a copy doesn’t exist locally yet.

Return type:

Path

olmo_core.io.is_url(path)[source]

Check if a path is a URL.

Parameters:

path (Union[Path, PathLike, str]) – Path-like object to check.

Return type:

bool

olmo_core.io.get_file_size(path)[source]

Get the size of a local or remote file in bytes.

Warning

Uses caching if the argument is URL if the filesystem cache is enabled (see olmo_core.fs_cache.maybe_cache()).

Parameters:

path (Union[Path, PathLike, str]) – Path/URL to the file.

Return type:

int

olmo_core.io.get_bytes_range(path, bytes_start, num_bytes)[source]

Get a range of bytes from a local or remote file.

Parameters:
  • source – Path/URL to the file.

  • bytes_start (int) – Byte offset to start at.

  • num_bytes (int) – Number of bytes to get.

Return type:

bytes

olmo_core.io.upload(source, target, save_overwrite=False, quiet=False)[source]

Upload source file to a target location on GCS or S3.

Parameters:
  • source (Union[Path, PathLike, str]) – Path to the file to upload.

  • target (str) – Target URL to upload to.

  • save_overwrite (bool, default: False) – Overwrite any existing file.

olmo_core.io.copy_file(source, target, save_overwrite=False, quiet=False)[source]

Copy a file from source to target.

Parameters:
  • source (Union[Path, PathLike, str]) – The path/URL to the source file.

  • target (Union[Path, PathLike, str]) – The path/URL to the target location.

  • save_overwrite (bool, default: False) – Overwrite any existing file.

Raises:
olmo_core.io.copy_dir(source, target, save_overwrite=False, num_threads=None, quiet=False)[source]

Copy a directory from source to target.

Parameters:
  • source (Union[Path, PathLike, str]) – The path/URL to the source directory.

  • target (Union[Path, PathLike, str]) – The path/URL to the target location.

  • save_overwrite (bool, default: False) – Overwrite any existing files.

  • num_threads (Optional[int], default: None) – The number of threads to use.

Raises:
olmo_core.io.dir_is_empty(dir)[source]

Check if a local or remote directory is empty. This also returns true if the directory does not exist.

Parameters:

dir (Union[Path, PathLike, str]) – Path/URL to the directory.

Return type:

bool

olmo_core.io.file_exists(path)[source]

Check if a local or remote file exists.

Parameters:

path (Union[Path, PathLike, str]) – Path/URL to a file.

Return type:

bool

olmo_core.io.remove_file(path)[source]

Remove a local or remote file.

Parameters:

path (Union[Path, PathLike, str]) – The path or URL to the file.

Raises:

FileNotFoundError – If the file doesn’t exist.

olmo_core.io.clear_directory(dir, force=False)[source]

Clear out the contents of a local or remote directory.

Warning

This function is potentially very destructive!

By default, for safety, this raise a ValueError if you attempt to clear a remote directory too close to the root of the bucket. Set force=True to override.

Parameters:
  • dir (Union[Path, PathLike, str]) – Path/URL to the directory.

  • force (bool, default: False) – See note about safety.

olmo_core.io.list_directory(dir, recurse=False, include_files=True, include_dirs=True)[source]

List the contents of a local or remote directory. If recurse=False, only the immediate children of the directory are returned, otherwise every sub-folder is recursed into.

Parameters:
  • dir (Union[Path, PathLike, str]) – Path/URL to the directory.

  • recurse (bool, default: False) – Whether to recurse into sub-folders.

  • include_files (bool, default: True) – Include regular files in the results.

  • include_dirs (bool, default: True) – Include directories in the results.

Return type:

Generator[str, None, None]

Returns:

A generator over paths in the directory. If the dir is a URL, the results will be full URLs. If the dir is a local path, the results will be of the form join_path(dir, p).

Raises:

FileNotFoundError – If the source file doesn’t exist.

olmo_core.io.glob_directory(pattern)[source]

Similar to glob.glob() from the standard library, but works with remote directories as well. :rtype: Generator[str, None, None]

Warning

Only a subset of glob patterns are supported. Specifically, * and ** wildcards, which the follow the semantics defined here https://docs.python.org/3/library/pathlib.html#pattern-language.

olmo_core.io.deterministic_glob_directory(pattern)[source]

Like glob_directory() but returns a sorted list for deterministic ordering. :rtype: List[str]

Warning

Uses caching if the argument is URL if the filesystem cache is enabled (see olmo_core.fs_cache.maybe_cache()).

olmo_core.io.init_client(remote_path)[source]

Initialize the right client for the given remote resource. This is helpful to avoid threading issues with boto3.

olmo_core.io.serialize_to_tensor(x)[source]

Serialize an object to a byte tensor using pickle.

Parameters:

x (Any) – The pickeable object to serialize.

Return type:

Tensor

olmo_core.io.deserialize_from_tensor(data)[source]

Deserialize an object from a byte tensor using pickle.

Parameters:

data (Tensor) – The byte tensor to deserialize.

Return type:

Any

olmo_core.io.add_cached_path_clients()[source]

Add additional cached-path clients.