io¶
- olmo_core.io.resource_path(folder, fname, local_cache=None)[source]¶
Returns an actual path for local or remote file, potentially downloading it if a copy doesn’t exist locally yet.
- Return type:
- olmo_core.io.get_file_size(path)[source]¶
Get the size of a local or remote file in bytes.
Warning
Uses caching if the argument is URL if the filesystem cache is enabled (see
olmo_core.fs_cache.maybe_cache()).
- olmo_core.io.get_bytes_range(path, bytes_start, num_bytes)[source]¶
Get a range of bytes from a local or remote file.
- olmo_core.io.upload(source, target, save_overwrite=False, quiet=False)[source]¶
Upload source file to a target location on GCS or S3.
- olmo_core.io.copy_file(source, target, save_overwrite=False, quiet=False)[source]¶
Copy a file from
sourcetotarget.- Parameters:
- Raises:
FileNotFoundError – If the
sourcefile doesn’t exist.FileExistsError – If the
targetalready exists andsave_overwrite=False.
- olmo_core.io.copy_dir(source, target, save_overwrite=False, num_threads=None, quiet=False)[source]¶
Copy a directory from
sourcetotarget.- Parameters:
- Raises:
FileNotFoundError – If the
sourcedir doesn’t exist.FileExistsError – If any source files already exist in the
targetandsave_overwrite=False.
- olmo_core.io.dir_is_empty(dir)[source]¶
Check if a local or remote directory is empty. This also returns true if the directory does not exist.
- olmo_core.io.remove_file(path)[source]¶
Remove a local or remote file.
- Parameters:
path (
Union[Path,PathLike,str]) – The path or URL to the file.- Raises:
FileNotFoundError – If the file doesn’t exist.
- olmo_core.io.clear_directory(dir, force=False)[source]¶
Clear out the contents of a local or remote directory.
Warning
This function is potentially very destructive!
By default, for safety, this raise a
ValueErrorif you attempt to clear a remote directory too close to the root of the bucket. Setforce=Trueto override.
- olmo_core.io.list_directory(dir, recurse=False, include_files=True, include_dirs=True)[source]¶
List the contents of a local or remote directory. If
recurse=False, only the immediate children of the directory are returned, otherwise every sub-folder is recursed into.- Parameters:
- Return type:
- Returns:
A generator over paths in the directory. If the
diris a URL, the results will be full URLs. If thediris a local path, the results will be of the formjoin_path(dir, p).- Raises:
FileNotFoundError – If the
sourcefile doesn’t exist.
- olmo_core.io.glob_directory(pattern)[source]¶
Similar to
glob.glob()from the standard library, but works with remote directories as well. :rtype:Generator[str,None,None]Warning
Only a subset of glob patterns are supported. Specifically,
*and**wildcards, which the follow the semantics defined here https://docs.python.org/3/library/pathlib.html#pattern-language.
- olmo_core.io.deterministic_glob_directory(pattern)[source]¶
Like
glob_directory()but returns a sorted list for deterministic ordering. :rtype:List[str]Warning
Uses caching if the argument is URL if the filesystem cache is enabled (see
olmo_core.fs_cache.maybe_cache()).
- olmo_core.io.init_client(remote_path)[source]¶
Initialize the right client for the given remote resource. This is helpful to avoid threading issues with boto3.