distributedΒΆ
APIs for distributed communication, bookkeeping, and checkpointing.
Submodules
distributed.checkpoint- Features
- Overview
- API Reference
save_state_dict()async_save_state_dict()load_state_dict()save_model_and_optim_state()async_save_model_and_optim_state()load_model_and_optim_state()unshard_checkpoint()load_keys()get_checkpoint_metadata()UnshardStrategyUnshardStrategyTypeprune_state_dict()merge_state_dicts()
distributed.parallelbuild_world_mesh()get_world_mesh()MeshDimNameget_dp_model_mesh()get_dp_mesh()get_tp_mesh()get_cp_mesh()get_pp_mesh()get_pp_stage_mesh()get_ep_mesh()get_dp_process_group()get_device_mesh_info()flatten_mesh()DataParallelTypeDataParallelConfigDPMeshDimNameTensorParallelConfigExpertParallelConfigPipelineParallelConfigPipelineScheduleTypePipelineSplitStylePipelineSchedule
distributed.utilsinit_distributed()validate_env_vars()get_node_hostname()is_distributed()barrier()get_rank()get_global_rank()get_local_rank()get_fs_local_rank()get_world_size()get_local_world_size()get_num_nodes()synchronize_value()synchronize_flag()all_reduce_value()broadcast_object()all_gather()all_gather_object()get_mesh_coordinates()backend_supports_cuda()backend_supports_cpu()do_n_at_a_time()