dctools.data.connection.connection_manager.prefetch_obs_files_to_local

dctools.data.connection.connection_manager.prefetch_obs_files_to_local(remote_paths, cache_dir, fs, ref_alias='', show_progress_bar=True, max_download_workers=None)

Pre-download observation files to local disk before worker dispatch.

Downloads all remote files (.zarr directories or .nc single files) to cache_dir so that dask workers can open them locally instead of issuing concurrent S3 requests.

A single tqdm progress bar tracks overall download progress.

Parameters:
  • remote_paths (List[str]) – List of remote S3 paths (e.g. s3://bucket/file.zarr).

  • cache_dir (str) – Local directory where files will be stored.

  • fs (Any) – An fsspec-compatible filesystem handle (e.g. s3fs.S3FileSystem).

  • ref_alias (str) – Name of the observation dataset (for logging / bar label).

  • show_progress_bar (bool)

  • max_download_workers (int | None)

Returns:

Dict mapping each remote path to its local path on disk.

Return type:

Dict[str, str]