dctools.data.datasets.dataloader.preprocess_batch_obs_files
- dctools.data.datasets.dataloader.preprocess_batch_obs_files(local_paths, alias, keep_vars, coordinates, n_points_dim='n_points', output_zarr_dir=None, eval_variables=None, max_workers=None)
Preprocess all unique observation files on the driver into a single zarr.
Eliminates redundant per-worker preprocessing when multiple tasks share the same observation files (typical for SWOT/swath data with wide time_tolerance). Each unique file is processed exactly once:
open --> swath_to_points --> NaN-mask --> compute.The resulting zarr contains all valid ocean points with a
timecoordinate along n_points_dim, enabling per-worker time filtering.- Parameters:
local_paths (list of str) – Unique local file paths (post-prefetch).
alias (str) – Dataset alias (e.g.
"swot").keep_vars (list of str or None) – Variables to retain in the output.
coordinates (dict) – Coordinate name mapping, must contain
"time"key.n_points_dim (str) – Name of the points dimension (default
"n_points").output_zarr_dir (str or None) – Directory for temp zarr files. Created automatically when None.
eval_variables (List[str] | None)
max_workers (int | None)
- Returns:
Absolute path to the shared batch zarr, or None on failure.
- Return type:
str or None