dctools.data.datasets.dataloader.preprocess_batch_obs_files

dctools.data.datasets.dataloader.preprocess_batch_obs_files(local_paths, alias, keep_vars, coordinates, n_points_dim='n_points', output_zarr_dir=None, eval_variables=None, max_workers=None)

Preprocess all unique observation files on the driver into a single zarr.

Eliminates redundant per-worker preprocessing when multiple tasks share the same observation files (typical for SWOT/swath data with wide time_tolerance). Each unique file is processed exactly once: open --> swath_to_points --> NaN-mask --> compute.

The resulting zarr contains all valid ocean points with a time coordinate along n_points_dim, enabling per-worker time filtering.

Parameters:
  • local_paths (list of str) – Unique local file paths (post-prefetch).

  • alias (str) – Dataset alias (e.g. "swot").

  • keep_vars (list of str or None) – Variables to retain in the output.

  • coordinates (dict) – Coordinate name mapping, must contain "time" key.

  • n_points_dim (str) – Name of the points dimension (default "n_points").

  • output_zarr_dir (str or None) – Directory for temp zarr files. Created automatically when None.

  • eval_variables (List[str] | None)

  • max_workers (int | None)

Returns:

Absolute path to the shared batch zarr, or None on failure.

Return type:

str or None