dctools.data.connection.connection_manager.ArgoManager
- class dctools.data.connection.connection_manager.ArgoManager(connect_config, depth_values=None, argo_index=None, call_list_files=False)
Specific manager for ARGO data using ArgoInterface for scalable indexing.
- Parameters:
connect_config (BaseConnectionConfig | Namespace)
depth_values (List[float] | None)
argo_index (Any | None)
call_list_files (bool | None)
- __init__(connect_config, depth_values=None, argo_index=None, call_list_files=False)
Initialize ArgoManager with ArgoInterface.
- Parameters:
connect_config (BaseConnectionConfig | Namespace) – Configuration for ARGO connection
depth_values (List[float] | None) – Depth levels for interpolation
argo_index (Any | None) – Deprecated compatibility argument (unused)
call_list_files (bool | None) – Whether to call list_files during initialization
Methods
__init__(connect_config[, depth_values, ...])Initialize ArgoManager with ArgoInterface.
adjust_full_day(date_start, date_end)Adjust date_end to cover a full day if dates are the same at midnight.
download_file(remote_path, local_path)Download a file from the remote source to the local path.
estimate_resolution(ds, coord_system)Estimate resolution from dataset based on coordinates.
extract_global_metadata()Extract global metadata (common to all files) from a single file.
extract_metadata(path)Extract metadata combining global/file-specific info.
extract_metadata_worker(path, ...[, argo_index])Extract metadata combining global/file-specific info.
Return ARGO index object for compatibility with legacy evaluator code.
get_config_clean_copy()Return a clean copy of the configuration.
Get global metadata for ARGO dataset.
List available monthly index keys.
List available time windows from the master index.
open(path, *args, **kwargs)Open an ARGO time window.
open_local(local_path)Open a file locally if it exists.
open_remote(path[, mode])Open a file remotely if the source supports it.
prefetch_batch_shared_zarr(time_bounds_list, ...)Pre-download ALL ARGO profiles for a batch into one shared Zarr.
Pre-download ARGO profiles for a batch into one-or-more shared Zarr stores.
set_global_metadata(global_metadata)Sets the global metadata for the connection manager.
supports(path)Check if path is supported by ARGO manager.
- get_argo_index()
Return ARGO index object for compatibility with legacy evaluator code.
- get_global_metadata()
Get global metadata for ARGO dataset.
Harmonized with the generic connection manager path: - build a CoordinateSystem from a real ARGO sample when possible - detect semantic variable mappings (variables_dict) - expose inverse mapping (variables_rename_dict)
Falls back to robust defaults when no sample can be opened.
- Returns:
Global metadata for ARGO.
- Return type:
Dict[str, Any]
- list_files()
List available monthly index keys.
- Returns:
- List of month keys (YYYY-MM) from master index,
or empty list if index not loaded.
- Return type:
List[str]
- list_files_with_metadata()
List available time windows from the master index.
Liste les fenêtres temporelles disponibles à partir du master index. Utilise build_multi_year_monthly() pour créer l’index si nécessaire.
- Returns:
Liste des entrées de catalogue avec métadonnées
- Return type:
List[CatalogEntry]
- open(path, *args, **kwargs)
Open an ARGO time window.
Ouvre une fenêtre temporelle ARGO. Utilise open_time_window() de ArgoInterface.
- Parameters:
path (str) – Clé du mois (e.g., “2024_01”) ou tuple (start, end)
*args (Any) – Ignored extra positional arguments.
**kwargs (Any) – Ignored extra keyword arguments.
- Returns:
Dataset ARGO avec interpolation sur les profondeurs
- Return type:
xr.Dataset
Pre-download ALL ARGO profiles for a batch into one shared Zarr.
Instead of fetching one Zarr per time-window (the old approach), this method:
Merges all per-entry time windows into one global bounding interval — profiles that belong to multiple overlapping windows are downloaded exactly once.
Downloads every profile in that interval through a single
requests.Session(HTTP connection pooling -> one TCP+TLS handshake per GDAC mirror).Writes a single time-sorted Zarr that is opened by every worker. Each worker filters by its own
time_boundsvianp.searchsorted— reads only contiguous chunks, no full scan.
Typical saving vs. the per-window approach for a 10-entry batch with
time_tolerance=12 h:Downloads: 10 × ~1 day -> 1 × ~11 days (massive overlap removed)
Zarr writes: 10 -> 1
Disk space: 10 small stores -> 1 compact store
- Parameters:
time_bounds_list (list of (start, end) pd.Timestamp tuples) – One per batch entry. May contain duplicates.
cache_dir (Path) – Directory for the shared Zarr file. Created if necessary. Files persist across batches (same global window -> cache hit).
- Returns:
Absolute path to the shared, time-sorted Zarr, or None on failure (empty data, download error, …).
- Return type:
str or None
Pre-download ARGO profiles for a batch into one-or-more shared Zarr stores.
This is a safer variant of
prefetch_batch_shared_zarr()for workloads where a batch may contain time windows far apart in time. Instead of merging the entire batch into one giant global window, it partitions requests by calendar month and merges only overlapping windows within each month.The returned partitions are designed to be consumed by the evaluator fast-path: each entry receives either a single Zarr path (same-month window) or a list of two paths (rare month-boundary window), then the worker filters by its exact
time_boundsusingnp.searchsorted.- Returns:
Each element is
{"t0": Timestamp, "t1": Timestamp, "zarr_path": str}. The time interval is the one used to build the Zarr store.- Return type:
list[dict]
- Parameters:
time_bounds_list (List[Tuple[pandas.Timestamp, pandas.Timestamp]])
cache_dir (Path)
- classmethod supports(path)
Check if path is supported by ARGO manager.
Since ARGO uses monthly indexing rather than traditional file paths, this accepts any path when explicitly specified.
- Returns:
Always True (manager selected via configuration).
- Return type:
bool
- Parameters:
path (str)