dctools.data.connection.connection_manager.BaseConnectionManager

class dctools.data.connection.connection_manager.BaseConnectionManager(connect_config, call_list_files=True, batch_size=64)

Abstract base connection manager.

Manages opening, closing and listing files for various protocols.

Parameters:
__init__(connect_config, call_list_files=True, batch_size=64)
Parameters:

Methods

__init__(connect_config[, call_list_files, ...])

adjust_full_day(date_start, date_end)

Adjust date_end to cover a full day if dates are the same at midnight.

download_file(remote_path, local_path)

Download a file from the remote source to the local path.

estimate_resolution(ds, coord_system)

Estimate resolution from dataset based on coordinates.

extract_global_metadata()

Extract global metadata (common to all files) from a single file.

extract_metadata(path)

Extract metadata combining global/file-specific info.

extract_metadata_worker(path, ...[, argo_index])

Extract metadata combining global/file-specific info.

get_config_clean_copy()

Return a clean copy of the configuration.

get_global_metadata()

Get global metadata for all files in the connection manager.

list_files()

List files matching the configuration.

list_files_with_metadata()

Version with integrated Dask client and optimized configuration.

open(path[, mode])

Open a file, prioritizing local then remote access.

open_local(local_path)

Open a file locally if it exists.

open_remote(path[, mode])

Open a file remotely if the source supports it.

set_global_metadata(global_metadata)

Sets the global metadata for the connection manager.

supports(path)

Check if path is supported by this manager.

adjust_full_day(date_start, date_end)

Adjust date_end to cover a full day if dates are the same at midnight.

Parameters:
  • date_start (pandas.Timestamp)

  • date_end (pandas.Timestamp)

Return type:

tuple[pandas.Timestamp, pandas.Timestamp]

download_file(remote_path, local_path)

Download a file from the remote source to the local path.

Parameters:
  • remote_path (str) – Remote path of the file.

  • local_path (str) – Local path to save the file.

estimate_resolution(ds, coord_system)

Estimate resolution from dataset based on coordinates.

Only inspects coordinate values (small arrays). Handles both in-memory and dask-backed datasets safely — np.asarray() is used to materialise only the coordinate arrays (typically tiny).

Parameters:
  • ds (xarray.Dataset) – xarray.Dataset

  • coord_system (CoordinateSystem) – CoordinateSystem object.

Returns:

Dictionary of estimated resolutions.

Return type:

Dict[str, float | str]

extract_global_metadata()

Extract global metadata (common to all files) from a single file.

Returns:

Global metadata including spatial bounds and variable names.

Return type:

Dict[str, Any]

extract_metadata(path)

Extract metadata combining global/file-specific info.

Parameters:
  • path (str) – Path to the file.

  • global_metadata (Dict[str, Any]) – Global metadata to apply to all files.

Returns:

Metadata for the specific file as a CatalogEntry.

Return type:

CatalogEntry

static extract_metadata_worker(path, global_metadata, connection_params, class_name, argo_index=None)

Extract metadata combining global/file-specific info.

Thread-safe version to avoid conflicts.

Parameters:
  • path (str) – Path to the file.

  • global_metadata (Dict[str, Any]) – Global metadata.

  • connection_params (dict)

  • class_name (Any)

  • argo_index (Any | None)

Returns:

Metadata for the specific file as a CatalogEntry.

Return type:

CatalogEntry

get_config_clean_copy()

Return a clean copy of the configuration.

get_global_metadata()

Get global metadata for all files in the connection manager.

Returns:

Global metadata including spatial bounds and variable names.

Return type:

Dict[str, Any]

abstractmethod list_files()

List files matching the configuration.

Return type:

List[str]

list_files_with_metadata()

Version with integrated Dask client and optimized configuration.

Return type:

List[CatalogEntry]

open(path, mode='rb')

Open a file, prioritizing local then remote access.

If the file is not available, attempt to download it locally and open it.

Parameters:
  • path (str) – Remote path of the file.

  • mode (str) – Mode to open the file (default is “rb”).

Returns:

Opened dataset.

Return type:

xr.Dataset

open_local(local_path)

Open a file locally if it exists.

Parameters:

local_path (str) – Path to the local file.

Returns:

Opened dataset, or None if the file does not exist.

Return type:

Optional[xr.Dataset]

open_remote(path, mode='rb')

Open a file remotely if the source supports it.

Parameters:
  • path (str) – Remote path of the file.

  • mode (str) – Mode to open the file (default is “rb”).

Returns:

Opened dataset, or None if remote opening is not supported.

Return type:

Optional[xr.Dataset]

set_global_metadata(global_metadata)

Sets the global metadata for the connection manager.

Keeps only the keys listed in the global_metadata class variable.

Parameters:

global_metadata (Dict[str, Any]) – Global metadata dictionary.

Return type:

None

abstractmethod classmethod supports(path)

Check if path is supported by this manager.

Parameters:

path (str)

Return type:

bool