dctools.utilities.machine_profile

Runtime auto-tuning of Dask parallelism parameters based on hardware.

Called once at config-load time by dctools.utilities.args_config.load_args_and_config().

A parameter is filled in when all of the following hold:

  • auto_tune: true (or key absent) in the root config or the YAML value is the literal string "auto"

  • The value is not already an explicit number (integers/floats are always kept)

The YAML can therefore be used in three modes:

  1. Fully automatic (recommended) — set auto_tune: true at the root and omit per-source parallelism keys (or set them to null / "auto"):

    auto_tune: true
    sources:
      - dataset: swot
        observation_dataset: true
        # n_parallel_workers, nthreads_per_worker, memory_limit_per_worker
        # are all filled automatically
    
  2. Selective override — keep auto_tune: true but pin specific params:

    sources:
      - dataset: swot
        n_parallel_workers: 3   # fixed; everything else still auto-tuned
    
  3. Fully manual — set auto_tune: false; only params set to the string "auto" are filled, everything else is kept as-is:

    auto_tune: false
    sources:
      - dataset: swot
        n_parallel_workers: 5          # kept as-is
        memory_limit_per_worker: "auto"  # ← filled from hardware
    

Functions

auto_tune_config(config[, data_directory])

Fill auto-tuned parallelism parameters into config in-place.