Performance, cluster environments and reproducibility¶
If you plan to run OGGM on more than a handful of glaciers, you might be interested in using all processors available to you, whether you are working on your laptop or on a cluster: see Parallel computations for how to do this.
For regional or global computations you will need to run OGGM in Cluster environments. Here we provide a couple of guidelines based on our own experience with operational runs.
In Reproducibility with OGGM, we discuss certain aspects of scientific reproducibility with OGGM, and how we try to ensure that our results are reproducible (it’s not easy).
Parallel computations¶
OGGM is designed to use the available resources as well as possible. For single nodes machines but with more than one processor (e.g. for personal computers) OGGM ships with a multiprocessing approach which is fairly simple to use. For cluster environments with more than one machine, you can use MPI.
Multiprocessing¶
Most OGGM computations are embarrassingly parallel: they are standalone operations to be realized on one single glacier entity and therefore independent from each other (they are called entity tasks, as opposed to the non-parallelizable global tasks).
When given a list of Glacier directories on which to apply a given task,
the workflow.execute_entity_task()
will distribute the operations on
the available processors using Python’s multiprocessing module.
You can control this behavior with the use_multiprocessing
config
parameter and the number of processors with mp_processes
.
The default in OGGM is:
In [1]: from oggm import cfg
In [2]: cfg.initialize()
In [3]: cfg.PARAMS['use_multiprocessing'] # whether to use multiprocessing
Out[3]: True
In [4]: cfg.PARAMS['mp_processes'] # number of processors to use