Clusters usage

Note

See our news feed for regular updates

A mandatory prerequisite for running computational tasks on PSMN is to request computing resources. This is done via a job scheduler (or resource scheduler, or batch manager), whose very purpose is to match computing resources in the cluster (CPUs, memory, …) with user resource requests.

The scheduler provides three key functions:

  1. it allocates access to resources (compute nodes) to users for some duration of time so they can perform work,

  2. it provides a framework for starting, executing, and monitoring work (typically a parallel job such as MPI) on a set of allocated nodes,

  3. it arbitrates contention for resources by managing a queue of pending jobs.

PSMN is using Slurm, an open-source resource manager and job scheduler. We specificaly use v20.11 of Slurm.

For those familiar with GridEngine, Slurm documentation provide a Rosetta Stone for schedulers, to ease the transition.

Slurm supports a variety of job submission techniques. By accurately requesting the resources you need, you will be able to get your work done.

PSMN synoptic, for reference:

PSMN network synoptic 2022

Fig. 34 PSMN network synoptic (as of 2022)

PSMN clusters 2023

Fig. 36 A quick view of PSMN clusters