Clusters usage

Todo

This documentation represent final expected state, not current migration state : See our news feed for up-to-date informations

A mandatory prerequisite for running computational tasks on PSMN is to request computing resources. This is done via a job scheduler (or resource scheduler, or batch manager), whose very purpose is to match compute resources in the cluster (CPUs, memory, …) with user resource requests.

The scheduler provides three key functions:

  1. it allocates access to resources (compute nodes) to users for some duration of time so they can perform work,

  2. it provides a framework for starting, executing, and monitoring work (typically a parallel job such as MPI) on a set of allocated nodes,

  3. it arbitrates contention for resources by managing a queue of pending jobs.

PSMN is now using Slurm, an open-source resource manager and job scheduler. We specificaly use v20.11 of Slurm.

For those familiar with GridEngine, Slurm documentation provide a Rosetta Stone for schedulers, to ease the transition.

Slurm supports a variety of job submission techniques. By accurately requesting the resources you need, you will be able to get your work done.

PSMN partitions 2022

Fig. 32 A quick view of PSMN partitions