Computing resources
===================

Our clusters are grouped as *partitions*, by CPU generations, **available** :term:`RAM` **size** and infiniband networks:

Big picture
-----------

+-----------+---------------+----------+----------+---------+------------------+----------------------------+
| Partition | CPU family    | nb cores | RAM (GB) | Network | main Scratch     | **Best use case**          |
+===========+===============+==========+==========+=========+==================+============================+
|        E5 |            E5 |       16 | 62, 124, | 56Gb/s  |     /scratch/E5N | training, sequential,      |
|           |               |          | 252      |         |                  | small parallel             |
+-----------+---------------+----------+----------+---------+------------------+----------------------------+
|    E5-GPU |            E5 |        8 | 124      | 56Gb/s  |    /scratch/Lake | sequential, small parallel |
|           |               |          |          |         |                  | , GPU computing            |
+-----------+---------------+----------+----------+---------+------------------+----------------------------+
|      Lake |      Sky Lake |       32 | 94, 124, | 56Gb/s  |    /scratch/Lake | medium parallel,           |
+           +---------------+          + 190, 380 +         +                  + sequential                 +
|           |  Cascade Lake |          |          |         |                  |                            |
+-----------+---------------+----------+----------+---------+------------------+----------------------------+
|      Epyc |      AMD Epyc |      128 | 510      | 100Gb/s |    /scratch/Lake | large parallel             |
+-----------+---------------+----------+----------+---------+------------------+----------------------------+
|   Cascade |  Cascade Lake |       96 | 380      | 100Gb/s | /scratch/Cascade | large parallel             |
+-----------+---------------+----------+----------+---------+------------------+----------------------------+

See :doc:`partitions_overview` for more hardware details. **Available** :term:`RAM` **size** may vary a little (not all RAM is available for computing, GB vs GiB, etc.).


Available resources
-------------------

Use the ``sinfo`` [#sinfo]_ command to view the list of partitions (default one is noted with a '*') and their state (also ``sinfo -l``, ``sinfo -lNe`` and ``sinfo --summarize``):

.. code-block:: bash

    $ sinfo
    PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
    E5*          up 8-00:00:00      4   idle c82gluster[1-4]
    Cascade      up 8-00:00:00     77   idle s92node[02-78]


Or informations state about a particular partition:

.. code-block:: bash

    $ sinfo -p Epyc
    PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
    Epyc         up 8-00:00:00      1    mix c6525node002
    Epyc         up 8-00:00:00     12  alloc c6525node[001,003-006,008-014]
    Epyc         up 8-00:00:00      1   idle c6525node007


To see more informations (cpus and cpu organization, :term:`RAM` size [in MiB], state/availability), use one of these:

.. code-block:: bash

    $ sinfo --exact --format="%9P %.8z %.8X %.8Y %.8c %.7m %.5D %N"
    PARTITION    S:C:T  SOCKETS    CORES     CPUS  MEMORY NODES NODELIST
    E5*          2:8:1        2        8       16  128872     4 c82gpgpu[31-34]
    E5*          2:8:1        2        8       16   64328     3 c82gluster[2-4]
    E5-GPU       2:4:1        2        4        8  128829     1 r730gpu20
    Lake        2:16:1        2       16       32  385582     3 c6420node[172-174]
    Cascade     2:48:1        2       48       96  385606    77 s92node[02-78]

    $ sinfo --exact --format="%9P %.8c %.7m %.5D %.14F %N"
    PARTITION     CPUS  MEMORY NODES NODES(A/I/O/T) NODELIST
    E5*             16  128872     4        3/1/0/4 c82gpgpu[31-34]
    E5*             16   64328     3        3/0/0/3 c82gluster[2-4]
    E5-GPU           8  128829     1        0/1/0/1 r730gpu20
    Lake            32  385582     3        1/2/0/3 c6420node[172-174]
    Cascade         96  385606    77     47/26/4/77 s92node[02-78]

    $ sinfo --exact --format="%9P %.8c %.7m %.20C %.5D %25f" --partition E5,E5-GPU
    PARTITION     CPUS  MEMORY        CPUS(A/I/O/T) NODES AVAIL_FEATURES
    E5*             16  256000       248/120/16/384    24 local_scratch
    E5*             16  128828         354/30/0/384    24 (null)
    E5*             16  257852          384/0/0/384    24 (null)
    E5*             32  257843          384/0/0/384    12 (null)
    E5*             16   64328            48/0/0/48     3 (null)
    E5*             16  128872            64/0/0/64     4 (null)
    E5-GPU           8  127000         32/128/0/160    20 gpu


``A/I/O/T`` standing for ``Allocated/Idle/Other/Total``, in CPU terms.

.. code-block:: bash

    $ sinfo -lN | less
    NODELIST     NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
    [...]
    c82gluster4      1       E5*        idle 16      2:8:1  64328        0      1   (null) none
    s92node02        1   Cascade        idle 96     2:48:1 385606        0      1   (null) none
    [...]


.. important::

    * HyperThreading [#ht]_ is activated on all Intel nodes, but not available as computing resources (*real cores vs logical cores*).

    * :term:`RAM` size is in MiB, and you cannot reserve more than 94% of it, by node.


Basic defaults
--------------

* default partition: E5

* default time: 10 minutes

* default cpu(s): 1 core

* default memory size: 4GiB / core

Features
--------

Some nodes have *features* [#features]_ (``gpu``, ``local_scratch``, etc.).

To request a feature/constraint, you must add the following line to your submit script: ``#SBATCH --constraint=<feature>``. Example:

.. code-block:: bash

    #!/bin/bash
    #SBATCH --name=my_job_needs_local_scratch
    #SBATCH --time=02:00:00
    #SBATCH --ntasks=8
    #SBATCH --mem-per-cpu=4096M
    #SBATCH --constraint=local_scratch

Only nodes having features matching the job constraints will be used to satisfy the request.

Maximums
--------

Here are some maximums of usable resources **per job**:

* maximum wall-time : 8 days ('8-0:0:0' as 'day-hours:minutes:secondes')

* maximum nodes per job and/or maximum cores **per job**:

+-----------+-------+-------+-----+
| Partition | nodes | cores | gpu |
+===========+=======+=======+=====+
|        E5 |    24 |   384 |     |
+-----------+-------+-------+-----+
|    E5-GPU |    19 |   152 |  18 |
+-----------+-------+-------+-----+
|      Lake |    24 |   768 |     |
+-----------+-------+-------+-----+
|      Epyc |    14 |  1792 |     |
+-----------+-------+-------+-----+
|   Cascade |    76 |  7296 |     |
+-----------+-------+-------+-----+

Anything more **must be asked using** `our contact forms <https://www.ens-lyon.fr/PSMN/doku.php?id=contact:forms:accueil>`_.


.. [#sinfo] You can get the complete list of parameters by referring to the ``sinfo`` manual page (``man sinfo``).

.. [#ht] `See HyperThreading <https://fr.wikipedia.org/wiki/Hyper-threading>`_

.. [#features] See ``sbatch`` manual page (``man sbatch``, -C, --constraint=<list>).