Submitting a job ================ For those familiar with GridEngine, Slurm documentation provide a `Rosetta Stone for schedulers `_, to ease the transition. Slurm commands -------------- :term:`Slurm` allows requesting resources and submitting jobs in a variety of ways. The main Slurm commands to submit jobs are: * srun * Request resources and **runs a command** on the allocated compute node(s) * **Blocking**: will not return until the command ends * sbatch * Request resources and **runs a script** on the allocated compute node(s) * **Asynchronous**: will return as soon as the job is submitted .. TIP:: **Slurm Basics** .. _slurm_basics: * **Job** A Job is an allocation of resources (CPUs, RAM, time, etc.) reserved for the execution of a specific process: * The allocation is defined in the submission script as the number of Tasks (``--ntasks``) multiplied by the number of CPUs per Task (``--cpus-per-task``) and corresponds to the maximum resources that can be used in parallel, * The submission script, via ``sbatch``, creates one or more Job Steps and manages the distribution of Tasks on Compute Nodes. * **Tasks** A Task is a process to which are allocated the resources defined in the script via the ``--cpus-per-task`` option. A Task can have these resources like any other process (creation of threads, of sub-processes possibly themselves multi-threaded). This is the Job's resource allocation unit. CPUs not used by a Task will be **lost**, not usable by any other Task or Step. If the Task creates more processes/threads than allocated CPUs, these threads will share the allocation. * **Job Steps** A Job Step represents a stage, or section, of the processing performed by the Job. It executes one or more Tasks via the ``srun`` command. This division into Job Steps offers great flexibility in the organization of the steps in the Job and the management, and analysis, of the allocated resources: * Steps can be executed sequentially or in parallel, * one Step can initiate one or more Tasks, executed sequentially or in parallel, * Steps are tracked by the ``sstat/sacct`` commands, allowing both Step-by-Step progress tracking of a Job during it's execution, and detailed resource usage statistics for each Step (during and after execution). Using ``srun`` for a single task, inside a submission script, is not mandatory. * **Partition** A Partition is a logical grouping of Compute Nodes. This grouping makes it possible to specialize and optimize each partition for a particular type of job. See :doc:`computing_resources` and :doc:`partitions_overview` for more details. .. _job_script: Job script ---------- To run a job on the system you need to create a ``submission script`` (or job script, or batch script). This script is a regular shell script (bash) with some directives specifying the number of CPUs, memory, etc., that will be interpreted by the scheduling system upon submission. * very simple .. code-block:: bash #!/bin/bash # #SBATCH --job-name=test hostname -s sleep 60s Writing submission scripts can be tricky, see more in :doc:`batch_scripts`. See also our `repository of examples scripts `_. First job --------- submit your job script with: .. code-block:: bash $ sbatch myfirstjob.sh Submitted batch job 623 :term:`Slurm` will return with a ``$JOBID`` if the job is accepted, else an error message. Without any options about output, it will be defaulted to ``slurm-$JOBID.out`` (slurm-623.out, with the above example), in the submission directory. Once submitted, the job enters the queue in the *PENDING* (PD) state. When resources become available and the job has sufficient priority, an allocation is created for it and it moves to the *RUNNING* (R) state. If the job completes correctly, it goes to the *COMPLETED* state, otherwise, its state is set to *FAILED*. .. TIP:: **You can submit jobs from any login node to any partition. Login nodes are only segregated for build (CPU µarch) and scratch access.** Monitor your jobs ----------------- You can monitor your job using either its name (``#SBATCH --job-name``) or its ``$JOBID`` with Slurm's ``squeue`` [#squeue]_ command: .. code-block:: bash $ squeue -j 623 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 623 E5 test ltaulell R 0:04 1 c82gluster2 By default, ``squeue`` show every pending and running jobs. You can filter in your own jobs, using ``-u $USER`` or ``--me`` option: .. code-block:: bash $ squeue --me JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 623 E5 test ltaulell R 0:04 1 c82gluster2 If needed, you can modify the output of ``squeue`` [#squeue]_. Here's an example (add CPUs to default output): .. code-block:: bash $ squeue --me --format="%.7i %.9P %.8j %.8u %.2t %.10M %.6D %.4C %N" JOBID PARTITION NAME USER ST TIME NODES CPUS NODELIST 38956 Lake test ltaulell R 0:41 1 1 c6420node172 Usefull bash aliases: .. code-block:: bash alias pending='squeue --me --states=PENDING --sort=S,Q --format="%.10i %.12P %.8j %.8u %.6D %.4C %.20R %Q %.19S" # my pending jobs alias running='squeue --me --states=RUNNING --format="%.10i %.12P %.8j %.8u %.2t %.10M %.6D %.4C %R %.19e" # my running jobs Analyzing currently running jobs -------------------------------- The ``sstat`` [#sstat]_ command allows users to easily pull up status information about their currently running jobs. This includes information about **CPU usage**, **task information**, **node information**, **resident set size (RSS)**, and **virtual memory (VM)**. You can invoke the ``sstat`` command as such: .. code-block:: bash $ sstat --jobs=$JOB_ID By default, sstat will pull up significantly more information than what would be needed in the commands default output. To remedy this, you can use the `--format` flag to choose what you want in your output. See format flag in ``man sstat``. Some relevant variables are listed in the table below: +-----------+----------------------------------------------------------+ | Variable | Description | +===========+==========================================================+ | avecpu | Average CPU time of all tasks in job. | +-----------+----------------------------------------------------------+ | averss | Average resident set size of all tasks. | +-----------+----------------------------------------------------------+ | avevmsize | Average virtual memory of all tasks in a job. | +-----------+----------------------------------------------------------+ | jobid | The id of the Job. | +-----------+----------------------------------------------------------+ | maxrss | Maximum number of bytes read by all tasks in the job. | +-----------+----------------------------------------------------------+ | maxvsize | Maximum number of bytes written by all tasks in the job. | +-----------+----------------------------------------------------------+ | ntasks | Number of tasks in a job. | +-----------+----------------------------------------------------------+ For example, let's print out a job's average job id, cpu time, max rss, and number of tasks: .. code-block:: bash sstat --jobs=$JOB_ID --format=jobid,cputime,maxrss,ntasks You can obtain more detailed informations about a job using Slurm's ``scontrol`` [#scontrol]_ command. This can be very usefull for troubleshooting. .. code-block:: bash $ scontrol show jobid $JOB_ID $ scontrol show jobid 38956 JobId=38956 JobName=test UserId=ltaulell(*****) GroupId=psmn(*****) MCS_label=N/A Priority=8628 Nice=0 Account=staff QOS=normal JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:08 TimeLimit=8-00:00:00 TimeMin=N/A SubmitTime=2022-07-08T12:00:20 EligibleTime=2022-07-08T12:00:20 AccrueTime=2022-07-08T12:00:20 StartTime=2022-07-08T12:00:22 EndTime=2022-07-16T12:00:22 Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-07-08T12:00:22 Partition=Lake AllocNode:Sid=x5570comp2:446203 ReqNodeList=(null) ExcNodeList=(null) NodeList=c6420node172 BatchHost=c6420node172 NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=1,mem=385582M,node=1,billing=1 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryNode=385582M MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=/home/ltaulell/tests/env.sh WorkDir=/home/ltaulell/tests StdErr=/home/ltaulell/tests/slurm-38956.out StdIn=/dev/null StdOut=/home/ltaulell/tests/slurm-38956.out Power= NtasksPerTRES:0 .. [#squeue] You can get the complete list of parameters by referring to the ``squeue`` manual page (``man squeue``). .. [#scontrol] You can get the complete list of parameters by referring to the ``scontrol`` manual page (``man scontrol``). .. [#sstat] You can get the complete list of parameters by referring to the ``sstat`` manual page (``man sstat``).