Batch System

The batch system of the hydra cluster is provided by SLURM.

Using a batch system means, that you submit a job to the batch system which requests certain resources of the compute servers, e.g., how many processors or how much memory should be used. If the requested resources are available the job is started, otherwise execution waits until the resources are available.

All batch commands have to be executed on the front end of the batch system, i.e., hydra. Hence, you first have to log into hydra to access the compute nodes of the batch system. Direct login to the batch nodes is provided for job maintenance only and limited to 10 min.

The batch servers consist of different computer systems, e.g. different CPUs, different amount of main memory, with accelerator cards. This leads to different partitions for running jobs (see below).

Submitting a Job

sbatch

Executes all commands contained in a batch script on the compute servers based on the requested resources.

A minimal batch script looks like:

#!/bin/bash

mycommand arg1 arg2

Saving the file, e.g., as mybatchjob, you can submit it to the batch system as

sbatch mybatchjob

The default resources for the job are one CPU core on one compute server with 8 GB of memory and up to 48 hours runtime.

Resources are requested by one of the following options:

Option

Resource Request

--nodes=m

request m nodes (compute servers)

--ntasks-per-node=m

request m tasks per node

--cpus-per-task=m

request m CPU cores per task

--hint=nomultithread

disable hyperthreading

--exclusive

get exclusive access to compute nodes

--mem=m

request m MB of memory per node (m=0 equals all memory)

--mem-per-cpu=m

request m MB of memory per CPU

--partition=<name>

request job execution on partition name

To start a job with one task on one node with 4 CPU cores using up to 64G memory one needs to add the following arguments during job submission:

sbatch --nodes=1 --ntasks-per-node=1 --cpus-per-task=4 --mem=65536 mybatchjob

Parameters describing job resources can also be placed in the script by using the prefix #SBATCH:

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=65536
#SBATCH --partition=bdw

mycommand arg1 arg2

This again starts a job with one task on one node with 4 CPU cores using up to 64G memory. In addition, the job is placed in the bdw partition (see below).

You may provide additional options for your job, e.g. a job name or for email notification:

Option

Resource Request

--job-name=<name>

set job name

--mail-type=<type>

send email at BEGIN, END, FAIL, REQUEUE, ALL

--mail-user=<address>

set email address for notifications

Please make sure, that you specify a full email adress, e.g., user@mis.mpg.de. Otherwise this will not work.

Partitions

The batch systems puts the nodes into several partitions

Partition

Nodes

Hardware

bdw

bdw01..08

2x20 cores Intel Broadwell, 512 GB RAM

epyc

epyc01..02

2x64 cores AMD Epyc Rome, 512 GB RAM

cuda

cuda01..04

one or two NVidia accellerator cards

Please note, that the default partition is bdw!

Interactive Jobs

You may also request an interactive job, i.e., a command line on one of the batch nodes via:

srun --pty -u bash -i

All of the above parameters for sbatch are also available (and should be used!) for interactive sessions.

Please note that graphical applications are currently not supported in interactive sessions. For this, please use the interactive compute servers.

Array jobs

Jobs with identical parameters, so called array jobs, may also be submitted. For this, the parameter --array of the sbatch command is available. The parameter expects an array specification, which is a list of array indices:

sbatch --array 0,1,2,3,4 ...

or a range specifier:

sbatch --array 0-16:4 ...

The step width (:4) is optional and defaults to `.

All of the above may also be combined:

sbatch --array 0-16:4,32 ...

Within the batch script the individual tasks of the array job may be distinguished by using the SLURM_ARRAY_TASK_ID environment variable.

CPU affinity

Affinity of user programs to CPU cores is set by SLURM as requested by the user with the above options. The following table contains some typical configurations:

Configuration

Arguments

1 task/node, 2 CPUs/task, w/o HT

--ntasks-per-node=1 --cpus-per-task=40 --hint=nomultithread

1 task/node, 2 CPUs/task, w/ HT

--ntasks-per-node=1 --cpus-per-task=40 --threads-per-core=1

2 tasks/node, 1 CPUs/task, w/ HT

--ntasks-per-node=2 --cpus-per-task=20 --threads-per-core=1

2 tasks/node, 1 CPUs/task, w/o HT

--ntasks-per-node=2 --cpus-per-task=20 --hint=nomultithread

The values for --cpus-per-task correspond to nodes in the bdw partition. For the epyc partition you should set --cpus-per-task to 128.

Job Control

squeue

To view the currently allocated/running jobs, the sqeue command is available. By default all jobs will be shown. To limit the output to jobs of a specific user use the parameter -u:

squeue -u <username>

scancel

The command scancel cancels specific jobs identified by their job id, which is either printed while submitting your job or shown in the output of squeue:

scancel <jobid>

To cancel all jobs of a user, again use the parameter -u:

scancel -u <username>

sinfo

Shows various information about the state of the partitions.

MPI

The default MPI implementation on the compute servers is OpenMPI v4.1. All programs installed via default Linux packages will use this.

Recommended is however the Intel MPI library via our module system:

module load impi

This will work for programs and libraries compiled by you, e.g., not with default programs.

An example batch script will look like

#!/bin/bash

#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=20
#SBATCH --mem=65536
#SBATCH --partition=bdw

module purge
module load impi

srun my-program arg1 arg2 ...

which will launch a job with 4 MPI processes (ranks) with each process (rank) using 20 CPU cores and 64GB memory.

CUDA

The cuda nodes have different accelerator cards installed (see Hardware). To run on either of them, just choose the cuda partition:

sbatch -p cuda ...

You can also choose a specific GPU for your job with the –gres parameter, which can either be v100, titanv or a100 together with the number of GPUs requested:

sbatch -p cuda --gres gpu:v100:1 ...

Tensorflow

Tensorflow requires specific combinations of CUDA (and cuDNN). It is therefore recommended to use virtual environments for Tensorflow.

For Tensorflow v2.13:

module load cuda/11.8
module load cudnn/8.6
python3 -m venv tf-2.13
tf-2.13/bin/pip install tensorflow==2.13
tf-2.13/bin/pip install nvidia-cudnn-cu11==8.6.0.163

For Tensorflow v2.14:

module load cuda/11.8
module load cudnn/8.7
python3 -m venv tf-2.14
tf-2.14/bin/pip install tensorflow==2.14
tf-2.14/bin/pip install nvidia-cudnn-cu11==8.7.0.84

Note

To list available versions of nvidia-cudnn-cu11 run pip install --use-deprecated=legacy-resolver nvidia-cudnn-cu11==

Matlab

To run Matlab programs using the batch system, copy the following lines into your batch script:

source /etc/profile.d/modules.sh
source /etc/profile.d/opt_local_modules.sh

module load matlab

matlab -nosplash -nodesktop -nojvm -r "run('myprogram.m');quit"

where your replace myprogram.m with the corresponding Matlab file containing your instructions.

Don’t forget to add additional resource requests for the number of tasks, CPU cores or main memory as described above.