Batch System

The batch system of the hydra cluster is provided by SLURM.

Using a batch system means, that you submit a job to the batch system which requests certain resources of the compute servers, e.g., how many processors or how much memory should be used. If the requested resources are available the job is started, otherwise execution waits until the resources are available.

All batch commands have to be executed on the front end of the batch system, i.e., hydra. Hence, you first have to log into hydra to access the compute nodes of the batch system. Direct login to the batch nodes is provided for job maintenance only and limited to 10 min.

The batch servers consist of different computer systems, e.g. different CPUs, different amount of main memory, with accelerator cards. This leads to different partitions for running jobs (see below).

Submitting a Job

sbatch

Executes all commands contained in a batch script on the compute servers based on the requested resources.

A minimal batch script looks like:

#!/bin/bash

mycommand arg1 arg2

Saving the file, e.g., as mybatchjob, you can submit it to the batch system as

sbatch mybatchjob

The default resources for the job are one CPU core on one compute server with 8 GB of memory and up to 48 hours runtime.

Resources are requested by one of the following options:

Option	Resource Request
`--nodes=m`	request m nodes (compute servers)
`--ntasks-per-node=m`	request m tasks per node
`--cpus-per-task=m`	request m CPU cores per task
`--hint=nomultithread`	disable hyperthreading
`--exclusive`	get exclusive access to compute nodes
`--mem=m`	request m MB of memory per node (m=0 equals all memory)
`--mem-per-cpu=m`	request m MB of memory per CPU
`--partition=<name>`	request job execution on partition name

To start a job with one task on one node with 4 CPU cores using up to 64G memory one needs to add the following arguments during job submission:

sbatch --nodes=1 --ntasks-per-node=1 --cpus-per-task=4 --mem=65536 mybatchjob

Parameters describing job resources can also be placed in the script by using the prefix #SBATCH:

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=65536
#SBATCH --partition=bdw

mycommand arg1 arg2

This again starts a job with one task on one node with 4 CPU cores using up to 64G memory. In addition, the job is placed in the bdw partition (see below).

You may provide additional options for your job, e.g. a job name or for email notification:

Option	Resource Request
`--job-name=<name>`	set job name
`--mail-type=<type>`	send email at `BEGIN, END, FAIL, REQUEUE, ALL`
`--mail-user=<address>`	set email address for notifications

Please make sure, that you specify a full email adress, e.g., user@mis.mpg.de. Otherwise this will not work.

Partitions

The batch systems puts the nodes into several partitions

Partition	Nodes	Hardware
bdw	bdw01..08	2x20 cores Intel Broadwell, 512 GB RAM
epyc	epyc01..02	2x64 cores AMD Epyc Rome, 512 GB RAM
cuda	cuda01..04	one or two NVidia accellerator cards

Please note, that the default partition is bdw!

Interactive Jobs

You may also request an interactive job, i.e., a command line on one of the batch nodes via:

srun --pty -u bash -i

All of the above parameters for sbatch are also available (and should be used!) for interactive sessions.

Please note that graphical applications are currently not supported in interactive sessions. For this, please use the interactive compute servers.

Array jobs

Jobs with identical parameters, so called array jobs, may also be submitted. For this, the parameter --array of the sbatch command is available. The parameter expects an array specification, which is a list of array indices:

sbatch --array 0,1,2,3,4 ...

or a range specifier:

sbatch --array 0-16:4 ...

The step width (:4) is optional and defaults to `.

All of the above may also be combined:

sbatch --array 0-16:4,32 ...

Within the batch script the individual tasks of the array job may be distinguished by using the SLURM_ARRAY_TASK_ID environment variable.

CPU affinity

Affinity of user programs to CPU cores is set by SLURM as requested by the user with the above options. The following table contains some typical configurations:

Configuration	Arguments
1 task/node, 2 CPUs/task, w/o HT	`--ntasks-per-node=1 --cpus-per-task=40 --hint=nomultithread`
1 task/node, 2 CPUs/task, w/ HT	`--ntasks-per-node=1 --cpus-per-task=40 --threads-per-core=1`
2 tasks/node, 1 CPUs/task, w/ HT	`--ntasks-per-node=2 --cpus-per-task=20 --threads-per-core=1`
2 tasks/node, 1 CPUs/task, w/o HT	`--ntasks-per-node=2 --cpus-per-task=20 --hint=nomultithread`

The values for --cpus-per-task correspond to nodes in the bdw partition. For the epyc partition you should set --cpus-per-task to 128.

Job Control

squeue

To view the currently allocated/running jobs, the sqeue command is available. By default all jobs will be shown. To limit the output to jobs of a specific user use the parameter -u:

squeue -u <username>

scancel

The command scancel cancels specific jobs identified by their job id, which is either printed while submitting your job or shown in the output of squeue:

scancel <jobid>

To cancel all jobs of a user, again use the parameter -u:

scancel -u <username>

sinfo

Shows various information about the state of the partitions.

MPI

The default MPI implementation on the compute servers is OpenMPI v4.1. All programs installed via default Linux packages will use this.

An example batch script will look like

#!/bin/bash

#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=20
#SBATCH --mem=65536
#SBATCH --partition=bdw

srun --mpi=pmix my-program arg1 arg2 ...

which will launch a job with 4 MPI processes (ranks) with each process (rank) using 20 CPU cores and 64GB memory.

GPUs

A special partition gpu gives access to nodes equipped with different accelerators. The accelerator type is chosen via the gres option, which can either be

NVidia A100 using --gres gpu:a100:1
AMD MI210 using --gres gpu:mi210:2

It is advised to always allocate the full node for computations. The option list for batch scripts is

#SBATCH --partition gpu
#SBATCH --gres gpu:a100:1
#SBATCH --cpus-per-task=64
#SBATCH --hint=multithread
#SBATCH --mem=0

for the NVidia A100 and

#SBATCH --partition gpu
#SBATCH --gres gpu:mi210:2
#SBATCH --cpus-per-task=64
#SBATCH --hint=multithread
#SBATCH --mem=0

for the AMD MI210.

Note

Please note that you can only allocate up to 32 cores (including hyper threads) if only a single MI210 card is requested.

Note

For the AMD MI210 CUDA is not supported. Please use ROCm or HIP instead.

Tensorflow

Tensorflow requires specific combinations of CUDA (and cuDNN). It is therefore recommended to use virtual environments for Tensorflow.

For Tensorflow v2.13:

module load cuda/11.8
module load cudnn/8.6
python3 -m venv tf-2.13
tf-2.13/bin/pip install tensorflow==2.13
tf-2.13/bin/pip install nvidia-cudnn-cu11==8.6.0.163

For Tensorflow v2.14:

module load cuda/11.8
module load cudnn/8.7
python3 -m venv tf-2.14
tf-2.14/bin/pip install tensorflow==2.14
tf-2.14/bin/pip install nvidia-cudnn-cu11==8.7.0.84

Note

To list available versions of nvidia-cudnn-cu11 run pip install --use-deprecated=legacy-resolver nvidia-cudnn-cu11==

Matlab

To run Matlab programs using the batch system, copy the following lines into your batch script:

source /etc/profile.d/modules.sh
source /etc/profile.d/opt_local_modules.sh

module load matlab

matlab -nosplash -nodesktop -nojvm -r "run('myprogram.m');quit"

where your replace myprogram.m with the corresponding Matlab file containing your instructions.

Don’t forget to add additional resource requests for the number of tasks, CPU cores or main memory as described above.