Cluster Documentation

Introduction

The Advanced Computing Facility (ACF) is located at the Information and Telecommunication Technology Center (ITTC) in Nichols Hall, and provides a 20-fold increase in power to support a diverse range of research. The facility houses high performance computing (HPC) resources and, thanks to a $4.6 million renovation grant from the NIH, has the capability of supporting over 24,000 processing cores. A unique feature of the ACF is a sophisticated computer-rack cooling system that shuttles heat from computing equipment into the Nichols Hall boiler room, resulting in an expected 15% reduction in building natural gas use. Additionally, when outdoor temperatures drop below 45 degrees, a "dry-cooler" will kick in, slashing electricity consumption by allowing cooling compressors to be powered down.

The ITTC Research Cluster is located in the ACF, and provides HPC resources to members of the center. The cluster uses the Slurm workload manager, which is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters. The cluster is composed of a variety of hardware types, with core counts ranging from 8 to 20 cores per node. In addition, there is specialized hardware including Nvidia graphics cards for GPU computing, Infiniband for low latency/high throughput parallel computing, and large memory systems with up to 512 GB of RAM.

Getting Help

If you have any questions about the ITTC Research Cluster, feel free to email clusterhelp@ittc.ku.edu for assistance.

Job Submission Guide

PBS/Torque and Slurm

A translation for common PBS/Torque commands to Slurm commands can be found here. This provides a quick guide for those who are familiar with PBS/Torque, but new to the Slurm scheduler.

Submitting Jobs

To submit jobs to the cluster, you can either write a script and submit it using sbatch:

[username@login1 ~]$ sbatch script.sh

Or, you can submit jobs interactively from the command line using srun:

[username@login1 ~]$ srun echo Hello World!

Job scripts use parameters (denoted by #SBATCH) in the script file to requested job resources, while interactive jobs request resources with command line parameters. When no resources are requested, a default set is automatically allocated for the job.

This default resource set includes :

The job's name is set the same as the script file name, or, if the job was started with srun, then the job name is the same as the first command (in the case of the example above, the name would be set to 'echo').
The job is scheduled in the default intel queue.
The job is allocated 1 core on 1 node with 2GB of memory.
The job is allocated 1 day to run.
The job redirects stdout and stderr to the same output file if the job is submitted with sbatch. If srun is used, then both will be printed to the screen.
The job's output file name takes the form "slurm-jobid.out", and is created in the same directory as the job script.

srun

srun can be used to run any single task on a cluster node, but it is most useful for launching interactive GUI or bash sessions. Here is an srun example run on login1:

[username@login1 ~]$ srun -p intel -N 1 -n 1 -c 4 --mem 4G --pty /bin/bash
[username@n097 ~]$

The options used in this example are all detailed below:

-p: Specifies a partition, or queue to create the job in. The current cluster partitions available are intel, amd, bigm, and gpu. For more information on the cluster queues, see the partitions section below.
-N: This sets the number of requested nodes for the interactive session.
-n: Specifies the number of tasks or processes to run on each allocated node.
-c: Sets the number of requested cpus per task
--mem: This specifies the requested memory per node. Memory amounts can be given in Kilobytes (K), Megabytes (M), and Gigabytes (G).
--pty: This option puts the srun session in pseudo-terminal mode. It is recommended to use this option if you are running an interactive shell session.
/bin/bash: The last option in an srun invocation is the program that srun will execute on the requested node. In this case, bash is specified to start an interactive shell session.

srun is used to submit both interactive and non-interactive jobs. When it is run directly on the command line as shown above, an interactive session is started on a cluster node. When it is used in a job submission script, it starts a non-interactive session.

sbatch

sbatch is used to submit jobs to the cluster using a script file. Below is an example job submission script:

#!/bin/bash
#SBATCH -p intel
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 1
#SBATCH --mem=1GB
#SBATCH -t 00:20:00 
#SBATCH -J test_job
#SBATCH -o slurm-%j.out
 
echo "Job ${SLURM_JOB_ID} ran on ${HOSTNAME}"

Example output:

[username@login1 ~]$ sbatch test_job.sh
[username@login1 ~]$ cat slurm-47491.out
Job 47491 ran on n097
[username@login1 ~]$

This script requests one node with one core, and 1GB of memory. -J is used to specify the job name that appears in the job queue, while -o specifies the log file name for the job. %j in the job output file name is replaced with the Slurm job id when the scheduler processes the script. The variable SLURM_JOB_ID used in the example output is an environment variable set by the Slurm scheduler for each job.

To run this example script, copy its contents into a file in your home directory (test_job.sh for example). Log in to either login1.ittc.ku.edu or login2.ittc.ku.edu with your ITTC credientials, and run the command sbatch test_job.sh. The job output log will be saved in the same directory as the job submission script, and should contain similar output to the example above.

sbatch job scripts can run programs directly, as show above, but it is also possible to use srun within job submission scripts to run programs. Using srun in a job script allows for fine-grained resource control over parallel tasks run in a job script. An example is shown below:

#!/bin/bash
#SBATCH -p intel
#SBATCH -N 1
#SBATCH -n 2
#SBATCH -c 1
#SBATCH --mem=2GB
#SBATCH -t 00:20:00 
#SBATCH -J test_job
#SBATCH -o slurm-%j.out

srun -n 1 --mem=1G echo "Task 1 ran" &
srun -n 1 --mem=1G echo "Task 2 ran" &

wait

When the sbatch script is submitted, both srun invocations will run at the same time, splitting the resources requested at the top of the script file. This method is useful for launching a small number of related jobs at once from the same script, but does not scale well with a large number of jobs. The Job Array section below goes into more depth on running large numbers of parallel jobs on the cluster.

When using srun within a job submission script, you need to specify what portion of the resources each srun invocation is allocated. If more resources are requested by srun than are made available by the #SBATCH parameters, then some jobs may wait to run, or attempt to share resources with already running jobs. In the example above, two tasks and 2GB of memory are requested. In the srun commands below the resource request, we specify how much memory and how many tasks are allocated to each job.

The sbatch options shown in these example scripts are just the tip on the iceberg in terms of what is available. For the full listing of sbatch parameters, see the official Slurm sbatch documentation

Here is a brief list of other common options that may be useful:

-C: Specifies a node constraint. This can be used to specify cpu architecture, and instruction set.
-D: Specifies the path to the log file destination directory. This can be an absolute path, or a relative path from the job submission script directory.
--gres: Used to request GPU resources. See this example for more information on running GPU jobs.
--cores-per-socket: Sets the requested number of cores per cpu socket.
--mem-per-cpu: This specifies the memory allocated to each cpu in the interactive session. It has the same memory specification syntax as --mem.
--mail-type: Sets when the user is to be mailed job notifications. NONE, BEGIN, END, FAIL, REQUEUE, TIME_LIMIT, TIME_LIMIT_90, TIME_LIMIT_80, and TIME_LIMIT_50 are all valid options
--mail-user: Specifies the user account to email when job notification emails are sent.

Job Arrays

Submitting a large number of cluster jobs at once has two general approaches. The first is to submit jobs to the scheduler using srun in a loop on the command line. The preferable, and more powerful approach uses job arrays to submit large blocks of jobs all at once with the sbatch command.

The --array parameter for sbatch allows the scheduler to queue up hundreds to thousands of jobs with the same resource requests. This method is much less taxing on the cluster scheduler, and simplifies the process of submitting a large number of jobs all at once. These arrays usually consist of the same program fed different parameters dictated by the job array indicies.

An example job array script is shown below:

#!/bin/bash
#SBATCH -p intel
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 1
#SBATCH --mem=1G
#SBATCH -t 00:20:00
#SBATCH -J test_job
#SBATCH -o logs/%A_%a.out
#SBATCH --array=1-4

echo Job ${SLURM_ARRAY_TASK_ID} used $(awk "NR == ${SLURM_ARRAY_TASK_ID} {print \$0}" ${SLURM_SUBMIT_DIR}/parameters)

Example output:

[username@login1 ~]$ sbatch array_test.sh
[username@login1 ~]$ cd logs/
[username@login1 logs]$ ls
49219_1.out  49219_2.out  49219_3.out  49219_4.out
[username@login1 logs]$ cat *
Job 1 used line 1 parameters
Job 2 used line 2 parameters
Job 3 used line 3 parameters
Job 4 used line 4 parameters
[username@login1 logs]$

Parameters file:

line 1 parameters
line 2 parameters
line 3 parameters
line 4 parameters

In this example, the %A and %a symbols in the job log file path are replaced by the scheduler with the job array id and job array indicie respectively for each job in the array. The --array option specifies the creation of a job array which consists of four identical jobs with indices ranging from 1 to 4. Each job in the array is created with the same resource request at the top of the file, and runs the same bash command at the bottom of the script file. The echo command prints out the SLURM_ARRAY_TASK_ID (or job array indicie) environment variable of each job, along with one line from a file called "parameters". The awk command within the echo selects the line in the parameters file with the line number that matches the job array indicie value. This technique can be used to feed in specific parameters to different jobs within a job array.

Another way of generating program parameters for job arrays is through arithmetic. For example, if you wanted to define a minimum and maximum value a job needed to loop through based on its indicie value, in your job script, you may include something like this:

MAX=$(echo "${SLURM_ARRAY_TASK_ID} * 1000" | bc)
MIN=$(echo "$({SLURM_ARRAY_TASK_ID} - 1) * 1000" | bc)

for (( i=$MIN; i<$MAX; i++ )); do
  # Perform calculations...
done

Cluster Partitions

Cluster partitions, or queues, are sets of nodes in the cluster grouped by their features. Currently, there are four partition in the ITTC cluster: intel, amd, bigm, and gpu. The intel and amd partitions are made up of nodes that contain exclusively intel and amd cpus respectively. The bigm queue is made up of nodes with RAM from 256 to 500GB, and the gpu partition contains nodes with Nvidia gpu co-processors. Partitions can be specified in a job script with the -p option:

#SBATCH -p intel

They can also be specified in interactive sessions:

srun -p intel -N 1 -n 1 --pty /bin/bash

Partitions allow for high-level constraints on job hardware, but lack fine-grained control over things like cpu and gpu architecture.

Job Constraints

Job constraints allow precise specification for what hardware a job should run on. Cpu architectures and instruction sets can be requested, as well as the networking type, node manufacturer, and memory. Specifying hardware constraints is done with the -C option:

#SBATCH -C "intel"

Multiple constraints can also be specified at once:

srun -C "intel&ib" --pty /bin/bash

In this example, the & symbol between the two constraints specifies that both should be fulfilled for the job to run. The | symbol can also be used to specify that either one or the other constraint can be fulfilled. Additionally, square-brackets can be used to group together constraints. Here is an example combining all three:

#SBATCH -C "[intel&ib]|[amd&eth_10g]"

Available constraints:

Instruction Set

sse3
sse4_1
sse4_2
sse4a
avx

CPU Brand/Cores

intel
amd
intel8
amd8
intel12
intel16
intel20

Networking

ib
ib_ddr
ib_qdr
noib
eth_10g

Manufacturer/CPU Brand/Cores/Memory

del_int_8_16
del_int_8_24
del_int_12_24
asu_int_12_32
sup_int_12_32
asu_int_12_128
del_int_16_64
del_int_16_256
del_int_16_512
del_int_20_128
del_amd_8_16

GPU Jobs

Instead of using hardware constraints, GPUs are specified with Generic Resource (gres) requests. Below is an example of an interactive GPU job request:

srun -p gpu --gres="gpu:k20:2" --pty /bin/bash

This request specifies two Nvidia K20 GPUs in the GPU queue for the interactive session, along with the default job resources. The --gres option allows the specification of a the GPU model and number through a colon-delimited list. Below is a job script example:

#SBATCH -p gpu
#SBATCH --gres="gpu:k40:1"

The GPU partition must be specified when requesting GPUs, otherwise the scheduler will reject the job. Whenever a job is started on a GPU node, the environment variable CUDA_VISIBLE_DEVICES is set to contain a comma-delimited list of the GPUs allocated to the current job. Information about these GPUs can be viewed by running nvidia-smi.

Here is example output from the srun example above:

[username@login1 ~]$ srun -p gpu --gres="gpu:k20:2" --pty /bin/bash
[username@g002 ~]$ echo $CUDA_VISIBLE_DEVICES
1,2
[username@g002 ~]$ nvidia-smi
Fri Jan 20 16:23:01 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48                 Driver Version: 367.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K20m          Off  | 0000:02:00.0     Off |                    0 |
| N/A   30C    P0    47W / 225W |      0MiB /  4742MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K20m          Off  | 0000:03:00.0     Off |                    0 |
| N/A   29C    P0    47W / 225W |      0MiB /  4742MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K20m          Off  | 0000:83:00.0     Off |                    0 |
| N/A   28C    P0    48W / 225W |      0MiB /  4742MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K20m          Off  | 0000:84:00.0     Off |                    0 |
| N/A   28C    P0    51W / 225W |      0MiB /  4742MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
[username@g002 ~]$

Currently, there are five different GPU models available in the cluster:

Gpu Models:

m2070
k20
k40
k80
titanxp

GUI Access

X11 forwarding

Access to a GUI running on the cluster may be accomplished with X11 forwarding. Data from the remote application is sent over ssh to an X server running locally. Each additional ssh connection between the local machine and the cluster must be started with X11 forwarding enabled. To request an interactive shell with X11 forwarding, you can run "srun.x11". The following steps assume that the local machine has an X server running.

Login via ssh to login1 or login2. Make sure your local ssh client has X11 forwarding enabled. If you are using ssh on the command line, add the "-X" flag to your ssh command.
Load the slurm-torque/14.11.8 module on login1 with the module load slurm-torque/14.11.8 command. This will allow you to start an X11 session using srun.
Start an interactive session with X11 forwarding. Be sure to request the number of cores, amount of memory, and walltime to complete your job. Syntax:
```
srun.x11 -N 1 -n 2 --mem=4096mb -t 8:00:00
```
After starting an interactive session with X11 forwarding, you can now launch graphical programs from the terminal.

NoMachine

NoMachine is a remote desktop application that is available for Linux, Windows, and OSX. NoMachine requires that you are connected to the KU or ITTC network; remote users will need to use the KU Anywhere VPN.

Here is a step-by-step guide to setting up NoMachine:

<pdf height="575">File:NoMachineTutorial.pdf</pdf>

General Cluster Information

Software Environment

All cluster nodes run CentOS version 7 with GCC version 4.8.5. Cluster applications are installed as modules in the /nfs/apps/7/arch/generic.

Environment Modules

Cluster software is made available through environment modules. A list of available modules can be viewed by running:

module avail

Modules shown in the list can be loaded with the following command:

module load module_name

In order to persist loaded modules between interactive sessions, you need to add module load commands for the applications you want loaded to your ~/.bash_profile file if you are using bash, or ~/.cshrc if you are using tcsh or csh.

To view all loaded modules in your current shell session, use the module list command. To unload all currently loaded modules, you can use the module purge command. For more information on the module command and its options, see the documentation for further detail.

Filesystems

Below is a list of filesystems available on the cluster:

Path	Description	Default Quota
/users	Stores private home directories. Avoid running cluster jobs out of this directory.	5GB
/work	Shared group storage.	1TB
/scratch	Private working storage to run cluster jobs.	1TB
/tmp	Local storage on cluster nodes.	N/A

Debugging

The cluster has a number of tools at your disposal for debugging submitted Slurm jobs. The most basic debugging information available is from the log files generated by running your job, which contain the STDERR and STDOUT output from the job. Log files are located within the submit directory with the filename slurm-<job id>.out, such as slurm-49321.out.

You can retrieve detailed job information using the command scontrol show jobid -dd <jobid>. Likewise, if you want to view detailed job information while the job is running, add the --output option to srun in your job batch file. For an unbuffered stream of STDOUT, which is quite useful for debugging, add the -u or --unbuffered to srun in your job batch file.

Helpful Commands

The Slurm scheduler has a number of utilities for finding information on the status of your jobs. Below are listed a few of the most useful commands and options for quickly finding this information.

Useful Slurm commands:

sacct: Lists information on finished and currently running jobs, including job status and exit codes.
sacct -u <username>: Lists information on currently running and recently finished jobs for the specified user.
sacct -S <start-date> -s <state>: Lists all jobs that started before the start date or time that are in the specified state.
scancel -u <username> -t <state>: Cancels all of the jobs for the specific user that are in the specified state.
scontrol hold <jobid>: Suspends the specifed job by putting it in a 'HOLD' state.
scontrol resume <jobid>: Resumes the specified job from the 'HOLD' state.
scontrol show job <jobid>: Shows detailed queue and resource allocation information for the specified job.
sinfo: Displays information on all of the cluster partitions, including the nodes available in them.
sinfo -T: Shows information on cluster node reservations, including reservation period, name, and reserved nodes.
squeue: Displays the short-form information for all currently running and queued jobs.
squeue -u <username> -l: Lists the long-form information about currently running jobs for a specific user.
squeue -u <username> -t <state>: Lists information about a specific users jobs that are in the specified state.
sview: If X11 forwarding is enabled, this command launches a graphical interface for viewing cluster information.

Citing the Cluster

If you would like to cite the ITTC research cluster in your work, feel free to use or adapt the following citation:

The authors wish to acknowledge Wesley Mason, Michael Hulet and the rest of
the Information and Telecommunication Technology Center (ITTC) staff at The
University of Kansas for their support with our high performance computing.

Cluster Hardware

Visit the Cluster Hardware page for a complete listing of all of the nodes in the cluster and their hardware configurations.

Cluster Documentation

Contents

Introduction

Getting Help

Job Submission Guide

PBS/Torque and Slurm

Submitting Jobs

srun

srun

sbatch

sbatch

Job Arrays

Job Arrays

Cluster Partitions

Job Constraints

GPU Jobs

GUI Access

X11 forwarding

NoMachine

General Cluster Information

Software Environment

Environment Modules

Filesystems

Debugging

Helpful Commands

Citing the Cluster

Cluster Hardware

Navigation menu

Cluster Documentation

Introduction

Getting Help

Job Submission Guide

PBS/Torque and Slurm

Submitting Jobs

srun

srun

sbatch

sbatch

Job Arrays

Job Arrays

Cluster Partitions

Job Constraints

GPU Jobs

GUI Access

X11 forwarding

NoMachine

General Cluster Information

Software Environment

Environment Modules

Filesystems

Debugging

Helpful Commands

Citing the Cluster

Cluster Hardware

Navigation menu

Search