Cluster Documentation
Introduction
The Advanced Computing Facility (ACF) is located at the Information and Telecommunication Technology Center (ITTC) in Nichols Hall, and provides a 20-fold increase in power to support a diverse range of research. The facility houses high performance computing (HPC) resources and, thanks to a $4.6 million renovation grant from the NIH, has the capability of supporting over 24,000 processing cores. A unique feature of the ACF is a sophisticated computer-rack cooling system that shuttles heat from computing equipment into the Nichols Hall boiler room, resulting in an expected 15% reduction in building natural gas use. Additionally, when outdoor temperatures drop below 45 degrees, a "dry-cooler" will kick in, slashing electricity consumption by allowing cooling compressors to be powered down.
The ITTC Research Cluster is located in the ACF, and provides HPC resources to members of the center. The cluster uses the Slurm workload manager, which is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters. The cluster is composed of a variety of hardware types, with core counts ranging from 8 to 20 cores per node. In addition, there is specialized hardware including Nvidia graphics cards for GPU computing, Infiniband for low latency/high throughput parallel computing, and large memory systems with up to 512 GB of RAM.
Getting Help
If you have any questions about the ITTC Research Cluster, feel free to email clusterhelp@ittc.ku.edu for assistance.
Job Submission Guide
PBS/Torque and Slurm
A translation for common PBS/Torque commands to Slurm commands can be found here. This provides a quick guide for those who are familiar with PBS/Torque, but new to the Slurm scheduler.
Submitting Jobs
To submit jobs to the cluster, you can either write a script and submit it using sbatch:
[username@login1 ~]$ sbatch script.sh
Or, you can submit jobs interactively from the command line using srun:
[username@login1 ~]$ srun echo Hello World!
Job scripts use parameters (denoted by #SBATCH
) in the script file to requested job resources, while interactive jobs request resources with command line parameters. When no resources are requested, a default set is automatically allocated for the job.
- This default resource set includes :
- The job's name is set the same as the script file name, or, if the job was started with srun, then the job name is the same as the first command (in the case of the example above, the name would be set to 'echo').
- The job is scheduled in the default intel queue.
- The job is allocated 1 core on 1 node with 2GB of memory.
- The job is allocated 1 day to run.
- The job redirects stdout and stderr to the same output file if the job is submitted with sbatch. If srun is used, then both will be printed to the screen.
- The job's output file name takes the form "slurm-jobid.out", and is created in the same directory as the job script.
srun
srun
can be used to run any single task on a cluster node, but it is most useful for launching interactive GUI or bash sessions. Here is an srun example run on login1:
[username@login1 ~]$ srun -p intel -N 1 -n 1 -c 4 --mem 4G --pty /bin/bash [username@n097 ~]$
- The options used in this example are all detailed below:
-
- -p
- Specifies a partition, or queue to create the job in. The current cluster partitions available are
intel
,amd
,bigm
, andgpu
. For more information on the cluster queues, see the partitions section below. - -N
- This sets the number of requested nodes for the interactive session.
- -n
- Specifies the number of tasks or processes to run on each allocated node.
- -c
- Sets the number of requested cpus per task
- --mem
- This specifies the requested memory per node. Memory amounts can be given in Kilobytes (K), Megabytes (M), and Gigabytes (G).
- --pty
- This option puts the srun session in pseudo-terminal mode. It is recommended to use this option if you are running an interactive shell session.
- /bin/bash
- The last option in an srun invocation is the program that srun will execute on the requested node. In this case, bash is specified to start an interactive shell session.
srun is used to submit both interactive and non-interactive jobs. When it is run directly on the command line as shown above, an interactive session is started on a cluster node. When it is used in a job submission script, it starts a non-interactive session.
sbatch
sbatch
is used to submit jobs to the cluster using a script file. Below is an example job submission script:
#!/bin/bash #SBATCH -p intel #SBATCH -N 1 #SBATCH -n 1 #SBATCH -c 1 #SBATCH --mem=1GB #SBATCH -t 00:20:00 #SBATCH -J test_job #SBATCH -o slurm-%j.out echo "Job ${SLURM_JOB_ID} ran on ${HOSTNAME}"
Example output:
[username@login1 ~]$ sbatch test_job.sh [username@login1 ~]$ cat slurm-47491.out Job 47491 ran on n097 [username@login1 ~]$
This script requests one node with one core, and 1GB of memory. -J
is used to specify the job name that appears in the job queue, while -o
specifies the log file name for the job. %j
in the job output file name is replaced with the Slurm job id when the scheduler processes the script. The variable SLURM_JOB_ID
used in the example output is an environment variable set by the Slurm scheduler for each job.
To run this example script, copy its contents into a file in your home directory (test_job.sh for example). Log in to either login1.ittc.ku.edu or login2.ittc.ku.edu with your ITTC credientials, and run the command sbatch test_job.sh
. The job output log will be saved in the same directory as the job submission script, and should contain similar output to the example above.
sbatch job scripts can run programs directly, as show above, but it is also possible to use srun within job submission scripts to run programs. Using srun in a job script allows for fine-grained resource control over parallel tasks run in a job script. An example is shown below:
#!/bin/bash #SBATCH -p intel #SBATCH -N 1 #SBATCH -n 2 #SBATCH -c 1 #SBATCH --mem=2GB #SBATCH -t 00:20:00 #SBATCH -J test_job #SBATCH -o slurm-%j.out srun -n 1 --mem=1G echo "Task 1 ran" & srun -n 1 --mem=1G echo "Task 2 ran" & wait
When the sbatch script is submitted, both srun invocations will run at the same time, splitting the resources requested at the top of the script file. This method is useful for launching a small number of related jobs at once from the same script, but does not scale well with a large number of jobs. The Job Array section below goes into more depth on running large numbers of parallel jobs on the cluster.
When using srun within a job submission script, you need to specify what portion of the resources each srun invocation is allocated. If more resources are requested by srun than are made available by the #SBATCH
parameters, then some jobs may wait to run, or attempt to share resources with already running jobs. In the example above, two tasks and 2GB of memory are requested. In the srun commands below the resource request, we specify how much memory and how many tasks are allocated to each job.
The sbatch options shown in these example scripts are just the tip on the iceberg in terms of what is available. For the full listing of sbatch parameters, see the official Slurm sbatch documentation
- Here is a brief list of other common options that may be useful:
-
- -C
- Specifies a node constraint. This can be used to specify cpu architecture, and instruction set.
- -D
- Specifies the path to the log file destination directory. This can be an absolute path, or a relative path from the job submission script directory.
- --gres
- Used to request GPU resources. See this example for more information on running GPU jobs.
- --cores-per-socket
- Sets the requested number of cores per cpu socket.
- --mem-per-cpu
- This specifies the memory allocated to each cpu in the interactive session. It has the same memory specification syntax as
--mem
. - --mail-type
- Sets when the user is to be mailed job notifications. NONE, BEGIN, END, FAIL, REQUEUE, TIME_LIMIT, TIME_LIMIT_90, TIME_LIMIT_80, and TIME_LIMIT_50 are all valid options
- --mail-user
- Specifies the user account to email when job notification emails are sent.
Job Arrays
Submitting a large number of cluster jobs at once has two general approaches. The first is to submit jobs to the scheduler using srun in a loop on the command line. The preferable, and more powerful approach uses job arrays to submit large blocks of jobs all at once with the sbatch command.
The --array
parameter for sbatch allows the scheduler to queue up hundreds to thousands of jobs with the same resource requests. This method is much less taxing on the cluster scheduler, and simplifies the process of submitting a large number of jobs all at once. These arrays usually consist of the same program fed different parameters dictated by the job array indicies.
An example job array script is shown below:
#!/bin/bash #SBATCH -p intel #SBATCH -N 1 #SBATCH -n 1 #SBATCH -c 1 #SBATCH --mem=1G #SBATCH -t 00:20:00 #SBATCH -J test_job #SBATCH -o logs/%A_%a.out #SBATCH --array=1-4 echo Job ${SLURM_ARRAY_TASK_ID} used $(awk "NR == ${SLURM_ARRAY_TASK_ID} {print \$0}" ${SLURM_SUBMIT_DIR}/parameters)
Example output:
[username@login1 ~]$ sbatch array_test.sh [username@login1 ~]$ cd logs/ [username@login1 logs]$ ls 49219_1.out 49219_2.out 49219_3.out 49219_4.out [username@login1 logs]$ cat * Job 1 used line 1 parameters Job 2 used line 2 parameters Job 3 used line 3 parameters Job 4 used line 4 parameters [username@login1 logs]$
Parameters file:
line 1 parameters line 2 parameters line 3 parameters line 4 parameters
In this example, the %A
and %a
symbols in the job log file path are replaced by the scheduler with the job array id and job array indicie respectively for each job in the array. The --array
option specifies the creation of a job array which consists of four identical jobs with indices ranging from 1 to 4. Each job in the array is created with the same resource request at the top of the file, and runs the same bash command at the bottom of the script file. The echo
command prints out the SLURM_ARRAY_TASK_ID
(or job array indicie) environment variable of each job, along with one line from a file called "parameters". The awk
command within the echo selects the line in the parameters file with the line number that matches the job array indicie value. This technique can be used to feed in specific parameters to different jobs within a job array.
Another way of generating program parameters for job arrays is through arithmetic. For example, if you wanted to define a minimum and maximum value a job needed to loop through based on its indicie value, in your job script, you may include something like this:
MAX=$(echo "${SLURM_ARRAY_TASK_ID} * 1000" | bc) MIN=$(echo "$({SLURM_ARRAY_TASK_ID} - 1) * 1000" | bc) for (( i=$MIN; i<$MAX; i++ )); do # Perform calculations... done
Cluster Partitions
Cluster partitions, or queues, are sets of nodes in the cluster grouped by their features. Currently, there are four partition in the ITTC cluster: intel, amd, bigm, and gpu. The intel and amd partitions are made up of nodes that contain exclusively intel and amd cpus respectively. The bigm queue is made up of nodes with RAM from 256 to 500GB, and the gpu partition contains nodes with Nvidia gpu co-processors. Partitions can be specified in a job script with the -p
option:
#SBATCH -p intel
They can also be specified in interactive sessions:
srun -p intel -N 1 -n 1 --pty /bin/bash
Partitions allow for high-level constraints on job hardware, but lack fine-grained control over things like cpu and gpu architecture.
Job Constraints
Job constraints allow precise specification for what hardware a job should run on. Cpu architectures and instruction sets can be requested, as well as the networking type, node manufacturer, and memory. Specifying hardware constraints is done with the -C
option:
#SBATCH -C "intel"
Multiple constraints can also be specified at once:
srun -C "intel&ib" --pty /bin/bash
In this example, the &
symbol between the two constraints specifies that both should be fulfilled for the job to run. The |
symbol can also be used to specify that either one or the other constraint can be fulfilled. Additionally, square-brackets can be used to group together constraints. Here is an example combining all three:
#SBATCH -C "[intel&ib]|[amdð_10g]"
Available constraints:
|
|
|
|
GPU Jobs
Instead of using hardware constraints, GPUs are specified with Generic Resource (gres
) requests. Below is an example of an interactive GPU job request:
srun -p gpu --gres="gpu:k20:2" --pty /bin/bash
This request specifies two Nvidia K20 GPUs in the GPU queue for the interactive session, along with the default job resources. The --gres
option allows the specification of a the GPU model and number through a colon-delimited list. Below is a job script example:
#SBATCH -p gpu #SBATCH --gres="gpu:k40:1"
The GPU partition must be specified when requesting GPUs, otherwise the scheduler will reject the job. Whenever a job is started on a GPU node, the environment variable CUDA_VISIBLE_DEVICES
is set to contain a comma-delimited list of the GPUs allocated to the current job. Information about these GPUs can be viewed by running nvidia-smi
.
Here is example output from the srun
example above:
[username@login1 ~]$ srun -p gpu --gres="gpu:k20:2" --pty /bin/bash [username@g002 ~]$ echo $CUDA_VISIBLE_DEVICES 1,2 [username@g002 ~]$ nvidia-smi Fri Jan 20 16:23:01 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 367.48 Driver Version: 367.48 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K20m Off | 0000:02:00.0 Off | 0 | | N/A 30C P0 47W / 225W | 0MiB / 4742MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K20m Off | 0000:03:00.0 Off | 0 | | N/A 29C P0 47W / 225W | 0MiB / 4742MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla K20m Off | 0000:83:00.0 Off | 0 | | N/A 28C P0 48W / 225W | 0MiB / 4742MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla K20m Off | 0000:84:00.0 Off | 0 | | N/A 28C P0 51W / 225W | 0MiB / 4742MiB | 100% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ [username@g002 ~]$
Currently, there are five different GPU models available in the cluster:
|
GUI Access
X11 forwarding
Access to a GUI running on the cluster may be accomplished with X11 forwarding. Data from the remote application is sent over ssh to an X server running locally. Each additional ssh connection between the local machine and the cluster must be started with X11 forwarding enabled. To request an interactive shell with X11 forwarding, you can run "srun.x11". The following steps assume that the local machine has an X server running.
- Login via ssh to login1 or login2. Make sure your local ssh client has X11 forwarding enabled. If you are using ssh on the command line, add the "-X" flag to your ssh command.
- Load the slurm-torque/14.11.8 module on login1 with the
module load slurm-torque/14.11.8
command. This will allow you to start an X11 session using srun. - Start an interactive session with X11 forwarding. Be sure to request the number of cores, amount of memory, and walltime to complete your job. Syntax:
srun.x11 -N 1 -n 2 --mem=4096mb -t 8:00:00
- After starting an interactive session with X11 forwarding, you can now launch graphical programs from the terminal.
NoMachine
NoMachine is a remote desktop application that is available for Linux, Windows, and OSX. NoMachine requires that you are connected to the KU or ITTC network; remote users will need to use the KU Anywhere VPN.
- Here is a step-by-step guide to setting up NoMachine:
<pdf height="575">File:NoMachineTutorial.pdf</pdf>
General Cluster Information
Software Environment
All cluster nodes run CentOS version 7 with GCC version 4.8.5. Cluster applications are installed as modules in the /nfs/apps/7/arch/generic.
Environment Modules
Cluster software is made available through environment modules. A list of available modules can be viewed by running:
module avail
Modules shown in the list can be loaded with the following command:
module load module_name
In order to persist loaded modules between interactive sessions, you need to add module load
commands for the applications you want loaded to your ~/.bash_profile file if you are using bash, or ~/.cshrc if you are using tcsh or csh.
To view all loaded modules in your current shell session, use the module list
command. To unload all currently loaded modules, you can use the module purge
command. For more information on the module command and its options, see the documentation for further detail.
Filesystems
Below is a list of filesystems available on the cluster:
Path | Description | Default Quota |
---|---|---|
/users | Stores private home directories. Avoid running cluster jobs out of this directory. | 5GB |
/work | Shared group storage. | 1TB |
/scratch | Private working storage to run cluster jobs. | 1TB |
/tmp | Local storage on cluster nodes. | N/A |
Debugging
The cluster has a number of tools at your disposal for debugging submitted Slurm jobs. The most basic debugging information available is from the log files generated by running your job, which contain the STDERR and STDOUT output from the job. Log files are located within the submit directory with the filename slurm-<job id>.out
, such as slurm-49321.out
.
You can retrieve detailed job information using the command scontrol show jobid -dd <jobid>
. Likewise, if you want to view detailed job information while the job is running, add the --output
option to srun in your job batch file. For an unbuffered stream of STDOUT, which is quite useful for debugging, add the -u
or --unbuffered
to srun in your job batch file.
Helpful Commands
The Slurm scheduler has a number of utilities for finding information on the status of your jobs. Below are listed a few of the most useful commands and options for quickly finding this information.
- Useful Slurm commands:
-
- sacct
- Lists information on finished and currently running jobs, including job status and exit codes.
- sacct -u <username>
- Lists information on currently running and recently finished jobs for the specified user.
- sacct -S <start-date> -s <state>
- Lists all jobs that started before the start date or time that are in the specified state.
- scancel -u <username> -t <state>
- Cancels all of the jobs for the specific user that are in the specified state.
- scontrol hold <jobid>
- Suspends the specifed job by putting it in a 'HOLD' state.
- scontrol resume <jobid>
- Resumes the specified job from the 'HOLD' state.
- scontrol show job <jobid>
- Shows detailed queue and resource allocation information for the specified job.
- sinfo
- Displays information on all of the cluster partitions, including the nodes available in them.
- sinfo -T
- Shows information on cluster node reservations, including reservation period, name, and reserved nodes.
- squeue
- Displays the short-form information for all currently running and queued jobs.
- squeue -u <username> -l
- Lists the long-form information about currently running jobs for a specific user.
- squeue -u <username> -t <state>
- Lists information about a specific users jobs that are in the specified state.
- sview
- If X11 forwarding is enabled, this command launches a graphical interface for viewing cluster information.
Citing the Cluster
If you would like to cite the ITTC research cluster in your work, feel free to use or adapt the following citation:
The authors wish to acknowledge Wesley Mason, Michael Hulet and the rest of the Information and Telecommunication Technology Center (ITTC) staff at The University of Kansas for their support with our high performance computing.
Cluster Hardware
Visit the Cluster Hardware page for a complete listing of all of the nodes in the cluster and their hardware configurations.