Cluster Documentation
Introduction
The Advanced Computing Facility (ACF) is located at the Information and Telecommunication Technology Center (ITTC) in Nichols Hall, and provides a 20-fold increase in power to support a diverse range of research. The facility houses high performance computing (HPC) resources and, thanks to a $4.6 million renovation grant from the NIH, has the capability of supporting over 24,000 processing cores. A unique feature of the ACF is a sophisticated computer-rack cooling system that shuttles heat from computing equipment into the Nichols Hall boiler room, resulting in an expected 15% reduction in building natural gas use. Additionally, when outdoor temperatures drop below 45 degrees, a "dry-cooler" will kick in, slashing electricity consumption by allowing cooling compressors to be powered down.
The ITTC Research Cluster is located in the ACF, and provides HPC resources to members of the center. The cluster uses the Slurm workload manager, which is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters. The cluster is composed of a variety of hardware types, with core counts ranging from 8 to 20 cores per node. In addition, there is specialized hardware including Nvidia graphics cards for GPU computing, Infiniband for low latency/high throughput parallel computing, and large memory systems with up to 512 GB of RAM.
Job Submission Guide
PBS/Torque and Slurm
A translation for common PBS/Torque commands to Slurm commands can be found here. This provides a quick guide for those who are familiar with PBS/Torque, but new to the Slurm scheduler.
srun
Srun can be used to run any single task on a cluster node, but it is most useful for launching interactive GUI or bash sessions. Here is a srun example run on login1:
[username@login1 ~]$ srun -p intel -N 1 -n 1 -c 4 --mem 4G --pty /bin/bash [username@n097 ~]$
- The options used in this example are all detailed below
-
- -p
- Specifies a partition, or queue to create the job in. The current cluster partitions available are
intel
,amd
,bigm
, andgpu
. - -N
- This sets the number of requested nodes of the interactive session.
- -n
- Specifies the number of tasks or processes to run on each allocated node.
- -c
- Sets the number of requested cpus per task
- --mem
- This specifies the requested memory per node. Memory amounts can be given in Kilobytes (K), Megabytes (M), and Gigabytes (G).
- --pty
- This option puts the srun session in pseudo-terminal mode. It is recommended to use this option if you are running an interactive shell session.
- /bin/bash
- The last option in a srun invocation is the program that srun will execute on the requested node. In this case, bash is specified to start an interactive shell session.
Srun is used to submit both interactive and non-interactive jobs. When it is run directly on the command line as shown above, an interactive session is started on a cluster node. When it is used in a job submission script, it starts a non-interactive session.
sbatch
Sbatch is used to submit job scripts to the cluster. Unlike srun, sbatch uses a script file to specify resource requests. Below is an example Slurm job submission script:
#!/bin/bash #SBATCH -p intel #SBATCH -N 1 #SBATCH -n 1 #SBATCH -c 1 #SBATCH --mem=1GB #SBATCH -t 00:20:00 #SBATCH -J test_job #SBATCH -o slurm-%A.out srun echo "Job ${SLURM_JOB_ID} ran on ${HOSTNAME}"
Example output:
[username@login1 ~]$ sbatch test_job.sh [username@login1 ~]$ cat slurm-47491.out Job 47491 ran on n097 [username@login1 ~]$
This script requests one node with one core and 1GB of memory. The srun command used at the bottom of the script inherits the resource request from the #SBATCH parameters above it, and launches a job that runs the given bash command on the command line. -J
is used to specify the job name that appears in the job queue, while -o
specifies the log file name for the job. %A
in the job name is replaced with the Slurm job id when the scheduler processes the job.
To run this example script, copy its contents to a file in your home directory. Login to to either login1.ittc.ku.edu or login2.ittc.ku.edu with your ittc credientials, and run the command sbatch script_name
. The job output log will be saved in the same directory as the job submission script, and should contain similar output to the example run. The variable ${SLURM_JOB_ID}
used in the example job output is an environment variable set by the Slurm scheduler for each job.
- Other useful options
-
- -C
- Specifies a node constraint. This can be used to specify gpu nodes. See [ this ] example for more information on running gpu jobs.
- -D
- Specifies the path to the log file destination directory. This can be an absolute path, or a relative path from the job submission script directory.
- --cores-per-socket
- Sets the requested number of cores per cpu socket.
- --mem-per-cpu
- This specifies the memory allocated to each cpu in the interactive session. It has the same memory specification syntax as --mem.
- --mail-type
- Sets when the user is to be mailed job notifications. NONE, BEGIN, END, FAIL, REQUEUE, TIME_LIMIT, TIME_LIMIT_90, TIME_LIMIT_80, and TIME_LIMIT_50 are all valid options
- --mail-user
- specifies the user account to email when job notification emails are sent.
GUI Access
X11 forwarding
Access to a GUI running on the cluster may be accomplished with X11 forwarding. Data from the remote application is sent over ssh to an X server running locally. Each additional ssh connection between the local machine and the cluster must be started with X11 forwarding enabled. To request an interactive shell with X11 forwarding, you can run "srun.x11". The following steps assume that the local machine has an X server running.
1. Login via ssh to login1 or login2. Make sure your local ssh client has X11 forwarding enabled. If you are using ssh on the command line, add the "-Y" flag to your ssh command.
2. Start an interactive session with X11 forwarding. Be sure to request the number of cores, amount of memory, and walltime to complete your job. Syntax:
srun.x11 -N 1 -n 2 --mem=4096mb -t 8:00:00
NoMachine
NoMachine is a remote desktop application that is available for Linux, Windows, and OSX. NoMachine requires that you are connected to the KU or ITTC network; remote users will need to use the KU Anywhere VPN.
Here is a step-by-step guide to setting up NoMachine:
- First, you will need to install NoMachine on you computer, which is available here
- After installing NoMachine, run the client and click 'Continue' on the start-up screen. From the options available, click 'Create a new custom Connection'. Change the Protocol setting from 'NX' to 'SSH' and click continue. In the 'Host' field, put
login1.ittc.ku.edu
orlogin2.ittc.ku.edu
, and in the 'Port' field, put22
, then continue to the next screen.
For the authentication method, select the 'Use the NoMachine login' radial button and continue. The next screen prompts for an alternative server key; hit continue without specifying one. Lastly, select the 'Don't use a proxy' radial button before continuing on the final page. - After creating your new connection configuration, you will be prompted for a name to save the configuration under. After saving your connection, you should be able to see it in a list of created connections(s) on the main page. Double click on the one you just created to try connecting.
If you configured the settings properly, you will be prompted to enter your ITTC credentials. If you are not prompted to enter your credentials, you may have entered information incorrectly in the connection creation process, or you computer may not be on the correct network.
Make sure you are connected to the KU or ITTC network through the KU Anywhere VPN if you are connecting to the cluster remotely. If you are unable to solve your connection issue, email [1] for assistance in setting up your remote connection. - Assuming you are able to connect, the first time you connect, you will be asked to 'verify the host authenticity'. Click 'yes' to continue with the connection. You will now be asked to select a desktop environment to use for the connection. The GNOME desktop is recommended.
After selecting the environment, read through the NoMachine welcome screen and continue on to the desktop.