Cluster Documentation

From ITTC Help
Revision as of 10:54, 29 December 2016 by Wmason (talk | contribs) (→‎GUI Access)
Jump to navigation Jump to search

Introduction

The Advanced Computing Facility (ACF) is located at the Information and Telecommunication Technology Center (ITTC) in Nichols Hall, and provides a 20-fold increase in power to support a diverse range of research. The facility houses high performance computing (HPC) resources and, thanks to a $4.6 million renovation grant from the NIH, has the capability of supporting over 24,000 processing cores. A unique feature of the ACF is a sophisticated computer-rack cooling system that shuttles heat from computing equipment into the Nichols Hall boiler room, resulting in an expected 15% reduction in building natural gas use. Additionally, when outdoor temperatures drop below 45 degrees, a "dry-cooler" will kick in, slashing electricity consumption by allowing cooling compressors to be powered down.

The ITTC Research Cluster is located in the ACF, and provides HPC resources to members of the center. The cluster uses the Slurm workload manager, which is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters. The cluster is composed of a variety of hardware types, with core counts ranging from 8 to 20 cores per node. In addition, there is specialized hardware including Nvidia graphics cards for GPU computing, Infiniband for low latency/high throughput parallel computing, and large memory systems with up to 512 GB of RAM.

Job Submission Guide

PBS/Torque and Slurm

A translation for common PBS/Torque commands to Slurm commands can be found here. This provides a quick guide for those who are familiar with PBS/Torque, but new to the Slurm scheduler.

srun

srun can be used to run any single task on a cluster node, but it is most useful for launching interactive GUI or bash sessions. Here is an example srun run on login1:



The options used in this example are all detailed below
-C
specifies a node constraint. This can be used to specify nodes
-J, --job-name
Specifies the job name that appears in the job queue.
--
--cores-per-socket
sets the requested number of cores per cpu socket requested.
--mem
This allow the specification of the desired memory per node. memory can be specified in Kilobytes (K), Megabytes (M), and Gigabytes (G).
--mem-per-cpu
This specifies the memory allocated to each cpu in the interactive session. It has the same memory specification syntax as --mem.
--mail-type
Sets when the user is to be mailed job notifications. NONE, BEGIN, END, FAIL, REQUEUE, TIME_LIMIT, TIME_LIMIT_90, TIME_LIMIT_80, and TIME_LIMIT_50 are all valid options
--mail-user
specifies the user account to email when job notification emails are sent.


sbatch

GUI Access

X11 forwarding

Access to a GUI running on the cluster may be accomplished with X11 forwarding. Data from the remote application is sent over ssh to an X server running locally. Each additional ssh connection between the local machine and the cluster must be started with X11 forwarding enabled. To request an interactive shell with X11 forwarding, you can run "srun.x11". The following steps assume that the local machine has an X server running.

1. Login via ssh to login1 or login2. Make sure your local ssh client has X11 forwarding enabled. If you are using ssh on the command line, add the "-Y" flag to your ssh command.
2. Start an interactive session with X11 forwarding. Be sure to request the number of cores, amount of memory, and walltime to complete your job. Syntax:

srun.x11 -N 1 -n 2 --mem=4096mb -t 8:00:00 

NoMachine

NoMachine is a remote desktop application that is available for Linux, Windows, and OSX. NoMachine requires that you are connected to the KU or ITTC network; remote users will need to use the KU Anywhere VPN.

Here is a step-by-step guide to setting up NoMachine:

  1. First, you will need to install NoMachine on you computer, which is available here
  2. After installing NoMachine, run the client and click 'Continue' on the start-up screen. From the options available, click 'Create a new custom Connection'. Change the Protocol setting from 'NX' to 'SSH' and click continue. In the 'Host' field, put login1.ittc.ku.edu or login2.ittc.ku.edu, and in the 'Port' field, put 22, then continue to the next screen.
    For the authentication method, select the 'Use the NoMachine login' radial button and continue. The next screen prompts for an alternative server key; hit continue without specifying one. Lastly, select the 'Don't use a proxy' radial button before continuing on the final page.
  3. After creating your new connection configuration, you will be prompted for a name to save the configuration under. After saving your connection, you should be able to see it in a list of created connections(s) on the main page. Double click on the one you just created to try connecting.
    If you configured the settings properly, you will be prompted to enter your ITTC credentials. If you are not prompted to enter your credentials, you may have entered information incorrectly in the connection creation process, or you computer may not be on the correct network.
    Make sure you are connected to the KU or ITTC network through the KU Anywhere VPN if you are connecting to the cluster remotely. If you are unable to solve your connection issue, email [1] for assistance in setting up your remote connection.
  4. Assuming you are able to connect, the first time you connect, you will be asked to 'verify the host authenticity'. Click 'yes' to continue with the connection. You will now be asked to select a desktop environment to use for the connection. The GNOME desktop is recommended.
    After selecting the environment, read through the NoMachine welcome screen and continue on to the desktop.

Application Support

Helpful Commands

Debugging

Profiling