Filesystems¶

Path	Description	Default Space Quota	Inode Quota	Type
/home	Personal Storage assigned to every user.	100GB	5 million	NFS
/work	Backed up, lower performance storage for final results	500GB	N/A	NFS
/users	Stores private home directories. Avoid running cluster jobs out of this directory.	5GB	N/A	NFS
/scratch	Private, high performance working storage to run cluster jobs. NOT backed up.	2TB	19 million	Lustre
/tmp	Local storage on cluster nodes. (purged at end of job)	N/A	N/A	tmpfs
/oldscratch	/scratch from the old cluster, mounted read-only for easy transfer of files.	1TB	1 million	PanFS
/oldwork	/work from the old cluster, mounted read-only for easy transfer of files.	N/A	N/A	PanFS

Quotas¶

Quotas are set for most of the filesystems to make sure storage is used fairly.

You can check your quota usage with the myquota command.

Space Quotas¶

Space Quotas are set to restrict the amount of space a user or group can consume.

Inode Quotas¶

Inode Quotas are set to restrict the number of combined files and directories a user has. Only so many metadata structures can be cached into memory on the metadata servers, so we try to keep them from growing out of control for performance reasons.

Transferring files to/from the cluster¶

There are several ways of transferring files to/from the cluster.

OnDemandWinSCPMobaXtermscp

Visit OnDemand Files Tab for more information on transferring files to/from the cluster from your web browser.

On Windows, you can use WinSCP for transferring files graphically. Create a new SCP connection with front1.ittc.ku.edu or front2.ittc.ku.edu as the host, and you should be presented with all of your files currently on the cluster. You can drag files from your local machine to the cluster and vice versa.

On Windows, you can use MobaXterm to transfer files to/from the cluster over SSH. Use front1.ittc.ku.edu or front2.ittc.ku.edu as the SSH host.

On Windows, Linux, and MacOS, the scp tool is installed as a part of the SSH suite of tools. Below are some examples.

Push with scp¶

Copying a directory named my_directory from your local machine into /home/username on front2.ittc.ku.edu:

username@your-desktop~$ scp -r my_directory username@front2.ittc.ku.edu:/home/username

Pull with scp¶

To copy a directory named processed_data in /home/username on front1.ittc.ku.edu to our current working directory on your local machine:

username@your-desktop~$ scp -r username@front1.ittc.ku.edu:/home/username/processed_data .

You can transfer files using either of the front nodes, it does not matter. Like the compute nodes, they have the same filesystems mounted.

Overview of IO access patterns¶

Serial IO¶

Most applications will have a serial IO access pattern. This looks like a serialized stream of filesystem operations coming from one compute node.

Parallel IO¶

Parallel IO patterns look like filesystem operations coming in from multiple compute nodes in parallel. Tools like MPI IO can facilitate "striding" IO where different processes read/write alternating parts of a shared file.

Best practices¶

NFS best practices¶

Do not run large parallel IO jobs from any of the above NFS filesystems. Our NFS fileservers are not well suited to high-throughput, high-IOPs workloads.

Lustre best practices¶

Lustre is our high-performance scratch filesystem. Lustre has been used in the majority of the largest supercomputers in the world for the last 20 years, and is generally considered mature software. However, our Lustre filesystem is NOT backed up or set up with any high availability mechanisms in place. All this to say: do NOT keep your thesis (or anything else irreplaceable) on Lustre.

Our Lustre cluster is set up with two MDS's (metadata servers) and DNEv2 to try to improve metadata performance, but Lustre is still known to have poor performance when working with many small files + directories, such as: large source code repositories, Charliecloud containers, or datasets with lots of small files.

We'd recommend:

Keep your source code in $HOME or $WORK
If you can't keep your Charliecloud containers in $HOME or $WORK, convert it to a SquashFS file before putting it on Lustre. This turns lots of stat()'s and open()'s into seek()'s in the eyes of the underlying filesystem, which Lustre is MUCH better suited to handle lots of.
If you have a large dataset with hundreds of thousands or millions of small files, come talk to us or email us at clusterhelp@ittc.ku.edu.

Lustre Striping and Stripe Size¶

File-per-process patterns¶

In a file-per-process pattern, it is usually best to set stripe size to 1 to keep OST contention at a minimum, while still making use of the aggregate bandwidth from all of the OSTs.

lfs setstripe -c 1 <directory>

Shared file patterns¶

When using a shared-file pattern, Lustre's DLM (Distributed Lock Manager) will enforce serialization of potentially conflicting operations. This has the upside of keeping ill-behaved applications from corrupting files, but the downside of complicating performance tuning.

Each stripe making up part of a file counts as a lockable "IO domain" which means only one process can write to a given stripe at a time. It probably becomes easy to see that having a small stripe count, but many processes will lead to very bad lock contention. It is recommended to keep stripe count >= number of processes to scale IO efficiently.

You might wonder how you're supposed to grow your number of processes past the number of OSTs/stripes while staying within the above constraint. The Lustre developers thought of this, and introduced an "overstriping" feature that allows for multiple stripes per OST. You can overstripe a directory like this, with the -C argument:

lfs setstripe -C 128 <directory>

This increases the degreee of parallelism for a given OST, and opens up many more lockable IO domains, reducing lock contention.

Increasing stripe size does worsen performance with small files though, since clients will need to get information about more stripes from more OSTs.

Stripe size¶

You will want to make sure your stripe size aligns with your read/write size, especially if using a shared-file pattern. This makes sure each process reads exactly one stripe without crossing into another, keeping lock contention low and subsequently, throughput high.