Using Local /scratch (TMPDIR) on Compute Nodes

All nodes (compute and development) have their own locally storage mounted as /scratch. The /scratch storage is fast - faster than system-wide storage such as home folders and lab storage - which make it ideal for holding intermediate data files. This will also lower the load on the system-wide storage and the local network. Using local /scratch is a win-win for everyone.


Here is how to use /scratch:


Here is a script called that make use of of the job-specific $TMPDIR folder (on local scratch). At the beginning, the script copies input files over from the NFS-mounted /data drive to this local scratch folder. After processing of the input files is complete, the output files are moved from the local scratch to /data.

#!/bin/env bash
#PBS -j oe

## 0. In case TMPDIR is not set (e.g. running on another system)
if [[ -z "$TMPDIR" ]]; then
  if [[ -d /scratch ]]; then TMPDIR=/scratch/$USER; else TMPDIR=/tmp/$USER; fi
  mkdir -p "$TMPDIR"
  export TMPDIR

## 1. Copy input files from global disk to local scratch
cp /data/$USER/sample.fq $TMPDIR/
cp /data/$USER/reference.fa $TMPDIR/

## 2. Process input files
/path/to/my_pipeline --cores=$PBS_NUM_PPN reference.fa sample.fq > output.bam

## 3. Move output files back to global disk
mv output.bam /data/$USER/

Assume that the total amount of local scratch you need for your input files and your output files and whatever intermediate files my_pipeline needs is 300 GiB, and assume that the process requires up to 4 GiB of RAM to complete. Moreover, let’s say you wish to run in parallel using two cores. Then you should submit this job script as:

$ qsub -l nodes=1:ppn=2 -l gres=scratch:150 -l vmem=4gb

This will identify a node with 2 cores, 2 * 150 GiB = 300 GiB of scratch, and 4 GiB of RAM available.

Technical details

To clarify, the gres:scratch resource is just a bunch of tokens available per node that are handed out to jobs and recollected when those jobs are done. The number of tokens available for a given node depends on how big it’s /scratch/ drive is. What is not automatically accounted for is the actual free disk space available on /scratch/. In other words, it is possible for a node’s /scratch to become full although there are gres:scratch tokens available for that node. When /scratch becomes full, any attempts to write to the drive will generate

write error: No space left on device

Because of this, it is very important that we all clean up after ourselves, if we make use of any /scratch/ space outside of the job-specific $TMPDIR folder.