HPC | Computer Science and Engineering | University of Arkansas Home Page



Tutorial

Using the Star of Arkansas

Table of Contents

Important Information

Using the Cluster

There are several steps to using the Star cluster. The basic steps are as follows:

Logging in

Start your favorite SSH client. (If you use Windows and don't already have an SSH client, try PuTTY, or the UArk recommended SSH client The host to login to is stargate.uark.edu.

Choosing your MPI implementation

Star has several MPI implementations available. The default MPI will be fine for most people, but the others exist for those who wish to experiment and optimize. To change the MPI implementation used, run the command mpi-selector-menu.

Running this command will print out a menu that will switch the necessary paths to another MPI implementation. Following is an example of how to switch from qlogicmpi_intel-2.1 to mvapich_intel-1.0.0.

Note: If you switch MPI implementations, you must logout and log back in before the changes take effect.

user@stargate:~# mpi-selector-menu
Current system default: <none>
Current user default: qlogicmpi_intel-2.1

"u" and "s" modifiers can be added to numeric and "U"
commands to specify "user" or "system-wide".

1. mvapich_gcc-1.0.0
2. mvapich_intel-1.0.0
3. openmpi_gcc-1.2.5
4. openmpi_intel-1.2.5
5. qlogicmpi_gnu-2.1
6. qlogicmpi_intel-2.1
U. Unset default
Q. Quit

Selection (1-6[us], U[us], Q): 2
Operator on the per-user or system-wide default (u/s)? u
Defaults already exist; overwrite them? (y/N) y

Current system default: <none>
Current user default: mvapich_intel-1.0.0

"u" and "s" modifiers can be added to numeric and "U"
commands to specify "user" or "system-wide".

1. mvapich_gcc-1.0.0
2. mvapich_intel-1.0.0
3. openmpi_gcc-1.2.5
4. openmpi_intel-1.2.5
5. qlogicmpi_gnu-2.1
6. qlogicmpi_intel-2.1
U. Unset default
Q. Quit

Selection (1-6[us], U[us], Q): q

Creating a job script

Once the application is compiled and ready for testing or execution, you must write a job script that will tell the scheduler what to do. The job script must

  1. Request a number of cores
  2. Tell the scheduler with job queue to put the job into
Following is an example of a job script that runs the application MPI-test with 32 cores.
1. #PBS -N MPI-test.job
2. #PBS -q parshort
3. #PBS -o MPI-test.output.$PBS_JOBID
4. #PBS -j oe
5. #PBS -l nodes=4:ppn=8
6. #PBS -l walltime=10:00
7.
8. cd $PBS_O_WORKDIR
9.
10. mpirun -np 32 -machinefile $PBS_NODEFILE ./MPI-test
Job script analysis, line-by-line:
  1. Names the job MPI-test.job
  2. Puts the job into the parshort queue. (Other options are parlong and serial
  3. Puts all output into the file MPI-test.output.$PBS_JOBID. Note: This file contains output that is normally printed to the screen.
  4. Puts error output and standard output into the same file (MPI-test.output.$PBS_JOBID)
  5. Requests 4 compute nodes with 8 cores per node for a total of 32 cores
  6. Sets the maximum runtime of the job to 10 minutes. If the job runs more than 10 minutes, it will be killed. (The format for setting this is walltime=HH:MM:SS)
  7. Blank line
  8. cd to the directory that the script was submitted from
  9. Blank line
  10. Executes the program MPI-test by calling mpirun with 32 cores, and using the machine file set by PBS. The file $PBS_NODEFILE contains all the hosts that the job has been allocated.
There are several example job scripts available. Click on the link below for each different script.

Submitting Your Job Script

Once the job script is written, it can be submitted the scheduler with the msub command. If the script is called "MPI-test-script.pbs", you will run msub MPI-test-script.pbs.
Following is an example of a successful use of "msub".

[user@stargate ~]$ msub MPI-test-script.pbs

1095

The number "1095" tells you which job number the script has been submitted as. That number will be used when checking on the job's execution.

Checking your job's execution

A job that has been submitted to the scheduler may not execute immediately. To check the status of jobs, use the commands showq or checkjob. The showq command will print general information about the state of the job queue.

[user@stargate ~]$ showq

active jobs------------------------
JOBID USERNAME STATE PROCS REMAINING STARTTIME

1095 user Running 10 00:02:23 Fri May 9 16:24:47
1096 user Running 64 00:09:49 Fri May 9 16:32:13
1097 user Running 64 00:09:51 Fri May 9 16:32:15
1098 user Running 64 00:09:51 Fri May 9 16:32:15
1099 user Running 64 00:09:52 Fri May 9 16:32:16
1100 user Running 64 00:09:53 Fri May 9 16:32:17
1101 user Running 64 00:09:56 Fri May 9 16:32:20
1102 user Running 64 00:09:58 Fri May 9 16:32:22

8 active jobs 458 of 512 processors in use by local jobs (89.45%)
58 of 64 nodes active (90.62%)

eligible jobs----------------------
JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME

1103 user Idle 64 00:10:00 Fri May 9 16:32:21
1104 user Idle 64 00:10:00 Fri May 9 16:32:22
1105 user Idle 64 00:10:00 Fri May 9 16:32:23

3 eligible jobs

blocked jobs-----------------------
JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME

0 blocked jobs

Total jobs: 11

The information printed out tells you that 8 jobs are executing, and 3 jobs are waiting to execute. If any jobs are blocked, then for one reason or another, the job is not eligible to run. It should be noted that all jobs that are submitted to the scheduler are initially blocked. Once the scheduler examines the job to determine its requirements, the job will usually be changed to be eligible. If a job is blocked for more than 30 minutes, please contact a system administrator. The showq command prints general information about the state of jobs in the scheduler. For detailed information about a particular job, use the checkjob command. Following is an example of the checkjob command:

[user@stargate ~]$ checkjob 1103
job 1103

AName: parshort.test
State: Idle
Creds: user:user group:user account:useraccount class:parshort
WallTime: 00:00:00 of 00:10:00
SubmitTime: Fri May 9 17:32:21
(Time Queued Total: 00:06:08 Eligible: 00:05:33)

Total Requested Tasks: 64

Req[0] TaskCount: 64 Partition: ALL
Memory >= 0 Disk >= 0 Swap >= 0
Opsys: --- Arch: --- Features: ---

Reserved Nodes: (00:03:44 -> 00:13:44 Duration: 00:10:00)
[compute-2-6:8][compute-2-7:8][compute-2-8:8][compute-2-9:8]
[compute-2-10:8][compute-2-11:8][compute-2-12:8][compute-2-13:8]


IWD: $HOME
Executable: /opt/moab/spool/moab.job.esV8xn

Partition List: ALL,StarPBS
Flags: RESTARTABLE,GLOBALQUEUE
Attr: checkpoint
StartPriority: 5
Reservation '1103' (00:03:44 -> 00:13:44 Duration: 00:10:00)
compute-2-6 available: 8 tasks supported
compute-2-7 available: 8 tasks supported
compute-7-40 available: 8 tasks supported
NOTE: job cannot run in partition StarPBS (idle procs do not
meet requirements : 24 of 64 procs found)
idle procs: 64 feasible procs: 24

Node Rejection Summary: [Class: 5][State: 149]

Executing this command tells you that job 1103 is idle because there aren't enough available nodes to run the job.

Canceling your job

If a job that has been submitted to Moab needs to be cancelled, use the canceljob command. Following is an example with job 1103.

[user@stargate ~]$ canceljob 1137

job '1137' cancelled

Cleaning up after your job

Many applications create temporary files to store intermediate results. Removing temporary files is a necessary part of cluster use. This can be accomplished in the job script, or it can be done after the job has finished.

Copying data to a safe place

Once your job has finished executing, it is time to copy your data to a safe place. If the files you want to save are on Lustre, do mv /fasttmp/<username>/.../<file> /home/<username>/ For most people, data should be moved to 2 places, your home directory and your local computer. Although backups of the home directory are made, these can sometimes still fail. You should always keep a second backup copy of your most important files. To copy data to your local machine you can use an SCP client (the counterpart of SSH). If you use Windows and don't have an SCP client try WinSCP. This program will allow you to login to stargate.uark.edu and copy files from there to your local computer.

Logging off

Once all your jobs have run, and you no longer need the cluster, it is time to log out with the logout command. Thus completes the cluster use life-cycle.

NEW: Using the Serial Queue

Serial jobs remain a significant part of HPC cluster workloads. However, attempting to run serial and parallel jobs on the same set of hardware is likely to cause problems with job scheduling. If serial jobs are run as soon as possible, parallel jobs usually suffer from increased wait time. If strict job ordering is used, serial jobs may be forced to wait when resources are available. To reduce these and other problems, we maintain two types of queues - serial and parallel. To better support serial jobs on the cluster, Red Diamond has been reinstalled and is dedicated to the serial queue. This increases the number of CPU cores available to the serial queue from 32 to 76.

Usage:

Using the serial queue is almost identical to using a parallel queue. The most significant change is that of using a fast filesystem. The /fasttmp filesystem is only available to jobs using a parallel queue. The serial queue has a similar filesystem at /serialtmp that is only available to jobs using the serial queue. Both /fasttmp and /serialtmp are available from the login node.

To reiterate, /serialtmp is only available to jobs using the serial queue. /fasttmp is only available to jobs using a parallel queue.

Serial Queue Information:

MPI jobs in the Serial Queue

To run an MPI job in the serial queue, you must compile the application with OpenMPI. See section Choosing your MPI Implementation, and select option 3 or 4 with the mpi-selector tool. Remember, if you change the MPI implementation, you must logout before the changes will take effect. Once an application is compiled with OpenMPI, it can run in the serial queue over Ethernet.

Summary: Using the serial queue is almost identical to using a parallel queue. The biggest difference is the switch from /fasttmp to /serialtmp. Most job scripts requesting the serial queue can remain unchanged. The only change required should be replacing /fasttmp with /serialtmp.