Users should note that, currently, the only low-level software installed on the cluster to support parallelism is MPI. Thus, it is anticipated that, at least initially, virtually all of the parallel jobs run on the Myrinet nodes will be MPI-based
Principally since unbalanced load factors are much more of a concern for "truly" parallel tasks than for trivially-parallelizable ones (such as a sequence of jobs having different input parameters), USERS ARE REQUESTED TO RUN MYRINET JOBS ONLY VIA PBS (i.e. BATCH), i.e. PLEASE DO NOT LAUNCH INTERACTIVE JOBS ON THE MYRINET NODES
At least initially there will be no job quotas or other restrictions on how users use the batch system. Management reserves the right to impose such limits if and when contention for resources becomes severe.
AGAIN, HOWEVER, DO NOT RUN INTERACTIVELY ON THE MYRINET NODES, EVEN THOSE THAT USED TO BE GIG NODES. Such behaviour will lead to PBS confusion, overloading of nodes, and nasty e-mail messages from management.
Similarly, at least until the WestGrid UBC/TRIUMF cluster comes back on-line and stabilizes, parallel users should avoid the temptation of completely saturating the machine with their parallel jobs. In particular, note that at least for the time being, parallel jobs should still be restricted to 32 or fewer processors.
As always, the cardinal rule of cluster usage is to be AWARE and CONSIDERATE of other users
Users should also be aware that since the Myrinet upgrade on February 5, 2004, there are now two versions of Myrinet card in the cluster. Nodes 001 through 050 have 'C' cards, while 051 through 054 have a previous generation 'B' card, which has somewhat less peak performance than the 'C' card. However, in a multi-grid benchmark that management has run on
After downloading the above archive to a convenient location within your home directory on the cluster, unpack it using the tar command:
head% tar zxf cpi-mpi-myr-intel.tar.gzThis will create a directory cpi-mpi-myr-intel that should have contents as follows:
head% cd cpi-mpi-myr-intel head% ls Makefile Makefile.commented cpi.c cpi.pbsYou can browse the files listed above here: Note that Makefile and Makefile.commented are functionally identical; the latter has additional comments explaining the structure of the makefile.
To compile and link the test program, simply type make or make cpi:
head% make /opt/gmpi.intel/bin/mpicc -I/usr/local/intel/include -O3 -tpp7 -c cpi.c /opt/gmpi.intel/bin/mpicc -O3 -tpp7 -O3 -tpp7 -L/usr/local/intel/lib cpi.o -o cpiAs mentioned in the commented version of the makefile, /opt/gmpi.intel/bin/mpicc is a script that essentially functions as a front-end to the Intel C/C++ compiler, icc, and which, among other things, ensures linkage with the proper version of the MPI library during the load phase.
Now that the executable cpi has been created, you can submit a batch job to run it using Myrinet on several processors using the PBS qsub command. We'll do this via the batch script file cpi.pbs that you can use as a template for your own submissions (note that cpi.pbs is a bash script, you can equally well use a tcsh script should you so wish):
head% qsub cpi.pbs 181.headThe output "181.head" from the qsub command indicates that the batch job has been assigned the job identifier 181---the job ID assigned to your submission will almost certainly differ. (You'll need the job ID should you want to cancel a job once it's been submitted to the queue; see below.)
Provided that there are enough free Myrinet nodes available, the batch job will run quickly (you can check its status using the qstat command), and upon termination, will leave an additional two files, suffixed by the job ID, in the submission directory.
head% ls Makefile Makefile.commented cpi* cpi.c cpi.e181 cpi.o cpi.o181 cpi.pbsThe new files contain the standard output and standard error from the batch job and can be browsed here: As a final note, should you wish to stop a batch job's execution, or remove it from the execution queue, use the qdel command, supplying the JOB id as a the single argument thereto. For example, the above job could have been terminated via
head% qdel 181