x-CAT.org: Extreme Cluster Administration Toolkit home  download  docs  mailing lists
home documentation xCAT HOWTO
xCAT HOWTO
This document was most recently modified:
02/04/2002
The most recent version of this document is available at:
http://x-cat.org/docs/xcat-HOWTO.html
Author:
Matt Bohnsack
bohnsack@bohnsack.com  http://bohnsack.com/
and others

Make certain you're using the most recent version of this document, before beginning a cluster implementation or even before you begin reading. Be aware that software versions referenced in this document may be out of date. Always check for newer software versions and be aware of the stability of these newer versions.

Send additions and corrections to the author, so the document can continue to be improved.

Table of Contents
  1. Introduction
  2. Reading Related Documentation
  3. Understanding the Architecture of the Example Cluster
  4. Getting the Required Software
  5. Installing the OS on the Management Node
  6. Upgrading RedHat Software
  7. Installing Custom Kernel
  8. Installing xCAT
  9. Configuring xCAT
  10. Configuring Networking
  11. Doing the Compute Node Preinstall (stage1)
  12. Configuring the Terminal Servers
  13. Completing Management Node Setup
  14. Collecting MAC Addresses (stage2)
  15. Configuring ASMA/RSA/SPN (stage3)
  16. Installing Compute Nodes
  17. Installing / Configuring Myrinet Software
  18. Installing Portland Group Compilers
  19. Installing MPICH MPI Libraries
  20. Installing LAM MPI Libraries
  21. Installing PBS Resource Manager
  22. Installing Maui Scheduler
  23. Deploying PBS on the Cluster
  24. Adding Users and Setting Up User Environment
  25. Verifying Cluster Operation With Pseudo Jobs and PBS
  26. Running a Simple MPI Job Interactively via PBS
  27. Running a Simple MPI Job in Batch Mode via PBS
  28. ChangeLog
  29. TODO
  30. Thanks

1. Introduction
xCAT is for use by IBM and IBM Linux cluster customers only. xCAT is copyright © 2000, 2001, 2002 IBM corporation. All rights reserved. Use and modify all you like, but do not redistribute. No warranty is expressed or implied. IBM assumes no liability or responsibility. For more information about what xCAT is, read the xCAT Overview.

This document describes how to implement a Beowulf style Linux cluster on IBM xSeries hardware using xCAT and related third party software. It covers installing xCAT version 1.1RC7.5 on RedHat 6.2, 7.0. 7.1, or 7.2. Most of the examples cover installation on ia32 machines, but some coverage is given to ia64 machines (actually not yet, but soon). Specific configuration examples from a somewhat common 32 node cluster configuration are included. This document covers only a very little of the hardware connectivity, cabling, etc. that is required to implement a cluster.

You will need to adjust the configuration examples shown in this document to suit your particular cluster and architecture, but the examples should give a good general idea of what needs to be done. Please don't use this document verbatim as an implementation guide. You should rather use it as an inspiration to your own implementation. Use the man pages, source and other documentation that is available to figure out why certain design/configuration choices are made and how you can make different choices.

Additional documentation including hardware installation and configuration is available as a RedBook at http://publib-b.boulder.ibm.com/Redbooks.nsf/9445fa5b416f6e32852569ae006bb65f/e0384f6e6982b28986256a0f005b7ba4?OpenDocument&Highlight=0,hpc. This RedBook corresponds to the version of xCAT that is publicly available at ftp://www.redbooks.ibm.com/redbooks/SG246041. The redbook is out of date with a lot of xCAT 1.1, but is still a wonderful guide that, in places, goes into much more depth than this document does. If you're serious about implementing a cluster and learing how things work, you should read the RedBook in addition to this document.

Back to TOC

2. Reading Related Documentation
There's quite a bit of related documentation available in various stages of completion. You should read it. It's all accessible at http://x-cat.org/docs/

Back to TOC

3. Understanding the Architecture of the Example Cluster
This document uses a 32 node cluster that uses serial terminal servers for out-of-band console access, an APC Master Switch and IBM's Service Processor Network for remote hardware management, ethernet, and Myrinet as the basis of most of its examples. The following three diagrams describe some of the detail of this example cluster:

3.1 Components / Rack Layout      
Here you see how the hardware is positioned in the rack. Starting from the bottom and moving towards the top, we have:
  • The Myrinet switch: Used for high-speed, low-latency inter-node communication. Your cluster may not have Myrinet, if you aren't running parallel jobs that do heavy message passing, or if it doesn't fit in your budget.

  • Nodes 1-16: The first 16 compute nodes. Note that every 8th node has an ASMA adapter installed. You may have RSA adapters instead of ASMAs. These cards enable the Service Processor Network to function and remote hardware management to be performed.

  • Monitor/Keyboard: You know what this is.

  • Terminal servers: The terminal enable serial consoles from all of the compute nodes to be accessible from the management node. You will find this feature very useful during system setup and after setup administration.

  • APC master switch: This enables remote power control of devices that are not part of the Service Processor Network.. terminal servers, Myrinet switch, ASMA adapters, etc.

  • The management node: The management node is where we install the rest of the nodes from, manage the cluster, etc.

  • Nodes 17-32: The rest of the compute nodes.. again with Management Processor cards every 8th node.

  • Ethernet switch: Finally, at the top, we have the ethernet switch.
 
Ethernet Switch
node32
... nodes 27 - 31
node26
node25 ASMA
node24
... nodes 19 - 24
node18
node17 ASMA
Management Node
apc1 APC Master Switch
ts2 Terminal Servers
ts1
Monitor / Keyboard    
node16
... nodes 11 - 17
node10
node09 ASMA
node08
... nodes 03 - 07
node02
node01 ASMA has ASMA card
Myrinet Switch

3.2 Networks      
Here you see the networks that are used in this document's examples. Note the listing of attached devices to the right. Important things to note are:
  • The external network is the organization's main network. In this example, only the management node has connectivity to the external network.

  • The ethernet switch hosts both the cluster and management network on separate VLANs.

  • The cluster network connects the management node to the compute nodes. We use a private class B network that has no connectivity to the external network. This is often the easiest way to do things and a good thing to do if you think your cluster might grow to more than 254 nodes. You may have a requirement to place the compute nodes on a network that is part of your external network.

  • The management network is a separate network used to connect all devices associated with cluster management... terminal servers, ASMA cards, etc. to the management node.

  • Parallel jobs use the message passing network for interprocess communication. Our example uses a separate private class B network over Myrinet. If you are not using Myrinet, this network could be the same as the cluster network. i.e. You could do any required message passing over the cluster network.
 

Cluster Network <- eth0 on management node (1Gb/s)
172.16.0.0/16 <- eth0 on compute nodes (100Mb/s)
  <- Myrinet switch's management port
 
Management Network <- eth1 on management node (100Mb/s)
172.17.0.0/16 <- terminal server's ethernet interfaces
  <- ASMA adapter's ethernet interfaces
  <- APC MasterSwitch's ethernet interface
   
 
External Network <- eth2 on management node (100Mb/s)
10.0.0.0/8
 
Message Passing Network <- myri0 on compute nodes (2Gb/s)
172.18.0.0/16

3.3 Connections      
The management network doesn't match up with the rest of the documentation in this diagram, but it may be helpful in understanding how the networks and equipment are connected.


3.4 Another Connections Diagram      

3.5 Other Architecture Notes      
Other notes about this architecture (and areas where yours may differ and you may need to make adjustments to this document's examples):
  • The compute nodes have no access to the external network.
  • The compute nodes get DNS,DHCP, and NIS services from the management node.
  • NIS is used to distribute username/passwd information to the compute nodes and the management node is the NIS master.
  • The management node is the only node with access to the management network.
  • PBS and Maui are used to schedule/run jobs on the cluster.
  • Users can only access compute nodes when the scheduler has allocated nodes to them and then only with ssh.
  • Jobs will use MPICH or LAM for message passing.

Back to TOC

4. Getting the Required Software
This section will list of all the CDs, floppies, and software you'll need to install a cluster and where to get them.

For now see the original sources cited throughout this document and this start at a list:

Back to TOC

5. Installing the OS on the Management Node
The first step in building an xCAT cluster is installing Linux on the management node. This is, roughly, how to do just that:

5.1 Create and Configure RAID Devices if Necessary      
If you are using a ServeRAID card in the management node, use the ServeRAID flash/config CD to update the ServeRAID firmware to v 4.80 and define you RAID volumes. If you have other nodes with hardware RAID, you might as well update and configure them now as well. You can get this CD from http://www.pc.ibm.com/qtechinfo/MIGR-495PES.html.

5.2 NIS Notes      
If you plan on interacting with an external NIS server, check if it supports MD5 passwords and shadow passwords. If it doesn't support these modern features, don't turn them on during the install of the management node. I'm not absolutely certain on this point, but it's bitten me hard in the past, so be careful.

5.3 Partition Notes      
A good generic drive partitioning scheme for the management node follows. YMMV:
/boot (200 MB)
/install (4 GB)
/usr/local/ (2.5 GB)
/var (1 GB per every 128 nodes)
/ (the rest of the disk)
SWAP (1 GB)

5.4 User Notes      
Its a good idea to create a normal user other than root during the install. I usually make an 'ibm' user.

5.5 DMA Notes      
Some IDE CDROM dives in x340s have a DMA problem that can cause large data errors to crop up in your install and later CD copying. If you have a CDROM that has this error, or if you don't want to risk having the frustrating experience of finding out if you do have a bad drive, you need to use this workaround:

Pass ide=nodma to the installer, (i.e., install: text ide=nodma) and then after the install is complete, add append="ide=nodma" to /etc/lilo.conf, /sbin/lilo, and reboot to a system with IDE DMA diabled.

5.6 Install RedHat      
  • RedHat 7.0, 7.1, or 7.2
    Select custom installation. When asked for packages to install choose everything. As an added component, under Kernel options choose to additionally install the SMPX kernel.

  • RedHat 6.2
    Install RedHat 6.2 with ServeRAID support using:
    • A RedHat 6.2 with updated ServeRAID CD (modified version of the RedHat 6.2 installation CD). You can get a copy of this CD at ftp://cartman.sl.dfw.ibm.com/OS/linux/ks62sr4.iso (IBM internal site).

      or

    • The standard issue RedHat 6.2 CD without the latest IBM ServeRAID support. You must create a support diskette with the software found at the following URL: http://www.pc.ibm.com/qtechinfo/MIGR-495PES.html. At the boot: prompt, type expert. When asked for disk support, insert your floppy and select the second ServeRAID choice. If you do not see two ServeRAID options in the scroll list, the device driver has not been loaded and you will need to restart your configuration.

    Select custom installation. When asked for packages to install choose everything. As an added component, under Kernel options choose to additionally install the SMPX kernel.
5.7 Bring Up the Newly Installed System      
Reboot and login as root.

5.8 Turn Off Services We Don't Want (General)      
You probably want to turn off some of the network services that are turned on by default during installation for security and other reasons...

To view installed services:

> /sbin/chkconfig --list | grep ':on'

To turn off a service:

> /sbin/chkconfig <service> off

With Redhat 6.2, you'll also have to comment out the services you don't want run in /etc/inetd.conf and then restart inetd.

5.9 Turn Off Services We Don't Want (Specific)      
The following are examples of exactly what services to turn off for a system that works with our example architecture and will have nothing running that isn't necessary:
  • RedHat 6.2:
    /sbin/chkconfig --level 0123456 lpd off
    /sbin/chkconfig --level 0123456 linuxconf off
    /sbin/chkconfig --level 0123456 kudzu off
    /sbin/chkconfig --level 0123456 pcmcia off
    /sbin/chkconfig --level 0123456 isdn off
    /sbin/chkconfig --level 0123456 apmd off
    /sbin/chkconfig --level 0123456 autofs off
    /sbin/chkconfig --level 0123456 httpd off
    /sbin/chkconfig --level 0123456 reconfig off

    Also edit /etc/inetd.conf commenting out the following lines:

    #ftp stream tcp nowait root /usr/sbin/tcpd in.ftpd -l -a
    #telnet stream tcp nowait root /usr/sbin/tcpd in.telnet
    #shell stream tcp nowait root /usr/sbin/tcpd in.rshd
    #login stream tcp nowait root /usr/sbin/tcpd in.rlogind
    #talk dgram udp wait nobody.tty /usr/sbin/tcpd in.talkd
    #ntalk dgram udp wait nobody.tty /usr/sbin/tcpd in.ntalkd
    #finger stream tcp nowait nobody /usr/sbin/tcpd in.fingerd
    #linuxconf stream tcp wait root /bin/linuxconf linuxconf --http

    Then restart inetd:

    > /sbin/service inet restart

  • RedHat 7.1:
    /sbin/chkconfig --level 0123456 autofs off
    /sbin/chkconfig --level 0123456 linuxconf off
    /sbin/chkconfig --level 0123456 reconfig off
    /sbin/chkconfig --level 0123456 isdn off
    /sbin/chkconfig --level 0123456 pppoe off
    /sbin/chkconfig --level 0123456 iptables off
    /sbin/chkconfig --level 0123456 ipchains off
    /sbin/chkconfig --level 0123456 apmd off
    /sbin/chkconfig --level 0123456 pcmcia off
    /sbin/chkconfig --level 0123456 rawdevices off
    /sbin/chkconfig --level 0123456 lpd off
    /sbin/chkconfig --level 0123456 kudzu off /sbin/chkconfig --level 0123456 pxe off

  • RedHat 7.0, 7.2:
    You get the point.
5.10 Erase LAM Package      
You probably want to remove the RedHat LAM package. It can easily get in the way of the MPI software we install later on, because it's an old version and installs itself in /usr/bin:

> rpm --erase lam

Back to TOC

6. Upgrading RedHat Software
RedHat, like all software, has bugs. You should upgrade RedHat with all the available fixes to have the most stable and secure system possible (with the obvious caution that some of the updates might have undesired behaviours)

6.1 Create a Place To Put the Updates and Go There      
We'll use this directory (/install/post/updates/rhxx) later on during the installation of the compute nodes as well.

Substitute rh62, rh70, rh71, or rh72 for rhxx depending on what version of RedHat you are using.

> mkdir -p /install/post/updates/rhxx
> cd /install/post/updates/rhxx

6.2 Get the Updates      
Go to http://www.redhat.com/download/mirror.html and select a mirror site that has the "Updates" section of RedHat's FTP site.

Download all the rpms from updates/x.x/en/os/i386/ and updates/x.x/en/os/noarch/, to the directory you created above, where x.x is the RedHat release you are using. You will also want to download any glibc packages that are available in updates/x.x/en/os/i686/, so you have an optimized C library. If you're using the RedHat kernel, grab it from updates/x.x/en/os/i686/ as well.

6.3 Install the Updates      
Note the $(ls | egrep -v '^(kernel-)') stuff. We don't install RedHat's kernel updates, unless there's a very good reason.
  • Redhat 7.0 - 7.2:
    > cd /install/post/updates/rhxx
    > rpm -Fvh $(ls | egrep -v '^(kernel-)')

  • Redhat 6.2:
    Only if you're using RedHat 6.2...
    > cd /install/post/updates/rh62
    > rpm -Uivh --force --nodeps db3*
    > rpm -Fvh rpm*
    > rpm --rebuilddb
    > rpm -Fvh $(ls | egrep -v '^(kernel-)')

Back to TOC

7. Installing Custom Kernel
These instructions assume you want to use one of the xCAT custom kernels. Unless you have a good reason not to, the xCAT custom kernel is recommended. It includes things like large file support, Fibre channel HBA drivers, high-performance ethernet drivers, perfctr support, patches for xSeries hardware, etc.

7.1 Download Custom Kernel      
Available at the link the is provided on the download page: (http://www.sense.net/xcat/)

7.2 Install Kernel      
RedHat 7.2
Upgrade to 2.4.9 from redhat or 2.4.16 at a minimum from kernel.org.

RedHat 7.1, 7.2
> cd / ; tar xzvf kernel-2.4.5-2hpc.tgz

RedHat 6.2, 7.0?
> cd / ; tar xzvf kernel-2.2.19-4hpc.tgz


7.3 Edit /etc/lilo.conf      
# BEGIN example of /etc/lilo.conf for Redhat 6.2 after editing
boot=/dev/sda
map=/boot/map
install=/boot/boot.b
prompt
timeout=50
linear
default=2.2.19-4hpc

image=/boot/vmlinuz-2.2.19-4hpc
        label=2.2.19-4hpc
        read-only
        root=/dev/sda9

image=/boot/vmlinuz-2.2.14-5.0smp
        label=linux
        initrd=/boot/initrd-2.2.14-5.0smp.img
        read-only
        root=/dev/sda9

image=/boot/vmlinuz-2.2.14-5.0
        label=linux-up
        initrd=/boot/initrd-2.2.14-5.0.img
        read-only
        root=/dev/sda9
# END expample of /etc/lilo.conf
7.4 Run lilo to Install and Reboot      
> /sbin/lilo; reboot

Back to TOC

8. Installing xCAT
Installing xCAT on the management node is very straightforward:

8.1 Download the Latest Version of xCAT      
http://x-cat.org/download/

8.2 Install xCAT Into /usr/local/      
> cd /usr/local
> tar xzvf /where/you/put/it/xcat-dist-1.1RC7.5.tgz

8.3 Adjust Environment and $PATH      
Here, we copy some sample environment files. This puts the xCAT directories into $PATH. You may wish to look at these files to see what they are doing. Most installations shouldn't need to make any changes.

> cp /usr/local/xcat/samples/xcat.sh /etc/profile.d/
> cp /usr/local/xcat/samples/xcat.csh /etc/profile.d/

8.4 Add xCAT Man Pages to $MANPATH      
Add the following line to /etc/man.config:

MANPATH /usr/local/xcat/man

Back to TOC

9. Configuring xCAT
This section describes some of the xCAT configuration necessary for the 32 node example cluster. If your cluster differs from this example, you'll have to make changes. xCAT configuration files are located in /usr/local/xcat/etc. You must setup these configuration files before proceeding.

9.1 Copy the Sample Config Files to Their Required Location      
> mkdir /usr/local/xcat/etc
> cp /usr/local/xcat/samples/etc/* /usr/local/xcat/etc

9.2 Create Your Own Custom Configuration      
Edit /usr/local/xcat/etc/* to suit your cluster. Please read the man pages 'man site.tab', etc., to learn more about the format of these configuration files. There is a bit more detail on some of these files in some of the later sections. The following are examples that will work with our example 32 node cluster...

9.3 site.tab      
/usr/local/xcat/etc/site.tab
# site.tab control most of xCAT's global settings.
# man site.tab for information on what each field means.
rsh		/usr/bin/ssh
rcp		/usr/bin/scp
gkhfile		/usr/local/xcat/etc/gkh
tftpdir		/tftpboot
tftpxcatroot	xcat
domain		mydomain.com
nameservers	172.16.100.1
nets		172.16.0.0:255.255.0.0,172.17.0.0:255.255.0.0,172.18.0.0:255.255.0.0
dnsdir		/var/named
dnsallowq	172.16.0.0:255.255.0.0,172.17.0.0:255.255.0.0,172.18.0.0:255.255.0.0
domainaliasip	172.16.100.1	
mxhosts		mydomain.com,man-c.mydomain.com
mailhosts	man-c
master		man-c
pbshome		/var/spool/pbs
pbsprefix	/usr/local/pbs
pbsserver	man-c
scheduler	maui
xcatprefix	/usr/local/xcat
keyboard	us
timezone	US/Central
offutc		-6
mapperhost	NA
serialmac	0
snmpc		public
timeservers	man-c
logdays		7
installdir	/install
clustername	WOPR
dhcpver		2
dhcpconf	/etc/dhcpd.conf
clusternet	172.16.0.0
dynamic		172.30.0.1,255.255.0.0,172.30.1.1,172.30.254.254
dynamictype	ia32
usernodes	man-c
usermaster	man-c
nisdomain	mydomain.com
nismaster	man-c
nisslaves	NA
homelinks	NA
chagemin	0
chagemax	60
chagewarn	10
chageinactive	0
mpcliroot	/usr/local/xcat/lib/mpcli
9.4 nodelist.tab      
/usr/local/xcat/etc/nodelist.tab
# nodelist.tab contains a list of nodes and defines groups that
# can be used in commands.  man nodelist.tab for more information.
node01	all,rack1,compute,myri,spn1
node02	all,rack1,compute,myri,spn1
node03	all,rack1,compute,myri,spn1
node04	all,rack1,compute,myri,spn1
node05	all,rack1,compute,myri,spn1
node06	all,rack1,compute,myri,spn1
node07	all,rack1,compute,myri,spn1
node08	all,rack1,compute,myri,spn1
node09	all,rack1,compute,myri,spn2
node10	all,rack1,compute,myri,spn2
node11	all,rack1,compute,myri,spn2
node12	all,rack1,compute,myri,spn2
node13	all,rack1,compute,myri,spn2
node14	all,rack1,compute,myri,spn2
node15	all,rack1,compute,myri,spn2
node16	all,rack1,compute,myri,spn2
node17	all,rack1,compute,myri,spn3
node18	all,rack1,compute,myri,spn3
node19	all,rack1,compute,myri,spn3
node20	all,rack1,compute,myri,spn3
node21	all,rack1,compute,myri,spn3
node22	all,rack1,compute,myri,spn3
node23	all,rack1,compute,myri,spn3
node24	all,rack1,compute,myri,spn3
node25	all,rack1,compute,myri,spn4
node26	all,rack1,compute,myri,spn4
node27	all,rack1,compute,myri,spn4
node28	all,rack1,compute,myri,spn4
node29	all,rack1,compute,myri,spn4
node30	all,rack1,compute,myri,spn4
node31	all,rack1,compute,myri,spn4
node32	all,rack1,compute,myri,spn4
asma1	asma
asma2	asma
asma3	asma
asma4	asma
ts1	ts
ts2	ts
9.5 mpa.tab      
/usr/local/xcat/etc/mpa.tab
# mpa.tab defines what type of service processor adapters
# the cluster has and how to use their functionality.
# Our example cluster uses only ASMAs with telnet and http.
# man mpa.tab for more information
#
#service processor adapter management
#
#type      = asma,rsa
#name      = internal name (must be unique)
#            internal name should = node name
#            if rsa/asma is primary management
#            processor
#number    = internal number (must be unique and > 10000)
#command   = telnet,mpcli
#reset     = http(ASMA only),mpcli,NA
#
#mpa	type,name,number,command,reset,rvid
asma1	asma,asma1,10001,telnet,http,telnet
asma2	asma,asma2,10002,telnet,http,telnet
asma3	asma,asma3,10003,telnet,http,telnet
asma4	asma,asma4,10004,telnet,http,telnet
9.6 mp.tab      
/usr/local/xcat/etc/mp.tab
# mp.tab defines how the Service processor network is setup.
# node07 is accessed via the name 'node07' on the ASMA 'asma1', etc.
# man asma.tab for more information until the man page to mp.tab is ready
node01	asma1,node01
node02	asma1,node02
node03	asma1,node03
node04	asma1,node04
node05	asma1,node05
node06	asma1,node06
node07	asma1,node07
node08	asma1,node08
node09	asma2,node09
node10	asma2,node10
node11	asma2,node11
node12	asma2,node12
node13	asma2,node13
node14	asma2,node14
node15	asma2,node15
node16	asma2,node16
node17	asma3,node17
node18	asma3,node18
node19	asma3,node19
node20	asma3,node20
node21	asma3,node21
node22	asma3,node22
node23	asma3,node23
node24	asma3,node24
node25	asma4,node25
node26	asma4,node26
node27	asma4,node27
node28	asma4,node28
node29	asma4,node29
node30	asma4,node30
node31	asma4,node31
node32	asma4,node32
9.7 apc.tab      
/usr/local/xcat/etc/apc.tab
# apc.tab  defines  the  relationship  between nodes and APC
# MasterSwitches and their assigned outlets.  In our example,
# the power for asma1 is plugged into the 1st outlet the the
# APC MasterSwitch, etc.
asma1	apc1,1
asma2	apc1,2
asma3	apc1,3
asma4	apc1,4
ts1	apc1,5
ts2	apc1,6
9.8 conserver.cf      
/usr/local/xcat/etc/conserver.cf
# conserver.cf defines how serial consoles are accessed.  Our example
# uses the ELS terminal servers and node01 is connected to port 1 
# on ts1, node02 is connected to port 2 on ts1, node17 is connected to
# port 1 on ts2, etc.
# man conserver.cf for more information
#
# The character '&' in logfile names are substituted with the console
# name.  Any logfile name that doesn't begin with a '/' has LOGDIR
# prepended to it.  So, most consoles will just have a '&' as the logfile
# name which causes /var/consoles/ to be used.
#
LOGDIR=/var/log/consoles
#
# list of consoles we serve
#    name : tty[@host] : baud[parity] : logfile : mark-interval[m|h|d]
#    name : !host : port : logfile : mark-interval[m|h|d]
#    name : |command : : logfile : mark-interval[m|h|d]
#
node01:!ts1:3001:&:
node02:!ts1:3002:&:
node03:!ts1:3003:&:
node04:!ts1:3004:&:
node05:!ts1:3005:&:
node06:!ts1:3006:&:
node07:!ts1:3007:&:
node08:!ts1:3008:&:
node09:!ts1:3009:&:
node10:!ts1:3010:&:
node11:!ts1:3011:&:
node12:!ts1:3012:&:
node13:!ts1:3013:&:
node14:!ts1:3014:&:
node15:!ts1:3015:&:
node16:!ts1:3016:&:
node17:!ts2:3001:&:
node18:!ts2:3002:&:
node19:!ts2:3003:&:
node20:!ts2:3004:&:
node21:!ts2:3005:&:
node22:!ts2:3006:&:
node23:!ts2:3007:&:
node24:!ts2:3008:&:
node25:!ts2:3009:&:
node26:!ts2:3010:&:
node27:!ts2:3011:&:
node28:!ts2:3012:&:
node29:!ts2:3013:&:
node30:!ts2:3014:&:
node31:!ts2:3015:&:
node32:!ts2:3016:&:
%%   
#
# list of clients we allow
# {trusted|allowed|rejected} : machines
#
trusted: 127.0.0.1
9.9 conserver.tab      
/usr/local/xcat/etc/conserver.tab
# conserver.tab  defines  the relationship between nodes and
# conserver servers.  Our example uses only one conserver on
# the localhost.  man conserver.tab for more information.
node01	localhost,node01
node02	localhost,node02
node03	localhost,node03
node04	localhost,node04
node05	localhost,node05
node06	localhost,node06
node07	localhost,node07
node08	localhost,node08
node09	localhost,node09
node10	localhost,node10
node11	localhost,node11
node12	localhost,node12
node13	localhost,node13
node14	localhost,node14
node15	localhost,node15
node16	localhost,node16
node17	localhost,node17
node18	localhost,node18
node19	localhost,node19
node20	localhost,node20
node21	localhost,node21
node22	localhost,node22
node23	localhost,node23
node24	localhost,node24
node25	localhost,node25
node26	localhost,node26
node27	localhost,node27
node28	localhost,node28
node29	localhost,node29
node30	localhost,node30
node31	localhost,node31
node32	localhost,node32
9.10 nodehm.tab      
/usr/local/xcat/etc/nodehm.tab
# nodehm.tab  defines  the  relationship  between  nodes and
# hardware management methods.  man nodehm.tab for more info.
#
#node hardware management
#
#power     = mp,apc,apcp,NA
#reset     = mp,apc,apcp,NA
#cad       = mp,NA
#vitals    = mp,NA
#inv       = mp,NA
#cons      = conserver,tty,rtel,NA
#bioscons  = mp,NA
#eventlogs = mp,NA
#getmacs   = rcons,cisco3500
#netboot   = pxe,eb,ks62,elilo,NA
#eth0      = eepro100,pcnet32,e100
#gcons     = vnc,NA
#
#node power,reset,cad,vitals,inv,cons,bioscons,eventlogs,getmacs,netboot,eth0,gcons
#
node01	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node02	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node03	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node04	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node05	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node06	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node07	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node08	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node09	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node10	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node11	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node12	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node13	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node14	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node15	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node16	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node17	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node18	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node19	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node20	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node21	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node22	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node23	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node24	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node25	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node26	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node27	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node28	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node29	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node30	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node31	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
node32	mp,mp,mp,mp,mp,conserver,mp,mp,rcons,pxe,eepro100,vnc
asma1	apc,apc,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
asma2	apc,apc,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
asma3	apc,apc,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
asma4	apc,apc,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
ts1	apc,apc,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
ts2	apc,apc,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA


9.11 noderes.tab      
/usr/local/xcat/etc/noderes.tab
# noderes.tab defines the resources for each node.
# If you're cluster doesn't use GM, PBS, or you want users to be
# able to access compute nodes even if they aren't running a job
# on them, you'll need to modify the GM, PBS, and ACCESS fields.
# For changes to this file to take effect, you must do a 'mkks',
# 'nodeset' and reinstall the node.
# man noderes.tab for more information.  
#
#TFTP         = Where is my TFTP server? 
#               Used by makedhcp to setup /etc/dhcpd.conf
#               Used by mkks to setup update flag location
#NFS_INSTALL  = Where do I get my files?
#INSTALL_DIR  = From what directory?
#SERIAL       = Serial console port (0, 1, or NA)
#USENIS       = Use NIS to authencate (Y or N)
#INSTALL_ROLL = Am I also an installation server? (Y or N)
#ACCT         = Turn on BSD accounting
#GM           = Load GM module (Y or N)
#PBS          = Enable PBS (Y or N)
#ACCESS       = access.conf support
#INSTALL NIC  = eth0, eth1, ... or NA
#
#node/group	TFTP,NFS_INSTALL,INSTALL_DIR,SERIAL,USENIS,INSTALL_ROLL,\
#               ACCT,GM,PBS,ACCESS,INSTALL_NIC
#
# the entries below can be accomplished with a single line...
# all		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node01		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node02		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node03		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node04		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node05		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node06		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node07		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node08		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node09		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node10		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node11		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node12		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node13		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node14		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node15		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node16		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node17		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node18		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node19		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node20		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node21		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node22		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node23		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node24		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node25		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node26		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node27		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node28		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node29		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node30		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node31		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
node32		man-c,man-c,/install,0,Y,N,N,Y,Y,Y,NA
9.12 nodetype.tab      
/usr/local/xcat/etc/nodetype.tab
# nodetype.tab maps nodes to types of installs.
# Our example uses only one type, but you might have a few
# different types.. a subset of nodes with GigE, storage nodes,
# etc.  man nodetype.tab for more information.
node01	compute71
node02	compute71
node03	compute71
node04	compute71
node05	compute71
node06	compute71
node07	compute71
node08	compute71
node09	compute71
node10	compute71
node11	compute71
node12	compute71
node13	compute71
node14	compute71
node15	compute71
node16	compute71
node17	compute71
node18	compute71
node19	compute71
node20	compute71
node21	compute71
node22	compute71
node23	compute71
node24	compute71
node25	compute71
node26	compute71
node27	compute71
node28	compute71
node29	compute71
node30	compute71
node31	compute71
node32	compute71
9.13 passwd.tab      
/usr/local/xcat/etc/passwd.tab
# passwd.tab defines some passwords that will be used in the cluster
# man passwd.tab for more information.
cisco		cisco
rootpw		netfinity
asmauser	USERID
asmapass	PASSW0RD

Back to TOC

10. Configuring Networking
This section describes network setup on the management node:

10.1 /etc/hosts      
Create your /etc/hosts file. Make sure all devices are entered... terminal servers, switches, hardware management devices, etc.
The following is an sample of the /etc/hosts for the example cluster:
#  Localhost
127.0.0.1		localhost.localdomain localhost
# 
########## Management Node ###################
#
# cluster interface (eth0) GigE
172.16.100.1  man-c.mydomain.com        man-c
#
# management interface (eth1)
172.17.100.1  man-m.mydomain.com        man-m
#
# external interface (eth2)
10.0.0.1      man.mydomain.com          man
#
########## Management Equipment ##############
#
# ASMA adapters
172.17.1.1    asma1.mydomain.com        asma1
172.17.1.2    asma2.mydomain.com        asma2
172.17.1.3    asma3.mydomain.com        asma3
172.17.1.4    asma4.mydomain.com        asma4
#
# Terminal Servers
172.17.2.1    ts1.mydomain.com          ts1
172.17.2.2    ts2.mydomain.com          ts2
#
# APC Master Switch
172.17.3.1    apc1.mydomain.com         apc1
#
# Myrinet Switch's management port
172.17.4.1    myrinet.mydomain.com      myrinet
#
########## Compute Nodes #####################
#
#
172.16.1.1    node01.mydomain.com       node01
172.18.1.1    node01-myri0.mydomain.com node01-myri0
172.16.1.2    node02.mydomain.com       node02
172.18.1.2    node02-myri0.mydomain.com node02-myri0
172.16.1.3    node03.mydomain.com       node03
172.18.1.3    node03-myri0.mydomain.com node03-myri0
172.16.1.4    node04.mydomain.com       node04
172.18.1.4    node04-myri0.mydomain.com node04-myri0
172.16.1.5    node05.mydomain.com       node05
172.18.1.5    node05-myri0.mydomain.com node05-myri0
172.16.1.6    node06.mydomain.com       node06
172.18.1.6    node06-myri0.mydomain.com node06-myri0
172.16.1.7    node07.mydomain.com       node07
172.18.1.7    node07-myri0.mydomain.com node07-myri0
172.16.1.8    node08.mydomain.com       node08
172.18.1.8    node08-myri0.mydomain.com node08-myri0
172.16.1.9    node09.mydomain.com       node09
172.18.1.9    node09-myri0.mydomain.com node09-myri0
172.16.1.10   node10.mydomain.com       node10
172.18.1.10   node10-myri0.mydomain.com node10-myri0
172.16.1.11   node11.mydomain.com       node11
172.18.1.11   node11-myri0.mydomain.com node11-myri0
172.16.1.12   node12.mydomain.com       node12
172.18.1.12   node12-myri0.mydomain.com node12-myri0
172.16.1.13   node13.mydomain.com       node13
172.18.1.13   node13-myri0.mydomain.com node13-myri0
172.16.1.14   node14.mydomain.com       node14
172.18.1.14   node14-myri0.mydomain.com node14-myri0
172.16.1.15   node15.mydomain.com       node15
172.18.1.15   node15-myri0.mydomain.com node15-myri0
172.16.1.16   node16.mydomain.com       node16
172.18.1.16   node16-myri0.mydomain.com node16-myri0
172.16.1.17   node17.mydomain.com       node17
172.18.1.17   node17-myri0.mydomain.com node17-myri0
172.16.1.18   node18.mydomain.com       node18
172.18.1.18   node18-myri0.mydomain.com node18-myri0
172.16.1.19   node19.mydomain.com       node19
172.18.1.19   node19-myri0.mydomain.com node19-myri0
172.16.1.20   node20.mydomain.com       node20
172.18.1.20   node20-myri0.mydomain.com node20-myri0
172.16.1.21   node21.mydomain.com       node21
172.18.1.21   node21-myri0.mydomain.com node21-myri0
172.16.1.22   node22.mydomain.com       node22
172.18.1.22   node22-myri0.mydomain.com node22-myri0
172.16.1.23   node23.mydomain.com       node23
172.18.1.23   node23-myri0.mydomain.com node23-myri0
172.16.1.24   node24.mydomain.com       node24
172.18.1.24   node24-myri0.mydomain.com node24-myri0
172.16.1.25   node25.mydomain.com       node25
172.18.1.25   node25-myri0.mydomain.com node25-myri0
172.16.1.26   node26.mydomain.com       node26
172.18.1.26   node26-myri0.mydomain.com node26-myri0
172.16.1.27   node27.mydomain.com       node27
172.18.1.27   node27-myri0.mydomain.com node27-myri0
172.16.1.28   node28.mydomain.com       node28
172.18.1.28   node28-myri0.mydomain.com node28-myri0
172.16.1.29   node29.mydomain.com       node29
172.18.1.29   node29-myri0.mydomain.com node29-myri0
172.16.1.30   node30.mydomain.com       node30
172.18.1.30   node30-myri0.mydomain.com node30-myri0
172.16.1.31   node31.mydomain.com       node31
172.18.1.31   node31-myri0.mydomain.com node31-myri0
172.16.1.32   node32.mydomain.com       node32
172.18.1.32   node32-myri0.mydomain.com node32-myri0
10.2 Configure Network Devices      
Edit /etc/modules.conf, /etc/sysconfig/network-scripts/*, and /etc/sysconfig/network, to create a network configuration that reflects the cluster's design. The following samples work with the example cluster:
/etc/modules.conf
alias eth0 e1000
alias eth1 pcnet32
alias eth2 e100
/etc/sysconfig/network
NETWORKING=yes
HOSTNAME="man-c"
GATEWAY="10.0.0.254"
GATEWAYDEV="eth2"
FORWARD_IPV4="no"
/etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE="eth0"
BOOTPROTO="none"
IPADDR="172.16.100.1"
NETMASK="255.255.0.0"
ONBOOT="yes"
/etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE="eth1"
BOOTPROTO="none"
IPADDR="172.17.100.1"
NETMASK="255.255.0.0"
ONBOOT="yes"
/etc/sysconfig/network-scripts/ifcfg-eth2
DEVICE="eth2"
BOOTPROTO="none"
IPADDR="10.0.0.1"
NETMASK="255.0.0.0"
ONBOOT="yes"
10.3 Setup DNS      
Setup the resolver:

> makedns

If the management node's IP address is listed first in site.tab's nameservers field (as it is in our example), this will generate zone files from the data in /etc/hosts and start a DNS server as a master.

If the management node's IP address is listed in the nameservers field, but not first, then it will become a slave DNS server and will do a zone xfer from the master (the first IP address listed). If the management node's IP address is not in site.tab then it's possible for the first IP address not to be in the cluster. Some installations user their own DNS external to the cluster and setup all the cluster names there. In this case, you just put the IP address of this external DNS server as the #1 address in the nameservers field, and the IP of the management node as #2, then #2 is a slave and will do a zone xfer, then you'll need to update either the .kstmp or /install/post/sync (this is what I do) to copy a resolv.conf to the clients that specify only to use the slave(2) for DNS.

10.4 Verify Name Resolution      
Verify that forward and reverse name resolution work like in the following examples:

> host node01
> host 172.16.1.1

Do not continue until forward and reverse name resolution are working.

10.5 Setup VLANs and Configure Ethernet Switches      
If you have separate subnets for the management and compute networks, like in our example, you need to setup VLANs on the ethernet switches. Experience has shown that many strange problems are solved with the introduction of VLANS. Use VLANs to separate the ports associated with the management, and cluster subnets. A set of somewhat random notes for configuring VLANs on different switches and setting up "spanning-tree portfast" on ciscos is available here.

10.6 Verify Management Node Network Setup      
You might have to wait until some of the management equipment is setup in the steps that follow, to do full verification. You need to check that:
  • You can ping all of the network interfaces
  • You can ping other devices on all of the subnets (cluster, management, external, etc.)
  • You can ping and route through your gateway

Back to TOC

11. Doing the Compute Node Preinstall (stage1)
'stage1' is an automated procedure for updating and configuring system BIOSes. This section describes an easy way to upgrade the x330 compute nodes' firmware and set their BIOS settings. It is x330 specific. You're on your own with non-x330s... just make certain that you enable PXE boot by setting the boot order to be CDROM, Floppy, Network, HD and that you disable virus detection or any other interactive BIOS features.

11.1 Create a CD/Floppy to Configure the BIOS and SPN in Each x330      
If you have a lot of compute nodes, it's a good idea to make multiple sets. The flash/configuration process can take up to 10 minute per machine, so you don't want wait around for 10 minutes for each node.

For 1 GHz machines (model 8654):
Create a CD from the ISO image located at:
/usr/local/xcat/stage/stage1/x330.iso

Create a floppy from the floppy image located at:
/usr/local/xcat/stage/stage1/x330.dd
This can be done under Linux buy placing a blank DOS formatted floppy into the floppy drive and:
> dd if=x330.dd of=/dev/fd0 bs=1024 conv=sync; sync

For machines > 1 GHz (model 8674):
Use x330-8674-1.02.dd which will be in the next version of xCAT. You don't need a CD.

11.2 Paranoia      
Things just seem to work better when you do this... Remove AC Power from all compute nodes. Disconnect all the SPN serial networking.
After waiting a minute or so, restore AC Power to compute nodes and apply power to ASMA cards.

11.3 Flash the Compute Nodes      
Insert the floppy (and the CD if you're flashing a model 8654) into each compute node, power on the node and let the disks do their magic. There is no need for any manual intervention during this process... ignore any BIOS errors etc, they'll go away in 15 seconds or so. When the flash/config is finished, the CD will eject and the display should have an obvious "I'm done" message.

Reconnect the SPN serial network after all the machines have been flashed.
You shouldn't make any compute node BIOS modifications after this procedure.

11.4 Serial Port Notes      
In the past, it was necessary to make a serial port hardware change inside of the x330. This is no longer necessary. All the bugs are fixed with the BIOS so, please use ttyS0/COM1/COMa.

Back to TOC

12. Configuring the Terminal Servers
This section describes setting up ELS and ESP terminal servers and conserver. Your cluster will probably have either ELSes or ESPs so you can skip the instructions for the terminal server type that is not a part of your cluster. Terminal servers enable out-of-band administration and access to the compute nodes... e.g. watching a compute node's console remotely before the compute node can be assigned an IP address or after the network config gets messed up, etc.

12.1 Learn About Conserver      
Check out conserver's website.

12.2 Setup ELS Terminal Servers      
This section describes how to configure the
Equinox ELS terminal server. If you're using the ESP terminal servers instead of the ELSes, you'll want to skip this section and skip ahead to 12.3 and follow the ESP instructions.
    12.2.1 conserver.cf Setup      
    Modify /usr/local/xcat/etc/conserver.cf
    This has already been covered in the configuring xCAT section, but this explains it...

    Each node gets a line like:

    nodeXXX:!tsx:yyyy:&:

    where x = ELS Unit number and yyyy = ELS port + 3000 e.g. node1:!ts1:3001:&: means access node1 via telnet to ts1 on port 3001. 'node1' should be connected to ts1's first serial port.

    12.2.2 Set ELS's IP Address      
    For each ELS unit in your cluster...

    Reset the ELS to factory defaults. You usually have to push the reset botton. If the button is green, just push it. If the botton is white, you need to hold it down until the link light stops blinking. All the new units have green buttons.

    Connect the DB-9 adaptor Equinox part #210062 to the management nodes's first serial port (COM1) and connect a serial cable from the ELS to the DB-9 adapter. You can test that the serial connection is good with:

    > cu -l /dev/ttyS0 -s 9600

    Hit Return to connect and you should see:

    Username>

    Unplug the serial cable to have cu hangup and then reconnect it for the next step:
    > setupelsip <ELS_HOSTNAME>

    Test for success:
    > ping <ELS_HOSTNAME>

    12.2.3 Final ELS Setup      
    After assigning the ELS' IP address over the serial link, use

    > setupels <ELS_HOSTNAME>

    to finish the setup for each ELS in your cluster. This sets up the terminal server's serial settings. After the serial settings are set, you can not use setupelsip again, because the serial ports have been set for reverse use. A reset of the unit will have to be performed again, if you need to change the IP address.
12.3 Setup ESP Terminal Servers      
This section describes how to configure the
Equinox ESP terminal server. If you're using ELS terminal servers, as most of the examples in this document do, you should skip this section and use the ELS section instead.

    12.3.1 conserver.cf Setup      
    Modify /usr/local/xcat/etc/conserver.cf

    Each node gets a line like:

    nodeXXX:/dev/ttyQxxyy:9600p:&:

    where xx = ESP Unit number and yy = ESP port (in hex) e.g. ttyQ01e0

    12.3.2 Build ESP Driver      
    Install the RPM (must be 3.03 or later!)

    12.3.3 ESP Startup Configuration      
    Type /usr/local/xcat/sbin/updaterclocal (you can run this multiple times without creating problems). You need to run this because the ESP RPM puts evil code in the rc.local file, that forces the ESP to load very last and any other service that needs the ESP to start (e.g. conserver) will fail.

    > cp /usr/local/xcat/rc.d/espx /etc/rc.d/init.d/
    > chkconfig espx on

    12.3.4 ESP Driver Configuration      
    Note the mac address of each ESP and manually create the /etc/eqnx/esp.conf file. All that esp util does is create this file, you can do it yourself and save a lot of time. No need to setup DHCP for the ESPs this way.

    > service espx stop
    > rmmod espx
    > service espx start
12.4 Setup Conserver      
We user the prebuilt conserver in /usr/local/xcat/sbin/conserver, but it is possible to build and install conserver from /usr/local/xcat/src/.

Make certain you have a valid /usr/local/xcat/etc/conserver.tab and /usr/local/xcat/etc/conserver.cf. conserver.tab was covered in the configuring xCAT section, but you might want to read its man page again and make certain you have it correct now.

Copy a conserver startup file to init.d
> cp /usr/local/xcat/rc.d/conserver /etc/rc.d/init.d/


12.5 Start Conserver and Make It Start at Boot      
> service conserver start
> chkconfig conserver on

Back to TOC

13. Completing Management Node Setup
Here, we setup the final services necessary for a functioning management node.

13.1 Copy xCAT init Files      
This will enable some services to start at boot time and change the behavior of some existing services

> cd /usr/local/xcat/rc.d
> cp acct atftpd conserver maui watchlogd /etc/rc.d/init.d/
> cp pbs* portmap snmptrapd syslog /etc/rc.d/init.d/

There are other init files in /usr/local/xcat/rc.d that you may wish to use, depending on your installation.

13.2 Copy the 'post' Files for RedHat 6.2 and RedHat 7.x      
Copy some install files from the xCAT distribution to the post directory that is used during unattended installs:

> cd /usr/local/xcat/post
> find . | cpio -dump /install/post

13.3 Setup syslog      
Here we enable remote logging...

> cp /usr/local/xcat/samples/syslog.conf /etc
> touch /var/log/pipemessages
> service syslog restart

On RH7.x based installs, you might want to edit /etc/sysconfig/syslog, changing SYSLOGD_OPTIONS and add the -r switch instead of copying the modified rc.d/syslogd.

13.4 Setup snmptrapd      
snmptrapd received messages from the SPN.

> chkconfig snmptrapd on
> service snmptrapd start

13.5 Setup watchlogd      
watchlogd monitors syslog and sends email alerts to a user defined list of admins in the event of a hardware error. This functionality (hardware error alerts) requires a SPN.

> chkconfig watchlogd on
> service watchlogd start

You also must setup an alias in /etc/aliases called alerts. This alias is a comma delimited list of admins that should receive these hardware alerts as email.

13.6 Install SSH RPMs for RedHat 6.2      
This step isn't required for RedHat 7.x

> cd /usr/local/xcat/post/rpm62
> rpm -ivh --force --nodeps openssh*.rpm

13.7 Setup NFS and NFS Exports      
Make /etc/exports look something like the following:

/install node*(ro,no_root_squash)
/tftpboot node*(ro,no_root_squash)
/usr/local node*(ro,no_root_squash)
/home node*(rw,no_root_squash)

Turn on NFS:

> chkconfig nfs on
> service nfs start
> exportfs -ar # (to source)
> exportfs # (to verify)

13.8 Setup NTP      
RedHat 7.x:
> chkconfig ntpd on
> service ntpd start

RedHat 6.2:
> chkconfig xntpd on
> service xntpd start

13.9 Setup TFTP      
The following works with RH6.2 and 7.x:

> rpm -ivh /usr/local/xcat/post/rpm71/atftp-0.4-1.i386.rpm

> chkconfig atftpd on
> service atftpd start

Test that tftp is working by monitoring /var/log/messages and:

> tftp localhost
> get bogus_file
> quit

In the log output you should see tftp try to service the 'get' request.

13.10 Initial DHCP Setup      
    13.10.1 Collect the MAC Addresses of Cluster Equipment      
    Place the MAC addresses of cluster equipment that needs to DHCP for an IP address into /usr/local/xcat/etc/<MANAGEMENT_NET>.tab. See the man page for macnet.tab.

    If you have APC master switches, put their MAC addresses into this file.

    13.10.2 Make the Initial dhcpd.conf Config File      
    > gendhcp --new

    13.10.3 Edit dhcpd.conf      
    Check for anything out of the ordinary
    > vi /etc/dhcpd.conf

    13.10.4 Important DHCP Note      
    You probably don't want DHCP running on the network interface that is connected to the rest of the network. Except for in special circumstances, you'll want to remove the network section from dhcpd.conf that corresponds to the external network and then explicitly list the interfaces you want dhcpd to listen on in /etc/rc.d/init.d/dhcpd (leaving out the external interface).

    # BEGIN portion of /etc/rc.d/init.d/dhcpd
    daemon /usr/sbin/dhcpd eth1 eth2
    # END portion of /etc/rc.d/init.d/dhcpd

    On RedHat 7.2, you edit /etc/sysconfig/dhcpd instead of modifying /etc/rc.d/init.d/dhcpd, with something like:

    DHCPDARGS="eth1 eth2"
13.11 Setup NIS      
If you're using the management node as a NIS server for your cluster:
    13.11.1 Verify That xCAT Is Configured for NIS      
    Check stuff in site.tab.

    13.11.2 Setup Management Node as an NIS Server      
    > gennis

13.12 Copy the Custom Kernel to a Place Where Installs Can Access It      
Copy the custom kernel tarball you installed in step 7 to /install/post/kernel/.
Edit the KERNELVER variable where appropriate in /usr/local/xcat/ksxx/*.kstmp.

13.13 Copy the RedHat Install CD(s)      
  • RedHat 7.x
    Do the right thing substituting 1 or 2 for x for RedHat 7.1 or 7.2.
    1. Mount Install CD #1
      > mount /mnt/cdrom

    2. Copy the files from CD #1
      > cd /mnt/cdrom
      > find . | cpio -dump /install/rh7x

    3. Unmount Install CD #1
      > cd / ; umount /mnt/cdrom ; eject

    4. Mount Install CD #2
      > mount /mnt/cdrom

    5. Copy the files from CD #2
      > cd /mnt/cdrom
      > find RedHat | cpio -dump /install/rh7x

    6. Patch the files
      > /usr/local/xcat/build/rh7x/applypatch


  • RedHat 6.2
    1. Mount the Install CD
      > mount /mnt/cdrom

    2. Copy the files
      > cd /mnt/cdrom
      > find . | cpio -dump /install/rh62

    3. Patch the files
      Accept reverse patch errors (just hit enter)
      The following should be typed as one line
      > patch -p0 -d /install/rh62 < /usr/local/xcat/build/rh62/ks62.patch

13.14 Generate root's SSH Keypair      
The following command create's a SSH keypair for root with an empty passphrase, sets up root's ssh configuration, and copies keypair and config to /install/post/.ssh so that all installed nodes will have the same root keypair/config.

> gensshkeys root

13.15 Clean Up the Unneeded .tab Files      
In /usr/local/xcat/etc/, move unneeded .tab files somewhere out of the way e.g. rtel.tab, tty.tab, etc.

Back to TOC

14. Collecting MAC Addresses (stage2)
In this section, we collect the MAC addresses of the compute nodes and create entries in dhcpd.conf for them.

14.1 Setup      
> cd /usr/local/xcat/stage
> ./mkstage

14.2 Prepare to Monitor stage2 Progress      
> wcons -t 8 compute (or a subset like rack01)
> tail -f /var/log/messages (you should always be watching messages)

14.3 Reboot Compute Nodes      
You'll have to do this manually.

14.4 Collect the MACs      
Once you see that all the compute nodes are spitting their MAC addresses out of their serial consoles...
> getmacs compute

14.5 Kill the wcons Windows      
> wkill

14.6 Populate dhcpd.conf      
> makedhcp compute

At this point, a dhcpd will be running. So you might want to again make certain that it is only listening on the interfaces that you want it to be.

Back to TOC

15. Configuring ASMA/RSA/SPN (stage3)
Stage3 is a mostly automated procedure for configuring the Service Processor Network on IBM xSeries machines. This section describes how to perform stage3 with ASMA or RSA adapters. If your cluster doesn't have a Service Processor Network, you can skip this section.

15.1 Read the xCAT xSeries Management Processor HOWTO      
Learn more about the Service Processor Network at: http://x-cat.org/docs/mp-HOWTO.html

15.2 Perform the manual steps      
  • For ASMA adapters:
    1. Download the ASMA config utility:
      http://www-1.ibm.com/support/ ->
      Server downloads: xSeries, Netfinity and NUMA-Q ->
      Get fixes: Download device drivers, BIOS, and updates for preloaded software ->
      Device drivers by server ->
      IBM e(logo)server xSeries 330, 340, 350, 370 ->
      Systems Management - Advanced Systems Management Processor Firmware Update - x330 (v1.05 at the time of this writing)

    2. Create the ASMA config floppy:
      Using DOS (be certain that you use 'command' and not 'cmd' under NT or Win2k), run the .exe and follow the prompts.

    3. Configure ASMA cards:
      For each node that contains an ASMA card, you need to take the following steps, using the SPN setup floppy disk. Once you have booted this floppy disk, select Configuration Settings -> Systems Management Adapter and apply the following changes under Network Settings:
      • Enable network interface
      • Set local IP address for ASMA network interface
      • Set subnet mask
      • Set gateway

  • For RSA adapters:
    1. Record the adapters' MAC addresses:
      Place the mac addresses in in <management-network>.tab (172.17.0.0.tab in our example).

    2. Modify dhcpd.conf and boot the RSA adapters:
      > gendhcp -new
      This puts static entries into dhcpd.conf from <management-network>.tab. Remove the power from the RSA adapters and then reapply it. The adapters should boot with the correct IP addresses.
15.3 Check That the Management Processors Are on the Network      
> pping asma

This does a parallel ping to all the nodes that are defined as being a memeber of the group 'asma' in nodelist.tab. If you're using RSA adapters, you may have a group 'rsa' instead of 'asma'. If any of the adapters show up as 'noping', you'll have to investigate why they are not connecting to the network before you continue.

15.4 Program the Management Processors      
> mpasetup asma

This command sets up things like alerts, SNMP information, etc on each management adapter in the 'asma' group over IP. Again you might want to user a 'rsa' group if you have RSA adapters.

15.5 Verify the Management Processors Were Programed Correctly      
> mpacheck asma

At this point all of the Management Processors should be correctly programmed. Next, we'll configure the Service Processor devices that are in each compute node...

15.6 Nodeset      
The following command makes the nodes PXE boot the stage3 image. (it alters the files in /tftpboot/pxelinux.cfg/)
> nodeset compute stage3

15.7 Prepare to Monitor the stage3 Progress      
> wcons -t 8 compute (or a subset like rack01)
> tail -f /var/log/messages (you should always be watching messages)

15.8 Reboot Compute Nodes      
You'll have to do this manually.

15.9 Watch the Show      
You should see the SPN procedure move forward smoothly in all the compute nodes' wcons windows.

15.10 Test Out Some SPN Commands      
Read the man pages for rvitals, rinv, and rpower, etc. and then try out some of these commands on your cluster.

15.11 mpncheck      
> mpncheck compute

Back to TOC

16. Installing Compute Nodes
LaLaLa.

16.1 Edit/Generate Kickstart Scripts      
Modify kickstart template file if needed. Substitute your version of RedHat for xx... (/usr/local/xcat/ksxx/computexx.kstmp)

Generate real kickstart scripts from the templates:
> cd /usr/local/xcat/ksxx; ./mkks

16.2 Nodeset      
The following command makes the nodes PXE boot the RedHat kickstart image. (it alters the files in /tftpboot/pxelinux.cfg/)
> nodeset compute install

16.3 Prepare to Monitor the Installation Progress      
> wcons -t 8 compute (or a subset like rack01)
> tail -f /var/log/messages (you should always be watching messages)

16.4 Reboot the Compute Nodes      
You might want to do only a subset of 'compute'
> rpower compute boot

16.5 A Better Way to Install      
Instead of the three steps: nodeset, wcons, rpower, its much easier to user the single command:

> winstall -t 8 compute

This command accomplishes the above three commands in one step.

When going through the install procedure, you'll probably want to install onto only a single machine until you're fairly certain that the install is working well.. then do installs on the whole 'compute' group.

16.6 Installs with No Terminal Servers      
LaLa.

16.7 Verify that the Compute Nodes Installed Correctly      
LaLa.

16.8 Update the SSH Global Known Hosts File      
> makesshgkh compute (or, again, a subset of 'compute')

16.9 Test SSH and psh      
> psh compute date | sort
The output here will be a good way to see if SSH/gkh is setup correctly on all of the compute nodes (a requirement for most cluster tasks). If a node doesn't appear here correctly, you must go back and troubleshoot the individual node, make certain the install happens correctly, rerun makesshgkh, and finally test again with psh. You really must get psh working correctly before continuing.

Back to TOC

17. Installing/Configuring Myrinet Software
The following section only applies to clusters that use Myrinet. It gives an example of creating a GM rpm on RedHat 6.2 and installing this driver on the compute nodes. With RedHat 7.1, the procedure will be slightly different (the kernel version will be different). You may wish to do the rpm building part of this section before you do the previous step to avoid having to install the compute nodes multiple times.

17.1 Make Certain xCAT Configuration Is Ready for Myrinet      
Set up a separate subnet for each compute node, node01-myri0, node02-myri0, etc. in /etc/hosts, DNS, etc.
Verify forward and reverse name resolution for these host names.
Add all the hosts that have Myrinet cards to the 'myri' group in nodelist.tab.

17.2 Get the Latest GM Source from Myricom      
ftp://ftp.myri.com/pub/GM/gm-1.5_Linux.tar.gz

17.3 Copy the Source to a Build Location      
> cp gm-1.5_Linux.tar.gz /tmp

17.4 Build the GM rpm      
> cd /tmp ; /usr/local/xcat/build/gm/gmmaker 1.5_Linux
The result should be a GM rpm in /usr/src/redhat/RPMS/i686

17.5 Copy the rpm to a Place That Is Accessible by the Kickstart Install Process      
> cp /usr/src/redhat/RPMS/i686/gm-1.5_Linux-2.2.19-4hpc.i686.rpm /install/post/kernel

17.6 Make Kickstart Files Aware of This GM Version      
Edit your .kstmp, setting the GMVER variable to the appropriate value... ( GMVER=1.5_Linux in our example )

17.7 Regenerate Kickstart Files      
> cd /usr/local/xcat/ksxx
> ./mkks

17.8 Do New Compute Node Installs      
Refer to the previous section.

17.9 Look at the Pretty Lights      
Check the Myrinet switch. If the compute nodes come up with the GM driver loaded and all cabling is correct, you should see lights on all the connected ports. Do you see the light?

17.10 Generate Myrinet Routing Maps      
> makegmroutes myri

17.11 Verify Connectivity Between All Myrinet Hosts      
> psh myri /usr/local/xcat/sbin/gmroutecheck myri
No output means success.

17.12 Install the GM rpm on the Management Node      
> rpm -ivh /usr/src/redhat/RPMS/i686/gm-1.5_Linux-2.2.19-4hpc.i686.rpm
If the management node doesn't have a Myrinet card, you'll want to keep GM from loading at boot...
> chkconfig gm off

Back to TOC

18. Installing Portland Group Compilers
This section is out of date. You will benefit from reading the FAQ at: http://www.pgroup.com/faq.htm

Connect to the Portland Group's ftp server: ftp://ftp.pgroup.com/.

  • If a temporary installation:

    Get x86/linux86-HPF-CC.tar.gz
    Make a temporary directory and move the .tar file there.
    temp> tar -xzvf linux86-HPF-CC.tar.gz
    Read the INSTALL file.
    ./install
    Enter accept
    Enter 5 (PGI workstation or PGI server)
    Install Directory? /usr/local/pgi
    Create eval Licence? Yes
    Would you like the installation to be read only? Yes

    There are prob errors in this documentation... check and correct
  • For a permanent IBM customer install:

    Get ftp.pgroup.com/x86_ibm/pgi-3.2.i386.rpm
    > rpm -ivh pgi-3.2.i386.rpm
    > cd /usr/src/redhat/BUILD/pgi-3.2
    Edit Install, modifying INSTALL_DIR to use /usr/local/pgi.
    > ./install
    accept
    Create a file with all of the nodes in your cluster when asked by the install script.
    > lmutil lmhostid (If you have a license key)
    Now go to https://www.pgroup.com/License .
    Enter Username: (Provided by PGI)
    Enter Password:
    Click Generate License Key
    Scroll down to the bottom and click the <Issue Main Keys> button. In the FLEXln hostid field, enter the characters from the lmutil command above.

    Copy the output and replace the hostname in the generated license.
    Look at the path names in this file. Change /usr/pgi/... to /usr/local/pgi/...
    Place the contents in $PGI/license.dat

    Edit /etc/profile, adding
    export LM_LICENSE_FILE=$(PGI)/license.dat


    Edit /usr/local/pgi/linux86/bin/lmgrd.rc, modifying the PGI environment variable to look like this:
    PGI=${PGI:-/usr/local/pgi}
    Start the license manager and have it start at boot:
    > cp /usr/local/pgi/linux86/bin/lmgrd.rc /etc/rc.d/init.d
    > chkconfig --add lmgrd.rc
    > chkconfig --level 345 lmgrd.rc on
    > service lmgrd.rc start

Back to TOC

19. Installing MPICH MPI Libraries
MPI is a standard library used for message passing in parallel applications. This section documents how to install the MPICH MPI implementations that are used over ethernet and Myrinet.

  • MPICH
    Use MPICH if you want to run MPI over ethernet. Skip this if you only want to run MPICH over Myrinet.

    1. Learn about MPICH:
      MPICH's homepage is at: http://www-unix.mcs.anl.gov/mpi/mpich/

    2. Download MPICH:
      ftp://ftp.mcs.anl.gov/pub/mpi/mpich.tar.gz

    3. Build MPICH:
      > cd /usr/local/xcat/build/mpi
      > cp /where/ever/you/put/it/mpich.tar.gz .
      > mv mpich.tar.gz mpich-1.2.3.tar.gz
      > ./mpimaker (This will show you the possible arguments. You may want to use different ones.)
      > ./mpimaker 1.2.3 smp gnu ssh
      You can 'tail -f mpich-1.2.3/make.log' to view the build's progress.
      When done, you should have stuff in /usr/local/mpich/1.2.3/ip/smp/gnu/ssh

    4. Adjust environment:
      Add the following to your ~/.bashrc. You can put it in /etc/profile if you are only using this MPI lib:
      export MPICH="/usr/local/mpich/1.2.3/ip/smp/gnu/ssh"
      export MPICH_PATH="${MPICH}/bin"
      export MPICH_LIB="${MPICH}/lib"
      export PATH="${MPICH_PATH}:${PATH}"
      export LD_LIBRARY_PATH="${MPICH_LIB}:${LD_LIBRARY_PATH}"

    5. Test the environment:
      After re-sourcing the environment changes that you've made, it's a good idea to validate that everything is correct. A simple, but not complete, way to do this is:

      > which mpicc

      If you're setup for MPICH like in the above example, the output of this command should be:

      /usr/local/mpich/1.2.3/ip/smp/gnu/ssh/bin/mpicc


  • MPICH-GM
    MPICH-GM is a special version of MPICH that communicates over Myrinet's low-level GM layer.
    Use MPICH-GM if you want to run MPI over Myrinet. Skip this if you don't have Myrinet.

    1. Learn about MPICH-GM:
      Some information is available in Myricom's Software FAQ

    2. Download MPICH-GM:
      ftp://ftp.myri.com/pub/MPICH-GM/mpich-1.2.1..7.tar.gz

    3. Build MPICH-GM:
      > cd /usr/local/xcat/build/mpi
      > cp /where/ever/you/put/it/mpich-1.2.1..7.tar.gz .
      > ./mpimaker (This will show you the possible arguments. You may want to use different ones.)
      > ./mpimaker 1.2.1..7:1.5_Linux-2.2.19-4hpc smp gnu ssh
      You can 'tail -f mpich-1.2.1..7/make.log' to view the build's progress.
      When done, you should have stuff in /usr/local/mpich/1.2.1..7/gm-1.5_Linux-2.2.19-4hpc/smp/gnu/ssh

    4. Adjust environment:
      Add the following to your ~/.bashrc. You can put it in /etc/profile if you are only using this MPI lib:
      export MPICH="/usr/local/mpich/1.2.1..7/gm-1.5_Linux-2.2.19-4hpc/smp/gnu/ssh"
      export MPICH_PATH="${MPICH}/bin"
      export MPICH_LIB="${MPICH}/lib"
      export PATH="${MPICH_PATH}:${PATH}"
      export LD_LIBRARY_PATH="${MPICH_LIB}:${LD_LIBRARY_PATH}"

    5. Test the environment:
      After re-sourcing the environment changes that you've made, it's a good idea to validate that everything is correct. A simple, but not complete, way to do this is:

      > which mpicc

      If you're setup for MPICH like in the above example, the output of this command should be:

      /usr/local/mpich//gm-1.5_Linux-2.2.19-4hpc/smp/gnu/ssh/bin/mpicc

Back to TOC

20. Installing LAM MPI Libraries
LAM is a lesser used alternative to MPICH for message passing with MPI. It is reportedly faster then MPICH over TCP/IP. The stable version of LAM runs only over TCP/IP (no GM). This section documents how to install LAM. Skip this section if you don't need to use LAM.
  1. Learn about LAM:
    LAM's homepage is at http://www.lam-mpi.org/

  2. Download LAM:
    http://www.lam-mpi.org/download/files/lam-6.5.6.tar.gz

  3. Build LAM:
    (You may wish to use xcat/buil/lam/lammaker instead of these instructions. Run it without any arguments for usage.)
    > cd /usr/src
    > cp /where/ever/you/put/it/lam-6.5.6.tar.gz .
    For the Portland Group compilers:
    > ./configure --prefix=/usr/local/lam-6.5.6/ip/pgi/ssh --with-rpi=usysv \
    --with-rsh='ssh -x' --with-fc=pgf90 \
    --with-cc=pgcc --with-cxx=pgCC
    > make
    > make install
    For the GNU compilers:
    > ./configure --prefix=/usr/local/lam-6.5.6/ip/gnu/ssh --with-rpi=usysv \
    --with-rsh='ssh -x'
    > make
    > make install

  4. Adjust environment:
    Add the following to your ~/.bashrc. You can put it in /etc/profile if you are only using this MPI lib:
    export LAM="/usr/local/lam-6.5.6/ip/gnu/ssh"
    export LAM_PATH="${LAM}/bin"
    export LAM_LIB="${LAM}/lib"
    export PATH="${LAM_PATH}:${PATH}"
    export LD_LIBRARY_PATH="${LAM_LIB}:${LD_LIBRARY_PATH}"

  5. Test the environment:
    After re-sourcing the environment changes that you've made, it's a good idea to validate that everything is correct. A simple, but not complete, way to do this is:

    > which mpicc

    If you're setup for LAM like in the gcc example from above, the output of this command should be:

    /usr/local/lam-6.5.6/ip/gnu/ssh/bin/mpicc

Back to TOC

21. Installing PBS Resource Manager
PBS is a free tool that enables you to run batch jobs on a cluster. Here's how it can be setup quickly to work with our example:

21.1 Learn About PBS      
Check out the PBS homepage at http://www.openpbs.com/. You need to go through some rig-a-ma-roll to get a username and password. After you get a username and password, download and read the manual.

21.2 Download the PBS Source      
Make certain you have a username and password. You can't access the source without them.
Download the source from: http://www.openpbs.org/UserArea/Download/OpenPBS_2_3_12.tar.gz

21.3 Build PBS      
> cd /usr/local/xcat/build/pbs
> cp /where/ever/you/put/it/OpenPBS_2_3_12.tar.gz .
> ./pbsmaker OpenPBS_2_3_12.tar.gz scp

You should now have stuff in /usr/local/pbs and environment files in /etc/profile.d/.

Back to TOC

22. Installing Maui Scheduler
Maui is an OpenSource scheduler that offers advanced scheduling algorithms and integrates with PBS. Here's how it can be setup quickly to work with our example:

22.1 Learn About Maui      
The homepage is at: http://www.supercluster.org/.
It would be a good idea to read the docs.

22.2 Download Maui Source      
Download the source from: http://supercluster.org/downloads/maui/maui-3.0.7p2.tar.gz

22.3 Build Maui      
> cd /where/ever/your/put/the/tarball
> tar -xzvf maui-3.0.7p2.tar.gz
> cd maui-3.0.7
> ./configure
     Maui Installation Directory? /usr/local/maui
     Maui Home Directory? /usr/local/maui
     Compiler? gcc
     Checksum SEED? 123
     Correct? Y
     Do you want to use PBS? [Y|N] default (Y)
     PBS Target Directory: /usr/local/pbs
> make
> make install

You should now have stuff in /usr/local/maui.

> mkdir /var/log/maui

22.4 Edit Maui Configuration      
Edit /usr/local/maui/maui.cfg, changing the 'LOGFILE' and 'RMPOLLINTERVAL' directives to read:
LOGFILE /var/log/maui/maui.log
RMPOLLINTERVAL 00:00:30

Back to TOC

23. Deploying PBS on the Cluster
If you're running PBS and Maui and you've installed them in the above two steps, you'll want to follow the instructions in this section to finish their setup and deploy them on the compute nodes.

23.1 Deploy      
> genpbs compute
(Where 'compute' is a nodelist.tab group that includes all your compute nodes)

23.2 Verify      
> showq
An example of part of the expected output on a 32 node, dual CPU cluster follows:
0 Active Jobs       0 of  64 Processors Active (0.00%)
                    0 of  32 Nodes Active      (0.00%)
If you don't see this kind of output, something is wrong with your PBS setup. You should fix it before you continue.

Back to TOC

24. Adding Users and Setting Up User Environment
There are a number of thing necessary to setup for each user, before they can run jobs within the framework of the example architecture. Some of the things covered in this section have been covered previously. Use your judgment on if and where to apply the following:

24.1 Setup MPI and Other Environment in /etc/skel/      
If you don't set the MPI environment globally in /etc/profile, /etc/csh.login, or you haven't done this already, you'll need to add something like the following to /etc/skel/.bashrc and the csh equilivant to /etc/skel/.cshrc, so that the adduser command below will automatically pick up this environment. This example is for MPICH-GM. Use whatever MPI library your users plan on using:

export MPICH="/usr/local/mpich/1.2.1..7/gm-1.5_Linux-2.2.19-4hpc/smp/gnu/ssh"
export MPICH_PATH="${MPICH}/bin"
export MPICH_LIB="${MPICH}/lib"
export PATH="${MPICH_PATH}:${PATH}"
export LD_LIBRARY_PATH="${MPICH_LIB}:${LD_LIBRARY_PATH}"

24.2 Add User      
> addclusteruser username

This command automates a lot of user setup. More on what it does after I get a chance to play with it a little. I wish there was a man page, but the source tells all.


Back to TOC

25. Verifying Cluster Operation With Pseudo Jobs and PBS
At this point, the cluster is almost ready to go. This section outlines a number of tests that will show that the infrastructure is in place for jobs to be successfully run on the cluster:

25.1 Check that the Compute Nodes Know About Your Users      
> ssh node01 (as root)
> su - ibm (or what ever user your testing)
> touch ~/test_of_read_write_home_dir (or what ever user your testing)

If you can't su to this user, something is wrong. If you're using NIS, there's an NIS problem... 'ypwhich; ypcat passwd' to test. If you're not using NIS, you probably haven't added this user to the compute node's /etc/passwd, etc. If you cant touch a file in the user's home directory, you don't have a writable home directory... fix it.

Make the above test work before continuing.

25.2 Test ssh to Compute Node as Regular User      
> su - ibm (or what ever user you're testing (on the management node))
> ssh node01

If you're using access control, like we are in the example xCAT configuration (ACCESS = Y in noderes.tab for node01), you should get a permission denied error. This is the correct behavior.... user's can only ssh to a resource after PBS has allocated the node to the user. If you're not using access control, you should be able to ssh to node01 as a regular user.

25.3 Test Interactive PBS Job to a Single Node      
This will validate that PBS is working and that your user's ssh key-pair is setup correctly.

Request a job:
> su - ibm (or whatever user you're testing (on the management node))
> qsub -l nodes=1,walltime=10:00 -I

This submits an interactive job to PBS, asking for 1 node for 10 minutes. After a bit, PBS should put you on one of the compute nodes. This should look something like:
qsub: waiting for job 1.man-c to start
qsub: job 1.man-c ready

----------------------------------------
Begin PBS Prologue Tue Oct 30 16:44:56 MST 2001
Job ID:		1.man-c
Username:	ibm
Group:		ibm
Nodes:		node32
End PBS Prologue Tue Oct 30 16:44:56 MST 2001
----------------------------------------
[ibm@node32 ibm]$
If it doesn't, your PBS setup is broken. Fix it before continuing. If you get a permission denied type of error, your user's ssh key-pair or ssh configuration isn't setup correctly. Fix it before continuing.

When you can successfully get to the compute node via PBS as a regular user, try to ssh back to the head node. You should be able to without supplying a password. If you can't, something's broken. Fix it before continuing.

25.4 Test Interactive PBS Job to Multiple Nodes      
This will validate that PBS is working, that your user's ssh key-pair is setup correctly, and that jobs will work across more than one compute node.

Request a job:
> su - ibm (or whatever user you're testing (on the management node))
> qsub -l nodes=2,walltime=10:00 -I

This submits an interactive job to PBS, asking for 2 nodes for 10 minutes. After a bit, PBS should put you on one of the compute nodes and give you a list of the compute nodes that you have access to. This should look something like:
qsub: waiting for job 2.man-c to start
qsub: job 2.man-c ready

----------------------------------------
Begin PBS Prologue Tue Oct 30 16:47:36 MST 2001
Job ID:		2.man-c
Username:	ibm
Group:		ibm
Nodes:		node31 node32
End PBS Prologue Tue Oct 30 16:47:36 MST 2001
----------------------------------------
[ibm@node32 ibm]$ 
Test using ssh between the compute nodes you have access to and the headnode. You should be able to ssh to and from all these nodes without supplying a password. If you can't, something's broken. Fix it before continuing.

Back to TOC

26. Running a Simple MPI Job Interactively via PBS
This section outlines how to build and run a simple MPI job:
  1. Be ready:
    Make certain that everything from section 24 works.

  2. Build a simple test MPI program:
    Here we build the simple MPI program cpi with a few different MPI libraries. cpi calculates the value of pi using numerical integration, in C. You only need to build with the libs that you are interested in running. cpi is not a particularly good test in terms of completeness or performance, but it does serve as a good first step for validating MPI and parallel operation.

    Create a place to build:
    > su - ibm (or whatever user you're using to test)
    > mkdir ~/cpi

    Copy the cpi source to this directory:
    > cp /where/you/put/it/mpich-1.2.2.3.tar.gz ~/cpi/
    > cd ~/cpi
    > tar -xzvf mpich-1.2.2.3.tar.gz
    > cp mpich-1.2.2.3/examples/basic/cpi.c ~/cpi

    Build the program:
    These are just examples. Exact path names, etc. may vary with your setup.

    • MPICH-IP
      1. Verify environment:
        > su - ibm (or whatever user you're using to test)
        Make certain that your environment (section 23) is setup correctly for MPICH-IP. i.e. 'which mpicc' should result in /usr/local/mpich/1.2.2.3/ip/smp/gnu/ssh/bin/mpicc, etc.
      2. Build:
        > cd ~/cpi
        > mkdir mpich-ip; cd mpich-ip
        > cp ~/cpi/cpi.c ~/cpi/mpich-ip
        > mpicc -o cpi cpi.c

    • MPICH-GM
      1. Verify environment:
        > su - ibm (or whatever user you're using to test)
        Make certain that your environment (section 23) is setup correctly for MPICH-GM. i.e. 'which mpicc' should result in something like /usr/local/mpich/1.2.1..7/gm-1.5_Linux-2.2.19-4hpc/smp/gnu/ssh/bin/mpicc, etc.
      2. Build:
        > cd ~/cpi
        > mkdir mpich-gm; cd mpich-gm
        > cp ~/cpi/cpi.c ~/cpi/mpich-gm
        > mpicc -o cpi cpi.c

    • LAM-IP
      1. Verify environment:
        > su - ibm (or whatever user you're using to test)
        Make certain that your environment (section 23) is setup correctly for LAM-IP. i.e. 'which mpicc' should result in /usr/local/lam/blah-blah/mpicc, etc.
      2. Build:
        > cd ~/cpi
        > mkdir lam-ip; cd lam-ip
        > cp ~/cpi/cpi.c ~/cpi/lam-ip
        > mpicc -o cpi cpi.c

  3. Run simple MPI jobs interactively:
    These are just examples. Exact path names, etc. may vary with your setup.

    • MPICH-IP
      1. Verify environment:
        > su - ibm (or whatever user you're using to test)
        Make certain that your environment (section 23) is setup correctly for MPICH-IP. i.e. 'which mpicc' should result in /usr/local/mpich/1.2.2.3/ip/smp/gnu/ssh/bin/mpicc, etc.
      2. Request resources and run the job:
        Your session should look something like the following...
        [ibm@man-c ibm]$ cd ~/cpi/mpich-ip
        [ibm@man-c mpich-ip]$ qsub -l nodes=4,walltime=10:00:00 -I
        qsub: waiting for job 3.man-c to start
        qsub: job 3.man-c ready
        
        ----------------------------------------
        Begin PBS Prologue Tue Oct 30 19:35:17 MST 2001
        Job ID:         3.man-c
        Username:       ibm
        Group:          ibm
        Nodes:          node32 node31 node30 node29
        End PBS Prologue Tue Oct 30 19:35:17 MST 2001
        ----------------------------------------
        [ibm@node32 ibm]$ cd $PBS_O_WORKDIR
        [ibm@node32 mpich-ip]$ which mpirun
        /usr/local/mpich/1.2.2.3/ip/smp/gnu/ssh/bin/mpirun
        [ibm@node32 mpich-ip]$ NP=$(wc -l $PBS_NODEFILE | awk '{print $1}')
        [ibm@node32 mpich-ip]$ mpirun -machinefile $PBS_NODEFILE  -np $NP cpi
        Process 0 of 4 on node1
        pi is approximately 3.1415926544231239, Error is 0.0000000008333307
        wall clock time = 0.002015
        Process 3 of 4 on node4
        Process 1 of 4 on node2
        Process 2 of 4 on node3
        [ibm@node32 mpich-ip]$ logout
        qsub: job 3.man-c completed
        [ibm@man-c mpich-ip]$
        
    • MPICH-GM
      1. Verify environment:
        > su - ibm (or whatever user you're using to test)
        Make certain that your environment (section 23) is setup correctly for MPICH-GM. i.e. 'which mpicc' should result in something like /usr/local/mpich/1.2.1..7/gm-1.5_Linux-2.2.19-4hpc/smp/gnu/ssh/bin/mpicc, etc.
      2. Request resources and run the job:
        Your session should look something like the following...
        [ibm@man-c ibm]$ cd ~/cpi/mpich-gm
        [ibm@man-c mpich-gm]$ qsub -l nodes=4:ppn=2,walltime=10:00:00 -I
        qsub: waiting for job 4.man-c to start
        qsub: job 4.man-c ready
        
        ----------------------------------------
        Begin PBS Prologue Tue Oct 30 17:59:06 MST 2001
        Job ID:         3.man-c
        Username:       ibm
        Group:          ibm
        Nodes:          node32 node31 node30 node29
        End PBS Prologue Tue Oct 30 17:59:06 MST 2001
        ----------------------------------------
        [matt@node1 matt]$ test -d ~/.gmpi || mkdir ~/.gmpi
        [matt@node1 matt]$ GMCONF=~/.gmpi/conf.$PBS_JOBID
        [matt@node1 matt]$ /usr/local/xcat/bin/pbsnodefile2gmconf $PBS_NODEFILE \
        >$GMCONF
        [matt@node1 matt]$ NP=$(head -1 $GMCONF)
        [matt@node1 matt]$ cd $PBS_O_WORKDIR
        [matt@node1 mpich-gm]$ RECVMODE="polling"
        [matt@node1 mpich-gm]$ mpirun.ch_gm --gm-f $GMCONF --gm-recv $RECVMODE \
        --gm-use-shmem -np $NP PBS_JOBID=$PBS_JOBID cpi
        
        Process 4 of 8 on node32
        Process 1 of 8 on node31
        Process 6 of 8 on node30
        Process 7 of 8 on node29
        Process 5 of 8 on node31
        Process 2 of 8 on node30
        Process 0 of 8 on node32
        pi is approximately 3.1415926544231247, Error is 0.0000000008333316
        wall clock time = 0.000805
        Process 3 of 8 on node29
        [matt@node1 mpich-gm]$ logout
        qsub: job 4.man-c completed
        [matt@man-c mpich-gm]
        
    • LAM-IP
      1. Verify environment:
        > su - ibm (or whatever user you're using to test)
        Make certain that your environment (section 23) is setup correctly for LAM-IP. i.e. 'which mpicc' should result in /usr/local/lam/6.5.4/ip/gnu/ssh/bin/mpicc, etc.
      2. Request resources and run the job:
        Your session should look something like the following...
        [ibm@man-c ibm]$ cd ~/cpi/lam-ip
        [ibm@man-c lam-ip]$ qsub -l nodes=4:ppn=2,walltime=10:00:00 -I
        qsub: waiting for job 4.man-c to start
        qsub: job 4.neptune ready
        
        ----------------------------------------
        Begin PBS Prologue Tue Oct 30 20:19:06 MST 2001
        Job ID:         4.man-c
        Username:       ibm
        Group:          ibm
        Nodes:          node32 node31 node30 node29
        End PBS Prologue Tue Oct 30 20:19:06 MST 2001
        ----------------------------------------
        [ibm@node32 ibm]$ which lamboot
        /usr/local/lam/6.5.4/ip/gnu/ssh/bin/lamboot
        [ibm@node32 ibm]$ lamboot -v $PBS_NODEFILE
        
        LAM 6.5.4/MPI 2 C++/ROMIO - University of Notre Dame
        
        Executing hboot on n0 (node32 - 2 CPUs)...
        Executing hboot on n1 (node2 - 2 CPUs)...
        Executing hboot on n2 (node3 - 2 CPUs)...
        Executing hboot on n3 (node4 - 2 CPUs)...
        [ibm@node32 ibm]$ which mpirun
        /usr/local/lam/6.5.4/ip/gnu/ssh/bin/mpirun
        [ibm@node32 ibm]$ cd $PBS_O_WORKDIR
        [ibm@node32 lam-ip]$ mpirun C cpi
        Process 0 of 8 on node32
        Process 1 of 8 on node32
        Process 2 of 8 on node2
        Process 3 of 8 on node2
        Process 4 of 8 on node3
        Process 6 of 8 on node4
        Process 7 of 8 on node4
        pi is approximately 3.1415926544231247, Error is 0.0000000008333316
        wall clock time = 0.000807
        Process 5 of 8 on node3
        [ibm@node32 lam-ip]$ lamclean
        [ibm@node32 lam-ip]$ logout
        qsub: job 4.man-c completed
        [ibm@man-c lam-ip]$
        


Back to TOC

27. Running a Simple MPI Job in Batch Mode via PBS
Now we're ready to run cpi in batch mode via PBS:
  1. Grok /usr/local/xcat/samples/pbs/ in fullness:
    Before running batch jobs via PBS, would be a good time to scan the sample PBS batch files that are available in the xCAT distribution in /usr/local/xcat/samples/pbs. Further, once again examining the PBS documentation and searching the web for PBS examples using the MPI libraries that you are using would be a good idea.

  2. Again make certain your environment is correct:
    Understand the the user's environment $PATH, etc. has to be correct for the MPI library you are using.

  3. Run the sample MPI program via PBS:
    • MPICH-IP
      1. Create your PBS file:

        [ibm@man-c ibm]$ cd ~/cpi/mpich-ip

        cpi-mpichip.pbs should look something like the following...
        #!/bin/bash
        #PBS -l nodes=4:ppn=2,walltime=00:30:00
        #PBS -N cpi
        PROG=cpi
        
        #How many proc do I have?
        NP=$(wc -l $PBS_NODEFILE | awk '{print $1}')
        
        #messing with this can increase performance or break your code
        export P4_SOCKBUFSIZE=0x40000
        #export P4_GLOBMEMSIZE=33554432
        export P4_GLOBMEMSIZE=16777296
        #export P4_GLOBMEMSIZE=8388648
        #export P4_GLOBMEMSIZE=4194304
        
        #cd into the directory where I typed qsub
        cd $PBS_O_WORKDIR
        
        #run it
        mpirun -machinefile $PBS_NODEFILE -np $NP $PROG
        
      2. Submit your job and watch its progress.
        An example session follows:
        [ibm@man-c mpich-ip]$ pwd
        /home/ibm/cpi/mpich-ip
        [ibm@man-c mpich-ip]$ qsub cpi-mpichip.pbs
        [ibm@man-c mpich-ip]$ showq 
        ACTIVE JOBS--------------------
        JOBNAME USERNAME    STATE  PROC   REMAINING            STARTTIME
        
        5.man-c      ibm  Running     8     0:15:00  Wed Oct 31 12:06:33
        
        1 Active Job        8 of   64 Processors Active (12.50%)
                            4 of   32 Nodes Active      (12.50%)
      3. Observe the output
        [ibm@man-c mpich-ip]$ ls -lrt
        -rw-------    1 ibm     ibm         1031 Oct 31 12:06 cpi.o5
        -rw-------    1 ibm     ibm            0 Oct 31 12:06 cpi.e5
        You'll note some errors in this output. They don't seem to be fatal and only appear when using multiple CPUs/node (SMP/shared memory)
        # cpi.o5
        ----------------------------------------
        Begin PBS Prologue Thu Nov  1 14:29:45 MST 2001
        Job ID:         5.man-c
        Username:       ibm
        Group:          ibm
        Nodes:          node32 node31 node30 node29
        End PBS Prologue Thu Nov  1 14:29:45 MST 2001
        ----------------------------------------
        Process 0 of 8 on node32
        pi is approximately 3.1415926544231247, Error is 0.0000000008333316
        wall clock time = 0.002537
        Process 2 of 8 on node31
        Process 4 of 8 on node30
        Process 6 of 8 on node29
        Process 3 of 8 on node31
        p3_14667:  p4_error: OOPS: semop lock failed
        : 983043
        Process 5 of 8 on node30
        p5_14865:  p4_error: OOPS: semop lock failed
        : 983043
        Process 7 of 8 on node29
        p7_14148:  p4_error: OOPS: semop lock failed
        : 983043
        Process 1 of 8 on node32
        p1_3787: (0.829781) net_recv failed for fd = 5
        p1_3787:  p4_error: net_recv read, errno = : 9
        ----------------------------------------
        Begin PBS Epilogue Thu Nov  1 14:29:52 MST 2001
        Job ID:         5.man-c
        Username:       ibm
        Group:          ibm
        Job Name:       cpi
        Session:        3490
        Limits:         neednodes=4:ppn=2,walltime=00:30:00
        Resources:      cput=00:00:00,mem=10844kb,vmem=39444kb,walltime=00:00:02
        Queue:          dque
        Account:
        Nodes:          node32 node31 node30 node29
        
        Killing leftovers...
        node1:  killing node32 3792
        node3:  killing node30 14865
        
        End PBS Epilogue Thu Nov  1 14:29:53 MST 2001
        ----------------------------------------
        


    • MPICH-GM
      1. Create your PBS file:
        [ibm@man-c ibm]$ cd ~/cpi/mpich-gm

        It (cpi-mpichgm.pbs) should look something like the following...
        #!/bin/bash
        #PBS -l nodes=4:ppn=2,walltime=00:15:00
        #PBS -N cpi
        
        #prog name
        PROG=cpi
        PROGARGS=""
        
        #Make .gmpi directory if is does not exist for the gm conf files
        test -d ~/.gmpi || mkdir ~/.gmpi
        
        #Define uniq gm conf filename
        GMCONF=~/.gmpi/conf.$PBS_JOBID
        
        GMCONF=~/.gmpi/conf.$PBS_JOBID
        
        #Make gm conf file from pbs nodefile
        if /usr/local/xcat/bin/pbsnodefile2gmconf $PBS_NODEFILE >$GMCONF
        then
                :
        #       echo "GM Nodefile:"
        #       echo
        #       cat $GMCONF
        else
                echo "pbsnodefile2gmconf failed to create gm conf file!"
                exit
        fi
        
        #How many proc do I have?
        NP=$(head -1 $GMCONF)
        
        #cd into the directory where I typed qsub
        cd $PBS_O_WORKDIR
        
        #Set receive mode, default: polling
        #RECVMODE="blocking"
        #RECVMODE="hybrid"
        RECVMODE="polling"
        
        #remove --gm-use-shmem if you do not want to use shared memory
        #
        #use --gm-v and --gm-recv-verb for additional info at run,
        #check both .o and .e files for output
        mpirun.ch_gm \
                --gm-f $GMCONF \
                --gm-recv $RECVMODE \
                --gm-use-shmem \
                --gm-kill 5 \
                -np $NP \
                PBS_JOBID=$PBS_JOBID \
                TMPDIR=/scr/$PBS_JOBID \
                $PROG $PROGARGS
        
        #clean up
        rm -f $GMCONF
        
        exit 0
      2. Submit your job and watch its progress.
        An example session follows:
        [ibm@man-c mpich-gm]$ pwd
        /home/ibm/cpi/mpich-gm
        [ibm@man-c mpich-gm]$ qsub cpi-mpichgm.pbs
        [ibm@man-c mpich-gm]$ showq 
        ACTIVE JOBS--------------------
        JOBNAME USERNAME  STATE    PROC  REMAINING            STARTTIME
        
        6.man-c      ibm  Running     8    0:15:00  Wed Oct 31 12:06:33
        
        1 Active Job      8 of   64 Processors Active (12.50%)
                          4 of   32 Nodes Active      (12.50%)
      3. Observe the output
        [ibm@man-c mpich-gm]$ ls -lrt
        -rw-------    1 ibm     ibm         1031 Oct 31 12:06 mcpi.o6
        -rw-------    1 ibm     ibm            0 Oct 31 12:06 mcpi.e6
        # mcpi.o6
        ----------------------------------------
        Begin PBS Prologue Wed Oct 31 12:06:34 MST 2001
        Job ID:         6.man-c
        Username:       ibm
        Group:          ibm
        Nodes:          node32 node31 node30 node29
        End PBS Prologue Wed Oct 31 12:06:34 MST 2001
        ----------------------------------------
        Process 7 of 8 on node29
        Process 4 of 8 on node32
        Process 5 of 8 on node31
        Process 1 of 8 on node31
        Process 2 of 8 on node30
        Process 3 of 8 on node29
        Process 6 of 8 on node30
        Process 0 of 8 on node32
        pi is approximately 3.1415926544231247, Error is 0.0000000008333316
        wall clock time = 0.000789
        ----------------------------------------
        Begin PBS Epilogue Wed Oct 31 12:06:48 MST 2001
        Job ID:         6.man-c
        Username:       ibm
        Group:          ibm
        Job Name:       cpi
        Session:        20298
        Resources:      cput=00:00:00,mem=320kb,vmem=1400kb,walltime=00:00:10
        Queue:          dque
        Account:
        Nodes:          node32 node31 node30 node29
        
        Killing leftovers...
        
        End PBS Epilogue Wed Oct 31 12:06:49 MST 2001
        ----------------------------------------


    • LAM-IP
      1. Create your PBS file:
        [ibm@man-c lam-ip]$ cd ~/cpi/lam-ip
        It (cpi-lam.pbs) should look something like the following...
        #!/bin/bash
        #PBS -l nodes=4:ppn=2,walltime=00:30:00
        #PBS -N cpi
        
        #How many processors do I have?
        NP=$(wc -l $PBS_NODEFILE | awk '{print $1}')
        
        #cd into the directory where I typed qsub
        cd $PBS_O_WORKDIR
        
        #lamboot
        lamboot $PBS_NODEFILE
        
        #run it
        mpirun C cpi
        
        #cleanup
        lamclean
      2. Submit your job and observe its progress
        An example session with a bit of commentary follows:
        [ibm@man-c lam-ip]$ qsub cpi-lam.pbs
        [ibm@man-c lam-ip]$ showq  (note the active nodes and processors)
        ACTIVE JOBS--------------------
        JOBNAME USERNAME    STATE PROC REMAINING            STARTTIME
        
        7.man-c      ibm  Running    8   0:30:00  Tue Oct 30 20:40:24
        
        1 Active Job        8 of   64 Processors Active (12.50%)
                            4 of   32 Nodes Active      (12.50%)
      3. Observe the output
        [ibm@man-c lam-ip]$ ls -l
        -rw-------    1 ibm     ibm            0 Oct 30 20:40 mcpi.e7
        -rw-------    1 ibm     ibm         1189 Oct 30 20:40 mcpi.o7
        
        # mcpi.o7
        ----------------------------------------
        Begin PBS Prologue Tue Oct 30 20:40:25 MST 2001
        Job ID:         7.man-c
        Username:       ibm
        Group:          ibm
        Nodes:          node32 node31 node30 node29
        End PBS Prologue Tue Oct 30 20:40:25 MST 2001
        ----------------------------------------
        
        LAM 6.5.4/MPI 2 C++/ROMIO - University of Notre Dame
        
        Process 0 of 8 on node32
        pi is approximately 3.1415926544231247, Error is 0.0000000008333316
        wall clock time = 0.000757
        Process 1 of 8 on node32
        Process 2 of 8 on node31
        Process 6 of 8 on node29
        Process 4 of 8 on node30
        Process 3 of 8 on node31
        Process 7 of 8 on node29
        Process 5 of 8 on node30
        ----------------------------------------
        Begin PBS Epilogue Tue Oct 30 20:40:31 MST 2001
        Job ID:         7.man-c
        Username:       ibm
        Group:          ibm
        Job Name:       cpi
        Session:        8009
        Resources:      cput=00:00:00,mem=320kb,vmem=1400kb,walltime=00:00:02
        Queue:          dque
        Account:
        Nodes:          node32 node31 node30 node29
        
        Killing leftovers...
        node32:  killing node32 8085
        node30:  killing node30 6201
        node31:  killing node31 6302
        node29:  killing node29 6201
        
        End PBS Epilogue Tue Oct 30 20:40:32 MST 2001
        ----------------------------------------


  4. Running jobs that are a bit more substantial:
    1. Adjust your .pbs file to make the cpi job run on all of the CPUs in your cluster.

    2. Sumit a bunch of cpi jobs into the queue and watch their progress with showq and qstat.

    3. Check out http://x-cat.org/docs/running_mpi_jobs-HOWTO.html for more examples of running MPI benchmarks and test jobs.

    4. Check out http://x-cat.org/docs/top500-HOWTO.html for documentation on running the HPL benchmark

    5. Look for more documentation on this subject in the future.



Back to TOC

28. Changelog
2002-01-28   New versions of LAM-MPI and MPICH
2002-01-27   Spelling fixes.
2002-01-20   Added another diagram to the architecture section.
2002-01-22   Added stuff to be controlled by apc master switch to nodehm.tab
2002-01-16   BIOS v1.02 stage1 dd for 8674. Don't use COM2. noderes.tab changed to '0' (COM1). Changed serialmac to 0 in site.tab.
2002-01-09   gensshkeys, not makesshkeys. DHCPDARGS="eth1 eth2" (the quotes are important)
2002-01-06   Don't need nodeset for initial stage2. Fixed mpacheck/mpncheck confusion. winstall as alternative to 'nodeset; rpower'. Use addcluster user and recommend using /etc/skel, etc.
2002-01-04   watchlogd setup documented. Don't need to install ssh rpms for RH 7.x. Don't need to setup ssh keys for root by hand. dhcpd configuration options on RH7.x. Use maui 3.0.7p1.
2002-01-03   Subsection formating changes to they stick out better... not quite done yet though. More, new, and improved ELS and ESP config documentation.
2001-12-21   Updated to xCAT 1.1RC7.5. Added RSA stuff and documented the more automated way of configuring the ASMA adapters. Added Copyright stuff in the intro. All links to xCAT software are password protected. Links to bohnsack.com documentation have been removed.
2001-12-18   Added links to man pages where possible. More content in the architecture section.
2001-12-10   Updated to xCAT 1.1RC7.4
2001-12-09   Many formating adjustments to make it work inside of x-cat.org. Changed .png to .gif to make it display better with Netscape.
2001-12-08   Changed to PHP from HTML::Mason and moved to x-cat.org.
2001-12-06   Updated xCAT version to 1.1RC7.3.
2001-11-09   Updated GM version to 1.5_Linux. Updated xCAT version to 1.1RC7.2.
2001-11-01   Added documentation on running MPICH-IP cpi via a PBS batch job.
2001-10-31   Added documentation on running MPICH-GM cpi via a PBS batch job.
2001-10-30   Added a .png diagram to the architecture section. Filled in the testing cluster operation section with major new content (and split it into 3 sections in the process). Updated GM version to 1.5_Linux_beta3. Moved additional reading to the top of the document.
2001-10-29   Fixed a bunch of spelling mistakes.... I'm certain new ones have already surfaced.
2001-10-26   Started keeping track of changes. Added xcat configuration examples. Added network configuration examples. Added architecture section. Added more stuff in the introduction. Moved VLAN configuration examples to a separate document. Moved ESP configuration to a separate document. Added examples of what services to remove. Fixed sequence problem with installing the correct OpenSSH rpms. Split LAM and MPICH into separate sections. Added stuff about making ypind come up at boot on management node.


Back to TOC

29. TODO
  • Patches for PGI on RedHat 7.x
  • PVM stuff
  • 10.3 is very unclear... improve
  • Include some simple stuff about using conserver cntrl-e-c, etc.
  • dhcp setup is unclear or incorrect in parts.
  • work so its easy to provide single page and multi-page versions of the docuemnt.
  • How to deal with a 'user' or 'interactive' nodes... PBS, PXE and ks on an x340, etc.
  • How to install a RedHat update kernel as a part of kickstart
  • ia64 stuff.
  • Changing the enable and telnet passwords on cisco 3500s.
  • Make certain each section has a few sentences at the top that explain why we want to do what we're about to do.
  • Validate the correctness of licensed PGI compiler install... possibly condense into only talking about the licensed install
  • Make the stage1,2,3 sections have why we do stage-x intro material
  • Provide content in the 'before you begin' section
  • flesh out 'verify cluster is setup for NIS'
  • Move a number of the other docs into this document
  • Spelling check
  • Go through the redbook and add any content that is in the redbook, missing from this document, and relevant.
  • Convert to DocBook, work on stylesheets, and auto-generate HTML, PDF, man, and text document versions... yeah right.

Back to TOC

30. Thanks
  • Mike Galicki, Chris DeYoung, Mark Atkinson, Greg Kettmann, people from POSDATA, Kevin Rudd, and Tonko L De rooy, for suggestions and contributions.
  • Egan Ford for writing xCAT, answering my xCAT questions, and contributing to this document.
  • Jay Urbanski for answering my xCAT questions and getting me started with Linux clustering at IBM.

Back to TOC

home  download  docs  mailing lists Extreme Cluster Administration Toolkit