Friday 3 June 2011

Install and Configuration of PBS

Introduction      The Portable Batch System (PBS) is available as Open Source software from
       http://www.OpenPbs.org/. A commercial version can be bought from
       http://www.PBSPro.com/. The PBSPro also offer support for OpenPBS, and at a decent price for academic institutions.

       There exists a very useful collection of user-contributed software/patches for Open PBS at http://www-unix.mcs.anl.gov/openpbs/.

       This HowTo document outlines all the steps required to compile and install the Portable Batch System (PBS) version 2.1, 2.2 and 2.3. Most likely the steps will be the same for the PBSPro software.

       The latest version of PBS is available from http://www.OpenPbs.org/. The PBS documentation available at the Web-site should be handy for in-depth discussion of the points covered in this HowTo.

       We also discuss how to create a PBS script for parallel or serial jobs. The cleanup in an e pilogue script may be required for parallel jobs.

       Accounting Reports may be generated from PBS' accounting files. We provide a simple tool pbsacct that processes and formats the accounting into a useful report.

       Download the latest version of pbsacct from the ftp://ftp.fysik.dtu.dk/pub/PBS/ directory.


     The following steps are what we use to install PBS from scratch on our systems.
          1. Ensure that tcl8.0 and tk8.0 are installed on the system. Look into the PBS docs
             to find out about these packages. The homepage is at http://www.scriptics.com
             /products/tcltk/. Get Linux RPMs from your favorite distribution, or build it
             yourself on other UNIXes.
             If you installed the PBS binary RPMs on Linux, skip to step 4.

          2. Configure PBS for your choice of spool-directory and the central server
             machine (named "zeise" in our examples):
             ./configure --set-server-home=/var/spool/PBS --set-default-server=zeise
             On Compaq Tru64 UNIX make sure that you use the Compaq C-compiler in
             stead of the GNU gcc by doing "setenv CC cc". You should add these flags to the
             above configure command: --set-cflags="-g3 -O2". It is also important that the
             /var/spool/PBS does not include any soft-links, such as /var -> /usr/var, since
             this triggers a bug in the PBS code.

             If you compiled PBS for a different architecture before, make sure to clean up before running configure:
                                  gmake distclean

          3. Run a GNU-compatible in order to build PBS.
                                             make
             On AIX 4.1.5 edit src/tools/Makefile to add a library: LIBS= -lld
             On Compaq Tru64 UNIX use the native Compaq C-compiler:
             gmake CC=cc
             The default CFLAGS are "-g -O2", but the Compaq compiler requires"-g3 -O2"
             for optimization. Set this with:
             ./configure (flags) --set-cflags="-g3 -O2"
             After the has completed, install the PBS files as the root superuser:
                         make
                         gmake install

          4. Create the file in the central server's (zeise) directory /var/spool"nodes" Containing hostnames, see the PBS 2.2 Admin Guide p.8 (Sec. 2.2/PBS/server_priv
             "Installation Overview" point 8.). Substitute the spool-directory name
             /var/spool/PBS b y your own choice (the Linux RPM uses /var/spool/pbs). Check the
             file /var/spool/PBS/pbs_environment and ensure that important environment variables
             (such as the TZ timezone variable) have been included by the installation
             process. Add any required variables in this file.

          5. Initialize the PBS server daemon and scheduler:
             /usr/local/sbin/pbs_server -t create
             /usr/local/sbin/pbs_sched
                                should only be executed once, at the time of installation !!
             The   "-t create"
             The pbs_server and pbs_sched should be started at boot time: On Linux this is
             done automatically by /etc/rc.d/init.d/pbs. Otherwise use your UNIX's standard
             method (e.g. /etc/rc.local) to run the following commands at boot time:
             /usr/local/sbin/pbs_server -a true
             /usr/local/sbin/pbs_sched
             The "-a true" sets the scheduling attribute to True, so that jobs may start
             running.

          6. Create queues using the "qmgr" command, see the manual page for
             "pbs_server_attributes" and "pbs_queue_attributes": List the server configuration by the Print server c ommand. The output can be used as input to qmgr, so this is a way to
             make a backup of your server setup. You may stick the output of qmgr (for
             example, you may use the setup listed below) into a file (removing the first 2
             lines which are actually not valid commands). Pipe this file into qmgr like this: cat
             file | qmgr and everything is configured in a couple of seconds !
             Our current configuration is:
             # qmgr
             Max open servers: 4
             Qmgr: print server
             #
             # Create queues and set their attributes.
             #
             #
             # Create and define queue verylong
             #
             create queue verylong
             set queue verylong queue_type = Execution
             set queue verylong Priority = 40
             set queue verylong max_running = 10
             set queue verylong resources_max.cput = 72:00:00
             set queue verylong resources_min.cput = 12:00:01
             set queue verylong resources_default.cput = 72:00:00
             set queue verylong enabled = True
             set queue verylong started = True
             #
             # Create and define queue long
             #
             create queue long
             set queue long queue_type = Execution
             set queue long Priority = 60
             set queue long max_running = 10
             set queue long resources_max.cput = 12:00:00
             set queue long resources_min.cput = 02:00:01
             set queue long resources_default.cput = 12:00:00
             set queue long enabled = True
             set queue long started = True
             #
             # Create and define queue medium
             #
             create queue medium
             set queue medium queue_type = Execution
             set queue medium Priority = 80
             set queue medium max_running = 10
             set queue medium resources_max.cput = 02:00:00
             set queue medium resources_min.cput = 00:20:01
             set queue medium resources_default.cput = 02:00:00
             set queue medium enabled = True
             set queue medium started = True
             #
             # Create and define queue small
             #
             create queue small
             set queue small queue_type = Execution
             set queue small Priority = 100
             set queue small max_running = 10
             set queue small resources_max.cput = 00:20:00
             set queue small resources_default.cput = 00:20:00
             set queue small enabled = True
             set queue small started = True
             #
             # Create and define queue default
             #
             create queue default
             set queue default queue_type = Route
             set queue default max_running = 10
             set queue default route_destinations = small
             set queue default route_destinations += medium
             set queue default route_destinations += long
             set queue default route_destinations += verylong
             set queue default enabled = True
             set queue default started = True
             #
             # Set server attributes.
             #
             set server scheduling = True
             set server max_user_run = 6
             set server acl_host_enable = True
             set server acl_hosts = *.fysik.dtu.dk
             set server acl_hosts = *.alpha.fysik.dtu.dk
             set server default_queue = default
             set server log_events = 63
             set server mail_from = adm
             set server query_other_jobs = True
             set server resources_default.cput = 01:00:00
             set server resources_default.neednodes = 1
             set server resources_default.nodect = 1
             set server resources_default.nodes = 1
             set server scheduler_iteration = 60
             set server default_node = 1#shared

          7. Install the PBS software on the client nodes, repeating steps 1-3 above.

          8. Configure the PBS nodes so that they know the server: Check that the file
             /var/spool/PBS/server_name c ontains the name of the PBS server (zeise in this
             example), and edit it if appropriate. Also make sure that this hostname resolves
             correctly (with or without the domain-name), otherwise the pbs_server may
             refuse connections from the qmgr command.
             Create the file on all PBS nodes (server and clients)/var/spool/PBS/mom_priv/config

             with the contents:
             # The central server must be listed:
             $clienthost zeise
             where the correct servername must replace "zeise". You may add other
             relevant lines as recommended in the manual, for example for restricting
             access and for logging:
             $logevent 0x1ff
             $restricted *.your.domain.name
             (list the domain names that you want to give access).
             For maintenance of the configuration file, we use rdist to duplicate       /var/spool
             /PBS/mom_priv/config from the server to all PBS nodes.

          9. Start the MOM mini-servers on both the server and the client nodes:
             /usr/local/sbin/pbs_mom
             or "/etc/rc.d/init.d/pbs start" o n Linux. Make sure that MOM is started at boot
             time. See discussion under point 5.
             On Compaq Tru64 UNIX 4.0E+F there may be a problem with starting
             pbs_mom too soon. Some network problem makes pbs_mom report errors in
             an infinite loop, which fills up the logfiles' filesystem within a short time !
             Several people told me that they don't have this problem, so it's not understood
             at present.
             The following section is only relevant if you have this problem on Tru64 UNIX.
             On Tru64 UNIX start pbs_mom from the last entry in /etc/inittab:
             # Portable Batch System batch execution mini-server
             pbsmom::once:/etc/rc.pbs > /dev/console 2>&1
             The file /etc/rc.pbs delays the startup of pbs_mom:
             #!/bin/sh
             #
             # Portable Batch System (PBS) startup
             #
             # On Digital UNIX, pbs_mom fills up the mom_logs directory
             # within minutes after reboot. Try to sleep at startup
             # in order to avoid this.
             PBSDIR=/usr/local/sbin
             if [ -x ${PBSDIR}/pbs_mom ]; then
                   echo PBS startup.
                   # Sleep for a while
                   sleep 120
                   ${PBSDIR}/pbs_mom       # MOM
                   echo Done.
             else
                   echo Could not execute PBS commands !
              fi

         10. Queues defined above do not work until you start them:
              qstart default small medium long verylong
              qenable default small medium long verylong
              This needs to be done only once and for all, at the time when you install PBS.

         11. Make sure that the PBS server has all nodes correctly defined. Use the            pbsnodes -a c ommand to list all nodes.
              Add nodes using the command:
                                       qmgr
              # qmgr
              Max open servers: 4
              Qmgr: create node node99 properties=ev67
              where the node-name is node99 with the properties=ev67. Alternatively, you
              may simply list the nodes in the file /var/spool/PBS/server_priv/nodes:
              server:ts ev67
              node99 ev67
              The :ts indicates a time-shared node; nodes without :ts are cluster nodes where
              batch jobs may execute. The second column lists the properties that you
              associate with the node. Restart the pbs_server after editing manually the
              nodes file.

         12. After you first setup your system, to get the jobs to actually run you need to
              set the server scheduling attribute to true. This will normally be done for you at
              boot time (see point 5 in this file), but for this first time, you will need to do this by hand using the qmgr command:
              # qmgr
              Max open servers: 4
              Qmgr: set server scheduling=true
        Batch job scripts
        Your PBS batch system ought to be fully functional at this point so that you can
        submit batch jobs using the qsub command. For debugging purposes, PBS offers you
        an "interactive batch job" by using the command qsub -I.
        As an example, you may use the following PBS batch script as a template for
        creating your own batch scripts. The present script runs an MPI parallel job on the
        available processors:
        #!/bin/sh
        ### Job name
        #PBS -N test
        ### Declare job non-rerunable
        #PBS -r n
        ### Output files
        #PBS -e test.err
        #PBS -o test.log
        ### Mail to user
        #PBS -m ae
        ### Queue name (small, medium, long, verylong)
        #PBS -q long
        ### Number of nodes (node property ev67 wanted)
        #PBS -l nodes=8:ev67
        # This job's working directory
        echo Working directory is $PBS_O_WORKDIR
        cd $PBS_O_WORKDIR
        echo Running on host `hostname`
        echo Time is `date`
        echo Directory is `pwd`
        echo This jobs runs on the following processors:
        echo `cat $PBS_NODEFILE`
        # Define number of processors
        NPROCS=`wc -l < $PBS_NODEFILE`
        echo This job has allocated $NPROCS nodes
        # Run the parallel MPI executable "a.out"
        mpirun -v -machinefile $PBS_NODEFILE -np $NPROCS a.out
        If you specify #PBS in the script, you will be running a non-parallel (or serial)
        -l nodes=1
        batch job:
        #!/bin/sh
        ### Job name
        #PBS -N test
        ### Declare job non-rerunable
        #PBS -r n
        ### Output files
        #PBS -e test.err
        #PBS -o test.log
        ### Mail to user
        #PBS -m ae
        ### Queue name (small, medium, long, verylong)
        #PBS -q long
        ### Number of nodes (node property ev6 wanted)
        #PBS -l nodes=1:ev6
        # This job's working directory
        echo Working directory is $PBS_O_WORKDIR
        cd $PBS_O_WORKDIR
        echo Running on host `hostname`
        echo Time is `date`
        echo Directory is `pwd`
        # Run your executable
        a.out
        Clean-up after parallel jobs
        If a parallel job dies prematurely for any reason,
        PBS will clean up user processes on the
        master-node only. We (and others) have found
        that often MPI slave-processes are lingering on
        all of the slave-nodes waiting for communication
        from the (dead) master-process.
        At present the only generally applicable way to
        clean up user processes on the nodes allocated to
        a PBS job is to use the PBS epilogue capability
        (see the PBS documentation). The epilogue is
        executed on the job's master-node, only.
        An epilogue script /var/spool/PBS/mom_priv/epilogue
        should be created on every node, containing for
        example this:
        #!/bin/sh
        echo '--------------------------------------'
        echo Running PBS epilogue script
        # Set key variables
        USER=$2
        NODEFILE=/var/spool/PBS/aux/$1
        echo
        echo Killing processes of user $USER on the batch nodes
        for node in `cat $NODEFILE`
        do
                    echo Doing node $node
                    su $USER -c "ssh -a -k -n -x $node skill -v -9 -u $USER"
        done
        echo Done.
        The Secure Shell command ssh m ay be replaced by
        the remote-shell command of your choice. The
        skill ( Super-kill) command is a nice tool available
        from ftp://fast.cs.utah.edu/pub/skill/ , or as part of
        the Linux procps RPM-package.
        On SMP nodes one cannot use the Super-kill
        command, since the user's processes belonging
        to other PBS jobs might be terminated. The
        present solution works correctly only on
        single-CPU nodes.
        An alternative cleanup solution for Linux systems
        is provided by Benjamin Webb of Oxford
        University. This solution may work more reliably
        than the above.
        This page is maintained by: . Last update: 07 Jan
        2003 .
        Copyright © 2003 `Center for Atomic-scale
        Materials Physics' . All rights reserved.
              Home
           [images/CAMP_small.jpg]

No comments:

Post a Comment