How to run MPI applications

From LHEP Wiki
Jump to navigation Jump to search

Introduction


AEC Cluster Tutorial 26/02/2013 room B52 @ 14

List of participants

Dr James Storey (LHEP - AEgIS) ==== james.storey@cern.ch (cannot join)

Christoph Rudolf von Rohr (LHEP - PhD Cryo) ==== christoph.rudolfvonrohr@lhep.unibe.ch

Dr Lucian Ancu (LHEP - ATLAS) ==== lucian.ancu@lhep.unibe.ch (cannot join)

Dr Tamer Tobla (LHEP - EXO) ==== tamer.tolba@lhep.unibe.ch

Dr Alexander Rothkopf (ITP) ==== rothkopf@itp.unibe.ch

Matthias Luethi (LHEP - PhD MicroBooNE) ==== matthias.luethi@lhep.unibe.ch (cannot join)

PD Dr Igor Kreslo (LHEP) ==== igor.kreslo@lhep.unibe.ch (cannot join)

Lukas Marti (LHEP - ATLAS) ==== lukas.marti@lhep.unibe.ch

Michael Schenk (LHEP) ==== michael.schenk@lhep.unibe.ch (cannot join)

Philippe Widmer (ITP-PhD) ==== widmer@itp.unibe.ch

Dr Thomas Strauss (LHEP) ==== thomas.strauss@lhep.unibe.ch

Dr Markus Moser (ITP- IT) ==== moser@itp.unibe.ch

Peter Stoffer (ITP-PhD) ==== stoffer@itp.unibe.ch

Dr Akitaka Ariga (LHEP) ==== akitaka.ariga@lhep.unibe.ch

Dr Debasish Banerjee (ITP) ==== dbanerjee@itp.unibe.ch

Lorena Rothen (ITP-PhD) ==== rothen@itp.unibe.ch

Pablo Verges (Unibe-Student) ==== pablo.verges@students. unibe.ch

Alireza Ehtesham (LHEP-PhD) ==== a.ehtesham@students.unibe.ch (cannot join)

Dr Mitsuro Kimura (LHEP) ==== mitsuhiro.kimura@lhep.unibe.ch

Prof. Dr Thomas Becher ==== becher@itp.unibe.ch

SLCS and ARC Installation

Documentation on SLCS (Short Lived Credential Service) can be found in [1] and FAQ are in [2].

To set up slcs download the latest package (slcs-client-<?>.tar.gz) from the repository [3] and unpack in your preferred $SLCS directory (tar xfvz slcs-client-<?>.tar.gz). Source and add the bin directory to the $PATH environment:

     #PATH=$PATH:/path/to/SLCS_directory/bin


To get the certificate from your $HOME:

     #slcs-init -i unibe.ch -u username --p12
     AAI Password: ->type your password
     New Key Password:    -> you can leave it empty, just pressing enter

it should create a .globus directory with these files: userkey.pem, usercert.pem and usercred.p12

you can check your SLCS here [4], if you do not have access contact: grid@switch.ch

If you do not have access to aec:

     Load the usercred.p12 in your browser (in Firefox: Preferences->Advanced->View certificate ->Import)
     subscribe to the VO via https://voms.smscg.ch/    clicking on aec
            

If you do not have a .voms/ directory make it in your home and create a vomses file with the list of the VO, for example for aec:

     "aec" "voms.smscg.ch" "15027" "/DC=com/DC=quovadisglobal/DC=grid/DC=switch/DC=hosts/C=CH/ST=Zuerich/L=Zuerich/O=SWITCH/CN=voms.smscg.ch" "aec"

To submit the jobs ARC (Advanced Resource Connector) can be used, documentation can be found in [5].

To set up ARC (standalone) choose the needed configuration from [6] and download it. Then, for linux, extract and source:

     # tar -xvf nordugrid-arc-standalone-11-1.05-2.el5.x86_64.tgz
     # cd nordugrid-arc-standalone-11.05-2
     # . ./setup.sh

The ARC environment should be set, you can make a proxy

     # arcproxy --voms aec
      Type your password:


To test it:

     #arctest -c ce.lhep.unibe.ch -J 1

To check your job go to: [7]

To submit a JSDL script ask for the proxy, then submit the job and check the status:

     # arcsub -c ce.lhep.unibe.ch submit.jsdl
     # arcstat jobname

An example of submit.jsdl is in the following section.




JSDL submission file description

Job Submission Description Language (JSDL) based on XML.

  • Documentation can be found in: Manual: [8] Example: [9]

Structure of the submission file:

  • Main structure:
     <JobDefinition>
          <Jobescription>
               <JobIdentification...>
               <Application...>
               <DataStaging...>
          </Jobescription> 
     </JobDefinition>
  • JobIdentification here you can define name, describe the job, etc... The arguments are:
      <JobIdentification...>
          <JobName...>
          <Description...>
          <JobAnnotation...>
          <JobProject...>
      <\JobIdentification...>
  • For example:
     <JobIdentification>
         <JobName>My MonteCarlo test job </JobName>
     </JobIdentification>  


  • Application describes all the applications to be executed.

In the following an example in POSIX (Portable Operating System Interface for Unix) style, using PosixApplication. Only few arguments are shown:

     <Application>
        <posix:POSIXApplication>
            <posix:Executable>./executable_name</posix:Executable>   ---> string specifying the command to be executed
            <posix:Argument>filename.sh</posix:Argument>   ---> specify an argument element for the application   
            <posix:Output>out1.txt</posix:Output> ---> output file 
            <posix:Error>err.txt</posix:Error> ---> error file if there are problems with the jobs is filled
       </posix:POSIXApplication>
    </Application>
    
  • DataStaging here the files which are "staged in" the execution host or "staged out" are described. Usually, it contains a Source and/or a Target element. The options are:
     <DataStaging>
          <FileName ... >
          <FilesystemName ... >
          <CreationFlag ... >
          <DeleteOnTermination ...>
          <Source ...>
          <Target ...>
     </DataStaging>
  • DataStaging example:
    <DataStaging>
        <FileName>filename.sh</FileName>
        <DeleteOnTermination>true</DeleteOnTermination>    ---> It is a boolean when it is true the file is deleted after the job terminates or it is staged out
        <Source>     --->  location of the file or directory in the remote system
            <URI>srm://dpm.lhep.unibe.ch/dpm/lhep.unibe.ch/home/ltpc/vgallo/input/filename.sh</URI>
        </Source>
   </DataStaging>
   <DataStaging>
      <FileName>out.txt</FileName>
      <DeleteOnTermination>true</DeleteOnTermination>
      <Target> ---> location where the file is recorded in the remote system
            <URI>srm://dpm.lhep.unibe.ch/dpm/lhep.unibe.ch/home/ltpc/vgallo/output/out.txt</URI>
      </Target>
   </DataStaging>


Tutorial Examples

Submission jobs example can be found here http://ce.lhep.unibe.ch/vgallo/tutorial/ in both directories the corresponding submission files. In both cases the C++ montecarlo test file is compiled on the grid directly.

- Step1: basic submission file everything is sent to the cluster

- Step2: usage of the Storage Element (in/out data staging)

In order to use the SE use the option -S when requiring the proxy:

   # arcproxy -S aec

Multi-CPUs:

   # arcsub  -c ce.lhep.unibe.ch -e '&(executable=./montecarlo_testjob)(stdout=parallel.log)(count=2)'


FAQ / Possible Errors

  • If you get:
   #arcproxy --voms aec
   Your identity: /DC=ch/DC=switch/DC=slcs/O=Universitaet Bern/CN= YourUserName
   Enter pass phrase for /path/to/.globus/userkey.pem:
   Proxy generation failed: Certificate has expired.

You should ask again for the SLCS certificate:

     #slcs-init -i unibe.ch -u username --p12


  • If you do not have the updated policy (usually the error looks like "ERROR: Failed uploading file: Can't write to destination - globus_xio_gsi: gss_init_sec_context fail...")
     go [10] click on IGTF and download and untar the "classic" and the "slcs"  tarball 
     from the accredited folder to your nordugrid-arc-standalone-11.05-2/etc/grid-security/certificates/


  • There are two relevant paths that affect credential verification [11]
 - X509_CERT_DIR   path to CA certificates by default /etc/grid-security/certificates
 - X509_VOMS_DIR  path to directory containing the voms list /etc/grid-security/voms-dir

You may want to redefine these:

     #export X509_CERT_DIR=$HOME/nordugrid-arc-standalone/share/certificates
     #export X509_VOMS_DIR=$HOME/.vomsdir