How to run MPI applications
Introduction
AEC Cluster Tutorial 26/02/2013 room B52 @ 14
List of participants
Dr James Storey (LHEP - AEgIS) ==== james.storey@cern.ch (cannot join)
Christoph Rudolf von Rohr (LHEP - PhD Cryo) ==== christoph.rudolfvonrohr@lhep.unibe.ch
Dr Lucian Ancu (LHEP - ATLAS) ==== lucian.ancu@lhep.unibe.ch (cannot join)
Dr Tamer Tobla (LHEP - EXO) ==== tamer.tolba@lhep.unibe.ch
Dr Alexander Rothkopf (ITP) ==== rothkopf@itp.unibe.ch
Matthias Luethi (LHEP - PhD MicroBooNE) ==== matthias.luethi@lhep.unibe.ch (cannot join)
PD Dr Igor Kreslo (LHEP) ==== igor.kreslo@lhep.unibe.ch (cannot join)
Lukas Marti (LHEP - ATLAS) ==== lukas.marti@lhep.unibe.ch
Michael Schenk (LHEP) ==== michael.schenk@lhep.unibe.ch (cannot join)
Philippe Widmer (ITP-PhD) ==== widmer@itp.unibe.ch
Dr Thomas Strauss (LHEP) ==== thomas.strauss@lhep.unibe.ch
Dr Markus Moser (ITP- IT) ==== moser@itp.unibe.ch
Peter Stoffer (ITP-PhD) ==== stoffer@itp.unibe.ch
Dr Akitaka Ariga (LHEP) ==== akitaka.ariga@lhep.unibe.ch
Dr Debasish Banerjee (ITP) ==== dbanerjee@itp.unibe.ch
Lorena Rothen (ITP-PhD) ==== rothen@itp.unibe.ch
Pablo Verges (Unibe-Student) ==== pablo.verges@students. unibe.ch
Alireza Ehtesham (LHEP-PhD) ==== a.ehtesham@students.unibe.ch (cannot join)
Dr Mitsuro Kimura (LHEP) ==== mitsuhiro.kimura@lhep.unibe.ch
Prof. Dr Thomas Becher ==== becher@itp.unibe.ch
SLCS and ARC Installation
Documentation on SLCS (Short Lived Credential Service) can be found in [1] and FAQ are in [2].
To set up slcs download the latest package (slcs-client-<?>.tar.gz) from the repository [3] and unpack in your preferred $SLCS directory (tar xfvz slcs-client-<?>.tar.gz). Source and add the bin directory to the $PATH environment:
#PATH=$PATH:/path/to/SLCS_directory/bin
To get the certificate from your $HOME:
#slcs-init -i unibe.ch -u username --p12 AAI Password: ->type your password New Key Password: -> you can leave it empty, just pressing enter
it should create a .globus directory with these files: userkey.pem, usercert.pem and usercred.p12
you can check your SLCS here [4], if you do not have access contact: grid@switch.ch
If you do not have access to aec:
Load the usercred.p12 in your browser (in Firefox: Preferences->Advanced->View certificate ->Import) subscribe to the VO via https://voms.smscg.ch/ clicking on aec
If you do not have a .voms/ directory make it in your home and create a vomses file with the list of the VO, for example for aec:
"aec" "voms.smscg.ch" "15027" "/DC=com/DC=quovadisglobal/DC=grid/DC=switch/DC=hosts/C=CH/ST=Zuerich/L=Zuerich/O=SWITCH/CN=voms.smscg.ch" "aec"
To submit the jobs ARC (Advanced Resource Connector) can be used, documentation can be found in [5].
To set up ARC (standalone) choose the needed configuration from [6] and download it. Then, for linux, extract and source:
# tar -xvf nordugrid-arc-standalone-11-1.05-2.el5.x86_64.tgz # cd nordugrid-arc-standalone-11.05-2 # . ./setup.sh
The ARC environment should be set, you can make a proxy
# arcproxy --voms aec Type your password:
To test it:
#arctest -c ce.lhep.unibe.ch -J 1
To check your job go to: [7]
To submit a JSDL script ask for the proxy, then submit the job and check the status:
# arcsub -c ce.lhep.unibe.ch submit.jsdl # arcstat jobname
An example of submit.jsdl is in the following section.
JSDL submission file description
Job Submission Description Language (JSDL) based on XML.
Structure of the submission file:
- Main structure:
<JobDefinition> <Jobescription> <JobIdentification...> <Application...> <DataStaging...> </Jobescription> </JobDefinition>
- JobIdentification here you can define name, describe the job, etc... The arguments are:
<JobIdentification...> <JobName...> <Description...> <JobAnnotation...> <JobProject...> <\JobIdentification...>
- For example:
<JobIdentification> <JobName>My MonteCarlo test job </JobName> </JobIdentification>
- Application describes all the applications to be executed.
In the following an example in POSIX (Portable Operating System Interface for Unix) style, using PosixApplication. Only few arguments are shown:
<Application> <posix:POSIXApplication> <posix:Executable>./executable_name</posix:Executable> ---> string specifying the command to be executed <posix:Argument>filename.sh</posix:Argument> ---> specify an argument element for the application <posix:Output>out1.txt</posix:Output> ---> output file <posix:Error>err.txt</posix:Error> ---> error file if there are problems with the jobs is filled </posix:POSIXApplication> </Application>
- DataStaging here the files which are "staged in" the execution host or "staged out" are described. Usually, it contains a Source and/or a Target element. The options are:
<DataStaging> <FileName ... > <FilesystemName ... > <CreationFlag ... > <DeleteOnTermination ...> <Source ...> <Target ...> </DataStaging>
- DataStaging example:
<DataStaging> <FileName>filename.sh</FileName> <DeleteOnTermination>true</DeleteOnTermination> ---> It is a boolean when it is true the file is deleted after the job terminates or it is staged out <Source> ---> location of the file or directory in the remote system <URI>srm://dpm.lhep.unibe.ch/dpm/lhep.unibe.ch/home/ltpc/vgallo/input/filename.sh</URI> </Source> </DataStaging> <DataStaging> <FileName>out.txt</FileName> <DeleteOnTermination>true</DeleteOnTermination> <Target> ---> location where the file is recorded in the remote system <URI>srm://dpm.lhep.unibe.ch/dpm/lhep.unibe.ch/home/ltpc/vgallo/output/out.txt</URI> </Target> </DataStaging>
Tutorial Examples
Submission jobs example can be found here http://ce.lhep.unibe.ch/vgallo/tutorial/ in both directories the corresponding submission files. In both cases the C++ montecarlo test file is compiled on the grid directly.
- Step1: basic submission file everything is sent to the cluster
- Step2: usage of the Storage Element (in/out data staging)
In order to use the SE use the option -S when requiring the proxy:
# arcproxy -S aec
Multi-CPUs:
# arcsub -c ce.lhep.unibe.ch -e '&(executable=./montecarlo_testjob)(stdout=parallel.log)(count=2)'
FAQ / Possible Errors
- If you get:
#arcproxy --voms aec Your identity: /DC=ch/DC=switch/DC=slcs/O=Universitaet Bern/CN= YourUserName Enter pass phrase for /path/to/.globus/userkey.pem: Proxy generation failed: Certificate has expired.
You should ask again for the SLCS certificate:
#slcs-init -i unibe.ch -u username --p12
- If you do not have the updated policy (usually the error looks like "ERROR: Failed uploading file: Can't write to destination - globus_xio_gsi: gss_init_sec_context fail...")
go [10] click on IGTF and download and untar the "classic" and the "slcs" tarball from the accredited folder to your nordugrid-arc-standalone-11.05-2/etc/grid-security/certificates/
- There are two relevant paths that affect credential verification [11]
- X509_CERT_DIR path to CA certificates by default /etc/grid-security/certificates - X509_VOMS_DIR path to directory containing the voms list /etc/grid-security/voms-dir
You may want to redefine these:
#export X509_CERT_DIR=$HOME/nordugrid-arc-standalone/share/certificates #export X509_VOMS_DIR=$HOME/.vomsdir