Job submission and management with NorduGrid ARC: Difference between revisions

From LHEP Wiki
Jump to navigation Jump to search
No edit summary
 
Line 22: Line 22:


  # arcinfo --version
  # arcinfo --version
  # arcinfo -c ce01.lhep.unibe.ch
  # arcinfo -c ce00.lhep.unibe.ch
# arcinfo -c ce02.lhep.unibe.ch
  # arcinfo -c ce03.lhep.unibe.ch
  # arcinfo -c ce03.lhep.unibe.ch


=== arcproxy / voms-proxy-init/info ===
=== arcproxy / voms-proxy-init/info ===
Generates proxy with your credentials:
Generates proxy with your credentials with one of these two commands:


  # arcproxy --voms <yourVO>  (e.g. atlas)
  # arcproxy --voms <yourVO>  (e.g. atlas)
  # voms-proxy-int --voms <yourVO)
  # voms-proxy-int --voms <yourVO)


View the generated proxy information:
View the generated proxy information with one of these two commands


  # arcproxy -I
  # arcproxy -I
Line 54: Line 53:
  (queue="CLUSTER")
  (queue="CLUSTER")


  # arcsub -d info -T emies --computing-element ce01.lhep.unibe.ch:'''8443''' -o joblist.xml test.xrsl -d info
  # arcsub -d info -T arcrest --computing-element ce00.lhep.unibe.ch:'''8443''' -o joblist.xml test.xrsl
  ...
  ...
  Job submitted with jobid: <job ID>
  Job submitted with jobid: <job ID>
   
   
  arcsub -d info -T emies --computing-element ce02.lhep.unibe.ch:'''443''' -o joblist.xml test.xrsl -d info
  # arcsub -d info -T arcrest --computing-element ce03.lhep.unibe.ch:'''443''' -o joblist.xml test.xrsl
arcsub -d info -T emies --computing-element ce03.lhep.unibe.ch:'''443''' -o joblist.xml test.xrsl -d info


In the job description file, one can also add a list of input files to upload to the cluster at job submission and a list of output files to retrieve once execution is FINISHED.
In the job description file, one can also add a list of input files to upload to the cluster at job submission and a list of output files to retrieve once execution is FINISHED.
Line 72: Line 70:
Lists the job directory on the cluster
Lists the job directory on the cluster


  #arcls <job id>
  # arcls <job id>


=== arcstat ===
=== arcstat ===
Prints job status and some additional information from cluster, such as jobid, name, and status:
Prints job status and some additional information from cluster, such as jobid, name, and status:


  # arcstat -c ce01.lhep.unibe.ch
  # arcstat -c ce00.lhep.unibe.ch
  # arcstat <job id>
  # arcstat <job id>


Line 93: Line 91:


=== arcget ===
=== arcget ===
When the status of the task is FINISHED, you can download the results of one specific task or multiple tasks in directories named after the job id's. In the example above, only the stdout/stderr file test.log will be downloaded as there are no further output files specified:
When the status of the task is FINISHED, you can download the results of one specific task or multiple tasks. These and up in directories created by the download client and named after the job id's. In the example above, only the stdout/stderr file test.log will be downloaded as there are no further output files specified:


  # arcget <job id>
  # arcget <job id>

Latest revision as of 12:30, 28 July 2025

Prerequisites

Job submission to our cluster occurs via Nordugrid ARC middleware. You need:

  • Grid user certificate

This is best obtained from CERN: CERN CA. Being registered with CERN HR and having a CERN computer account are the pre-requisites. Users with FNAL accounts can obtain a certificate from the relevant FERMILAB CA. From those not eligible to either solution above, there is the possibility of requesting a certificate via the Swiss CA

In order to carry out the following steps, your user certificate and private key must be installed in the .globus subdirectory of your home directory on the client machine. You can find instructions on how to do that in here. While this focuses on ATLAS, the steps to extract and install the certificate are independent from the specific experiment.

Have the certificate registered with a Virtual Organisation (VO), e.g. for ATLAS, browse to the ATLAS VOMS service. The prerequisite for this step is having installed your personal certificate in your browser

  • ARC client

Users operating from the UIs (recommended) can setup the client from CVMFS:

# export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
# source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh
# lsetup emi

Otherwise installing a local client on any linux machine is possible by following the instructions for your specific operating system: Install ARC client

arcinfo

First, verify that the user interface is installed and you can see the submission endpoints

# arcinfo --version
# arcinfo -c ce00.lhep.unibe.ch
# arcinfo -c ce03.lhep.unibe.ch

arcproxy / voms-proxy-init/info

Generates proxy with your credentials with one of these two commands:

# arcproxy --voms <yourVO>  (e.g. atlas)
# voms-proxy-int --voms <yourVO)

View the generated proxy information with one of these two commands

# arcproxy -I
# voms-proxy-info

Job submission

Check to see if you can submit a trivial job test.xrsl:

# vi test.xrsl
&
(executable = /usr/bin/env)
(jobname = "test")
(stdout=test.log)
(* join stdout and stderr *)
(join=yes)
(gmlog=log)
(* by default in minutes *)
(wallTime="100")
(* by default in MB *)
(memory=1000)
(queue="CLUSTER")
# arcsub -d info -T arcrest --computing-element ce00.lhep.unibe.ch:8443 -o joblist.xml test.xrsl
...
Job submitted with jobid: <job ID>

# arcsub -d info -T arcrest --computing-element ce03.lhep.unibe.ch:443 -o joblist.xml test.xrsl

In the job description file, one can also add a list of input files to upload to the cluster at job submission and a list of output files to retrieve once execution is FINISHED.

Additional resources on job submission and job description:

arc6 submit job

xrls job specification reference manual

arcls

Lists the job directory on the cluster

# arcls <job id>

arcstat

Prints job status and some additional information from cluster, such as jobid, name, and status:

# arcstat -c ce00.lhep.unibe.ch
# arcstat <job id>

arccat

Prints the job stdout/stderr while the job is running on the cluster

# arccat <job id>

arckill

Kill one job, a list of jobs in filename or all jobs

# arckill <job id>
# arckill filename
# arckill -a

arcget

When the status of the task is FINISHED, you can download the results of one specific task or multiple tasks. These and up in directories created by the download client and named after the job id's. In the example above, only the stdout/stderr file test.log will be downloaded as there are no further output files specified:

# arcget <job id>
Results stored at: 4fQLDmY3BxjnmmR0Xox1SiGmABFKDmABFKDmvxHKDmABFKDmiPhU9m
Jobs processed: 1, successfully retrieved: 1, successfully cleaned: 1

or results of all completed tasks:

# arcget -a

or all tasks in the list in the joblist.xml file:

# arcget -i joblist.xml

Advanced xrls using DTR and singularity

In this example we will use the DTR (Data TRansfer) feature of ARC to specify input files to copy to the cluster and retrieve the outputs. In addition, the job is setup to run in a container which is the default way for ATLAS.

# cat arc_test-lhep-8core.xrls
&(executable=5333700233-8core.sh)
(inputfiles=(container_script-8core.sh container_script-8core.sh)(my_release_testsetup.sh   my_release_testsetup.sh)(HITS.25720417._000948.pool.root.1 HITS.25720417._000948.pool.root.1)(RDO.26819764._019417.pool.root.1 RDO.26819764._019417.pool.root.1)(RDO.26819764._016665.pool.root.1 RDO.26819764._016665.pool.root.1)(RDO.26811885._009174.pool.root.1 RDO.26811885._009174.pool.root.1)(RDO.26819766._027499.pool.root.1 RDO.26819766._027499.pool.root.1)(RDO.26811885._004931.pool.root.1 RDO.26811885._004931.pool.root.1))
(arguments="")
(gmlog="gridlog")
(stdout=mc20_lhep_8core.log)(join=yes)
(memory="1800")
(wallTime="2870")
(count = "8")
(countpernode = "8")
(queue= "CLUSTER" )
(jobname=mc20_lhep_8core)
(outputfiles=("/" ""))

This specific job is a clone of Panda JobID 5333700233 to run on 8 cores within singularity

NOTE: in this example the input files are uploaded from the local user directory, meaning you have to download them with rucio first. You can find the URI of the files to download with:

# lsetup rucio
# rucio list-file-replicas mc16_13TeV:HITS.25720417._000948.pool.root.1

In alternative one can specify to download the file to the cluster directly from the source on the remote SE. In such case the input file definition in the xlrs is:

(... HITS.25720417._000948.pool.root.1 rucio://rucio-lb-prod.cern.ch/replicas/HITS.25720417._000948.pool.root.1 ...)

The executable script:

# cat 5333700233-8core.sh 
#!/bin/bash
echo "*** Time is: ***"
date
chmod +x container_script-8core.sh

if [ -z "$ATLAS_LOCAL_ROOT_BASE" ]; then export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase; fi;
export thePlatform="x86_64-centos7-gcc8-opt"
source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh -c $thePlatform -s /srv/my_release_testsetup.sh -r /srv/container_script-8core.sh -e "-i -p -B /grid/lustre"

echo "*** Time is: ***"
date

The container script which is launched by the executable script with singularity:

# cat container_script-8core.sh
export TMPDIR=/srv
export GFORTRAN_TMPDIR=/srv
export ATHENA_PROC_NUMBER=8
export ATHENA_CORE_NUMBER=8
/usr/bin/time -f "%P %M"  Reco_tf.py --inputHITSFile="HITS.25720417._000948.pool.root.1" --asetup="RDOtoRDOTrigger:Athena,21.0.20.12" --maxEvents="10000" --multithreaded="True" --postInclude "default:PyJobTransforms/UseFrontier.py" "all:PyJobTransforms/DisableFileSizeLimit.py" --preInclude "all:Campaigns/MC20a.py" --skipEvents="0" --autoConfiguration="everything" --conditionsTag "default:OFLCOND-MC16-SDR-RUN2-09" "RDOtoRDOTrigger:OFLCOND-MC16-SDR-RUN2-08-02a" --geometryVersion="default:ATLAS-R2-2016-01-00-01" --runNumber="364681" --digiSeedOffset1="313" --digiSeedOffset2="313" --inputRDO_BKGFile="RDO.26811885._004931.pool.root.1,RDO.26811885._009174.pool.root.1,RDO.26819764._016665.pool.root.1,RDO.26819764._019417.pool.root.1,RDO.26819766._027499.pool.root.1" --AMITag="r13167" --steering "doOverlay" "doRDO_TRIG" --outputAODFile="AOD.27857976._000879.pool.root.1" --jobNumber="313" --triggerConfig="RDOtoRDOTrigger=MCRECO:DBF:TRIGGERDBMC:2283,35,327"
echo "*** Time is: ***"
date

The script to setup the ATLAS version for the job, which is also called by the executable script:

# cat my_release_testsetup.sh 
source $AtlasSetup/scripts/asetup.sh Athena,22.0.41.8,notest --platform x86_64-centos7-gcc8-opt --makeflags='$MAKEFLAGS'

NOTE: the last line of the xrls instructs ARC to download all the contents of the job directory once execution has FINISHED on the cluster. One can in alternative, specify a list of files of interest with a syntax for "local files" as used for the input files in the example xrls above.