CLUSTERS

From LHEP Wiki
Revision as of 14:42, 8 June 2016 by Lhep (talk | contribs)
Jump to navigation Jump to search

NOTE: this page is OBSOLETE. For up-to date infirmation on the systems, please go to: http://wiki.lhep.unibe.ch/index.php/AEC-LHEP_Hardware_information

BERN ATLAS T2 2013-14 Clusters - ce01.lhep.unibe.ch, ce02.lhep.unibe.ch, bdii.lhep.unibe.ch, dpm.lhep.unibe.ch

  • Summary :
           ce01       2 2.40GHz Intel Xeon E5645  Exa-Core with 24GB RAM total and nordugrid-arc-3.0.3-1.el6.x86_64 (EMI-3)
           mds-2-1    1 2.40GHz Intel Xeon E5620 Quad-Core with 24GB RAM total for Lustre MDS
           oss-2-x    1 2.80GHz AMD Opteron  290 Quad-Core with 16GB RAM total for Lustre OSS (9 nodes active 86TB)
           wn-2-x  1536 2.53GHz Intel Xeon E5540 Worker Cores (96 Nodes) with 2GB RAM per core 
           ce02       4 2.40GHz Intel Xeon E5620 Quad-Core with 24GB RAM total and nordugrid-arc-3.0.3-1.el6.x86_64 (EMI-3)
           mds-0-1    1 2.40GHz Intel Xeon E5620 Quad-Core with 24GB RAM total for Lustre MDS
           oss-2-x    2 2.3 GHz AMD Opteron 2376 Quad-Core with 16GB RAM total for Lustre OSS (11 nodes active 21TB)
           wn-0-x   120 2.3 GHz AMD Opteron 2376 Worker Cores (15 Nodes) with 2GB RAM per core. 
           wn-0-x   376 2.5 GHz Intel Xeon E5420 Worker Cores (47 Nodes) with 2GB RAM per core.
           wn-1-x   320 2.3 GHz AMD Opteron 8356 Worker Cores (20 Nodes) with 2GB RAM per core.
           dpm        4 2.40GHz Intel Xeon E5620 Quad-Core with 16GB RAM total and emi-dpm_mysql-1.8.7-3.el5.centos (EMI-3)
           dpmdisk0x  1 2.67GHz Intel Xeon X5650  Exa-Core with 12GB RAM total and emi-dpm_disk-1.8.7-3.el6.x86_64  (EMI-3) (6 nodes 500TB)
           dpmdisk04  1 2.33GHZ Intel Xeon E5345 Quad-Core with  8GB RAM total and emi-dpm_disk-1.8.7-3.el5.centos  (EMI-3) (1 node   22TB)
           bdii       4 2.40GHz Intel Xeon E5620 Quad-Core with 16GB RAM total and emi-bdii-site-1.0.1-1.el5        (EMI-3)

Update from here on:

  • Hardware :
           26 SUN Fire X2200 (Lustre and workers), 19 SUN Blade X8840 (workers), dual Xeon E5620 (8-core) FrontEnd.
           dual Xeon E5620 (8-core) DPM head node, 3 Xeon X5650 (6-core) with 9 12-port Areca 1231ML with 20TB RAID6 each (DPM disk servers).
           Dell PowerConnect 2724 (WAN), NetGear ProSafe GSM7248R and GS748T (LAN) switches.
  • Benchmark (not measured):
           Quad-Core AMD Opteron(tm) Processor 2376, 2.3GHz (SUN Fire X2200): 5.676 HEP-SPEC06/core (*)
           Quad-core AMD Opteron 8356, 2.3GHz (SunBlade X8440): 7.4c97 HEP-SPEC/core (**)
           
           (*)  http://hepix.caspur.it/afs/hepix.org/project/ptrack/spec-cpu2000.html (first row in table, average, SL4 x86_64)
           (**) https://wiki.chipp.ch/twiki/bin/view/LCGTier2/BenchMarks
  • Available resources (workers):
           19*8  AMD Opteron(tm) 2376:        862.752 HEP-SPEC06
           20*16 Quad-core AMD Opteron 8356:  2399.04 HEP-SPEC06
           ---------------------------------------------------------------------------------------------------------
           Total: 3261.8 HEP-SPEC06


Host IP Service OS CPU Comment
lheplsw1.unibe.ch 130.92.139.201 48-Port Switch Brocade FCX648 Connected to eth1 on FE (on 10GbE NIC)
ce.lhep.unibe.ch 130.92.139.200 Rocks Front End (FE) 5.3 x86_64 CentOS 5.3 x86_64 2 Quad Xeon E5620 @2.4GHz Rocks FE, SGE, ARC FE, Web Server, Ganglia
10.1.225.254 48 Ports Switch
nas-0-2 10.1.255.253 10TB NFS server CentOS 5.3 1 Quad Xeon E5620 @2.4GHz ATLAS and VO software (also lhepat02.unibe.ch)
nas-0-1 10.1.255.252 1.1TB Lustre MDS/MDT CentOS 5.3 1 Quad Xeon E5620 @2.4GHz MDS/MDT (currently not in production)
compute-0-0/2 10.1.255.xxx Lustre CentOS 5.3 2 Dual AMD Opteron 2214 Managed by ce.lhep.unibe.ch
compute-0-3 10.1.255.248 Lustre MDS/MDT CentOS 5.3 2 Dual AMD Opteron 2214 Managed by ce.lhep.unibe.ch
compute-0-4/5/6/7 10.1.255.xxx Lustre CentOS 5.3 2 Quad AMD Opteron 2376 Managed by ce.lhep.unibe.ch
compute-0-8 to 26 10.1.255.xxx Worker Node 0-8 to 0-26 CentOS 5.3 2 Quad AMD Opteron 2376 Managed by ce.lhep.unibe.ch
compute-1-1 to 19 10.1.255.xxx Worker Node 1-1 to 1-19 CentOS 5.3 4 Quad AMD Opteron 8356 Managed by ce.lhep.unibe.ch
dpm.lhep.unibe.ch 130.92.139.211 DPM head node SLC 5.7 2 Quad Xeon E5620 @2.4GHz DPM Head node and disk pool
dpmdisk01(to 3).lhep.unibe.ch 130.92.139.211 to 213 DPM pool node SLC 5.7 1 Hexa Xeon X5650 @2.67GHz DPM 60TB disk pool
dpmdisk04.lhep.unibe.ch 130.92.139.214 DPM pool node SLC 5.7 1 Quad Xeon E5345 @2.33 GHz DPM 22TB disk pool
dpmdisk05(to 9).lhep.unibe.ch 130.92.139.215 to 219 DPM pool node SLC 5.x ?? Reserved for DPM disk pool
kvm01.lhep.unibe.ch 130.92.139.150 KVM host SLC 5.7 2 Dual AMD Opteron 2218 @2.6GHz Virtualization host for KVM
... ... ... ... ... -

BERN ATLAS T3 2010-11 Cluster - ce.lhep.unibe.ch, bdii.lhep.unibe.ch, dpm.lhep.unibe.ch

  • Summary :
           152 2.3 GHz AMD Opteron 2376 Worker Cores (19 Nodes) with 2GB RAM per core. 
           304 2.3 GHz AMD Opteron 8356 Worker Cores (19 Nodes) with 2GB RAM per core.
           12TB Lustre FS (7 nodes).
           202TB DPM Storage Element.
  • Hardware :
           26 SUN Fire X2200 (Lustre and workers), 19 SUN Blade X8840 (workers), dual Xeon E5620 (8-core) FrontEnd.
           dual Xeon E5620 (8-core) DPM head node, 3 Xeon X5650 (6-core) with 9 12-port Areca 1231ML with 20TB RAID6 each (DPM disk servers).
           Dell PowerConnect 2724 (WAN), NetGear ProSafe GSM7248R and GS748T (LAN) switches.
  • Benchmark (not measured):
           Quad-Core AMD Opteron(tm) Processor 2376, 2.3GHz (SUN Fire X2200): 5.676 HEP-SPEC06/core (*)
           Quad-core AMD Opteron 8356, 2.3GHz (SunBlade X8440): 7.4c97 HEP-SPEC/core (**)
           
           (*)  http://hepix.caspur.it/afs/hepix.org/project/ptrack/spec-cpu2000.html (first row in table, average, SL4 x86_64)
           (**) https://wiki.chipp.ch/twiki/bin/view/LCGTier2/BenchMarks
  • Available resources (workers):
           19*8  AMD Opteron(tm) 2376:        862.752 HEP-SPEC06
           20*16 Quad-core AMD Opteron 8356:  2399.04 HEP-SPEC06
           ---------------------------------------------------------------------------------------------------------
           Total: 3261.8 HEP-SPEC06


Host IP Service OS CPU Comment
lheplsw1.unibe.ch 130.92.139.201 48-Port Switch Brocade FCX648 Connected to eth1 on FE (on 10GbE NIC)
ce.lhep.unibe.ch 130.92.139.200 Rocks Front End (FE) 5.3 x86_64 CentOS 5.3 x86_64 2 Quad Xeon E5620 @2.4GHz Rocks FE, SGE, ARC FE, Web Server, Ganglia
10.1.225.254 48 Ports Switch
nas-0-2 10.1.255.253 10TB NFS server CentOS 5.3 1 Quad Xeon E5620 @2.4GHz ATLAS and VO software (also lhepat02.unibe.ch)
nas-0-1 10.1.255.252 1.1TB Lustre MDS/MDT CentOS 5.3 1 Quad Xeon E5620 @2.4GHz MDS/MDT (currently not in production)
compute-0-0/2 10.1.255.xxx Lustre CentOS 5.3 2 Dual AMD Opteron 2214 Managed by ce.lhep.unibe.ch
compute-0-3 10.1.255.248 Lustre MDS/MDT CentOS 5.3 2 Dual AMD Opteron 2214 Managed by ce.lhep.unibe.ch
compute-0-4/5/6/7 10.1.255.xxx Lustre CentOS 5.3 2 Quad AMD Opteron 2376 Managed by ce.lhep.unibe.ch
compute-0-8 to 26 10.1.255.xxx Worker Node 0-8 to 0-26 CentOS 5.3 2 Quad AMD Opteron 2376 Managed by ce.lhep.unibe.ch
compute-1-1 to 19 10.1.255.xxx Worker Node 1-1 to 1-19 CentOS 5.3 4 Quad AMD Opteron 8356 Managed by ce.lhep.unibe.ch
dpm.lhep.unibe.ch 130.92.139.211 DPM head node SLC 5.7 2 Quad Xeon E5620 @2.4GHz DPM Head node and disk pool
dpmdisk01(to 3).lhep.unibe.ch 130.92.139.211 to 213 DPM pool node SLC 5.7 1 Hexa Xeon X5650 @2.67GHz DPM 60TB disk pool
dpmdisk04.lhep.unibe.ch 130.92.139.214 DPM pool node SLC 5.7 1 Quad Xeon E5345 @2.33 GHz DPM 22TB disk pool
dpmdisk05(to 9).lhep.unibe.ch 130.92.139.215 to 219 DPM pool node SLC 5.x ?? Reserved for DPM disk pool
kvm01.lhep.unibe.ch 130.92.139.150 KVM host SLC 5.7 2 Dual AMD Opteron 2218 @2.6GHz Virtualization host for KVM
... ... ... ... ... -

Front End Node changes in configuration

  • Added Acix service - After re-boot: service acix-cache start.
  • Added root env http_proxy=http://proxy.unibe.ch:80 to all cron jobs.
  • Root e-mail re-direction: https://svn.lhep.unibe.ch/LHEP+ATLAS+Sysadmin/34
  • grid*.log rotation in /etc/logrotate.d/smscg .
  • compute-0-3 now Lustre Meta Data Server with "/dev/vg00/mdt 438G 532M 412G 1% /mdt" (mounted manually).
  • compute-0-0,2,4,5,6,7 now Object Storage Servers with "/dev/vg00/ost1 1.8T 391G 1.3T 24% /mnt/ost1" (mounted manually).
  • Lustre clients now need "10.1.255.248@tcp0:/lustre /grid/lustre lustre localflock" in /etc/fstab.
  • Startup of grid* services upon boot to runlevel 5 (chkconfig --add <service>, chkconfig <service on>).
  • /dev/sda2 now for /var/spool/nordugrid (it was /var).
  • Add "sge_qmaster" and "sge_execd" under "Local services" in etc/services .
  • Cron to clear E flag from queued jobs /etc/cron.hourly/qmod-cj .
  • Pool accounts for all VOs and Roles (except user atlas-sw), with their own unix group (and subgroup if needed).
  • Add "PERL5LIB=/opt/rocks/lib/perl5/site_perl/5.10.1:" to "/etc/cron.d/nordugridmap.cron".
  • Upgrade to ARC 0.8.3 (preserve hacks to /opt/nordugrid/libexec/submit-sge-job".
  • yum install xerces-c xerces-c-devel .
  • yum remove systemtap systemtap-runtime (security).
  • Restart Ganglia gmond every night and gmetad on Front End every week.
  • Set virtual_free as consumable for gridengine (https://svn.lhep.unibe.ch/LHEP+ATLAS+Sysadmin/105)


Storage Element Head Node and site-BDII Installation and Configuration

  • Can't net-install because of http_proxy, so installed from SLC5.4 DVD and updated after that. 40GB for /root, 18GB swap.
  • Logical Volumes for mysql (50GB) and ops/dteam storage (10G).
  • Logical Volume for test LTPC area (100G).
  • 62GB unallocated.
  • ntpd, itpatbes, host cert+key.
  • yum install lcg-CA glite-SE_dpm_mysql glite-BDII_site (separately).
  • users.conf, groups.conf for all pool accounts and unix groups/subgroups.
  • /etc/hosts: add FQDN of pool nodes and move own FQDN to top (for site-bdii to work).
  • Additional stuff for DPMXrootAccess (https://svn.lhep.unibe.ch/LHEP+ATLAS+Sysadmin/87).
  • Adapt site-info.def (and service specific config files) from examples with "BDII_USER=ldap # Wrong default is edguser - https://savannah.cern.ch/bugs/?69028"
  • cd /opt; ./glite/yaim/bin/yaim -c -s glite/yaim/unibe-lhep/site-info.def -n glite-SE_dpm_mysql -n BDII_site
  • chkconfig fetch-crl off (since http_proxy not yet set at startup)
  • Add "env http_proxy=http://proxy.unibe.ch:80" to "/etc/cron.d/fetch-crl".
  • SELINUX=permissive in /etc/selinux/config to start slspd (no longer sure this was needed).
  • chkconfig bdii on; service bdii start .
  • mkdir ~ldap/.globus; copy cert+key in there and chown them to ldap:ldap (otherwise GlueVO* attributes missing)
  • Set "enabled=0" in "glite-SE_dpm_mysql.repo" and "glite-BDII_site.repo" (disable gLite auto-updates).
  • GIIS: ldapsearch -x -h dpm.lhep.unibe.ch -p 2170 -b "o=grid"


Storage Element Pool Nodes Installation and Configuration


Additional Configuration on Storage Element Head Node

BERN ATLAS T3 2009 Cluster ce.lhep.unibe.ch

  • Summary : 184 2.3 GHz AMD Worker Cores (23 Nodes) with 2GB RAM per core.
  • GOCDB Entry: https://goc.gridops.org/site/list?id=3605006
  • Hardware : 24 SUN Fire X2200 and one Elonex 10 TB File Server. NetGear Switch.
  • Network : Gateway 130.92.139.1. DNS 130.92.9.53 (default), 130.92.9.52. Subnet mask 255.255.255.0
Host IP Service OS CPU Comment
lheplin0.unibe.ch 130.92.139.150 24 Ports Switch Connected to eth1 on FE (lower left NIC)
ce.lhep.unibe.ch 130.92.139.200 Rocks Front End (FE) 5.2 x86_64 CentOS 5.3 x86_64 2 Quad AMD Opteron Rocks FE, SGE, ARC FE, Web Server, Ganglia
10.1.225.254 48 Ports Switch
nas-0-0 130.92.139.100 10TB NFS Server CentOS 5.3 2 Dual Xeon Cache and Sessiondirs, bonded to switch
compute-0-0 Worker Node 1 CentOS 5.3 2 Quad AMD Opteron Managed by ce.lhep.unibe.ch
... ... ... ... ... ...
compute-0-21 ... Worker Node 21 ... 2 Quad AMD Opteron Managed by ce.lhep.unibe.ch
... ... Worker Node 22 ... 2 Dual AMD Opteron Managed by ce.lhep.unibe.ch
... ... Worker Node 23 ... 2 Dual AMD Opteron Managed by ce.lhep.unibe.ch
... ... Worker Node 24 ... 2 Dual AMD Opteron Managed by ce.lhep.unibe.ch
... ... Worker Node 25 ... 2 Quad AMD Opteron Managed by ce.lhep.unibe.ch

Front End Node Installation and Configuration

  • Installation issue : Did CDROM emulation of USB in BIOS in order to find ks.cfg on external USB DVD.
  • Installation issue : Since xen installed, had to set 8 cpus in /etc/xen/xend-config.sxp and reboot in order to see all cores on front end.
  • Plug in DVD with ROCKS via USB. Answer the questions. About 15 min. Login as root
  • Disable firewall and SE Linux with system-config-securitylevel
  • echo export http_proxy=http://proxy.unibe.ch:80 > /etc/profile.d/proxy.sh; source /etc/profile.d/proxy.sh
  • adduser -g users gridatlaslhep; passwd gridatlaslhep
  • adduser -g users atlas-sw; passwd atlas-sw
  • add to /etc/fstab: 130.92.139.94:/terabig /external/terabig nfs defaults 0 0
  • mkdir /external/; mkdir /external/terabig; mount -a
  • cp /root/lhepconfig/iptables /etc/sysconfig/
  • /etc/init.d/http restart
  • ln -s /terabig/shaug/public_html/ /var/www/html/shaug. For every of us.
  • Had to restart Ganglia's gmond on all nodes: rocks run host "/etc/init.d/gmond restart"

Compute Nodes Configuration and Kick Start

  • Edit /export/rocks/install/site-profiles/5.2/nodes/extend-compute.xml. Or copy from /root/lhepconfig.
  • Copy all non-perl rpm in /root/lhepconfig to /export/rocks/install/contrib/5.2/x86_64/
  • cd /export/rocks/install; rocks create distro
  • Speed up the ssh login: Set "ForwardX11 no" /etc/ssh/ssh_config.
  • /etc/init.d/sshd restart
  • Now do insert-ethers and see if nodes are found and gets a *. If problems, restart nodes, change boot in bios etc.

ATLAS Kit Installation and Validation (better done on a compute node)

  • mkdir /share/apps/atlas; chown atlas-sw:users /share/apps/atlas/
  • su - atlas-sw
  • cd /share/apps/; mkdir runtime; cd runtime/; mkdir APPS; cd APPS/; mkdir HEP
  • cd /share/apps/atlas/; mkdir 15.3.1; cd 15.3.1
  • source installPacmanKit.sh 15.3.1 I686-SLC4-GCC34-OPT /share/apps/runtime
  • source /share/apps/runtime/APPS/HEP/ATLAS-15.3.1
  • pacman -allow tar-overwrite -get http://atlas.web.cern.ch/Atlas/GROUPS/DATABASE/pacman4/DBRelease:DBRelease-7.2.1.pacman
  • cd; source KVtest.sh 15.3.1 # All ok.

ARC Installation

  • mkdir /share/apps/cache; mkdir /share/apps/session; mkdir /var/spool/nordugrid; mkdir /var/spool/nordugrid/cachecontrol; mkdir /etc/grid-security
  • Edit /etc/yum.conf according to /root/lhepconfig/yum.conf
  • Some missing perl stuff : [root@ce lhepconfig]# rpm -ivh perl-*
  • yum groupinstall "ARC Server"
  • yum groupinstall "ARC Client"
  • cp arc.conf /etc/
  • /etc/init.d/gridftp start; /etc/init.d/grid-manager start; /etc/init.d/grid-infosystem start;
  • adduser -g users gridatlaslhep; passwd gridatlaslhep
  • rocks sync users
  • add to /etc/fstab: 130.92.139.151:/external/se3 /external/se3 nfs defaults 0 0
  • mkdir /external/se3; mount -a
  • Voms problem with the nordugridmap (gridmapfile generator). Work around: Using the generator from SMSCG.

Restore Roll Creation

  • cd /export/site-roll/rocks/src/roll/restore/; make roll
  • scp ce.lhep.unibe.ch-restore-2009.09.10-0.x86_64.disk1.iso shaug@lheppc51.unibe.ch:/terabig/shaug/tmp/

Old PC50 becomes NAS with cash and sessiondir

  • Just installed it as NAS appliance with insert-ethers (makes a nfs server)
  • Bonded:
    [root@nas-0-0 ~]# cat /etc/modprobe.conf 
    alias scsi_hostadapter 3w-9xxx 
    alias scsi_hostadapter1 ata_piix
    alias eth0 tg3
    alias eth1 tg3
    alias bond0 bonding
    install bond0 /sbin/modprobe/ bonding -o bonding0 mode=1 miimon=100
    [root@nas-0-0 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0 
    TYPE=Ethernet
    DEVICE=bond0
    BOOTPROTO='static'
    IPADDR=10.1.255.231
    NETMASK=255.255.0.0
    GATEWAY=10.1.255.1
    NETWORK=10.1.255.0  
    BROADCAST=10.1.255.255
    ONBOOT=yes
    USERCTL=no
    NAME='Bonding device 0'
    STARTMODE='auto'
    IPV6INIT=no
    PEERDNS=yes
    MTU=1500
    [root@nas-0-0 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0  
    DEVICE=eth0                                               
    SLAVE=yes
    MASTER=bond0
    HWADDR=00:30:48:74:f6:d0                      
    NETMASK=255.255.0.0                                       
    BOOTPROTO=none                                 
    ONBOOT=yes                                       
    MTU=1500                                                  
    [root@nas-0-0 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth1
    DEVICE=eth1                                               
    SLAVE=yes
    MASTER=bond0 
    HWADDR=00:30:48:74:f6:d1                                  
    BOOTPROTO=none                                            
    ONBOOT=yes                                              
    MTU=1500                                                  
    [root@nas-0-0 ~]# 
  • Bonding not tested. Now mounted on all nodes:
    [root@nas-0-0 ~]# /etc/init.d/iptables stop
    rocks run host compute "mkdir /grid"
    rocks run host compute "mkdir /grid/sessiondir"
    rocks run host compute "mkdir /grid/cache"
    rocks run host compute "echo '# NFS session and cache dirs for grid jobs' >> /etc/fstab"
    rocks run host compute "echo 'nas-0-0:/export/cache    /grid/cache      nfs   rw,nolock  0  0' >> /etc/fstab"
    rocks run host compute "echo 'nas-0-0:/export/sessiondir    /grid/sessiondir nfs   rw,nolock  0  0' >> /etc/fstab"
    rocks run host compute-0-0 "cat /etc/fstab"
    rocks run host compute "mount -a"
  • Now /etc/gmond.conf needs this:
    udp_recv_channel {
       /*mcast_join = 224.0.0.3*/
       port = 8649
    }
    udp_send_channel {
       /*mcast_join = 224.0.0.3*/
       host = 10.1.1.1
       port = 8649
    }                                                                                                                                   

To Be Done

  • Make Lustre file system with cash and seesion dir. AAA/SWITCH project in 2010.

Sun Grid Engine (SGE) Configuration and Commands

  • qstat -u \*, qstat -f
  • qmod -d|e *@compute-0-2
  • qhost
  • Jobs in state grid state O, often LRMS state Eqw ? Try something like this:
  qstat -u \* | grep Eqw | cut -d" " -f3,7 | while read line; do qmod -cj $line; done 

How to use the management NIC on the SUN pizzas

  • Check the IP of the pizza in its BIOS.
  • Configure your laptop to have IP in same subnet.
  • ssh into the pizza and do for example : SP -> show -l all

BERN ATLAS T3 2008 Cluster Hardware

Host IP Service OS ARC Comment
lheplin0 130.92.139.150 Switch
lheppc50 130.92.139.100 File Server SLC 4.6 Dual Dual Xeon ARC FE, gridftp, Torque, LDAP, 8.3 TB Data Raid
lheppc44 130.92.139.194 File Server SLC 4.7 Dual Quad Xeon 3.6 Data Raid 6 (homes)
lheppc51 130.92.139.151 File Server SLC 4.7 Quad Xeon 21 TB Data Raid, xrootd master (foreseen), cfengine, dq2 and tivoli clients
lheppc25 130.92.139.75 Web Server SuSe 9.3 Single AMD web and ganglia server, should be moved.
svn.lhep.unibe.ch Code/Doc. Repos. SuSe VM hosted by ID.
-
lhepat05 130.92.139.205 WN SLC44 Dual IA64 Worker Node
lhepat06 130.92.139.206 WN SLC44 Dual AMD64 Worker Node (needs new disk)
lhepat07 130.92.139.207 WN SLC45 Dual Dual AMD64 Worker Node (broken)
lhepat08 130.92.139.208 WN SLC44 Dual Dual AMD64 Worker Node
lhepat09 130.92.139.209 WN SLC44 Dual Dual AMD64 Worker Node (broken)
lhepat10 130.92.139.210 WN SLC44 Dual Dual AMD64 Worker Node
lhepat11 130.92.139.211 WN SLC44 Dual Dual AMD64 Worker Node
lhepat12 130.92.139.212 WN SLC44 Dual Dual AMD64 Worker Node (broken)


See ATLAS Software for instructions how to get ATLAS software to work

PROOF - Running ROOT on many cores OUTDATED We may bring it back when needed

You have to provide your TSelector class with the corresponding .C and .h files (see ROOT manual). From your ROOT session do :

   TProof *p = TProof::Open("lheppc51.unibe.ch");
   TDSet *set = new TDSet("TTree", "Name of your Branch");
   set->Add("/ngse1/yourfile1.root");
   set->Add("/ngse1/yourfile2.root");
   set->Process("yourfile.C","",1000) // Loop over 1000 events.

Setting up PROOF

There is no additional software. Everything comes with the ROOT installation. There are two configuration files which we have placed in $ROOTSYS/etc : proof.conf and xrootCluster.cf. The first contains the list of master node and worker nodes. On master and all workers the xrootd has to be started as ROOT, but with the R option specifying a non-superuser. From the config file xrootd knows if it is a master or a worker.

xrootd -c $ROOTSYS/etc/xrootCluster.cf -b -l /tmp/xpd.log -R atlsoft


Compile root on the cluster

This section sums up the steps to compile root on the cluster. Every now and then a patched version is needed that is not yet in the cvmfs and it can be tricky to get all the configurations and links right.

   - download the package from svn to pc7.
   - log onto ce.lhep.unibe.ch and from there onto one of the nodes
   ssh root@ce.lhep.unibe.ch
   ssh compute-1-10
   - scp the tarbarll to somewhere on /grid/root* (/grid is visible from all the nodes)
   - setup the environment like with
   export BUILD=x86_64-slc5-gcc43-opt
   export PATH="/afs/cern.ch/sw/lcg/external/Python/2.6.5/$BUILD/bin:${PATH}"
   export LD_LIBRARY_PATH="/afs/cern.ch/sw/lcg/external/Python/2.6.5/$BUILD/lib:${LD_LIBRARY_PATH}"
   source /cvmfs/atlas.cern.ch/repo/sw/python/2.6.5/setup.sh
   - configure the root compilation with
   ./configure linuxx8664gcc  --disable-castor  --disable-rfio --disable-tmva --enable-xml --enable-roofit --enable-python --enable-minuit2  --with-python-incdir=/cvmfs/atlas.cern.ch/repo/sw/python/2.6.5/x86_64-slc5-gcc43-opt/python/include/python2.6/ --with-python-libdir=/cvmfs/atlas.cern.ch/repo/sw/python/2.6.5/x86_64-slc5-gcc43-opt/python/lib/
   - the python / pyROOT part can be tricky, make sure you have the right versions and all libraries. The log should say something like:
   Checking for python2.6, libpython2.6, libpython, python, or Python ... /cvmfs/atlas.cern.ch/repo/sw/python/2.6.5/x86_64-slc5-gcc43-opt/python/lib/
   - everything should be setup for compilation:
   make -j 4
   - it usually gets stuck at some point or runs out of memory. Kill it, restart it. If it does not move, do just make.
   - Hopefully it succeeded. Test the installation with 
   source bin/thisroot.sh
   root 
   - and the python part with
   python
   from ROOT import *
   - Hopefully everything works fine. If it does not, do make clean, delete all libraries and check the environment.

Submit jobs to the cluster

Refer to the description on the Geneva TWiki: https://twiki.cern.ch/twiki/bin/viewauth/GeneveAtlas/BernGenevaARC (written by a Bern guy).