CLUSTERS: Difference between revisions

Latest revision as of 14:58, 23 June 2016

NOTE: this page is OBSOLETE and NO LONGER MAINTAINED. For up-to date information on the systems, please go to:

http://wiki.lhep.unibe.ch/index.php/AEC-LHEP_Hardware_information

BERN ATLAS T2 2013-14 Clusters - ce01.lhep.unibe.ch, ce02.lhep.unibe.ch, bdii.lhep.unibe.ch, dpm.lhep.unibe.ch

Summary :

           ce01       2 2.40GHz Intel Xeon E5645  Exa-Core with 24GB RAM total and nordugrid-arc-3.0.3-1.el6.x86_64 (EMI-3)
           mds-2-1    1 2.40GHz Intel Xeon E5620 Quad-Core with 24GB RAM total for Lustre MDS
           oss-2-x    1 2.80GHz AMD Opteron  290 Quad-Core with 16GB RAM total for Lustre OSS (9 nodes active 86TB)
           wn-2-x  1536 2.53GHz Intel Xeon E5540 Worker Cores (96 Nodes) with 2GB RAM per core

           ce02       4 2.40GHz Intel Xeon E5620 Quad-Core with 24GB RAM total and nordugrid-arc-3.0.3-1.el6.x86_64 (EMI-3)
           mds-0-1    1 2.40GHz Intel Xeon E5620 Quad-Core with 24GB RAM total for Lustre MDS
           oss-2-x    2 2.3 GHz AMD Opteron 2376 Quad-Core with 16GB RAM total for Lustre OSS (11 nodes active 21TB)
           wn-0-x   120 2.3 GHz AMD Opteron 2376 Worker Cores (15 Nodes) with 2GB RAM per core. 
           wn-0-x   376 2.5 GHz Intel Xeon E5420 Worker Cores (47 Nodes) with 2GB RAM per core.
           wn-1-x   320 2.3 GHz AMD Opteron 8356 Worker Cores (20 Nodes) with 2GB RAM per core.

           dpm        4 2.40GHz Intel Xeon E5620 Quad-Core with 16GB RAM total and emi-dpm_mysql-1.8.7-3.el5.centos (EMI-3)
           dpmdisk0x  1 2.67GHz Intel Xeon X5650  Exa-Core with 12GB RAM total and emi-dpm_disk-1.8.7-3.el6.x86_64  (EMI-3) (6 nodes 500TB)
           dpmdisk04  1 2.33GHZ Intel Xeon E5345 Quad-Core with  8GB RAM total and emi-dpm_disk-1.8.7-3.el5.centos  (EMI-3) (1 node   22TB)

           bdii       4 2.40GHz Intel Xeon E5620 Quad-Core with 16GB RAM total and emi-bdii-site-1.0.1-1.el5        (EMI-3)

Update from here on:

Hardware :

           26 SUN Fire X2200 (Lustre and workers), 19 SUN Blade X8840 (workers), dual Xeon E5620 (8-core) FrontEnd.
           dual Xeon E5620 (8-core) DPM head node, 3 Xeon X5650 (6-core) with 9 12-port Areca 1231ML with 20TB RAID6 each (DPM disk servers).
           Dell PowerConnect 2724 (WAN), NetGear ProSafe GSM7248R and GS748T (LAN) switches.

Benchmark (not measured):

           Quad-Core AMD Opteron(tm) Processor 2376, 2.3GHz (SUN Fire X2200): 5.676 HEP-SPEC06/core (*)
           Quad-core AMD Opteron 8356, 2.3GHz (SunBlade X8440): 7.4c97 HEP-SPEC/core (**)
           
           (*)  http://hepix.caspur.it/afs/hepix.org/project/ptrack/spec-cpu2000.html (first row in table, average, SL4 x86_64)
           (**) https://wiki.chipp.ch/twiki/bin/view/LCGTier2/BenchMarks

Available resources (workers):

           19*8  AMD Opteron(tm) 2376:        862.752 HEP-SPEC06
           20*16 Quad-core AMD Opteron 8356:  2399.04 HEP-SPEC06
           ---------------------------------------------------------------------------------------------------------
           Total: 3261.8 HEP-SPEC06

Network : Gateway 130.92.139.1. DNS 130.92.9.53 (default), 130.92.9.52. Subnet mask 255.255.255.0
GOCDB Entry: https://gocdb4.esc.rl.ac.uk/portal/index.php?Page_Type=View_Object&object_id=476&grid_id=0
ROC: NGI_CH - Certification status: Certified.


Host	IP	Service	OS	CPU	Comment
lheplsw1.unibe.ch	130.92.139.201	48-Port Switch		Brocade FCX648	Connected to eth1 on FE (on 10GbE NIC)
ce.lhep.unibe.ch	130.92.139.200	Rocks Front End (FE) 5.3 x86_64	CentOS 5.3 x86_64	2 Quad Xeon E5620 @2.4GHz	Rocks FE, SGE, ARC FE, Web Server, Ganglia
	10.1.225.254	48 Ports Switch
nas-0-2	10.1.255.253	10TB NFS server	CentOS 5.3	1 Quad Xeon E5620 @2.4GHz	ATLAS and VO software (also lhepat02.unibe.ch)
nas-0-1	10.1.255.252	1.1TB Lustre MDS/MDT	CentOS 5.3	1 Quad Xeon E5620 @2.4GHz	MDS/MDT (currently not in production)
compute-0-0/2	10.1.255.xxx	Lustre	CentOS 5.3	2 Dual AMD Opteron 2214	Managed by ce.lhep.unibe.ch
compute-0-3	10.1.255.248	Lustre MDS/MDT	CentOS 5.3	2 Dual AMD Opteron 2214	Managed by ce.lhep.unibe.ch
compute-0-4/5/6/7	10.1.255.xxx	Lustre	CentOS 5.3	2 Quad AMD Opteron 2376	Managed by ce.lhep.unibe.ch
compute-0-8 to 26	10.1.255.xxx	Worker Node 0-8 to 0-26	CentOS 5.3	2 Quad AMD Opteron 2376	Managed by ce.lhep.unibe.ch
compute-1-1 to 19	10.1.255.xxx	Worker Node 1-1 to 1-19	CentOS 5.3	4 Quad AMD Opteron 8356	Managed by ce.lhep.unibe.ch
dpm.lhep.unibe.ch	130.92.139.211	DPM head node	SLC 5.7	2 Quad Xeon E5620 @2.4GHz	DPM Head node and disk pool
dpmdisk01(to 3).lhep.unibe.ch	130.92.139.211 to 213	DPM pool node	SLC 5.7	1 Hexa Xeon X5650 @2.67GHz	DPM 60TB disk pool
dpmdisk04.lhep.unibe.ch	130.92.139.214	DPM pool node	SLC 5.7	1 Quad Xeon E5345 @2.33 GHz	DPM 22TB disk pool
dpmdisk05(to 9).lhep.unibe.ch	130.92.139.215 to 219	DPM pool node	SLC 5.x	??	Reserved for DPM disk pool
kvm01.lhep.unibe.ch	130.92.139.150	KVM host	SLC 5.7	2 Dual AMD Opteron 2218 @2.6GHz	Virtualization host for KVM
...	...	...	...	...	-

BERN ATLAS T3 2010-11 Cluster - ce.lhep.unibe.ch, bdii.lhep.unibe.ch, dpm.lhep.unibe.ch

Summary :

           152 2.3 GHz AMD Opteron 2376 Worker Cores (19 Nodes) with 2GB RAM per core. 
           304 2.3 GHz AMD Opteron 8356 Worker Cores (19 Nodes) with 2GB RAM per core.
           12TB Lustre FS (7 nodes).
           202TB DPM Storage Element.

Hardware :

           26 SUN Fire X2200 (Lustre and workers), 19 SUN Blade X8840 (workers), dual Xeon E5620 (8-core) FrontEnd.
           dual Xeon E5620 (8-core) DPM head node, 3 Xeon X5650 (6-core) with 9 12-port Areca 1231ML with 20TB RAID6 each (DPM disk servers).
           Dell PowerConnect 2724 (WAN), NetGear ProSafe GSM7248R and GS748T (LAN) switches.

Benchmark (not measured):

           Quad-Core AMD Opteron(tm) Processor 2376, 2.3GHz (SUN Fire X2200): 5.676 HEP-SPEC06/core (*)
           Quad-core AMD Opteron 8356, 2.3GHz (SunBlade X8440): 7.4c97 HEP-SPEC/core (**)
           
           (*)  http://hepix.caspur.it/afs/hepix.org/project/ptrack/spec-cpu2000.html (first row in table, average, SL4 x86_64)
           (**) https://wiki.chipp.ch/twiki/bin/view/LCGTier2/BenchMarks

Available resources (workers):

           19*8  AMD Opteron(tm) 2376:        862.752 HEP-SPEC06
           20*16 Quad-core AMD Opteron 8356:  2399.04 HEP-SPEC06
           ---------------------------------------------------------------------------------------------------------
           Total: 3261.8 HEP-SPEC06

Network : Gateway 130.92.139.1. DNS 130.92.9.53 (default), 130.92.9.52. Subnet mask 255.255.255.0
GOCDB Entry: https://gocdb4.esc.rl.ac.uk/portal/index.php?Page_Type=View_Object&object_id=476&grid_id=0
ROC: NGI_CH - Certification status: Certified.


Host	IP	Service	OS	CPU	Comment
lheplsw1.unibe.ch	130.92.139.201	48-Port Switch		Brocade FCX648	Connected to eth1 on FE (on 10GbE NIC)
ce.lhep.unibe.ch	130.92.139.200	Rocks Front End (FE) 5.3 x86_64	CentOS 5.3 x86_64	2 Quad Xeon E5620 @2.4GHz	Rocks FE, SGE, ARC FE, Web Server, Ganglia
	10.1.225.254	48 Ports Switch
nas-0-2	10.1.255.253	10TB NFS server	CentOS 5.3	1 Quad Xeon E5620 @2.4GHz	ATLAS and VO software (also lhepat02.unibe.ch)
nas-0-1	10.1.255.252	1.1TB Lustre MDS/MDT	CentOS 5.3	1 Quad Xeon E5620 @2.4GHz	MDS/MDT (currently not in production)
compute-0-0/2	10.1.255.xxx	Lustre	CentOS 5.3	2 Dual AMD Opteron 2214	Managed by ce.lhep.unibe.ch
compute-0-3	10.1.255.248	Lustre MDS/MDT	CentOS 5.3	2 Dual AMD Opteron 2214	Managed by ce.lhep.unibe.ch
compute-0-4/5/6/7	10.1.255.xxx	Lustre	CentOS 5.3	2 Quad AMD Opteron 2376	Managed by ce.lhep.unibe.ch
compute-0-8 to 26	10.1.255.xxx	Worker Node 0-8 to 0-26	CentOS 5.3	2 Quad AMD Opteron 2376	Managed by ce.lhep.unibe.ch
compute-1-1 to 19	10.1.255.xxx	Worker Node 1-1 to 1-19	CentOS 5.3	4 Quad AMD Opteron 8356	Managed by ce.lhep.unibe.ch
dpm.lhep.unibe.ch	130.92.139.211	DPM head node	SLC 5.7	2 Quad Xeon E5620 @2.4GHz	DPM Head node and disk pool
dpmdisk01(to 3).lhep.unibe.ch	130.92.139.211 to 213	DPM pool node	SLC 5.7	1 Hexa Xeon X5650 @2.67GHz	DPM 60TB disk pool
dpmdisk04.lhep.unibe.ch	130.92.139.214	DPM pool node	SLC 5.7	1 Quad Xeon E5345 @2.33 GHz	DPM 22TB disk pool
dpmdisk05(to 9).lhep.unibe.ch	130.92.139.215 to 219	DPM pool node	SLC 5.x	??	Reserved for DPM disk pool
kvm01.lhep.unibe.ch	130.92.139.150	KVM host	SLC 5.7	2 Dual AMD Opteron 2218 @2.6GHz	Virtualization host for KVM
...	...	...	...	...	-

Front End Node changes in configuration

Added Acix service - After re-boot: service acix-cache start.
Added root env http_proxy=http://proxy.unibe.ch:80 to all cron jobs.
Root e-mail re-direction: https://svn.lhep.unibe.ch/LHEP+ATLAS+Sysadmin/34
grid*.log rotation in /etc/logrotate.d/smscg .
compute-0-3 now Lustre Meta Data Server with "/dev/vg00/mdt 438G 532M 412G 1% /mdt" (mounted manually).
compute-0-0,2,4,5,6,7 now Object Storage Servers with "/dev/vg00/ost1 1.8T 391G 1.3T 24% /mnt/ost1" (mounted manually).
Lustre clients now need "10.1.255.248@tcp0:/lustre /grid/lustre lustre localflock" in /etc/fstab.
Startup of grid* services upon boot to runlevel 5 (chkconfig --add <service>, chkconfig <service on>).
/dev/sda2 now for /var/spool/nordugrid (it was /var).
Add "sge_qmaster" and "sge_execd" under "Local services" in etc/services .
Cron to clear E flag from queued jobs /etc/cron.hourly/qmod-cj .
Pool accounts for all VOs and Roles (except user atlas-sw), with their own unix group (and subgroup if needed).
Add "PERL5LIB=/opt/rocks/lib/perl5/site_perl/5.10.1:" to "/etc/cron.d/nordugridmap.cron".
Upgrade to ARC 0.8.3 (preserve hacks to /opt/nordugrid/libexec/submit-sge-job".
yum install xerces-c xerces-c-devel .
yum remove systemtap systemtap-runtime (security).
Restart Ganglia gmond every night and gmetad on Front End every week.
Set virtual_free as consumable for gridengine (https://svn.lhep.unibe.ch/LHEP+ATLAS+Sysadmin/105)

Storage Element Head Node and site-BDII Installation and Configuration

Can't net-install because of http_proxy, so installed from SLC5.4 DVD and updated after that. 40GB for /root, 18GB swap.
Logical Volumes for mysql (50GB) and ops/dteam storage (10G).
Logical Volume for test LTPC area (100G).
62GB unallocated.
ntpd, itpatbes, host cert+key.
yum install lcg-CA glite-SE_dpm_mysql glite-BDII_site (separately).
users.conf, groups.conf for all pool accounts and unix groups/subgroups.
/etc/hosts: add FQDN of pool nodes and move own FQDN to top (for site-bdii to work).
Additional stuff for DPMXrootAccess (https://svn.lhep.unibe.ch/LHEP+ATLAS+Sysadmin/87).
Adapt site-info.def (and service specific config files) from examples with "BDII_USER=ldap # Wrong default is edguser - https://savannah.cern.ch/bugs/?69028"
cd /opt; ./glite/yaim/bin/yaim -c -s glite/yaim/unibe-lhep/site-info.def -n glite-SE_dpm_mysql -n BDII_site
chkconfig fetch-crl off (since http_proxy not yet set at startup)
Add "env http_proxy=http://proxy.unibe.ch:80" to "/etc/cron.d/fetch-crl".
SELINUX=permissive in /etc/selinux/config to start slspd (no longer sure this was needed).
chkconfig bdii on; service bdii start .
mkdir ~ldap/.globus; copy cert+key in there and chown them to ldap:ldap (otherwise GlueVO* attributes missing)
Set "enabled=0" in "glite-SE_dpm_mysql.repo" and "glite-BDII_site.repo" (disable gLite auto-updates).
GIIS: ldapsearch -x -h dpm.lhep.unibe.ch -p 2170 -b "o=grid"

Storage Element Pool Nodes Installation and Configuration

Mirror SLC5.6 distribution at http://ce.lhep.unibe.ch/cern via rsync (https://svn.lhep.unibe.ch/LHEP+ATLAS+Sysadmin/86).
Burn to CD boot.iso at http://ce.lhep.unibe.ch/cern/slc5X/x86_64/images
Kickstart files in http://ce.lhep.unibe.ch/kickstart/ (kickstart MUST be world readable, protect later if pwd hash is in it)
Boot via boot.iso and "linux ks=http://ce.lhep.unibe.ch/kickstart/dpmdisk01-ks.cfg"
Configure network manually
Manual partition (first time, then add to kickstart) with 10GB swap, rest for /root (~20GB)
/root on /dev/sdd at install, but /dev/sda at re-boot, so much fiddling with GRUB in order to boot (https://svn.lhep.unibe.ch/LHEP+ATLAS+Sysadmin/88).
Once got it right, grab info from anaconda-ks.cfg and paste into kickstart (it works!) - DON"T TRY THIS IF RE-INSTALL && valid data on RAIDs.
mkfs -t xfs -f -L storage1 /dev/sdb; mkfs -t xfs -f -L storage1 /dev/sdc; mkfs -t xfs -f -L storage1 /dev/sdd
Add to /etc/fstab as /mnt/storageX and mount -a
chmod -R 0770 /mnt/storage1; chmod -R 0770 /mnt/storage2; chmod -R 0770 /mnt/storage3
ntpd, itpatbes, host cert+key.
yum install lcg-CA glite-SE_dpm_disk
users.conf, groups.conf for all pool accounts and unix groups/subgroups.
Additional stuff for DPMXrootAccess (https://svn.lhep.unibe.ch/LHEP+ATLAS+Sysadmin/87).
Adapt site-info.def from head node.
cd /opt; ./glite/yaim/bin/yaim -c -s glite/yaim/unibe-lhep/site-info.def -n glite-SE_dpm_disk
chkconfig fetch-crl off (since http_proxy not yet set at startup).
Add "env http_proxy=http://proxy.unibe.ch:80" to "/etc/cron.d/fetch-crl".
Set "enabled=0" in "glite-SE_dpm_disk.repo" (disable gLite auto-updates).

Additional Configuration on Storage Element Head Node

Restrict each pool to the appropriate VO groups (cull gid's from mysql DB)
Reserve 50TB for ATLASLOCALGROUPDISK (https://svn.lhep.unibe.ch/LHEP+ATLAS+Sysadmin/100) - Can change size at any time.

BERN ATLAS T3 2009 Cluster ce.lhep.unibe.ch

Summary : 184 2.3 GHz AMD Worker Cores (23 Nodes) with 2GB RAM per core.
GOCDB Entry: https://goc.gridops.org/site/list?id=3605006
Hardware : 24 SUN Fire X2200 and one Elonex 10 TB File Server. NetGear Switch.
Network : Gateway 130.92.139.1. DNS 130.92.9.53 (default), 130.92.9.52. Subnet mask 255.255.255.0


Host	IP	Service	OS	CPU	Comment
lheplin0.unibe.ch	130.92.139.150	24 Ports Switch			Connected to eth1 on FE (lower left NIC)
ce.lhep.unibe.ch	130.92.139.200	Rocks Front End (FE) 5.2 x86_64	CentOS 5.3 x86_64	2 Quad AMD Opteron	Rocks FE, SGE, ARC FE, Web Server, Ganglia
	10.1.225.254	48 Ports Switch
nas-0-0	130.92.139.100	10TB NFS Server	CentOS 5.3	2 Dual Xeon	Cache and Sessiondirs, bonded to switch
compute-0-0		Worker Node 1	CentOS 5.3	2 Quad AMD Opteron	Managed by ce.lhep.unibe.ch
...	...	...	...	...	...
compute-0-21	...	Worker Node 21	...	2 Quad AMD Opteron	Managed by ce.lhep.unibe.ch
...	...	Worker Node 22	...	2 Dual AMD Opteron	Managed by ce.lhep.unibe.ch
...	...	Worker Node 23	...	2 Dual AMD Opteron	Managed by ce.lhep.unibe.ch
...	...	Worker Node 24	...	2 Dual AMD Opteron	Managed by ce.lhep.unibe.ch
...	...	Worker Node 25	...	2 Quad AMD Opteron	Managed by ce.lhep.unibe.ch

Front End Node Installation and Configuration

Installation issue : Did CDROM emulation of USB in BIOS in order to find ks.cfg on external USB DVD.
Installation issue : Since xen installed, had to set 8 cpus in /etc/xen/xend-config.sxp and reboot in order to see all cores on front end.
Plug in DVD with ROCKS via USB. Answer the questions. About 15 min. Login as root
Disable firewall and SE Linux with system-config-securitylevel
echo export http_proxy=http://proxy.unibe.ch:80 > /etc/profile.d/proxy.sh; source /etc/profile.d/proxy.sh
adduser -g users gridatlaslhep; passwd gridatlaslhep
adduser -g users atlas-sw; passwd atlas-sw
add to /etc/fstab: 130.92.139.94:/terabig /external/terabig nfs defaults 0 0
mkdir /external/; mkdir /external/terabig; mount -a
cp /root/lhepconfig/iptables /etc/sysconfig/
/etc/init.d/http restart
ln -s /terabig/shaug/public_html/ /var/www/html/shaug. For every of us.
Had to restart Ganglia's gmond on all nodes: rocks run host "/etc/init.d/gmond restart"

Compute Nodes Configuration and Kick Start

Edit /export/rocks/install/site-profiles/5.2/nodes/extend-compute.xml. Or copy from /root/lhepconfig.
Copy all non-perl rpm in /root/lhepconfig to /export/rocks/install/contrib/5.2/x86_64/
cd /export/rocks/install; rocks create distro
Speed up the ssh login: Set "ForwardX11 no" /etc/ssh/ssh_config.
/etc/init.d/sshd restart
Now do insert-ethers and see if nodes are found and gets a *. If problems, restart nodes, change boot in bios etc.

ATLAS Kit Installation and Validation (better done on a compute node)

mkdir /share/apps/atlas; chown atlas-sw:users /share/apps/atlas/
su - atlas-sw
cd /share/apps/; mkdir runtime; cd runtime/; mkdir APPS; cd APPS/; mkdir HEP
cd /share/apps/atlas/; mkdir 15.3.1; cd 15.3.1
source installPacmanKit.sh 15.3.1 I686-SLC4-GCC34-OPT /share/apps/runtime
source /share/apps/runtime/APPS/HEP/ATLAS-15.3.1
pacman -allow tar-overwrite -get http://atlas.web.cern.ch/Atlas/GROUPS/DATABASE/pacman4/DBRelease:DBRelease-7.2.1.pacman
cd; source KVtest.sh 15.3.1 # All ok.

ARC Installation

mkdir /share/apps/cache; mkdir /share/apps/session; mkdir /var/spool/nordugrid; mkdir /var/spool/nordugrid/cachecontrol; mkdir /etc/grid-security
Edit /etc/yum.conf according to /root/lhepconfig/yum.conf
Some missing perl stuff : [root@ce lhepconfig]# rpm -ivh perl-*
yum groupinstall "ARC Server"
yum groupinstall "ARC Client"
cp arc.conf /etc/
/etc/init.d/gridftp start; /etc/init.d/grid-manager start; /etc/init.d/grid-infosystem start;
adduser -g users gridatlaslhep; passwd gridatlaslhep
rocks sync users
add to /etc/fstab: 130.92.139.151:/external/se3 /external/se3 nfs defaults 0 0
mkdir /external/se3; mount -a
Voms problem with the nordugridmap (gridmapfile generator). Work around: Using the generator from SMSCG.

Restore Roll Creation

cd /export/site-roll/rocks/src/roll/restore/; make roll
scp ce.lhep.unibe.ch-restore-2009.09.10-0.x86_64.disk1.iso shaug@lheppc51.unibe.ch:/terabig/shaug/tmp/

Old PC50 becomes NAS with cash and sessiondir

Just installed it as NAS appliance with insert-ethers (makes a nfs server)
Bonded:

    [root@nas-0-0 ~]# cat /etc/modprobe.conf 
    alias scsi_hostadapter 3w-9xxx 
    alias scsi_hostadapter1 ata_piix
    alias eth0 tg3
    alias eth1 tg3
    alias bond0 bonding
    install bond0 /sbin/modprobe/ bonding -o bonding0 mode=1 miimon=100
    [root@nas-0-0 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0 
    TYPE=Ethernet
    DEVICE=bond0
    BOOTPROTO='static'
    IPADDR=10.1.255.231
    NETMASK=255.255.0.0
    GATEWAY=10.1.255.1
    NETWORK=10.1.255.0  
    BROADCAST=10.1.255.255
    ONBOOT=yes
    USERCTL=no
    NAME='Bonding device 0'
    STARTMODE='auto'
    IPV6INIT=no
    PEERDNS=yes
    MTU=1500
    [root@nas-0-0 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0  
    DEVICE=eth0                                               
    SLAVE=yes
    MASTER=bond0
    HWADDR=00:30:48:74:f6:d0                      
    NETMASK=255.255.0.0                                       
    BOOTPROTO=none                                 
    ONBOOT=yes                                       
    MTU=1500                                                  
    [root@nas-0-0 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth1
    DEVICE=eth1                                               
    SLAVE=yes
    MASTER=bond0 
    HWADDR=00:30:48:74:f6:d1                                  
    BOOTPROTO=none                                            
    ONBOOT=yes                                              
    MTU=1500                                                  
    [root@nas-0-0 ~]#

Bonding not tested. Now mounted on all nodes:

    [root@nas-0-0 ~]# /etc/init.d/iptables stop
    rocks run host compute "mkdir /grid"
    rocks run host compute "mkdir /grid/sessiondir"
    rocks run host compute "mkdir /grid/cache"
    rocks run host compute "echo '# NFS session and cache dirs for grid jobs' >> /etc/fstab"
    rocks run host compute "echo 'nas-0-0:/export/cache    /grid/cache      nfs   rw,nolock  0  0' >> /etc/fstab"
    rocks run host compute "echo 'nas-0-0:/export/sessiondir    /grid/sessiondir nfs   rw,nolock  0  0' >> /etc/fstab"
    rocks run host compute-0-0 "cat /etc/fstab"
    rocks run host compute "mount -a"

Now /etc/gmond.conf needs this:

    udp_recv_channel {
       /*mcast_join = 224.0.0.3*/
       port = 8649
    }
    udp_send_channel {
       /*mcast_join = 224.0.0.3*/
       host = 10.1.1.1
       port = 8649
    }

To Be Done

Make Lustre file system with cash and seesion dir. AAA/SWITCH project in 2010.

Sun Grid Engine (SGE) Configuration and Commands

qstat -u \*, qstat -f
qmod -d|e *@compute-0-2
qhost
Jobs in state grid state O, often LRMS state Eqw ? Try something like this:

  qstat -u \* | grep Eqw | cut -d" " -f3,7 | while read line; do qmod -cj $line; done

Sun Grid Engine

How to use the management NIC on the SUN pizzas

Check the IP of the pizza in its BIOS.
Configure your laptop to have IP in same subnet.
ssh into the pizza and do for example : SP -> show -l all

BERN ATLAS T3 2008 Cluster Hardware


Host	IP	Service	OS	ARC	Comment
lheplin0	130.92.139.150	Switch
lheppc50	130.92.139.100	File Server	SLC 4.6	Dual Dual Xeon	ARC FE, gridftp, Torque, LDAP, 8.3 TB Data Raid
lheppc44	130.92.139.194	File Server	SLC 4.7	Dual Quad Xeon	3.6 Data Raid 6 (homes)
lheppc51	130.92.139.151	File Server	SLC 4.7	Quad Xeon	21 TB Data Raid, xrootd master (foreseen), cfengine, dq2 and tivoli clients
lheppc25	130.92.139.75	Web Server	SuSe 9.3	Single AMD	web and ganglia server, should be moved.
	svn.lhep.unibe.ch	Code/Doc. Repos.	SuSe		VM hosted by ID.
-
lhepat05	130.92.139.205	WN	SLC44	Dual IA64	Worker Node
lhepat06	130.92.139.206	WN	SLC44	Dual AMD64	Worker Node (needs new disk)
lhepat07	130.92.139.207	WN	SLC45	Dual Dual AMD64	Worker Node (broken)
lhepat08	130.92.139.208	WN	SLC44	Dual Dual AMD64	Worker Node
lhepat09	130.92.139.209	WN	SLC44	Dual Dual AMD64	Worker Node (broken)
lhepat10	130.92.139.210	WN	SLC44	Dual Dual AMD64	Worker Node
lhepat11	130.92.139.211	WN	SLC44	Dual Dual AMD64	Worker Node
lhepat12	130.92.139.212	WN	SLC44	Dual Dual AMD64	Worker Node (broken)

See ATLAS Software for instructions how to get ATLAS software to work

PROOF - Running ROOT on many cores OUTDATED We may bring it back when needed

You have to provide your TSelector class with the corresponding .C and .h files (see ROOT manual). From your ROOT session do :

   TProof *p = TProof::Open("lheppc51.unibe.ch");
   TDSet *set = new TDSet("TTree", "Name of your Branch");
   set->Add("/ngse1/yourfile1.root");
   set->Add("/ngse1/yourfile2.root");
   set->Process("yourfile.C","",1000) // Loop over 1000 events.

Setting up PROOF

There is no additional software. Everything comes with the ROOT installation. There are two configuration files which we have placed in $ROOTSYS/etc : proof.conf and xrootCluster.cf. The first contains the list of master node and worker nodes. On master and all workers the xrootd has to be started as ROOT, but with the R option specifying a non-superuser. From the config file xrootd knows if it is a master or a worker.

xrootd -c $ROOTSYS/etc/xrootCluster.cf -b -l /tmp/xpd.log -R atlsoft

Compile root on the cluster

This section sums up the steps to compile root on the cluster. Every now and then a patched version is needed that is not yet in the cvmfs and it can be tricky to get all the configurations and links right.

   - download the package from svn to pc7.
   - log onto ce.lhep.unibe.ch and from there onto one of the nodes
   ssh root@ce.lhep.unibe.ch
   ssh compute-1-10
   - scp the tarbarll to somewhere on /grid/root* (/grid is visible from all the nodes)
   - setup the environment like with
   export BUILD=x86_64-slc5-gcc43-opt
   export PATH="/afs/cern.ch/sw/lcg/external/Python/2.6.5/$BUILD/bin:${PATH}"
   export LD_LIBRARY_PATH="/afs/cern.ch/sw/lcg/external/Python/2.6.5/$BUILD/lib:${LD_LIBRARY_PATH}"
   source /cvmfs/atlas.cern.ch/repo/sw/python/2.6.5/setup.sh
   - configure the root compilation with
   ./configure linuxx8664gcc  --disable-castor  --disable-rfio --disable-tmva --enable-xml --enable-roofit --enable-python --enable-minuit2  --with-python-incdir=/cvmfs/atlas.cern.ch/repo/sw/python/2.6.5/x86_64-slc5-gcc43-opt/python/include/python2.6/ --with-python-libdir=/cvmfs/atlas.cern.ch/repo/sw/python/2.6.5/x86_64-slc5-gcc43-opt/python/lib/
   - the python / pyROOT part can be tricky, make sure you have the right versions and all libraries. The log should say something like:
   Checking for python2.6, libpython2.6, libpython, python, or Python ... /cvmfs/atlas.cern.ch/repo/sw/python/2.6.5/x86_64-slc5-gcc43-opt/python/lib/
   - everything should be setup for compilation:
   make -j 4
   - it usually gets stuck at some point or runs out of memory. Kill it, restart it. If it does not move, do just make.
   - Hopefully it succeeded. Test the installation with 
   source bin/thisroot.sh
   root 
   - and the python part with
   python
   from ROOT import *
   - Hopefully everything works fine. If it does not, do make clean, delete all libraries and check the environment.

Submit jobs to the cluster

Refer to the description on the Geneva TWiki: https://twiki.cern.ch/twiki/bin/viewauth/GeneveAtlas/BernGenevaARC (written by a Bern guy).

@@ Line 1: / Line 1: @@
+=== NOTE: this page is OBSOLETE and NO LONGER MAINTAINED. For up-to date information on the systems, please go to:===
+ http://wiki.lhep.unibe.ch/index.php/AEC-LHEP_Hardware_information
 == BERN ATLAS T2 2013-14 Clusters - ce01.lhep.unibe.ch, ce02.lhep.unibe.ch, bdii.lhep.unibe.ch, dpm.lhep.unibe.ch ==

CLUSTERS: Difference between revisions

Latest revision as of 14:58, 23 June 2016

Contents

NOTE: this page is OBSOLETE and NO LONGER MAINTAINED. For up-to date information on the systems, please go to:

BERN ATLAS T2 2013-14 Clusters - ce01.lhep.unibe.ch, ce02.lhep.unibe.ch, bdii.lhep.unibe.ch, dpm.lhep.unibe.ch

BERN ATLAS T3 2010-11 Cluster - ce.lhep.unibe.ch, bdii.lhep.unibe.ch, dpm.lhep.unibe.ch

BERN ATLAS T3 2009 Cluster ce.lhep.unibe.ch

BERN ATLAS T3 2008 Cluster Hardware

PROOF - Running ROOT on many cores OUTDATED We may bring it back when needed

Setting up PROOF

Compile root on the cluster

Submit jobs to the cluster

Navigation menu

CLUSTERS: Difference between revisions

Latest revision as of 14:58, 23 June 2016

NOTE: this page is OBSOLETE and NO LONGER MAINTAINED. For up-to date information on the systems, please go to:

BERN ATLAS T2 2013-14 Clusters - ce01.lhep.unibe.ch, ce02.lhep.unibe.ch, bdii.lhep.unibe.ch, dpm.lhep.unibe.ch

BERN ATLAS T3 2010-11 Cluster - ce.lhep.unibe.ch, bdii.lhep.unibe.ch, dpm.lhep.unibe.ch

BERN ATLAS T3 2009 Cluster ce.lhep.unibe.ch

BERN ATLAS T3 2008 Cluster Hardware

PROOF - Running ROOT on many cores OUTDATED We may bring it back when needed

Setting up PROOF

Compile root on the cluster

Submit jobs to the cluster

Navigation menu

Search