CLUSTERS: Difference between revisions
m (1 revision imported) |
No edit summary |
||
(2 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
=== NOTE: this page is OBSOLETE and NO LONGER MAINTAINED. For up-to date information on the systems, please go to:=== | |||
http://wiki.lhep.unibe.ch/index.php/AEC-LHEP_Hardware_information | |||
== BERN ATLAS T2 2013-14 Clusters - ce01.lhep.unibe.ch, ce02.lhep.unibe.ch, bdii.lhep.unibe.ch, dpm.lhep.unibe.ch == | == BERN ATLAS T2 2013-14 Clusters - ce01.lhep.unibe.ch, ce02.lhep.unibe.ch, bdii.lhep.unibe.ch, dpm.lhep.unibe.ch == | ||
Latest revision as of 14:58, 23 June 2016
NOTE: this page is OBSOLETE and NO LONGER MAINTAINED. For up-to date information on the systems, please go to:
http://wiki.lhep.unibe.ch/index.php/AEC-LHEP_Hardware_information
BERN ATLAS T2 2013-14 Clusters - ce01.lhep.unibe.ch, ce02.lhep.unibe.ch, bdii.lhep.unibe.ch, dpm.lhep.unibe.ch
- Summary :
ce01 2 2.40GHz Intel Xeon E5645 Exa-Core with 24GB RAM total and nordugrid-arc-3.0.3-1.el6.x86_64 (EMI-3) mds-2-1 1 2.40GHz Intel Xeon E5620 Quad-Core with 24GB RAM total for Lustre MDS oss-2-x 1 2.80GHz AMD Opteron 290 Quad-Core with 16GB RAM total for Lustre OSS (9 nodes active 86TB) wn-2-x 1536 2.53GHz Intel Xeon E5540 Worker Cores (96 Nodes) with 2GB RAM per core
ce02 4 2.40GHz Intel Xeon E5620 Quad-Core with 24GB RAM total and nordugrid-arc-3.0.3-1.el6.x86_64 (EMI-3) mds-0-1 1 2.40GHz Intel Xeon E5620 Quad-Core with 24GB RAM total for Lustre MDS oss-2-x 2 2.3 GHz AMD Opteron 2376 Quad-Core with 16GB RAM total for Lustre OSS (11 nodes active 21TB) wn-0-x 120 2.3 GHz AMD Opteron 2376 Worker Cores (15 Nodes) with 2GB RAM per core. wn-0-x 376 2.5 GHz Intel Xeon E5420 Worker Cores (47 Nodes) with 2GB RAM per core. wn-1-x 320 2.3 GHz AMD Opteron 8356 Worker Cores (20 Nodes) with 2GB RAM per core.
dpm 4 2.40GHz Intel Xeon E5620 Quad-Core with 16GB RAM total and emi-dpm_mysql-1.8.7-3.el5.centos (EMI-3) dpmdisk0x 1 2.67GHz Intel Xeon X5650 Exa-Core with 12GB RAM total and emi-dpm_disk-1.8.7-3.el6.x86_64 (EMI-3) (6 nodes 500TB) dpmdisk04 1 2.33GHZ Intel Xeon E5345 Quad-Core with 8GB RAM total and emi-dpm_disk-1.8.7-3.el5.centos (EMI-3) (1 node 22TB)
bdii 4 2.40GHz Intel Xeon E5620 Quad-Core with 16GB RAM total and emi-bdii-site-1.0.1-1.el5 (EMI-3)
Update from here on:
- Hardware :
26 SUN Fire X2200 (Lustre and workers), 19 SUN Blade X8840 (workers), dual Xeon E5620 (8-core) FrontEnd. dual Xeon E5620 (8-core) DPM head node, 3 Xeon X5650 (6-core) with 9 12-port Areca 1231ML with 20TB RAID6 each (DPM disk servers). Dell PowerConnect 2724 (WAN), NetGear ProSafe GSM7248R and GS748T (LAN) switches.
- Benchmark (not measured):
Quad-Core AMD Opteron(tm) Processor 2376, 2.3GHz (SUN Fire X2200): 5.676 HEP-SPEC06/core (*) Quad-core AMD Opteron 8356, 2.3GHz (SunBlade X8440): 7.4c97 HEP-SPEC/core (**) (*) http://hepix.caspur.it/afs/hepix.org/project/ptrack/spec-cpu2000.html (first row in table, average, SL4 x86_64) (**) https://wiki.chipp.ch/twiki/bin/view/LCGTier2/BenchMarks
- Available resources (workers):
19*8 AMD Opteron(tm) 2376: 862.752 HEP-SPEC06 20*16 Quad-core AMD Opteron 8356: 2399.04 HEP-SPEC06 --------------------------------------------------------------------------------------------------------- Total: 3261.8 HEP-SPEC06
- Network : Gateway 130.92.139.1. DNS 130.92.9.53 (default), 130.92.9.52. Subnet mask 255.255.255.0
- GOCDB Entry: https://gocdb4.esc.rl.ac.uk/portal/index.php?Page_Type=View_Object&object_id=476&grid_id=0
- ROC: NGI_CH - Certification status: Certified.
Host | IP | Service | OS | CPU | Comment | |
---|---|---|---|---|---|---|
lheplsw1.unibe.ch | 130.92.139.201 | 48-Port Switch | Brocade FCX648 | Connected to eth1 on FE (on 10GbE NIC) | ||
ce.lhep.unibe.ch | 130.92.139.200 | Rocks Front End (FE) 5.3 x86_64 | CentOS 5.3 x86_64 | 2 Quad Xeon E5620 @2.4GHz | Rocks FE, SGE, ARC FE, Web Server, Ganglia | |
10.1.225.254 | 48 Ports Switch | |||||
nas-0-2 | 10.1.255.253 | 10TB NFS server | CentOS 5.3 | 1 Quad Xeon E5620 @2.4GHz | ATLAS and VO software (also lhepat02.unibe.ch) | |
nas-0-1 | 10.1.255.252 | 1.1TB Lustre MDS/MDT | CentOS 5.3 | 1 Quad Xeon E5620 @2.4GHz | MDS/MDT (currently not in production) | |
compute-0-0/2 | 10.1.255.xxx | Lustre | CentOS 5.3 | 2 Dual AMD Opteron 2214 | Managed by ce.lhep.unibe.ch | |
compute-0-3 | 10.1.255.248 | Lustre MDS/MDT | CentOS 5.3 | 2 Dual AMD Opteron 2214 | Managed by ce.lhep.unibe.ch | |
compute-0-4/5/6/7 | 10.1.255.xxx | Lustre | CentOS 5.3 | 2 Quad AMD Opteron 2376 | Managed by ce.lhep.unibe.ch | |
compute-0-8 to 26 | 10.1.255.xxx | Worker Node 0-8 to 0-26 | CentOS 5.3 | 2 Quad AMD Opteron 2376 | Managed by ce.lhep.unibe.ch | |
compute-1-1 to 19 | 10.1.255.xxx | Worker Node 1-1 to 1-19 | CentOS 5.3 | 4 Quad AMD Opteron 8356 | Managed by ce.lhep.unibe.ch | |
dpm.lhep.unibe.ch | 130.92.139.211 | DPM head node | SLC 5.7 | 2 Quad Xeon E5620 @2.4GHz | DPM Head node and disk pool | |
dpmdisk01(to 3).lhep.unibe.ch | 130.92.139.211 to 213 | DPM pool node | SLC 5.7 | 1 Hexa Xeon X5650 @2.67GHz | DPM 60TB disk pool | |
dpmdisk04.lhep.unibe.ch | 130.92.139.214 | DPM pool node | SLC 5.7 | 1 Quad Xeon E5345 @2.33 GHz | DPM 22TB disk pool | |
dpmdisk05(to 9).lhep.unibe.ch | 130.92.139.215 to 219 | DPM pool node | SLC 5.x | ?? | Reserved for DPM disk pool | |
kvm01.lhep.unibe.ch | 130.92.139.150 | KVM host | SLC 5.7 | 2 Dual AMD Opteron 2218 @2.6GHz | Virtualization host for KVM | |
... | ... | ... | ... | ... | - |
BERN ATLAS T3 2010-11 Cluster - ce.lhep.unibe.ch, bdii.lhep.unibe.ch, dpm.lhep.unibe.ch
- Summary :
152 2.3 GHz AMD Opteron 2376 Worker Cores (19 Nodes) with 2GB RAM per core. 304 2.3 GHz AMD Opteron 8356 Worker Cores (19 Nodes) with 2GB RAM per core. 12TB Lustre FS (7 nodes). 202TB DPM Storage Element.
- Hardware :
26 SUN Fire X2200 (Lustre and workers), 19 SUN Blade X8840 (workers), dual Xeon E5620 (8-core) FrontEnd. dual Xeon E5620 (8-core) DPM head node, 3 Xeon X5650 (6-core) with 9 12-port Areca 1231ML with 20TB RAID6 each (DPM disk servers). Dell PowerConnect 2724 (WAN), NetGear ProSafe GSM7248R and GS748T (LAN) switches.
- Benchmark (not measured):
Quad-Core AMD Opteron(tm) Processor 2376, 2.3GHz (SUN Fire X2200): 5.676 HEP-SPEC06/core (*) Quad-core AMD Opteron 8356, 2.3GHz (SunBlade X8440): 7.4c97 HEP-SPEC/core (**) (*) http://hepix.caspur.it/afs/hepix.org/project/ptrack/spec-cpu2000.html (first row in table, average, SL4 x86_64) (**) https://wiki.chipp.ch/twiki/bin/view/LCGTier2/BenchMarks
- Available resources (workers):
19*8 AMD Opteron(tm) 2376: 862.752 HEP-SPEC06 20*16 Quad-core AMD Opteron 8356: 2399.04 HEP-SPEC06 --------------------------------------------------------------------------------------------------------- Total: 3261.8 HEP-SPEC06
- Network : Gateway 130.92.139.1. DNS 130.92.9.53 (default), 130.92.9.52. Subnet mask 255.255.255.0
- GOCDB Entry: https://gocdb4.esc.rl.ac.uk/portal/index.php?Page_Type=View_Object&object_id=476&grid_id=0
- ROC: NGI_CH - Certification status: Certified.
Host | IP | Service | OS | CPU | Comment | |
---|---|---|---|---|---|---|
lheplsw1.unibe.ch | 130.92.139.201 | 48-Port Switch | Brocade FCX648 | Connected to eth1 on FE (on 10GbE NIC) | ||
ce.lhep.unibe.ch | 130.92.139.200 | Rocks Front End (FE) 5.3 x86_64 | CentOS 5.3 x86_64 | 2 Quad Xeon E5620 @2.4GHz | Rocks FE, SGE, ARC FE, Web Server, Ganglia | |
10.1.225.254 | 48 Ports Switch | |||||
nas-0-2 | 10.1.255.253 | 10TB NFS server | CentOS 5.3 | 1 Quad Xeon E5620 @2.4GHz | ATLAS and VO software (also lhepat02.unibe.ch) | |
nas-0-1 | 10.1.255.252 | 1.1TB Lustre MDS/MDT | CentOS 5.3 | 1 Quad Xeon E5620 @2.4GHz | MDS/MDT (currently not in production) | |
compute-0-0/2 | 10.1.255.xxx | Lustre | CentOS 5.3 | 2 Dual AMD Opteron 2214 | Managed by ce.lhep.unibe.ch | |
compute-0-3 | 10.1.255.248 | Lustre MDS/MDT | CentOS 5.3 | 2 Dual AMD Opteron 2214 | Managed by ce.lhep.unibe.ch | |
compute-0-4/5/6/7 | 10.1.255.xxx | Lustre | CentOS 5.3 | 2 Quad AMD Opteron 2376 | Managed by ce.lhep.unibe.ch | |
compute-0-8 to 26 | 10.1.255.xxx | Worker Node 0-8 to 0-26 | CentOS 5.3 | 2 Quad AMD Opteron 2376 | Managed by ce.lhep.unibe.ch | |
compute-1-1 to 19 | 10.1.255.xxx | Worker Node 1-1 to 1-19 | CentOS 5.3 | 4 Quad AMD Opteron 8356 | Managed by ce.lhep.unibe.ch | |
dpm.lhep.unibe.ch | 130.92.139.211 | DPM head node | SLC 5.7 | 2 Quad Xeon E5620 @2.4GHz | DPM Head node and disk pool | |
dpmdisk01(to 3).lhep.unibe.ch | 130.92.139.211 to 213 | DPM pool node | SLC 5.7 | 1 Hexa Xeon X5650 @2.67GHz | DPM 60TB disk pool | |
dpmdisk04.lhep.unibe.ch | 130.92.139.214 | DPM pool node | SLC 5.7 | 1 Quad Xeon E5345 @2.33 GHz | DPM 22TB disk pool | |
dpmdisk05(to 9).lhep.unibe.ch | 130.92.139.215 to 219 | DPM pool node | SLC 5.x | ?? | Reserved for DPM disk pool | |
kvm01.lhep.unibe.ch | 130.92.139.150 | KVM host | SLC 5.7 | 2 Dual AMD Opteron 2218 @2.6GHz | Virtualization host for KVM | |
... | ... | ... | ... | ... | - |
Front End Node changes in configuration
- Added Acix service - After re-boot: service acix-cache start.
- Added root env http_proxy=http://proxy.unibe.ch:80 to all cron jobs.
- Root e-mail re-direction: https://svn.lhep.unibe.ch/LHEP+ATLAS+Sysadmin/34
- grid*.log rotation in /etc/logrotate.d/smscg .
- compute-0-3 now Lustre Meta Data Server with "/dev/vg00/mdt 438G 532M 412G 1% /mdt" (mounted manually).
- compute-0-0,2,4,5,6,7 now Object Storage Servers with "/dev/vg00/ost1 1.8T 391G 1.3T 24% /mnt/ost1" (mounted manually).
- Lustre clients now need "10.1.255.248@tcp0:/lustre /grid/lustre lustre localflock" in /etc/fstab.
- Startup of grid* services upon boot to runlevel 5 (chkconfig --add <service>, chkconfig <service on>).
- /dev/sda2 now for /var/spool/nordugrid (it was /var).
- Add "sge_qmaster" and "sge_execd" under "Local services" in etc/services .
- Cron to clear E flag from queued jobs /etc/cron.hourly/qmod-cj .
- Pool accounts for all VOs and Roles (except user atlas-sw), with their own unix group (and subgroup if needed).
- Add "PERL5LIB=/opt/rocks/lib/perl5/site_perl/5.10.1:" to "/etc/cron.d/nordugridmap.cron".
- Upgrade to ARC 0.8.3 (preserve hacks to /opt/nordugrid/libexec/submit-sge-job".
- yum install xerces-c xerces-c-devel .
- yum remove systemtap systemtap-runtime (security).
- Restart Ganglia gmond every night and gmetad on Front End every week.
- Set virtual_free as consumable for gridengine (https://svn.lhep.unibe.ch/LHEP+ATLAS+Sysadmin/105)
Storage Element Head Node and site-BDII Installation and Configuration
- Can't net-install because of http_proxy, so installed from SLC5.4 DVD and updated after that. 40GB for /root, 18GB swap.
- Logical Volumes for mysql (50GB) and ops/dteam storage (10G).
- Logical Volume for test LTPC area (100G).
- 62GB unallocated.
- ntpd, itpatbes, host cert+key.
- yum install lcg-CA glite-SE_dpm_mysql glite-BDII_site (separately).
- users.conf, groups.conf for all pool accounts and unix groups/subgroups.
- /etc/hosts: add FQDN of pool nodes and move own FQDN to top (for site-bdii to work).
- Additional stuff for DPMXrootAccess (https://svn.lhep.unibe.ch/LHEP+ATLAS+Sysadmin/87).
- Adapt site-info.def (and service specific config files) from examples with "BDII_USER=ldap # Wrong default is edguser - https://savannah.cern.ch/bugs/?69028"
- cd /opt; ./glite/yaim/bin/yaim -c -s glite/yaim/unibe-lhep/site-info.def -n glite-SE_dpm_mysql -n BDII_site
- chkconfig fetch-crl off (since http_proxy not yet set at startup)
- Add "env http_proxy=http://proxy.unibe.ch:80" to "/etc/cron.d/fetch-crl".
- SELINUX=permissive in /etc/selinux/config to start slspd (no longer sure this was needed).
- chkconfig bdii on; service bdii start .
- mkdir ~ldap/.globus; copy cert+key in there and chown them to ldap:ldap (otherwise GlueVO* attributes missing)
- Set "enabled=0" in "glite-SE_dpm_mysql.repo" and "glite-BDII_site.repo" (disable gLite auto-updates).
- GIIS: ldapsearch -x -h dpm.lhep.unibe.ch -p 2170 -b "o=grid"
Storage Element Pool Nodes Installation and Configuration
- Mirror SLC5.6 distribution at http://ce.lhep.unibe.ch/cern via rsync (https://svn.lhep.unibe.ch/LHEP+ATLAS+Sysadmin/86).
- Burn to CD boot.iso at http://ce.lhep.unibe.ch/cern/slc5X/x86_64/images
- Kickstart files in http://ce.lhep.unibe.ch/kickstart/ (kickstart MUST be world readable, protect later if pwd hash is in it)
- Boot via boot.iso and "linux ks=http://ce.lhep.unibe.ch/kickstart/dpmdisk01-ks.cfg"
- Configure network manually
- Manual partition (first time, then add to kickstart) with 10GB swap, rest for /root (~20GB)
- /root on /dev/sdd at install, but /dev/sda at re-boot, so much fiddling with GRUB in order to boot (https://svn.lhep.unibe.ch/LHEP+ATLAS+Sysadmin/88).
- Once got it right, grab info from anaconda-ks.cfg and paste into kickstart (it works!) - DON"T TRY THIS IF RE-INSTALL && valid data on RAIDs.
- mkfs -t xfs -f -L storage1 /dev/sdb; mkfs -t xfs -f -L storage1 /dev/sdc; mkfs -t xfs -f -L storage1 /dev/sdd
- Add to /etc/fstab as /mnt/storageX and mount -a
- chmod -R 0770 /mnt/storage1; chmod -R 0770 /mnt/storage2; chmod -R 0770 /mnt/storage3
- ntpd, itpatbes, host cert+key.
- yum install lcg-CA glite-SE_dpm_disk
- users.conf, groups.conf for all pool accounts and unix groups/subgroups.
- Additional stuff for DPMXrootAccess (https://svn.lhep.unibe.ch/LHEP+ATLAS+Sysadmin/87).
- Adapt site-info.def from head node.
- cd /opt; ./glite/yaim/bin/yaim -c -s glite/yaim/unibe-lhep/site-info.def -n glite-SE_dpm_disk
- chkconfig fetch-crl off (since http_proxy not yet set at startup).
- Add "env http_proxy=http://proxy.unibe.ch:80" to "/etc/cron.d/fetch-crl".
- Set "enabled=0" in "glite-SE_dpm_disk.repo" (disable gLite auto-updates).
Additional Configuration on Storage Element Head Node
- Restrict each pool to the appropriate VO groups (cull gid's from mysql DB)
- Reserve 50TB for ATLASLOCALGROUPDISK (https://svn.lhep.unibe.ch/LHEP+ATLAS+Sysadmin/100) - Can change size at any time.
BERN ATLAS T3 2009 Cluster ce.lhep.unibe.ch
- Summary : 184 2.3 GHz AMD Worker Cores (23 Nodes) with 2GB RAM per core.
- GOCDB Entry: https://goc.gridops.org/site/list?id=3605006
- Hardware : 24 SUN Fire X2200 and one Elonex 10 TB File Server. NetGear Switch.
- Network : Gateway 130.92.139.1. DNS 130.92.9.53 (default), 130.92.9.52. Subnet mask 255.255.255.0
Host | IP | Service | OS | CPU | Comment | |
---|---|---|---|---|---|---|
lheplin0.unibe.ch | 130.92.139.150 | 24 Ports Switch | Connected to eth1 on FE (lower left NIC) | |||
ce.lhep.unibe.ch | 130.92.139.200 | Rocks Front End (FE) 5.2 x86_64 | CentOS 5.3 x86_64 | 2 Quad AMD Opteron | Rocks FE, SGE, ARC FE, Web Server, Ganglia | |
10.1.225.254 | 48 Ports Switch | |||||
nas-0-0 | 130.92.139.100 | 10TB NFS Server | CentOS 5.3 | 2 Dual Xeon | Cache and Sessiondirs, bonded to switch | |
compute-0-0 | Worker Node 1 | CentOS 5.3 | 2 Quad AMD Opteron | Managed by ce.lhep.unibe.ch | ||
... | ... | ... | ... | ... | ... | |
compute-0-21 | ... | Worker Node 21 | ... | 2 Quad AMD Opteron | Managed by ce.lhep.unibe.ch | |
... | ... | Worker Node 22 | ... | 2 Dual AMD Opteron | Managed by ce.lhep.unibe.ch | |
... | ... | Worker Node 23 | ... | 2 Dual AMD Opteron | Managed by ce.lhep.unibe.ch | |
... | ... | Worker Node 24 | ... | 2 Dual AMD Opteron | Managed by ce.lhep.unibe.ch | |
... | ... | Worker Node 25 | ... | 2 Quad AMD Opteron | Managed by ce.lhep.unibe.ch |
Front End Node Installation and Configuration
- Installation issue : Did CDROM emulation of USB in BIOS in order to find ks.cfg on external USB DVD.
- Installation issue : Since xen installed, had to set 8 cpus in /etc/xen/xend-config.sxp and reboot in order to see all cores on front end.
- Plug in DVD with ROCKS via USB. Answer the questions. About 15 min. Login as root
- Disable firewall and SE Linux with system-config-securitylevel
- echo export http_proxy=http://proxy.unibe.ch:80 > /etc/profile.d/proxy.sh; source /etc/profile.d/proxy.sh
- adduser -g users gridatlaslhep; passwd gridatlaslhep
- adduser -g users atlas-sw; passwd atlas-sw
- add to /etc/fstab: 130.92.139.94:/terabig /external/terabig nfs defaults 0 0
- mkdir /external/; mkdir /external/terabig; mount -a
- cp /root/lhepconfig/iptables /etc/sysconfig/
- /etc/init.d/http restart
- ln -s /terabig/shaug/public_html/ /var/www/html/shaug. For every of us.
- Had to restart Ganglia's gmond on all nodes: rocks run host "/etc/init.d/gmond restart"
Compute Nodes Configuration and Kick Start
- Edit /export/rocks/install/site-profiles/5.2/nodes/extend-compute.xml. Or copy from /root/lhepconfig.
- Copy all non-perl rpm in /root/lhepconfig to /export/rocks/install/contrib/5.2/x86_64/
- cd /export/rocks/install; rocks create distro
- Speed up the ssh login: Set "ForwardX11 no" /etc/ssh/ssh_config.
- /etc/init.d/sshd restart
- Now do insert-ethers and see if nodes are found and gets a *. If problems, restart nodes, change boot in bios etc.
ATLAS Kit Installation and Validation (better done on a compute node)
- mkdir /share/apps/atlas; chown atlas-sw:users /share/apps/atlas/
- su - atlas-sw
- cd /share/apps/; mkdir runtime; cd runtime/; mkdir APPS; cd APPS/; mkdir HEP
- cd /share/apps/atlas/; mkdir 15.3.1; cd 15.3.1
- source installPacmanKit.sh 15.3.1 I686-SLC4-GCC34-OPT /share/apps/runtime
- source /share/apps/runtime/APPS/HEP/ATLAS-15.3.1
- pacman -allow tar-overwrite -get http://atlas.web.cern.ch/Atlas/GROUPS/DATABASE/pacman4/DBRelease:DBRelease-7.2.1.pacman
- cd; source KVtest.sh 15.3.1 # All ok.
ARC Installation
- mkdir /share/apps/cache; mkdir /share/apps/session; mkdir /var/spool/nordugrid; mkdir /var/spool/nordugrid/cachecontrol; mkdir /etc/grid-security
- Edit /etc/yum.conf according to /root/lhepconfig/yum.conf
- Some missing perl stuff : [root@ce lhepconfig]# rpm -ivh perl-*
- yum groupinstall "ARC Server"
- yum groupinstall "ARC Client"
- cp arc.conf /etc/
- /etc/init.d/gridftp start; /etc/init.d/grid-manager start; /etc/init.d/grid-infosystem start;
- adduser -g users gridatlaslhep; passwd gridatlaslhep
- rocks sync users
- add to /etc/fstab: 130.92.139.151:/external/se3 /external/se3 nfs defaults 0 0
- mkdir /external/se3; mount -a
- Voms problem with the nordugridmap (gridmapfile generator). Work around: Using the generator from SMSCG.
Restore Roll Creation
- cd /export/site-roll/rocks/src/roll/restore/; make roll
- scp ce.lhep.unibe.ch-restore-2009.09.10-0.x86_64.disk1.iso shaug@lheppc51.unibe.ch:/terabig/shaug/tmp/
Old PC50 becomes NAS with cash and sessiondir
- Just installed it as NAS appliance with insert-ethers (makes a nfs server)
- Bonded:
[root@nas-0-0 ~]# cat /etc/modprobe.conf alias scsi_hostadapter 3w-9xxx alias scsi_hostadapter1 ata_piix alias eth0 tg3 alias eth1 tg3 alias bond0 bonding install bond0 /sbin/modprobe/ bonding -o bonding0 mode=1 miimon=100 [root@nas-0-0 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0 TYPE=Ethernet DEVICE=bond0 BOOTPROTO='static' IPADDR=10.1.255.231 NETMASK=255.255.0.0 GATEWAY=10.1.255.1 NETWORK=10.1.255.0 BROADCAST=10.1.255.255 ONBOOT=yes USERCTL=no NAME='Bonding device 0' STARTMODE='auto' IPV6INIT=no PEERDNS=yes MTU=1500 [root@nas-0-0 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 SLAVE=yes MASTER=bond0 HWADDR=00:30:48:74:f6:d0 NETMASK=255.255.0.0 BOOTPROTO=none ONBOOT=yes MTU=1500 [root@nas-0-0 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth1 DEVICE=eth1 SLAVE=yes MASTER=bond0 HWADDR=00:30:48:74:f6:d1 BOOTPROTO=none ONBOOT=yes MTU=1500 [root@nas-0-0 ~]#
- Bonding not tested. Now mounted on all nodes:
[root@nas-0-0 ~]# /etc/init.d/iptables stop rocks run host compute "mkdir /grid" rocks run host compute "mkdir /grid/sessiondir" rocks run host compute "mkdir /grid/cache" rocks run host compute "echo '# NFS session and cache dirs for grid jobs' >> /etc/fstab" rocks run host compute "echo 'nas-0-0:/export/cache /grid/cache nfs rw,nolock 0 0' >> /etc/fstab" rocks run host compute "echo 'nas-0-0:/export/sessiondir /grid/sessiondir nfs rw,nolock 0 0' >> /etc/fstab" rocks run host compute-0-0 "cat /etc/fstab" rocks run host compute "mount -a"
- Now /etc/gmond.conf needs this:
udp_recv_channel { /*mcast_join = 224.0.0.3*/ port = 8649 } udp_send_channel { /*mcast_join = 224.0.0.3*/ host = 10.1.1.1 port = 8649 }
To Be Done
- Make Lustre file system with cash and seesion dir. AAA/SWITCH project in 2010.
Sun Grid Engine (SGE) Configuration and Commands
- qstat -u \*, qstat -f
- qmod -d|e *@compute-0-2
- qhost
- Jobs in state grid state O, often LRMS state Eqw ? Try something like this:
qstat -u \* | grep Eqw | cut -d" " -f3,7 | while read line; do qmod -cj $line; done
How to use the management NIC on the SUN pizzas
- Check the IP of the pizza in its BIOS.
- Configure your laptop to have IP in same subnet.
- ssh into the pizza and do for example : SP -> show -l all
BERN ATLAS T3 2008 Cluster Hardware
Host | IP | Service | OS | ARC | Comment |
---|---|---|---|---|---|
lheplin0 | 130.92.139.150 | Switch | |||
lheppc50 | 130.92.139.100 | File Server | SLC 4.6 | Dual Dual Xeon | ARC FE, gridftp, Torque, LDAP, 8.3 TB Data Raid |
lheppc44 | 130.92.139.194 | File Server | SLC 4.7 | Dual Quad Xeon | 3.6 Data Raid 6 (homes) |
lheppc51 | 130.92.139.151 | File Server | SLC 4.7 | Quad Xeon | 21 TB Data Raid, xrootd master (foreseen), cfengine, dq2 and tivoli clients |
lheppc25 | 130.92.139.75 | Web Server | SuSe 9.3 | Single AMD | web and ganglia server, should be moved. |
svn.lhep.unibe.ch | Code/Doc. Repos. | SuSe | VM hosted by ID. | ||
- | |||||
lhepat05 | 130.92.139.205 | WN | SLC44 | Dual IA64 | Worker Node |
lhepat06 | 130.92.139.206 | WN | SLC44 | Dual AMD64 | Worker Node (needs new disk) |
lhepat07 | 130.92.139.207 | WN | SLC45 | Dual Dual AMD64 | Worker Node (broken) |
lhepat08 | 130.92.139.208 | WN | SLC44 | Dual Dual AMD64 | Worker Node |
lhepat09 | 130.92.139.209 | WN | SLC44 | Dual Dual AMD64 | Worker Node (broken) |
lhepat10 | 130.92.139.210 | WN | SLC44 | Dual Dual AMD64 | Worker Node |
lhepat11 | 130.92.139.211 | WN | SLC44 | Dual Dual AMD64 | Worker Node |
lhepat12 | 130.92.139.212 | WN | SLC44 | Dual Dual AMD64 | Worker Node (broken) |
See ATLAS Software for instructions how to get ATLAS software to work
PROOF - Running ROOT on many cores OUTDATED We may bring it back when needed
You have to provide your TSelector class with the corresponding .C and .h files (see ROOT manual). From your ROOT session do :
TProof *p = TProof::Open("lheppc51.unibe.ch"); TDSet *set = new TDSet("TTree", "Name of your Branch"); set->Add("/ngse1/yourfile1.root"); set->Add("/ngse1/yourfile2.root"); set->Process("yourfile.C","",1000) // Loop over 1000 events.
Setting up PROOF
There is no additional software. Everything comes with the ROOT installation. There are two configuration files which we have placed in $ROOTSYS/etc : proof.conf and xrootCluster.cf. The first contains the list of master node and worker nodes. On master and all workers the xrootd has to be started as ROOT, but with the R option specifying a non-superuser. From the config file xrootd knows if it is a master or a worker.
xrootd -c $ROOTSYS/etc/xrootCluster.cf -b -l /tmp/xpd.log -R atlsoft
Compile root on the cluster
This section sums up the steps to compile root on the cluster. Every now and then a patched version is needed that is not yet in the cvmfs and it can be tricky to get all the configurations and links right.
- download the package from svn to pc7. - log onto ce.lhep.unibe.ch and from there onto one of the nodes ssh root@ce.lhep.unibe.ch ssh compute-1-10 - scp the tarbarll to somewhere on /grid/root* (/grid is visible from all the nodes) - setup the environment like with export BUILD=x86_64-slc5-gcc43-opt export PATH="/afs/cern.ch/sw/lcg/external/Python/2.6.5/$BUILD/bin:${PATH}" export LD_LIBRARY_PATH="/afs/cern.ch/sw/lcg/external/Python/2.6.5/$BUILD/lib:${LD_LIBRARY_PATH}" source /cvmfs/atlas.cern.ch/repo/sw/python/2.6.5/setup.sh - configure the root compilation with ./configure linuxx8664gcc --disable-castor --disable-rfio --disable-tmva --enable-xml --enable-roofit --enable-python --enable-minuit2 --with-python-incdir=/cvmfs/atlas.cern.ch/repo/sw/python/2.6.5/x86_64-slc5-gcc43-opt/python/include/python2.6/ --with-python-libdir=/cvmfs/atlas.cern.ch/repo/sw/python/2.6.5/x86_64-slc5-gcc43-opt/python/lib/ - the python / pyROOT part can be tricky, make sure you have the right versions and all libraries. The log should say something like: Checking for python2.6, libpython2.6, libpython, python, or Python ... /cvmfs/atlas.cern.ch/repo/sw/python/2.6.5/x86_64-slc5-gcc43-opt/python/lib/ - everything should be setup for compilation: make -j 4 - it usually gets stuck at some point or runs out of memory. Kill it, restart it. If it does not move, do just make. - Hopefully it succeeded. Test the installation with source bin/thisroot.sh root - and the python part with python from ROOT import * - Hopefully everything works fine. If it does not, do make clean, delete all libraries and check the environment.
Submit jobs to the cluster
Refer to the description on the Geneva TWiki: https://twiki.cern.ch/twiki/bin/viewauth/GeneveAtlas/BernGenevaARC (written by a Bern guy).