Interactive access to GPUs: Difference between revisions

Revision as of 13:16, 2 June 2023

NOTE

Due to lack of demand, this procedure has been established for ad-hoc users for limited time frames. As such, it is not very polished and might need adjustments. We will improve it in case higher demand arises. For the time being, only one user on the cluster is mapped to incoming ssh requests. The underlying resources are managed by Slurm, so the user will interact via Slurm client commands, e.g. 'srun'. Work in progress ...

Prerequisites

Users wishing to use special resources like GPUs should follow they following steps"

Provide their ssh public key

NOTE: the key _MUST_ be protected by a passphrase. We will proactively remove any key that is not.

System access

  ssh atlasch020@ce01.lhep.unibe.ch

Start an interactive shell

  srun --partition=CLUSTER-GPU --gres=gpu:1 -t 100 --pty bash

This will land the user on the worker node wn-1-1 and open a bash shell on it. This is the special wn we have that is equipped with GPUs. The -t flag reserves a runtime of 100 minutes, 2GB or RAM are allocated by default. In order to tweak your user request for resources (e.g. --mem-per-cpu=), please read the 'srun' docs: https://slurm.schedmd.com/srun.html

There is a local storage area of several hundred GBs on the local disk, which is where the user lands on. For interactive work, inputs, code, containers, etc should be copied over to this area first. We encourage to copy outputs out after every run to free up space on the local area. More complex data management schemes are possible, should be discussed according to the user(s) needs

Run your code

At this stage, you can run your code/container interactively, the environment should be very much like any UI (includibf CVMFS, and additionally, the user should be able to make use of one or more GPUs

@@ Line 14: / Line 14: @@
     srun --partition=CLUSTER-GPU --gres=gpu:1 -t 100 --pty bash
-This will land the user on the worker node '''wn-1-1''', which is the special wn we have that is equipped with CPUs. The '''-t''' flag reserves a runtime of 100 minutes, 2GB or RAM are allocated by default. In order to tweak your user request for resources (e.g. '''--mem-per-cpu='''), please read the 'srun' docs: [https://slurm.schedmd.com/srun.html https://slurm.schedmd.com/srun.html]
+This will land the user on the worker node '''wn-1-1''' and open a bash shell on it. This is the special wn we have that is equipped with GPUs. The '''-t''' flag reserves a runtime of 100 minutes, 2GB or RAM are allocated by default. In order to tweak your user request for resources (e.g. '''--mem-per-cpu='''), please read the 'srun' docs: [https://slurm.schedmd.com/srun.html https://slurm.schedmd.com/srun.html]
 There is a local storage area of several hundred GBs on the local disk, which is where the user lands on. For interactive work, inputs, code, containers, etc should be copied over to this area first. We encourage to copy outputs out after every run to free up space on the local area. More complex data management schemes are possible, should be discussed according to the user(s) needs

Interactive access to GPUs: Difference between revisions

Revision as of 13:16, 2 June 2023

NOTE

Prerequisites

Navigation menu

Search