Interactive access to GPUs: Difference between revisions
No edit summary |
No edit summary |
||
Line 14: | Line 14: | ||
srun --partition=CLUSTER-GPU --gres=gpu:1 -t 100 --pty bash | srun --partition=CLUSTER-GPU --gres=gpu:1 -t 100 --pty bash | ||
This will land the user on the worker node '''wn-1-1''' | This will land the user on the worker node '''wn-1-1''' and open a bash shell on it. This is the special wn we have that is equipped with GPUs. The '''-t''' flag reserves a runtime of 100 minutes, 2GB or RAM are allocated by default. In order to tweak your user request for resources (e.g. '''--mem-per-cpu='''), please read the 'srun' docs: [https://slurm.schedmd.com/srun.html https://slurm.schedmd.com/srun.html] | ||
There is a local storage area of several hundred GBs on the local disk, which is where the user lands on. For interactive work, inputs, code, containers, etc should be copied over to this area first. We encourage to copy outputs out after every run to free up space on the local area. More complex data management schemes are possible, should be discussed according to the user(s) needs | There is a local storage area of several hundred GBs on the local disk, which is where the user lands on. For interactive work, inputs, code, containers, etc should be copied over to this area first. We encourage to copy outputs out after every run to free up space on the local area. More complex data management schemes are possible, should be discussed according to the user(s) needs |
Revision as of 13:16, 2 June 2023
NOTE
Due to lack of demand, this procedure has been established for ad-hoc users for limited time frames. As such, it is not very polished and might need adjustments. We will improve it in case higher demand arises. For the time being, only one user on the cluster is mapped to incoming ssh requests. The underlying resources are managed by Slurm, so the user will interact via Slurm client commands, e.g. 'srun'. Work in progress ...
Prerequisites
Users wishing to use special resources like GPUs should follow they following steps"
- Provide their ssh public key
NOTE: the key _MUST_ be protected by a passphrase. We will proactively remove any key that is not.
- System access
ssh atlasch020@ce01.lhep.unibe.ch
- Start an interactive shell
srun --partition=CLUSTER-GPU --gres=gpu:1 -t 100 --pty bash
This will land the user on the worker node wn-1-1 and open a bash shell on it. This is the special wn we have that is equipped with GPUs. The -t flag reserves a runtime of 100 minutes, 2GB or RAM are allocated by default. In order to tweak your user request for resources (e.g. --mem-per-cpu=), please read the 'srun' docs: https://slurm.schedmd.com/srun.html
There is a local storage area of several hundred GBs on the local disk, which is where the user lands on. For interactive work, inputs, code, containers, etc should be copied over to this area first. We encourage to copy outputs out after every run to free up space on the local area. More complex data management schemes are possible, should be discussed according to the user(s) needs
- Run your code
At this stage, you can run your code/container interactively, the environment should be very much like any UI (includibf CVMFS, and additionally, the user should be able to make use of one or more GPUs