Interactive access to GPUs: Difference between revisions

From LHEP Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 6: Line 6:


* '''Provide their ssh public key'''
* '''Provide their ssh public key'''
NOTE: the key '''_MUST_''') be protected by a passphrase. We will proactively remove any key that is not.
NOTE: the key '''_MUST_''' be protected by a passphrase. We will proactively remove any key that is not.


* '''System access'''
* '''System access'''

Revision as of 13:15, 2 June 2023

NOTE

Due to lack of demand, this procedure has been established for ad-hoc users for limited time frames. As such, it is not very polished and might need adjustments. We will improve it in case higher demand arises. For the time being, only one user on the cluster is mapped to incoming ssh requests. The underlying resources are managed by Slurm, so the user will interact via Slurm client commands, e.g. 'srun'. Work in progress ...

Prerequisites

Users wishing to use special resources like GPUs should follow they following steps"

  • Provide their ssh public key

NOTE: the key _MUST_ be protected by a passphrase. We will proactively remove any key that is not.

  • System access
  ssh atlasch020@ce01.lhep.unibe.ch
  • Start an interactive shell
  srun --partition=CLUSTER-GPU --gres=gpu:1 -t 100 --pty bash

This will land the user on the worker node wn-1-1, which is the special wn we have that is equipped with CPUs. The -t flag reserves a runtime of 100 minutes, 2GB or RAM are allocated by default. In order to tweak your user request for resources (e.g. --mem-per-cpu=), please read the 'srun' docs: https://slurm.schedmd.com/srun.html

There is a local storage area of several hundred GBs on the local disk, which is where the user lands on. For interactive work, inputs, code, containers, etc should be copied over to this area first. We encourage to copy outputs out after every run to free up space on the local area. More complex data management schemes are possible, should be discussed according to the user(s) needs

  • Run your code

At this stage, you can run your code/container interactively, the environment should be very much like any UI (includibf CVMFS, and additionally, the user should be able to make use of one or more GPUs