Concepts Required to Understand the Cluster
Account#
It's a logical group of users. At the moment, accounts correspond to primary unix group.
Partition#
A partition is a group of compute nodes. On the cluster, there are 2 main partitions:
common, the default partition, indicated with a * in thesinfooutput command,dedicatedcontaining all the nodes belonging to research units
3 others partitions have been created for specific usage associated with the corresponding QoS:
gpufor common GPU nodes,clcgwbfor nodes with CLC Genomics WorkBench licencses,clcbiofor nodes with CLC Assembly Cell licenses.
Units can buy nodes dedicated to their research. These nodes are gathered in a partition dedicated to the unit. That partition is named based on the name of the unix group of the unit.
Quality of Service (QoS)#
The Quality of Service provides limits of what can be run:
- maximum run time of jobs,
- maximum number of cores allocated per user or per account,
- maximum number of running jobs per user or per account,
- maximum number of submitted jobs per user or per account,
- ...
The QoS is the bridge between an account and a partition:
- a QoS is allowed on one or more partitions,
- the members of an account are allowed to use one or more QoS.
These QoS are available for all users on the cluster:
| QoS | Time limit | Priority at submission time | Max cores |
|---|---|---|---|
fast |
2 hours | ++ | - |
normal |
24 hours | + | - |
long (default QoS on the corresponding partition) |
unlimited | 5 per user | |
gpu (default QoS on the corresponding partition) |
3 days | +++ | 12 per GPU (optimal) |
clcgwb (default QoS on the corresponding partition) |
24 hours | +++ | 12 per job |
clcbio (default QoS on the corresponding partition) |
24 hours | +++ | 12 per job |
The partitions owned by the research unit have their own QoS attached to the corresponding partition to ensure unlimited running time for the unit's jobs.
Association#
A combination cluster/user/account/partition. Each user must appear in at least one association to be able to submit jobs.
Node#
A node is a server with:
- many cores (usually 95)
- RAM 480 GB or 1.98 TB
- temporary disk
- possibly GPUs
Job#
A job is a scheduled computational task that uses cluster resources (nodes, cores, memory).
Job step#
A job step is the execution of a program or script supplied for the job, within its allocated resources.
The job step is identified by <jobid>.<stepid>
Job Priority#
A weighted sum computed by Slurm based on:
- the Age of the job,
- the Fair-share of the account,
- the Size of the job (not used),
- the Partition priority (not used),
- the QoS priority
The higher the sum is, the higher the job priority is. A job priority changes over time.
Related articles#
Related articles appear here based on the labels you select. Click to edit the macro and add or change labels.
false5FAQAfalsemodifiedtruepagelabel in ("slurm","scheduler") and type = "page" and space = "FAQA"scheduler slurm
true
| Related issues |