Concepts Required to Understand the Cluster

Account#

It's a logical group of users. At the moment, accounts correspond to primary unix group.

Partition#

A partition is a group of compute nodes. On the cluster, there are 2 main partitions:

common, the default partition, indicated with a * in the sinfooutput command,
dedicated containing all the nodes belonging to research units

3 others partitions have been created for specific usage associated with the corresponding QoS:

gpu for common GPU nodes,
clcgwb for nodes with CLC Genomics WorkBench licencses,
clcbio for nodes with CLC Assembly Cell licenses.

Units can buy nodes dedicated to their research. These nodes are gathered in a partition dedicated to the unit. That partition is named based on the name of the unix group of the unit.

Quality of Service (QoS)#

The Quality of Service provides limits of what can be run:

maximum run time of jobs,
maximum number of cores allocated per user or per account,
maximum number of running jobs per user or per account,
maximum number of submitted jobs per user or per account,
...

The QoS is the bridge between an account and a partition:

a QoS is allowed on one or more partitions,
the members of an account are allowed to use one or more QoS.

These QoS are available for all users on the cluster:

QoS	Time limit	Priority at submission time	Max cores
`fast`	2 hours	++	-
`normal`	24 hours	+	-
`long` (default QoS on the corresponding partition)	unlimited		5 per user
`gpu` (default QoS on the corresponding partition)	3 days	+++	12 per GPU (optimal)
`clcgwb` (default QoS on the corresponding partition)	24 hours	+++	12 per job
`clcbio` (default QoS on the corresponding partition)	24 hours	+++	12 per job

The partitions owned by the research unit have their own QoS attached to the corresponding partition to ensure unlimited running time for the unit's jobs.

Association#

A combination cluster/user/account/partition. Each user must appear in at least one association to be able to submit jobs.

Node#

A node is a server with:

many cores (usually 95)
RAM 480 GB or 1.98 TB
temporary disk
possibly GPUs

Job#

A job is a scheduled computational task that uses cluster resources (nodes, cores, memory).

Job step#

A job step is the execution of a program or script supplied for the job, within its allocated resources.
The job step is identified by <jobid>.<stepid>

Job Priority#

A weighted sum computed by Slurm based on:

the Age of the job,
the Fair-share of the account,
the Size of the job (not used),
the Partition priority (not used),
the QoS priority

The higher the sum is, the higher the job priority is. A job priority changes over time.

Related articles appear here based on the labels you select. Click to edit the macro and add or change labels.

false5FAQAfalsemodifiedtruepagelabel in ("slurm","scheduler") and type = "page" and space = "FAQA"scheduler slurm

true


Related issues