.. _ex2-cluster-script: Using the cluster ------------------------- The sim cluster consists of 4 compute nodes with 48 cores each (96 threads with hyperthreading) that are connected by a fast network. A resource manager, in our case SLURM, takes care of starting MPI applications on the nodes. So far, we have used `interactive sessions` on a single node, where we could only use up to 8 processes. The normal process of using a cluster is to write a `job script` and submit the job to the queue. A scheduler decides the order in which all jobs will be executed. When its the turn for our job, it will allocate the requested number of nodes and run the job. On busy systems, it will give smaller or shorter jobs priority over longer and larger jobs as well as optimize for good utilization of the whole cluster. A sample job script, `job.sh` is the following: .. code-block:: bash :linenos: #!/bin/bash # #SBATCH --job-name=submission #SBATCH --output=result.txt # #SBATCH --ntasks=16 #SBATCH --ntasks-per-node=4 #SBATCH --time=10:00 module use /usr/local.nfs/sgs/modulefiles module load gcc/10.2 module load openmpi/3.1.6-gcc-10.2 module load vtk/9.0.1 module load cmake/3.18.2 srun -n 16 ./numsim_parallel lid_driven_cavity.txt The scripts starts with the `Shebang` line 1 and some parameters to the scheduler that start with `#SBATCH`. It requests the number of processes (line 6) and how many processes should be placed on every node (line 4), as well as the maximum runtime (line 8). Usually in High Performance Computing, it is too slow to use hyperthreading. The given example will allocate 4 nodes and request 4 cores per node. All console output of the job will be written to the file `result.txt`, as specified in line 4. We again need to load the modules as previously and additionally set ``PATH`` and ``CPATH`` to a custom mpi version because the default OpenMPI does not work with SLURM. Instead of using `mpirun` we need to use SLURM to launch the parallel program. The command `srun` takes care of this and is given the total number of processes. This has to be less than or equal to the requested number of processes in the header of the file. Note that we should load the modules at the beginning of the job script. The script can be submitted by the command .. code-block:: bash sbatch job.sh You can list all submitted and running jobs with their job id with .. code-block:: bash $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 12817 all submissi maierbn R 0:02 4 sgscl1n[1-4] To cancel a job, run `scancel` with the job id: .. code-block:: bash $ scancel 12817