OMPC PLASMA¶
This library is an extension of the PLASMA library for distributed memory systems.
Building¶
To use OMPC PLASMA, we provide a docker image ompcluster/plasma-dev:lastest
containing a pre-compiled Clang/LLVM with all the OpenMP and MPI libraries
needed to compile and run OMPC PLASMA.
You can execute OMPC PLASMA on any computer using docker or Singularity.
To install OMPC PLASMA we use the following commands:
git clone https://gitlab.com/ompcluster/plasma.git
cd plasma/
mkdir build
cd build
export CC=clang
export CXX=clang++
export OpenBLAS_ROOT=/usr/local/include/openblas/
cmake ..
make -j$(nproc)
Usage¶
OMPC PLASMA should be run using parameters. To observe the parameters, execute the following command:
./plasmatest --help
In general, OMPC PLASMA should be executed with the following parameters:
./plasmatest routine --dim=$dim --nrhs=$dim --nb=$nb --test=$test
These parameters represent:
routine
: This parameter represents the application of linear algebra. Currently OMPC PLASMA supports four applications:spotrf
,sgemm
,ssyrk
andstrsm
.$dim
: The matrix size.$nb
: The block size. This number should be divisor of$dim
.$test
(y|n): Determine whether or not the results should be verified.
There are other parameters, which depend on each routine that the user wants to execute.
Example¶
Here is a example how to run the OMPC PLASMA in a cluster using SLURM:
#!/bin/bash
#SBATCH --job-name=plasma-job
#SBATCH --output=plasma-output.txt
#SBATCH --nodes 3
module purge
module load mpich/4.0.2-ucx
##### OMPC settings
export OMPCLUSTER_NUM_EXEC_EVENT_HANDLERS=4
expsort LIBOMP_NUM_HIDDEN_HELPER_THREADS=8
export OMPCLUSTER_HEFT_COMM_COEF=0.00000000008
export OMPCLUSTER_HEFT_COMP_COST=20000000000
##### OpenMP settings
export OMP_NUM_THREADS=4
export OPENBLAS_NUM_THREADS=1
srun --mpi=pmi2 -n 3 singularity exec plasma-dev_latest.sif plasma/build/plasmatest spotrf --dim=1024 --nrhs=1024 --nb=256 --test=y
OMPC configurations depend on how the user executes the program in the cluster. In the example, OMPC PLASMA runs on 2 worker nodes, each node will work with 4 threads.