name: inverse class: center, middle, inverse # Using GPU nodes in rioc: OpenACC Mauricio DIAZ .affiliations[ ![Inria](imgs/inria-logo.png) ] --- layout: false .left-column[ ## GPUs on rioc ## Why use OpenACC? ## How to do it ? ] .right-column[ ## Developer Meetup: 5 minutes talk ] --- # GPUs on rioc .green[current configuration - march 2019] - 2 nodes with 2 Testla K20: .green[gpu001, gpu002] - 2 nodes with 4 GTX 1080 TI: .green[gpu003, gpu004] .green[Options for using GPU in rioc:] - CUDA application programming interface - Library Cuda friendly (PyToch, TensorFlow, etc…) - OpenCL - OpenACC .blue[Reserve using OAR (e.g.)] : In general, for submit a job: ```terminal oarsub -q gpu -l /nodes=1 ./launch_job.sh ``` For specific nodes: .small[ ```terminal oarsub -q gpu -p "not host like 'gpu001' AND not host like 'gpu002'" -l /nodes=1 ``` ] [Doc for rioc](https://sed-paris.gitlabpages.inria.fr/rioc/) --- # What is OpenACC? # .red[or better what is not OpenACC?] - It is not a library. - It is not a software. --- # .red[What is OpenACC?] # or better what is not OpenACC? - It is a standard. - Citing OpenACC website: .large[.center[_"...user-driven directive-based performance-portable parallel programming model designed for scientists and engineers interested in porting their codes to a wide-variety of heterogeneous HPC hardware platforms and architectures..."_]] --- # How to do it ? ## Estimate _pi_ recursively .medium[ ```C #include
#include "timer.h" #define N 2000000000 #define vl 1024 int main(int argc, char **argv) { double pi = 0.0f; long long i; StartTimer(); #pragma acc parallel vector_length(vl) for (i=0; i
--- # How to do it ? ## Estimate _pi_ recursively Load the module for the PGI compiler: ```terminal module load pgi/18.4 ``` Compile a standard version: ```terminal pgcc -fast -o cpu pi.c ``` Load the cuda toolkit module: ```terminal module load cuda92/toolkit/9.2.88 ``` Compile the accelerated version: ```terminal pgcc -fast -Minfo=accel -ta=tesla:cc35 -o gpu pi.c ``` --- # Conclusion - Accelerated code with less programming effort. - Code already existing can be easily accelarated. - Not limited to GPUs but to other kind of parallelism. - More info: [OpenACC Programming and Best Practices Guide](https://www.openacc.org/sites/default/files/inline-files/OpenACC_Programming_Guide_0.pdf) ---