Activités SED


name: inverse
class: center, middle, inverse

# Using GPU nodes in rioc: OpenACC 
Mauricio DIAZ

.affiliations[
  ![Inria](imgs/inria-logo.png)
 ]

---

layout: false
.left-column[
 ## GPUs on rioc
 
 ## Why use OpenACC?
 
 ## How to do it ?

  ]
.right-column[
## Developer Meetup: 5 minutes talk
]

---

# GPUs on rioc 

.green[current configuration - march 2019]

- 2 nodes with 2 Testla K20: .green[gpu001, gpu002]

- 2 nodes with 4 GTX 1080 TI: .green[gpu003, gpu004]


.green[Options for using GPU in rioc:]

- CUDA application programming interface
- Library Cuda friendly (PyToch, TensorFlow, etc…)
- OpenCL
- OpenACC

.blue[Reserve using OAR (e.g.)] : 
In general, for submit a job:

```terminal
oarsub -q gpu -l /nodes=1 ./launch_job.sh
```

For specific nodes:
.small[
```terminal
oarsub -q gpu -p "not host like 'gpu001' AND not host like 'gpu002'" -l /nodes=1
```
]

[Doc for rioc](https://sed-paris.gitlabpages.inria.fr/rioc/)
---
# What is OpenACC? 
# .red[or better what is not OpenACC?] 

- It is not a library.
- It is not a software.

---
# .red[What is OpenACC?] 
# or better what is not OpenACC? 

- It is a standard.
- Citing OpenACC website: 

.large[.center[_"...user-driven directive-based performance-portable
parallel programming model designed for scientists and engineers interested in
porting their codes to a wide-variety of heterogeneous HPC hardware platforms
and architectures..."_]]

---
# How to do it ?
## Estimate _pi_ recursively

.medium[
```C
#include <stdio.h>
#include "timer.h"
#define N 2000000000

#define vl 1024

int main(int argc, char **argv) {

  double pi = 0.0f;
  long long i;
  StartTimer();
  #pragma acc parallel vector_length(vl)
  for (i=0; i<N; i++) {
    double t = (double)((i+0.5)/N);
    pi += 4.0/(1.0+t*t);
  }
  double runtime = GetTimer();
  printf("pi = %11.10f\n", pi/N);
  printf("Total: %f s\n", runtime/1000);

  return 0;
}
```
]

/*              >
---
# How to do it ?
## Estimate _pi_ recursively

Load the module for the PGI compiler:

```terminal
module load pgi/18.4
```

Compile a standard version:

```terminal
pgcc -fast -o cpu pi.c
```

Load the cuda toolkit module:

```terminal
module load cuda92/toolkit/9.2.88
```

Compile the accelerated version:


```terminal
pgcc -fast -Minfo=accel -ta=tesla:cc35 -o gpu pi.c
```
---
# Conclusion

- Accelerated code with less programming effort.

- Code already existing can be easily accelarated.

- Not limited to GPUs but to other kind of parallelism.

- More info: [OpenACC Programming and Best Practices Guide](https://www.openacc.org/sites/default/files/inline-files/OpenACC_Programming_Guide_0.pdf)
---