An R Introduction to Statistics

GPU Computing with R

fractal-10h Statistics is computationally intensive. Routine statistical tasks such as data extraction, graphical summary, and technical interpretation all require pervasive use of modern computing machinery. Obviously, these tasks can benefit greatly from a parallel computing environment where extensive calculations can be performed simultaneously.

Recent advances in consumer computer hardware makes parallel computing capability widely available to most users. In fact, the mundane video graphics cards in many PCs nowadays support parallel computing operations besides the routine graphical functions. Applications that make effective use of the so-called graphics processing units (GPU) have reported significant performance gains.

Although this site is dedicated to elementary statistics with R, it is evident that parallel computing will be of tremendous importance in the near future, and it is imperative for students to be acquainted with the new technology as soon as possible. Thus we are opening a new series of articles on the subject. Students will be able to tackle problems previously deemed impractical due to resource constraints.

We begin with selecting a GPU computing platform. One of the most affordable options available is NVIDIA’s CUDA. Even with its most inexpensive entry level equipment, there are dozens of processing cores for parallel computing. Therefore, our GPU computing tutorials will be based on CUDA for now. Incidentally, the CUDA programming interface is vector oriented, and fits perfectly with the R language paradigm.

We will not deal with CUDA directly or its advanced C/C++ interface. Instead, we will rely on rpud and other R packages for studying GPU computing. We will compare the performance of GPU functions with their regular R counterparts and verify the performance advantage.