Executive Summary
A team from the King Abdullah University of Science and Technology (KAUST) attended the GPU training organized by the KAUST Supercomputing Lab (KSL), NVIDIA and Appentra to learn how to improve the performance of their “maxvol” software for 5G wireless networks using GPUs. In 5G networks the base station can send data to several phones or tablets in the same frequency-time window. And their “maxvol” software allows fast user selection without losing too much transfer rates while achieving fare performance on different devices, based on the signal strength. The GPU-enabled “maxvol” software also has the potential to be used in a wide range of applications in recommender systems and wireless communications.
About KAUST
KAUST Supercomputing Lab (KSL)’s mission is to inspire and enable scientific, economic and social advances through the development and application of HPC solutions, through collaboration with KAUST researchers and partners, and through the provision of world-class computational systems and services. KAUST offers world-class HPC and data resources and the opportunity for work and partner with KAUST staff. Part of KAUST’s core mission includes training the users of their services from industry and academia. KAUST has provided GPU training for several years, using a mixture of in-house and commercial GPU industry materials with great success. KAUST training programs are opened to incorporate new innovative ways of improving the GPU training.
The Challenge
5G wireless communications will be able to handle a thousand times more traffic than today’s networks and will be significantly faster than 4G networks. The “maxvol” algorithm has shown its superiority over traditional greedy search methods in the scope of massive multi-input multi-output MIMO communications, one of the technologies at the foundation of 5G networks. The KAUST team was highly motivated to learn how to make “maxvol” work efficiently on modern heterogeneous computers powered by GPUs, and to learn new tools to port scientific code to GPUs. The team hoped to explore the OpenACC technology using as a guide the Parallelware Trainer tool available in KAUST Ibex system. The GPU training organized by KSL in collaboration with Appentra in March 2019 was a great opportunity to address this challenge.
Learning GPU programming with patterns

In March 2019 KAUST organized a 1-day course on using OpenACC programming for GPUs, followed by a GPU hackathon to help users of their systems port scientific code to GPUs. In order to innovate and improve their training outcomes, the participants had the opportunity to use Appentra’s unique approach to GPU training: focusing on identifying patterns and then introducing the concepts and directives needed to parallelize the patterns. Using the Parallelware Trainer tool allowed participants in one day to understand how to decompose their code, parallelize loop-by-loop to minimise the introduction of bugs and facilitate regular testing, and finally optimise across multiple GPU-enabled regions of code. This approach provided attendees with a clear framework for using OpenACC on any future code with clear actionable takeaways to start using straight away.
Parallelware Trainer was very well received by the developers. They first could see very quickly how a simple code can be ported to GPUs with OpenACC pragmas that the tool suggests. They also showed interest in the features that enable them to learn more information about each keyword illustrated with examples. One important aspect is that Aleksandr Mikhalev is the first-ever winner of the hackathon with a single-person team.
The experience with Parallelware Trainer
Hackathon participant, Aleksandr Mikhalev from KAUST, used the hackathon to port his maxvol code to the GPUs available on KAUST’s Ibex service, a heterogeneous group of 864 nodes, including a mix of AMD, INTEL and NVIDIA GPUs with different architectures. The hackathon allowed him to achieve his goals and implement an OpenACC code doing “maxvol” on many matrices in parallel, which in addition is easy to run on GPU or on CPU, depending on compilation flags only.


“At the beginning of the hackathon I thought to use CUDA to port my code as I had no knowledge about OpenACC at that time. However, I changed my mind after I listened to the presentation of the Parallelware Trainer. With its help I learned OpenACC with its advantages and current drawbacks in just 2 days. What I loved the most: you do not have to rewrite your code entirely to see if you did it without mistakes, you do it step-by-step simply by adding proper pragmas. And you can instantly check if your last change contains any mistakes.”
Aleksandr Mikhalev, KAUST
Leave a Reply