Last updated on September 15, 2022 In this blog post, we show how to get started with Codee using the Canny edge detection image processing algorithm. You will see how it supports the performance optimization roadmap by providing human-readable actionable items to enable the optimization of sequential code as well as the exploitation of the […]
News
Flexibility and Performance: turbocharging loop unswitching
In the previous post we talked about loop unswitching: a technique that the compilers use to make loops faster. Loop unswitching consists of detecting conditional statements whose condition never changes during the execution of the loop (loop invariant conditions) and then moving the condition outside of the loop. Loop unswitching opens the door for other […]
Flexibility and performance: introduction to loop unswitching
Flexibility is very important in the development of software. A great deal of research and education is invested into creating flexible software. However, flexibility and performance don’t often come hand in hand, as most developers have already witnessed sometime in their careers. One of the ways to achieve flexibility is to parametrize the behavior of […]
Many ways to speed up your program
There are many approaches to make your program run faster. Some approaches are based on using more efficient libraries, others rely on using the standard library and the language in an efficient manner. Other approaches include using more efficient algorithms. But sometimes, even after we’ve applied all the possible optimizations, the code performance is not […]
Speed up non-vectorizable loops with loop fission
Vectorization is a powerful technique for achieving peak computational performance. However, not all code is easily vectorizable by all compilers. In this post we are going to talk about vectorization of complex non-vectorizable loops. The idea is to split the loop into two loops, one for the vectorizable part and the other for the non-vectorizable […]
Trade a bit of precision for performance on hotspots with compiler vectorization directives
Most of the applications are fine when compiled with the -ffast-math compiler flag, which makes floating-point computations faster at the expense of some precision loss. However, in the scientific domain, floating-point precision is important so most of the time those codebases are compiled without this option. Still, the question of performance of such code remains […]
Run your floating-point calculations with both precision and speed
One of our customers has a mathematical simulation program where precision matters. We were given the code for evaluation and asked to make it run faster. They have a strict requirement that the source code is compiled without any optimization flags that can influence precision. Most compilers offer a fast-math compiler flag that significantly improves […]
Is your algorithm running at peak performance? The roofline model
There are worlds where performance matters. For example HPC world: faster software means less wait time for the scientists. Embedded world: faster software means we can use cheaper silicon to build our product. Game world: faster software means that our game will run on slower CPUs, thus making our game more interesting to people with […]
Case Study: How we made the Canny edge detector run faster? (part 2)
In the previous post we talked about the performance improvement we obtained for the Canny edge detection algorithm. Most of the changes we did there were focused on hot loops: making memory accesses sequential and making the loops longer, which in turn allowed the compiler to autovectorize them. This made our program run significantly faster […]
A touch of parallelism: example of NPB CG Benchmark
The ultimate goal of the Codee software suite is to help users achieve the peak performance of their software. One of the ways to do it is with a touch of parallelism. This post will talk about the NPB CG benchmark, a popular benchmark for comparing supercomputers, developed by NASA. We will talk about how […]