In the previous post we talked about the performance improvement we obtained for the Canny edge detection algorithm. Most of the changes we did there were focused on hot loops: making memory accesses sequential and making the loops longer, which in turn allowed the compiler to autovectorize them. This made our program run significantly faster […]
Technical
A touch of parallelism: example of NPB CG Benchmark
The ultimate goal of the Codee software suite is to help users achieve the peak performance of their software. One of the ways to do it is with a touch of parallelism. This post will talk about the NPB CG benchmark, a popular benchmark for comparing supercomputers, developed by NASA. We will talk about how […]
Case Study: How we made Canny edge detector algorithm run faster? (part 1)
Canny edge detector algorithm is a famous algorithm in image processing. One of our customers was inquiring about the performance of that algorithm, so our performance engineers took a look. As always when dealing with unfamiliar codebases, we didn’t know what to expect. Some algorithms have a lot of room for performance improvements, others don’t. […]