One of our customers has a mathematical simulation program where precision matters. We were given the code for evaluation and asked to make it run faster. They have a strict requirement that the source code is compiled without any optimization flags that can influence precision.
Most compilers offer a fast-math compiler flag that significantly improves performance at the expense of precision. It combines several types of optimizations that can be individually enabled using compiler flags. You don’t want to enable fast-math if precision is important to you, but there are some compiler flags that do not influence precision that will benefit the performance that you can enable. This post introduces the floating-point precision problem and presents an important compiler flag that increases performance without impacting precision.
The problem with floating-point numbers and precision
Compilers are really good at doing optimizations, but with floating-point calculations they need to be careful. Since floating-point numbers have a finite precision, some rules of algebra do not apply with floating-point numbers. For example, the law of associativity doesn’t hold in floating-point arithmetics, i.e.
(a + b) + c ≠ a + (b + c)
If a
is 1030, b
is -1030 and c
is 1, the expression (a + b) + c
gives 1, in contrast to expression a + (b + c)
which gives 0 as the result [1].
To make floating-point calculations predictable and as precise as possible, IEEE 754 standard defines strict semantics. Strict IEEE 754 semantics disallow approximative instructions and optimizations using algebraic transformations that can influence the precision of the result. The goal is to make the calculations precise and predictable. Ideally, the same program compiled with different compilers should produce the same results if IEEE 754 strict semantics flag is enabled.
Compilers, when working with floating-points under IEEE 754 semantics, must be careful when doing optimizations, since a small change intended to increase the speed can result in your program outputting a slightly different result.
Most of the time, the difference in results due to compiler optimizations will be small. Because of this, all compilers provide an optimization flag that enables many floating-point optimizations that result in a faster code, but with no guarantees about the precision of the result, typically called fast-math compilation flag 1. This flag is -ffast-math
for CLANG and GCC, -fp-model fast
for Intel’s compiler, and /fp:fast
for MSVC.

Subscribe to our newsletter
and get the latest tips and best practices from experts in software performance.
Compilation flag -ffast-math
is not enabled by default in any optimization level. In most applications, you will want to enable it because it can yield a substantial performance boost. A gamer doesn’t care if the red component in a color of the shirt of the main character is off by one; he is really interested in a high frame rate.
However, there are some applications, mostly in the scientific domain, where small differences due to the loss of precision and rounding errors accumulate over time and can lead to incorrect results. Those programs will be typically compiled without the -ffast-math
flag. The developers might decide to enable it if their program isn’t fast enough, but by enabling it they take the responsibility that the results produced by the program are still precise enough.
If you cannot enable -ffast-math
because of the precision loss, there are other compiler flags that are part of -ffast-math
but that do not affect precision and can be enabled independently. In the next section we will talk about one such optimization.
Optimizing mathematical error detection and error management
Some floating-point operations can create invalid results. For example, division by zero is invalid, as is the square root of a negative number. The question is how to signal the bad values to the program so it can decide what to do next (stop the computation, or perform fixup operations) 2.
In C/C++, before IEEE 754 exceptions, floating-point errors were signaled through the errno
variable defined in <errno.h>
. This mechanism continues to be employed today and library mathematical functions, such as sqrt
or log10
, signal an error by writing the error code to the errno
variable. Here is an example of code that exploits this fact:
errno = 0;
for (int i = 0; i < n; i++) {
double result = sqrt(a[i]);
if (errno == EDOM) {
continue;
}
b[i] = result / n;
}
In the case the sqrt
cannot compute the correct value (e.g. because the argument is negative), the computation is skipped.
From the performance perspective, setting errno
costs some time, even when it is unused, since all library mathematical functions set it. Code that uses errno
is also not optimal, especially in loops, since checking it requires a conditional statement and loops with conditional statements are more difficult for the compiler to efficiently optimize and vectorize.
Recommendations on performance and floating-point error management
Don’t use errno
to check for floating-point errors. You can sanitize your inputs to make sure no function can receive bad arguments (eg. check that no zero denominator is used in divisions). Alternatively, you can rely on IEEE exceptions defined in fenv.h
to let you know if your program is doing calculations on correct inputs, or, let the error propagate and simply check for NaNs or Infinities in the output result.
With default compiler flags, calls to mathematical functions like sqrt
will get forwarded to slow library functions that set errno
instead of fast hardware instructions. If your program doesn’t use errno
to check for errors in the calculations, it is completely safe to disable it. It is done using the -fno-math-errno
flag for GCC and CLANG.
We illustrate the performance difference with the following code snippet (click here for the full source code):
double rolling_average(double a[], int n) {
double result = 0.0;
for (int i = 0; i < n; i++) {
result += sqrt(a[i]) / n;
}
return result;
}
In our measurements, using GCC 9.3, when we compiled the above code with and without -fno-math-errno
, it took 497 ms to execute without -fno-math-errno
and 128 ms to execute with -fno-math-errno
. The result returned by the function was completely the same in both cases. In this particular example, flag -fno-math-errno
allowed the loop to be vectorized, which explains the performance improvement3.
This flag should be enabled whenever the performance of mathematical functions is important. The speed improvement will depend on the loop, compiler, etc, but generally, you should expect performance increase.
Summary
Most code bases can increase their performance by enabling the -ffast-math
(or equivalent) compiler switch to allow faster computations and the expense of some precision loss. The flag -ffast-math
automatically enables several compiler flags related to mathematical operations optimization. One of those flags is -fno-math-errno
, which can also be enabled without using -ffast-math
. For code bases that require the maximum precision possible, using this flag is a good idea. Enabling it will bring a performance boost by allowing the compiler to use fast instructions instead of slow library calls, with zero impact on precision.
In the next post, we go deeper into precision and vectorization, specifically, how you can speed up parts of your code if you are willing to give up on a bit of precision in your hot loops.

Building performance into the code from day one with Codee
References
[1] What Every Computer Scientist Should Know About Floating-Point Arithmetic
[2] GCC Wiki – Floating Point Math
[3] Linux Programmer’s Manual – Math Error
[4] CLANG Vectorizer Documentation
[5] LIBC – Errors in Floating-Point Calculations
[6] Microsoft Visual C++ – errno, _doserrno, _sys_errlist, and _sys_nerr
1 With -ffast-math
, the compiler can do many optimizations, roughly divided into three groups: (1) optimizations arising from rules of arithmetics (e.g. (-a) * (-b) = a * b
); (2) optimizations arising from using approximate instructions, e.g. a / b = a * (1 / b)
, where expression (1 / b)
is calculated using approximate reciprocal division instruction which is much faster than the regular division and (3) optimizations related to floating-point error management. These optimizations open door to other optimizations like common subexpression elimination, loop invariant code motion, vectorization, etc.
2 More information about floating-point error handling can be found in LIBC manual.
3 Please note that this improvement happened on a very short example using a specific compiler. The amount of improvement will depend on the compiler and loop source code, but generally, with a project which is large enough, you should definitely expect performance improvements with `-fno-math-errno` on all compilers.
P says
Excellent article. To the point without getting too technical about IEEE etc. I’ll be trying it today
Thx