Writing Efficient Code
December 19, 2010
Writing efficient code for scientific applications in languages such as Fortran and C is often a matter of understanding technical details such as storage layout and loop order. I address several common efficiency issues below, however, one should always keep in mind the following quote from Donald E. Knuth (1974) regarding premature optimization:
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
Thus, knowing when to spend time on optimization is just as important as the optimization itself.
Access Array Elements in Storage Order
The order in which memory is accessed can have performance implications. For example, it is easier to access elements of a two-dimensional array in the order in which they are stored, since this involves incrementing a memory address. It adds complexity to access them out of order, since more calculations must be done to determine the memory location. The storage order depends on the particular compiler or language standard being used.
To make this more concrete, consider the following matrix:
Array storage in Fortran is column-major. That is to say, when stored in memory as a linear array, the matrix will be arranged as . Thus, accessing the elements of column-wise is most efficient. A useful way to remember this is that in Fortran, the first index varies fastest.
On the other hand, array storage in C is row-major, so the second index varies fastest. The matrix above will be stored in memory as .
Accessing the memory in the order in which it is stored will be faster. In Fortran, this means writing loops like
do j = 1, n
do i = 1, n
! operate on A(i,j) ...
end do
end do
On the other hand, in C we want:
for (i = 0; i < n; i++ {
for (j = 0; j < n; j++) {
/* operate on A[i][j] ... */
}
}
Similarly indexed arrays can also be merged for more efficient access. Consider three vectors , , and each of length . Rather than using
real, dimension(n) :: a, b, c
it can be more efficient to use
real, dimension(3,n) :: abc
Pay careful attention to the order of the dimensions! In C we would want to switch the order:
double abc[n][3];
Avoid Conditionals in Loops
In the following example, due to Prof. Dr. Ulrich Rüde, sqrt
must be
called for each iteration and the conditional must be checked.
subroutine threshold(a, thresh, ic)
real, dimension(n), intent(in) :: a
real, intent(in) :: thresh
integer, intent(out) :: ic
real :: tt
ic = 0
tt = 0.d0
do j = 1, n
tt = tt + a(j) * a(j)
if (sqrt(tt) >= thresh) then
ic = j
return
end if
end if
end subroutine threshold
An alternative approach, which would allow for many optimizations
(loop unrolling, CPU pipelining, less time spent evaluating the
conditional) would involve adding tt
in blocks (e.g., blocks of size
128) and checking the conditional after each block. When it the
condition is met, the last block can be repeated to determine the
value of ic
.
Minimize Loop Overhead
There is overhead in setting up loops, for example, do
loops in
Fortran and for
loops in C, that can have significant performance
consequences for relatively short loops.
Consider the following two loops.
For ,
do i = 1, n
do j = 1, m
! ...
end do
end do
is less efficient than
do j = 1, m
do i = 1, n
! ...
end do
end do
The first variant requires setting up loops (once for the outer loop and times for the inner loop) while the second variant only requires setting up loops. Paying attention to the loop ordering may help improve performance, but locality of reference concerns (i.e., column- or row–major access) are of first-order importance.
Avoid Subroutine Calls in Inner Loops
There is also overhead associated with function and subroutine calls that, when nested in the innermost loops, can become significant. Avoid such subroutine calls by inlining or restructuring code.
References
- Knuth, Donald E. (1974). Structured Programming with go to Statements, ACM Journal Computing Surveys 6, 268–301.