6 min read

Computing the Euclidean norm: Numerical aspects

The Euclidean norm (also known as the 2-norm) is arguably the most widely used norm on vector spaces. It is also very simple conceptually: The length of a vector \(\mathbf{x} = \left[ x_1, \ldots, x_n \right]^\mathrm{T} \in \mathbb{R}^n\) is given by its Euclidean norm \(\|\mathbf{x}\|_2\) .

In this post, I want to talk about computing the Euclidean norm numerically, i.e. using floating-point arithmetic. As I recently found out, there is more than meets the eye to it. So let’s dive in…

The naïve approach

The formula for the Euclidean norm can be found in any textbook. It doesn’t look very complicated, right? $$ \|\mathbf{x}\|_2 = \sqrt{x_1^2 + \ldots + x_n^2} = \sqrt{\sum_{i = 1}^n x_i^2} $$ We just square all elements of \(\mathbf{x}\) , add them up and take the square root. Sounds easy enough! One can instantly write a C function for it:

/* Uses ANSI C */
#include <stddef.h>
#include <math.h>

double euclidean_norm_naive(const double x[], const size_t n) {
    size_t i = 0;
    double acc = 0.0;
    for (i = 0; i < n; i++) { acc += x[i]*x[i]; }
    return sqrt(acc);
}

So far, so good. But we’ve all heard that floating-point math is hard. For instance, we should take rounding errors, under- and overflow into account. And indeed, if you use our naïve function to compute the Euclidean norm of a vector with very large elements, you might be unpleasantly surprised:

const double foo[2] = {1.0e200, 1.0e200};

printf("%f", euclidean_norm_naive(foo, 2)); /* inf */

Oops. This is what we call overflow. Input vector elements \(10^{200}\) and the correct result \(\sqrt{2} \cdot 10^{200}\) can be represented in double-precision floating-point arithmetic, but the intermediate value \(\left( 10^{200} \right)^2 = 10^{400}\) cannot. So we get an inf instead of the correct result.

Similarly, our naïve implementation will underflow if vector \(\mathbf{x}\) has very small elements. Hence, this function is unusable for a considerable range of inputs. How can we solve this problem?

What would LAPACK do?

Let’s take a look at how LAPACK does this, ultimately it is the reference for doing numerical linear algebra. Here is the DNRM2 function:

      DOUBLE PRECISION FUNCTION DNRM2(N,X,INCX)
*     .. Scalar Arguments ..
      INTEGER INCX,N
*     ..
*     .. Array Arguments ..
      DOUBLE PRECISION X(*)
*     ..
*
*  Purpose
*  =======
*
*  DNRM2 returns the euclidean norm of a vector via the function
*  name, so that
*
*     DNRM2 := sqrt( x'*x )
*
*  Further Details
*  ===============
*
*  -- This version written on 25-October-1982.
*     Modified on 14-October-1993 to inline the call to DLASSQ.
*     Sven Hammarling, Nag Ltd.
*
*  =====================================================================
*
*     .. Parameters ..
      DOUBLE PRECISION ONE,ZERO
      PARAMETER (ONE=1.0D+0,ZERO=0.0D+0)
*     ..
*     .. Local Scalars ..
      DOUBLE PRECISION ABSXI,NORM,SCALE,SSQ
      INTEGER IX
*     ..
*     .. Intrinsic Functions ..
      INTRINSIC ABS,SQRT
*     ..
      IF (N.LT.1 .OR. INCX.LT.1) THEN
          NORM = ZERO
      ELSE IF (N.EQ.1) THEN
          NORM = ABS(X(1))
      ELSE
          SCALE = ZERO
          SSQ = ONE
*        The following loop is equivalent to this call to the LAPACK
*        auxiliary routine:
*        CALL DLASSQ( N, X, INCX, SCALE, SSQ )
*
          DO 10 IX = 1,1 + (N-1)*INCX,INCX
              IF (X(IX).NE.ZERO) THEN
                  ABSXI = ABS(X(IX))
                  IF (SCALE.LT.ABSXI) THEN
                      SSQ = ONE + SSQ* (SCALE/ABSXI)**2
                      SCALE = ABSXI
                  ELSE
                      SSQ = SSQ + (ABSXI/SCALE)**2
                  END IF
              END IF
   10     CONTINUE
          NORM = SCALE*SQRT(SSQ)
      END IF
*
      DNRM2 = NORM
      RETURN
*
*     End of DNRM2.
*
      END

Wait, what? SCALE? Branching? Why?

Let’s try to understand what happens in DNRM2. First I want to translate it into C, because, frankly, I cannot read fixed format FORTRAN without overheating my brain1). Here is a cleaned-up C version without some fine implementation details:

double euclidean_norm_scaled(const double x[], const size_t n) {
    size_t i = 0;
    double absxi = 0.0;
    double scale = 0.0;
    double ssq = 1.0;

    if (n < 1) {
        return 0.0;
    } else if (n == 1) {
        return fabs(x[0]);
    } else {
        /* Inlined DLASSQ */
        for (i = 0; i < n; i++) {
            if (x[i] != 0.0) {
                absxi = fabs(x[i]);
                if (scale < absxi) {
                    ssq = 1.0 + ssq*(scale/absxi)*(scale/absxi);
                    scale = absxi;
                } else {
                    ssq += (absxi/scale)*(absxi/scale);
                }
            }
        }
        return scale*sqrt(ssq);
    }
}

The most interesting part is the inlined DLASSQ subroutine. What does it do? It took me some time to understand it, but actually it does one very simple thing: It iterates through an array once and keeps track of the largest value in it. It also calculates the sum of squares scaled with the current largest value. In other words, it computes $$ s = \max \left\{|x_1|, \ldots, |x_n|\right\} $$ and $$ a = \sum_{i = 1}^n \left(\frac{x_i}{s}\right)^2 $$ in one iteration through the array. That’s it.

And now we see the trick used in DNRM2: We ‘extract’ the largest element of the vector and put it before the square root, replacing the textbook formula with this one: $$ \|\mathbf{x}\|_2 = s \, \sqrt{\sum_{i = 1}^n \left(\frac{x_i}{s}\right)^2} $$ with \(s = \max\left\{|x_1|, \ldots, |x_n|\right\}\) . By doing this, we avoid problems with under- and overflow we discussed earlier.

Some examples

Let’s test these functions and see how they perform on over- and underflow. We use the following inputs2): $$ \begin{aligned} \mathbf{x}_\mathrm{of} &= \big[ 3 \cdot 10^{200} \ \ {-4} \cdot 10^{200} \big]^\mathrm{T} & \|\mathbf{x}_\mathrm{of}\|_2 &= 5 \cdot 10^{200} \phantom{\frac{1}{1}} \\ \mathbf{x}_\mathrm{uf} &= \big[ 3 \cdot 10^{-200} \ \ 4 \cdot 10^{-200} \big]^\mathrm{T} & \|\mathbf{x}_\mathrm{uf}\|_2 &= 5 \cdot 10^{-200} \phantom{\frac{1}{1}} \\ \mathbf{x}_\mathrm{ac} &= \big[ 10^{-7} \quad 2 \quad 2 \cdot 10^7 \big]^\mathrm{T} & \|\mathbf{x}_\mathrm{ac}\|_2 &= 10^{-7} + 2 \cdot 10^7 \phantom{\frac{1}{1}} \end{aligned} $$

Overflow test:
exact:  5.00000000000000019e+200
naive:                       inf
scaled: 4.99999999999999951e+200

Underflow test:
exact:  4.99999999999999991e-200
naive:   0.00000000000000000e+00
scaled: 4.99999999999999991e-200

Accuracy test:
exact:   2.00000000000001006e+07
naive:   2.00000000000001006e+07
scaled:  2.00000000000000969e+07

We observe that the scaled version is much less prone to under- and overflow, allowing us to use the whole floating-point range. On the other hand, it certainly requires more operations and is sometimes less accurate, probably due to the division during the scaling process.

Conclusion

So, what did I learn from this simple example?

First, numerical linear algebra and floating-point arithmetic is hard. Even the simplest formulæ often have to be rearranged for better numerical stability or floating-point accuracy. I think that it is often the art of numerical linear algebra to find this transformation.

Second, open-source is important not only because it’s free. It is important because anyone can peek under the hood and see how it’s done in real world.

Further reading

Footnotes

1) I really like free format FORTRAN, though.

2) Notice how some values cannot be represented exactly in the floating-point arithmetic.