Fast Polynomial Multiplication (from Chapter 2.6)

Notes by Gene Cooperman, © 2009 (may be freely copied as long as this copyright notice remains)

We all know how to multiply polynomials by multiplying all the cross terms. Given a polynomial P of degree d (the largest power is d), and a polynomial Q of degree e (the largest power is e), the product, P Q, will have degree d e. To compute the product P Q, there will be up to (d+1) (e+1) cross terms to multiply.

In general, if two polynomials are both of degree d, then there will be (d+1)2 = O(d2) cross terms.

But using the idea of the fast Fourier transform, one can actually reduce the number of multiplications from O(d2) to O(d log d). multiply the polynomials in O(d log d) time. Let A(x) and B(x) be polynomials of degree d. For example, if d=3, we might have A(x) = 4 + 3x + 5x2 + 6x3.

In outline, the idea for polynomial multiplication for C(x) = A(x) B(x) of degree d is:

  1. Choose d points x0,…, xd/2 and x-1,…, x-d/2.
  2. Evaluation: A(x0),…,A(xd/2 and A(-x1),…,A(-xd/2); Do the same for B(x)
  3. Multiplication: Compute C(x)=A(x)B(x) at each of the points.
    So C(xi) = A(xi) B(xi) and C(x-i) = A(x-i) B(x-i). Let yi be the constants such that C(x-i) = yi and C(x-i) = y-i
  4. Interpolation: Compute the unknown coefficients of C(x) using C(x0) = y0, and C(x1) = y1,…,C(xd/2) = yd/2 and C(x-1) = y-1,…,C(x-d/2) = y-d/2.

Step 1 (Select Points):

We choose d+1 points x0, x1,…,xd at which we want to evaluate A(x), where

xd/2+i = -xi.

Step 2 (Evaluation):

Let's see how to do step 2 the fast way. The polynomial A(x) can be split into even and odd terms (for d even):
A(x) = (c0 + c2x2 +…+ cd/2xd/2) + (c1x + c3x3 +…+ cd/2-1xd/2-1)
We can write this as:
A(x) = Aeven(x) + Aodd(x)
where

Aeven(x) = c0 + c2x2 +…+ cd/2xd/2
Aodd(x) = c1x + c3x3 +…+ cd/2-1xd/2-1

(Note that our Aeven(x) and Aodd(x) are different from the textbook's Ae(x) and Ao(x). Specifically, Ae(x2) = Aeven(x) and xAo(x2) = Aodd(x).)

Next, we compute A(xi) = Aeven(x) + Aodd(x)
and A(-xi) = Aeven(-x) + Aodd(-x) = Aeven(x) - Aodd(x)

Recall that we chose the d+1 points x0, x1,…,xd at which we want to evaluate A(x), where

xd/2+i = -xi.

So, we do step 2 by evaluating Aeven(xi) and Aodd(xi) for x0, s1,…, xd/2. After this, we achieve evaluations for xd/2+i`free:

Aeven(xd/2+i) = Aeven(-xi) = Aeven(xi)
Aodd(xd/2+i) = Aodd(-xi) = -Aeven(xi)

Then, we have d+1 evaluations of A(x), since A(x) = Aeven(x) + Aeven(x)

Evaluating the d+1 terms of A() at all d+1 points would have required evaluating (d+1) (d+1) terms. By doing it this alternative way, we only need to evaluate about (d+1) (1+d/2) terms.

Do likewise for another polynomial B().

Step 3 (Multiplication):

Multiply at the d+1 points xi.

C(xi) = A(xi) B(xi)

When we are done, we have computed constants yi for which:

C(xi) = yi

This requires d+1 multiplications. Our goal is to find a fast multiplication in time O(d log d). This step is O(d) and so it's already fast enough.

Step 4 (Interpolation):

In general, interpolation requires a lot of steps. Given the unknown polynomial C(x) for which we only know:
C(x0) = y0,…, C(xd) = yd
we wish to find c0,…, cd such that
C(x) = c0 + c1x + c2x2 +…+ cdxd

As you would expect, one plugs in and solves. The ci are the unknowns that we have to solve for. After plugging into C(x) = y for x0, x1,…, xd, we have d+1 linear equations:

c0 + c1(x0) + c2(x0)2 +…+ cd(x0)d = y0
c0 + c1(x1) + c2(x1)2 +…+ cd(x1)d = y1
...

This means solving the linear simultaneous equations.

The advantage if we use x1,…, xd/2 and and x-1,…, x-d/2 is that we can replace the equations to solve:
C(xi)=yi and C(-xi)=y-i
by:
C(xi) + C(-xi) = yi + y-i
C(xi) - C(-xi) = yi - y-i
These last two equations are the same as:

2 Ceven(xi) = yi + y-i
2 Codd(xi) = yi + y-i

Now, we again plug in, but this time we have only half as many unknowns, ci, in each equation:
2c0 + 2c2(xi)2 +…+ 2cd-1(xi)d-1 = yi + y-i
2c1x + 2c3(xi)3 +…+ 2cd(xi)d = yi - y-i

So, we still have d+1 linear equations to solve, but each linear equation now has only half as many terms.

Generalizing to four polynomials instead of Aeven() and Aodd():

We have seen that the polynomial Aeven(x) has only half as many terms. So, we win by breaking up A(x) = Aodd(x) + Aodd(x). How can we extend this further?

The answer is to go to complex numbers. Recall that the imaginary number i satisfies i2 = -1. Some simple algebra then shows that i3 = -i and i4 = 1. (Note that we will also continue to use i for the subscript index. It will be clear from context whether we mean i the imaginary number, or i the subscript index.)

So, instead of using A(x) and A(-x) for A() and replacing them by A(x) + A(-x) and A(x) - A(-x) for A(), let's use P(x), P(ix), P(i2x), P(i3x) (or the equivalent P(x), P(ix), P(-x), P(-ix)) for P(). Let's solve for some polynomials below. From these, we won't construct 2Peven(x) = P(x) + P(-x) and 2Podd(x) = P(x) - P(-x), but four polynomials based on four linear combinations. We'll guess the right polynomials below.

  P(x)=1 P(x)=x P(x)=x2 P(x)=x3 P(x)=x4 Short Name for Expression
P(x) + P(ix) + P(-x) + P(-ix): 4 0 0 0 4x4 4P0 mod 4(x)
P(x) - P(ix) + P(-x) - P(-ix): 0 0 4x2 0 0 4P2 mod 4(x)
P(x) + P(ix) - P(-x) - P(-ix): 0 (2+2i)x 0 (2-2i)x3 0  
P(x) - P(ix) - P(-x) + P(-ix): 0 (2-2i)x 0 (2+2i)x3 0  
P(x) + iP(ix) - P(-x) - iP(-ix): 0 0 0 4x3 0 4P3 mod 4(x)
P(x) - iP(ix) - P(-x) + iP(-ix): 0 4x 0 0 0 4P1 mod 4(x)

The good polynomials are:

4P0 mod 4(x) = P(x) + P(ix) + P(-x) + P(-ix)
4P1 mod 4(x) = P(x) - iP(ix) - P(-x) + iP(-ix)
4P2 mod 4(x) = P(x) - P(ix) + P(-x) - P(-ix)
4P3 mod 4(x) = P(x) + iP(ix) - P(-x) - iP(-ix)

The subscript names were chosen to indicate which terms are non-zero. For example P1 mod 4() has terms identical to P() for those terms of degree 1 mod 4. The other terms of P1 mod 4() are all zero. These are the polynomials for which 3/4 of the terms are zero. So, these polynomials can be evaluated in 1/4 of the time.

Note that

P(x) = P0 mod 4(x) + P1 mod 4(x) + P2 mod 4(x) + P3 mod 4(x)

If we're going to make this work, we need an efficient way to convert between P(x), P(ix), P(-x), P(-ix) and the four polynomials above. This is the essence of the Fast Fourier Transform in the text. It allows you to convert from from an ordinary polynomial P() into the four transformed polynomials above. (This is analogous to the previous case in which we transformed into Aeven() and Aodd().) Using the transformed polynomials, we can evaluate (Step 2) fast. Step 3 is always fast. And we can interpolate (Step 4) fast.

Step 1 (Select Points):

We choose d+1 points x0, x1,…,xd at which we want to evaluate A(x), where

xd/4+i = i xi   for 1≤i<d/2
x2d/4+i = - xi for 1≤i<d/2
x3d/4+i = -i xi for 1≤i<d/2

Note that the coefficients i, -1, and -i are the powers of the imaginary number i.

Step 2 (Evaluation):

We can write this as:
P(x) = P0 mod 4(x) + P1 mod 4(x) + P1 mod 4(x) + P2 mod 4(x) + P3 mod 4(x)
where

P0 mod 4(x) = c0 + c4x4 + …
P1 mod 4(x) = c1x + c5x5 + …
...

So, we do step 2 by evaluating P0 mod 4(xi), P1 mod 4(xi), P2 mod 4(xi) and P3 mod 4(xi), for x0, s1,…, xd/4. After this, we achieve evaluations of P0 mod 4(xj) with j>d/4 for free:

P0 mod 4(xd/4+i) = P0 mod 4(ixi) = P0 mod 4(xi)
P0 mod 4(x2d/4+i) = P0 mod 4(-xi) = P0 mod 4(xi)
P0 mod 4(x3d/4+i) = P0 mod 4(-ixi) = P0 mod 4(xi)

and evaluations of P1 mod 4(xj) with j>d/4 for free:

P1 mod 4(xd/4+i) = P1 mod 4(ixi) = i P1 mod 4(xi)
P1 mod 4(x2d/4+i) = P1 mod 4(-xi) = - P1 mod 4(xi)
P1 mod 4(x3d/4+i) = P1 mod 4(-ixi) = -i P1 mod 4(xi)

and evaluations of P2 mod 4(xj) with j>d/4 for free:

P2 mod 4(xd/4+i) = P2 mod 4(ixi) = - P2 mod 4(xi)
P2 mod 4(x2d/4+i) = P2 mod 4(-xi) = P2 mod 4(xi)
P2 mod 4(x3d/4+i) = P2 mod 4(-ixi) = - P2 mod 4(xi)

and evaluations of P3 mod 4(xj) with j>d/4 for free:

P3 mod 4(xd/4+i) = P3 mod 4(ixi) = -i P3 mod 4(xi)
P3 mod 4(x2d/4+i) = P3 mod 4(-xi) = - P3 mod 4(xi)
P3 mod 4(x3d/4+i) = P3 mod 4(-ixi) = i P3 mod 4(xi)

Then P(x) is computed as:

P(x) = P0 mod 4(x) + P1 mod 4(x) + P1 mod 4(x) + P2 mod 4(x) + P3 mod 4(x)

P0 mod 4(), P1 mod 4(), P2 mod 4() and P3 mod 4() can each be evaluated fast, since each has only 1/4 as many terms. Each of the four functions is applied to each of x0, x1,…, xd/4. So, this requires an evaluation of (d+1) (d/4) terms.

Do likewise for another polynomial, Q(x).

Step 3 (Multiplication):

Multiply at the d+1 points xi.

R(xi) = P(xi) Q(xi)

When we are done, we have computed constants yi for which:

R(xi) = yi

This requires d+1 multiplications. Our goal is to find a fast multiplication in time O(d log d). This step is O(d) and so it's already fast enough.

Step 4 (Interpolation):

Recall that with the interpolation phase of Aeven() and Aodd(), we needed to determine the unknown coefficients c0, c1,…,cd for the polynomial C(x), given that:

C(xi) = yi

We again need to do again find unknown coefficients c0, c1,…,cd, but now for:

R(xi) = yi

We use the good polynomials found from the table at the beginning of this subsection:

4R0 mod 4(x) = R(x) + R(ix) + R(-x) + R(-ix)
4R1 mod 4(x) = R(x) - iR(ix) - R(-x) + iR(-ix)
4R2 mod 4(x) = R(x) - R(ix) + R(-x) - R(-ix)
4R3 mod 4(x) = R(x) + iR(ix) - R(-x) - iR(-ix)

We expand:

 
4R0 mod 4(xi)     = R(xi) + R(ixi) + R(-xi) + R(-ixi) = R(xi) + R(xd/4+i) + R(x2d/4+i) + R(x3d/4+i) = yi + yd/4+i + y2d/4+i + y3d/4+i
4R1 mod 4(x)     = R(xi) - iR(ixi) - R(-xi) + iR(-ixi)   = yi - iyd/4+i - y2d/4+i + iy3d/4+i
4R2 mod 4(xi)     = R(xi) - R(ixi) + R(-xi) - R(-ixi)   = yi - yd/4+i + y2d/4+i - y3d/4+i
4R3 mod 4(xi)     = R(xi) + iR(ixi) - R(-xi) - iR(-ixi)   = yi + iyd/4+i - y2d/4+i - iy3d/4+i

So, we have to find the unknowns, ci:

c0 + c4xi4 + … = yi + yd/4+i + y2d/4+i + y3d/4+i
c1xi + c5xi5 + … = yi - iyd/4+i - y2d/4+i + iy3d/4+i
c2xi + c6xi6 + … = yi - yd/4+i + y2d/4+i - y3d/4+i
c3xi + c7xi7 + … = yi + iyd/4+i - y2d/4+i - iy3d/4+i

But when we do this step, each equation already has 3/4 of the coefficients set to zero, and so each of the d+1 linear equations will have only 1/4 as many terms.

Furthermore, there are savings in computing the sums of the yi on the right hand side, since many of the partial sums are repeated. In the general case, there will be only O(d log d) sums of the yi.

Generalizing Further:

From the previous two subsections (case of Aeven(), Aodd(), and case of P0 mod 4(), P1 mod 4(), P2 mod 4(), P3 mod 4()), we can derive the general pattern.

The textbook does this in a somewhat more abstract (and more compact) manner, by using recursion. Now read the general case there. In converting between our notation and theirs, it's important to recognize that the textbook uses Ae() and Ao(), where:

Aeven(x) = Ae(x2), and
Aodd(x) = xAo(x2).

In general, if the polynomial is of degree d = n-1 for some n = 2k, then we let ω = e2πin/k and choose xi = ωi:

x0 = 1
x1 = ω = e2πi(n/k)
x2 = ω2 = e2πi(2n/k)
x3 = ω3 = e2πi(3n/k)
...

For further reading about fast multiplication (both for integers and polynomials), try http://en.wikipedia.org/wiki/Multiplication_algorithm.