共计 9924 个字符,预计需要花费 25 分钟才能阅读完成。
This page describes a couple of algorithms for computing the elementary mathematical functions log(x) (logarithm to the base e) and exp(x) (e to the power x). The algorithms avoid multiplication and division operations, and are thus suitable for implementation in software on processors that lack such instructions (or where the instructions are slow) or in hardware on a programmable logic device or dedicated chip.
The methods are particularly suitable for use when shifting is cheap: for example in ARM assembler code, where a shift can often be had for free as part of another instruction. We can calculate these functions in just a couple of cycles per bit of result using ARM assembler code. For clarity, we shall give code examples in straightforward C.
The ideas explained here can be extended to implement other elementary functions such as sin(x) or arctan(x); the resulting algorithms are similar to CORDIC (COordinate Rotation DIgital Computer – yes, really) methods, descriptions of which can be found in many places.
Principles
There are some constants by which it is easy to multiply. For example, multiplying by 2n, where n is a positive or a negative integer, can be achieved by simply shifting a number by n places. The shift will be to the left if n is positive, to the right if n is negative.
It is nearly as easy to multiply by numbers of the form ±2n±1. These simply involve an add or a subtract and a shift. For example, in C a = a + (a << 1) will (ignoring overflow etc.) multiply a by 3. Similarly, a = a + (a >> 4) will multiply a by 1+2-4=17/16.
We will call numbers which are easy to multiply by nice numbers.
In contrast, adding or subtracting some arbitrary number like 41256845 is usually no slower than adding a special number, such as 1. Generally, adding arbitrary numbers together is one of the fastest operations a CPU can do.
We will now look at how we can use this distinction between addition and multiplication to calculate the exp() and log() functions efficiently.
Calculating exp()
Let us suppose we want to calculate y=exp(x). As an example we will use x=4. The algorithm is going to generate a sequence of values for x and y, and we are going to make sure that for each pair
y·exp(x)=exp(4)
or
y=exp(4)·exp(-x).
An expression like this, whose value remains constant through an algorithm despite changes in the variables involved, is called an invariant. We shall write each pair in a row in a table, which will start like this:
x | y |
4 | 1 |
Note that y·exp(x)=1·exp(4)=exp(4) as required. If we can get x to 0 while maintaining the invariant, then y would be given by
y=exp(4)·exp(0)=exp(4)·1=exp(4),
and so we will have calculated the desired result in y.
Suppose we subtract some value k from x. Then, to maintain the invariant, the new y value y′ will have to satisfy
y′=exp(4)·exp(-(x–k))
=exp(4)·exp(-x)·exp(k)
=y·exp(k).
In other words, if we subtract k from x, we have to multiply y by exp(k). All we have to do now is make sure that exp(k) is a nice number, so we can multiply by it easily, and the rest is straightforward. Note that k itself does not have to be nice, as we are only subtracting that, not multiplying by it. Here are some nice values of exp(k) and the corresponding (not necessarily nice) values of k.
k | exp(k) |
5.5452 | 256 |
2.7726 | 16 |
1.3863 | 4 |
0.6931 | 2 |
0.4055 | 3/2 |
0.2231 | 5/4 |
0.1178 | 9/8 |
0.0606 | 17/16 |
0.0308 | 33/32 |
0.0155 | 65/64 |
0.0078 | 129/128 |
Now let’s try it out. At each step in the algorithm we shall subtract from x the biggest k in the above table that we can without sending x negative, and then multiply y by the corresponding exp(k).
Step 0. x=4, the biggest k we can subtract is 2.7726, and we will have to multiply y by 16. Results so far:
x | y |
4 | 1 |
4-2.7726=1.2274 | 1·16=16 |
Step 1. x=1.2274, the biggest k we can subtract is 0.6931, and we will have to multiply y by 2. Results so far:
x | y |
4 | 1 |
1.2274 | 16 |
1.2274-0.6931=0.5343 | 16·2=32 |
Step 2. x=0.5343, the biggest k we can subtract is 0.4055, and we will have to multiply y by 3/2. Results so far:
x | y |
4 | 1 |
1.2274 | 16 |
0.5343 | 32 |
0.5343-0.4055=0.1288 | 32·3/2=48 |
Step 3. x=0.1288, the biggest k we can subtract is 0.1178, and we will have to multiply y by 9/8. Results so far:
x | y |
4 | 1 |
1.2274 | 16 |
0.5343 | 32 |
0.1288 | 48 |
0.1288-0.1178=0.0110 | 48·9/8=54 |
Step 4. x=0.0110, the biggest k we can subtract is 0.0078, and we will have to multiply y by 129/128. Results so far:
x | y |
4 | 1 |
1.2274 | 16 |
0.5343 | 32 |
0.1288 | 48 |
0.0110 | 54 |
0.0110-0.0078=0.0032 | 54·129/128=54.42 |
With more entries in our table of k we could continue; but the result is already pretty accurate: the correct value of exp(4) is 54.598.
Example C code
Here is a sample C function to compute exp() using the above algorithm. The code assumes integers are at least 32 bits long. The (non-negative) argument and the result of the function are both expressed as fixed-point values with 16 fractional bits. Notice that after 11 steps of the algorithm the constants involved become such that the code is simply doing a multiplication: this is explained in the note below.
The extension to negative arguments is left as an exercise.
int fxexp(int x) {
int t,y;
y=0x00010000;
t=x-0x58b91; if(t>=0) x=t,y<<=8;
t=x-0x2c5c8; if(t>=0) x=t,y<<=4;
t=x-0x162e4; if(t>=0) x=t,y<<=2;
t=x-0x0b172; if(t>=0) x=t,y<<=1;
t=x-0x067cd; if(t>=0) x=t,y+=y>>1;
t=x-0x03920; if(t>=0) x=t,y+=y>>2;
t=x-0x01e27; if(t>=0) x=t,y+=y>>3;
t=x-0x00f85; if(t>=0) x=t,y+=y>>4;
t=x-0x007e1; if(t>=0) x=t,y+=y>>5;
t=x-0x003f8; if(t>=0) x=t,y+=y>>6;
t=x-0x001fe; if(t>=0) x=t,y+=y>>7;
if(x&0x100) y+=y>>8;
if(x&0x080) y+=y>>9;
if(x&0x040) y+=y>>10;
if(x&0x020) y+=y>>11;
if(x&0x010) y+=y>>12;
if(x&0x008) y+=y>>13;
if(x&0x004) y+=y>>14;
if(x&0x002) y+=y>>15;
if(x&0x001) y+=y>>16;
return y;
}
Note that the constants involved in the trial subtractions reduce by a factor of less than or equal to 2 at each step. This means that it is never necessary to test the same constant twice.
A note on the residual error
The error in the final answer depends on the residual value in x; in fact, the relative error is exp(x). Since for small x, exp(x) is approximately 1+x, it is possible to correct the final answer by multiplying it by 1+x. In the example above, 54.42·(1+0.0032)=54.594, which is about as accurate as can be expected given the rounding of our intermediate results to 4 decimal places. The advantage of applying the correction is that it roughly doubles the number of digits of accuracy in the answer; the disadvantage is that it requires a general multiplication. In a software implementation, whether this is worthwhile will depend on the relative speed of this algorithm and the multiply instruction. It is unlikely to be worthwhile in a hardware implementation.
Calculating log()
Now let us try calculating y=log(x). As an example we will use x=54. (Since we know from above that exp(4)=54.598, the answer we are expecting is very slightly less than 4.) As for exp(), the algorithm generates a sequence of values for xand y. This time our invariant is
exp(y)·x=54
or
y=log(54/x).
In this case our table starts like this:
x | y |
54 | 0 |
Note that y=log(54/x)=log(1)=0 as required. Our aim is to get x to 1 while maintaining the invariant. Then y will be given by
y=log(54/1)=log(54),
the answer we want.
Suppose we multiply x by some number k. Then to maintain the invariant, the new the invariant, the new y value y′ will have to satisfy
y′=log(54/kx)
=log(54/x)+log(1/k)
=y-log(k).
In other words, if we multiply x by k, we have to add -log(k) to y. We make sure that k is a nice number, so we can multiply by it easily. log(1/k) does not have to be nice as it is only involved in an addition. Here are some nice values of k and the corresponding, not necessarily nice, values of log(k); this is simply the same table as before, but with the columns exchanged.
k | log(k) |
16 | 2.7726 |
4 | 1.3863 |
2 | 0.6931 |
3/2 | 0.4055 |
5/4 | 0.2231 |
9/8 | 0.1178 |
17/16 | 0.0606 |
33/32 | 0.0308 |
65/64 | 0.0155 |
129/128 | 0.0078 |
All the k values in this table are greater than 1. We will therefore have to start with x less than 1, so we begin by multiplying x by 1/256 (other nice numbers would work too) and, to maintain the invariant, adding -log(1/256)=5.5452 to y; this is really just a scaling of the input and an initialisation of y. After this preparatory step our table looks like this:
x | y |
54 | 0 |
54/256=0.2109 | 5.5452 |
Step 0. x=0.2109, the biggest k we can multiply by is 4 (keeping x<1), and we will have to add -1.3863 to y. Results so far:
x | y |
0.2109 | 5.5452 |
0.2109*4=0.8436 | 5.5452-1.3863=4.1589 |
Step 1. x=0.8436, the biggest k we can multiply by is 9/8, and we will have to add -0.1178 to y. Results so far:
x | y |
0.2109 | 5.5452 |
0.8436 | 4.1589 |
0.8436*9/8=0.9491 | 4.1589-0.1178=4.0411 |
Step 2. x=0.9491, the biggest k we can multiply by is 33/32, and we will have to add -0.0308 to y. Results so far:
x | y |
0.2109 | 5.5452 |
0.8436 | 4.1589 |
0.9491 | 4.0411 |
0.9491*33/32=0.9788 | 4.0411-0.0308=4.0103 |
Step 3. x=0.9788, the biggest k we can multiply by is 65/64, and we will have to add -0.0155 to y. Results so far:
x | y |
0.2109 | 5.5452 |
0.8436 | 4.1589 |
0.9491 | 4.0411 |
0.9788 | 4.0103 |
0.9788*65/64=0.9941 | 4.0103-0.0155=3.9948 |
The true answer is 3.9890. The absolute error in the answer is the log of the residual in x, in this case log(0.9941). For small x, log(1-x) is approximately equal to –x, and so the absolute error in this case is approximately 1-0.9941=0.0059. Subtract this from the final y value, and we get 3.9889: pretty good!
Notice that because we obtained the absolute error in the answer, this final correction, which roughly doubles the number of digits of accuracy in the result, does not involve a multiplication.
Example C code
Here is a sample C function to compute log() using the above algorithm. The code assumes integers are at least 32 bits long. The (positive) argument and the result of the function are both expressed as fixed-point values with 16 fractional bits, although intermediates are kept with 31 bits of precision to avoid loss of accuracy during shifts. After 12 steps of the algorithm the correction described above is applied.
int fxlog(int x) {
int t,y;
y=0xa65af;
if(x<0x00008000) x<<=16, y-=0xb1721;
if(x<0x00800000) x<<= 8, y-=0x58b91;
if(x<0x08000000) x<<= 4, y-=0x2c5c8;
if(x<0x20000000) x<<= 2, y-=0x162e4;
if(x<0x40000000) x<<= 1, y-=0x0b172;
t=x+(x>>1); if((t&0x80000000)==0) x=t,y-=0x067cd;
t=x+(x>>2); if((t&0x80000000)==0) x=t,y-=0x03920;
t=x+(x>>3); if((t&0x80000000)==0) x=t,y-=0x01e27;
t=x+(x>>4); if((t&0x80000000)==0) x=t,y-=0x00f85;
t=x+(x>>5); if((t&0x80000000)==0) x=t,y-=0x007e1;
t=x+(x>>6); if((t&0x80000000)==0) x=t,y-=0x003f8;
t=x+(x>>7); if((t&0x80000000)==0) x=t,y-=0x001fe;
x=0x80000000-x;
y-=x>>15;
return y;
}
Implementation issues
The C code examples here are given for any use with no warranty whatsoever. They will need to be modified to suit your application. You may need to extend them if you want greater accuracy; conversely, if you want greater speed at the expense of accuracy, you may wish to remove some of the steps. You should also check that the function covers the full range of possible input values you will encounter; the examples do not include any checking of this kind at all.
You will probably find that as a result of rounding, implementations of these algorithms tend to exhibit systematic error. You may be able to get better overall accuracy by adding a small positive or negative constant to the argument of the exp() function or the result of the log() function, though possibly at the cost of no longer getting exact results for log(1) and exp(0).
ARM assembler implementations of these algorithms are particularly elegant. Each line in the C code above translates into about 3 or 4 instructions; this means that the log() algorithm produces one result bit about every two cycles. Please contact me at the e-mail address on the home page if you are interested in tested ARM assembler implementations of these algorithms for a particular application.