cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

README (21619B)


      1 +---------------------------------------------------------------------------+
      2 |  wm-FPU-emu   an FPU emulator for 80386 and 80486SX microprocessors.      |
      3 |                                                                           |
      4 | Copyright (C) 1992,1993,1994,1995,1996,1997,1999                          |
      5 |                       W. Metzenthen, 22 Parker St, Ormond, Vic 3163,      |
      6 |                       Australia.  E-mail billm@melbpc.org.au              |
      7 |                                                                           |
      8 |    This program is free software; you can redistribute it and/or modify   |
      9 |    it under the terms of the GNU General Public License version 2 as      |
     10 |    published by the Free Software Foundation.                             |
     11 |                                                                           |
     12 |    This program is distributed in the hope that it will be useful,        |
     13 |    but WITHOUT ANY WARRANTY; without even the implied warranty of         |
     14 |    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the          |
     15 |    GNU General Public License for more details.                           |
     16 |                                                                           |
     17 |    You should have received a copy of the GNU General Public License      |
     18 |    along with this program; if not, write to the Free Software            |
     19 |    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.              |
     20 |                                                                           |
     21 +---------------------------------------------------------------------------+
     22
     23
     24
     25wm-FPU-emu is an FPU emulator for Linux. It is derived from wm-emu387
     26which was my 80387 emulator for early versions of djgpp (gcc under
     27msdos); wm-emu387 was in turn based upon emu387 which was written by
     28DJ Delorie for djgpp.  The interface to the Linux kernel is based upon
     29the original Linux math emulator by Linus Torvalds.
     30
     31My target FPU for wm-FPU-emu is that described in the Intel486
     32Programmer's Reference Manual (1992 edition). Unfortunately, numerous
     33facets of the functioning of the FPU are not well covered in the
     34Reference Manual. The information in the manual has been supplemented
     35with measurements on real 80486's. Unfortunately, it is simply not
     36possible to be sure that all of the peculiarities of the 80486 have
     37been discovered, so there is always likely to be obscure differences
     38in the detailed behaviour of the emulator and a real 80486.
     39
     40wm-FPU-emu does not implement all of the behaviour of the 80486 FPU,
     41but is very close.  See "Limitations" later in this file for a list of
     42some differences.
     43
     44Please report bugs, etc to me at:
     45       billm@melbpc.org.au
     46or     b.metzenthen@medoto.unimelb.edu.au
     47
     48For more information on the emulator and on floating point topics, see
     49my web pages, currently at  http://www.suburbia.net/~billm/
     50
     51
     52--Bill Metzenthen
     53  December 1999
     54
     55
     56----------------------- Internals of wm-FPU-emu -----------------------
     57
     58Numeric algorithms:
     59(1) Add, subtract, and multiply. Nothing remarkable in these.
     60(2) Divide has been tuned to get reasonable performance. The algorithm
     61    is not the obvious one which most people seem to use, but is designed
     62    to take advantage of the characteristics of the 80386. I expect that
     63    it has been invented many times before I discovered it, but I have not
     64    seen it. It is based upon one of those ideas which one carries around
     65    for years without ever bothering to check it out.
     66(3) The sqrt function has been tuned to get good performance. It is based
     67    upon Newton's classic method. Performance was improved by capitalizing
     68    upon the properties of Newton's method, and the code is once again
     69    structured taking account of the 80386 characteristics.
     70(4) The trig, log, and exp functions are based in each case upon quasi-
     71    "optimal" polynomial approximations. My definition of "optimal" was
     72    based upon getting good accuracy with reasonable speed.
     73(5) The argument reducing code for the trig function effectively uses
     74    a value of pi which is accurate to more than 128 bits. As a consequence,
     75    the reduced argument is accurate to more than 64 bits for arguments up
     76    to a few pi, and accurate to more than 64 bits for most arguments,
     77    even for arguments approaching 2^63. This is far superior to an
     78    80486, which uses a value of pi which is accurate to 66 bits.
     79
     80The code of the emulator is complicated slightly by the need to
     81account for a limited form of re-entrancy. Normally, the emulator will
     82emulate each FPU instruction to completion without interruption.
     83However, it may happen that when the emulator is accessing the user
     84memory space, swapping may be needed. In this case the emulator may be
     85temporarily suspended while disk i/o takes place. During this time
     86another process may use the emulator, thereby perhaps changing static
     87variables. The code which accesses user memory is confined to five
     88files:
     89    fpu_entry.c
     90    reg_ld_str.c
     91    load_store.c
     92    get_address.c
     93    errors.c
     94As from version 1.12 of the emulator, no static variables are used
     95(apart from those in the kernel's per-process tables). The emulator is
     96therefore now fully re-entrant, rather than having just the restricted
     97form of re-entrancy which is required by the Linux kernel.
     98
     99----------------------- Limitations of wm-FPU-emu -----------------------
    100
    101There are a number of differences between the current wm-FPU-emu
    102(version 2.01) and the 80486 FPU (apart from bugs).  The differences
    103are fewer than those which applied to the 1.xx series of the emulator.
    104Some of the more important differences are listed below:
    105
    106The Roundup flag does not have much meaning for the transcendental
    107functions and its 80486 value with these functions is likely to differ
    108from its emulator value.
    109
    110In a few rare cases the Underflow flag obtained with the emulator will
    111be different from that obtained with an 80486. This occurs when the
    112following conditions apply simultaneously:
    113(a) the operands have a higher precision than the current setting of the
    114    precision control (PC) flags.
    115(b) the underflow exception is masked.
    116(c) the magnitude of the exact result (before rounding) is less than 2^-16382.
    117(d) the magnitude of the final result (after rounding) is exactly 2^-16382.
    118(e) the magnitude of the exact result would be exactly 2^-16382 if the
    119    operands were rounded to the current precision before the arithmetic
    120    operation was performed.
    121If all of these apply, the emulator will set the Underflow flag but a real
    12280486 will not.
    123
    124NOTE: Certain formats of Extended Real are UNSUPPORTED. They are
    125unsupported by the 80486. They are the Pseudo-NaNs, Pseudoinfinities,
    126and Unnormals. None of these will be generated by an 80486 or by the
    127emulator. Do not use them. The emulator treats them differently in
    128detail from the way an 80486 does.
    129
    130Self modifying code can cause the emulator to fail. An example of such
    131code is:
    132          movl %esp,[%ebx]
    133	  fld1
    134The FPU instruction may be (usually will be) loaded into the pre-fetch
    135queue of the CPU before the mov instruction is executed. If the
    136destination of the 'movl' overlaps the FPU instruction then the bytes
    137in the prefetch queue and memory will be inconsistent when the FPU
    138instruction is executed. The emulator will be invoked but will not be
    139able to find the instruction which caused the device-not-present
    140exception. For this case, the emulator cannot emulate the behaviour of
    141an 80486DX.
    142
    143Handling of the address size override prefix byte (0x67) has not been
    144extensively tested yet. A major problem exists because using it in
    145vm86 mode can cause a general protection fault. Address offsets
    146greater than 0xffff appear to be illegal in vm86 mode but are quite
    147acceptable (and work) in real mode. A small test program developed to
    148check the addressing, and which runs successfully in real mode,
    149crashes dosemu under Linux and also brings Windows down with a general
    150protection fault message when run under the MS-DOS prompt of Windows
    1513.1. (The program simply reads data from a valid address).
    152
    153The emulator supports 16-bit protected mode, with one difference from
    154an 80486DX.  A 80486DX will allow some floating point instructions to
    155write a few bytes below the lowest address of the stack.  The emulator
    156will not allow this in 16-bit protected mode: no instructions are
    157allowed to write outside the bounds set by the protection.
    158
    159----------------------- Performance of wm-FPU-emu -----------------------
    160
    161Speed.
    162-----
    163
    164The speed of floating point computation with the emulator will depend
    165upon instruction mix. Relative performance is best for the instructions
    166which require most computation. The simple instructions are adversely
    167affected by the FPU instruction trap overhead.
    168
    169
    170Timing: Some simple timing tests have been made on the emulator functions.
    171The times include load/store instructions. All times are in microseconds
    172measured on a 33MHz 386 with 64k cache. The Turbo C tests were under
    173ms-dos, the next two columns are for emulators running with the djgpp
    174ms-dos extender. The final column is for wm-FPU-emu in Linux 0.97,
    175using libm4.0 (hard).
    176
    177function      Turbo C        djgpp 1.06        WM-emu387     wm-FPU-emu
    178
    179   +          60.5           154.8              76.5          139.4
    180   -          61.1-65.5      157.3-160.8        76.2-79.5     142.9-144.7
    181   *          71.0           190.8              79.6          146.6
    182   /          61.2-75.0      261.4-266.9        75.3-91.6     142.2-158.1
    183
    184 sin()        310.8          4692.0            319.0          398.5
    185 cos()        284.4          4855.2            308.0          388.7
    186 tan()        495.0          8807.1            394.9          504.7
    187 atan()       328.9          4866.4            601.1          419.5-491.9
    188
    189 sqrt()       128.7          crashed           145.2          227.0
    190 log()        413.1-419.1    5103.4-5354.21    254.7-282.2    409.4-437.1
    191 exp()        479.1          6619.2            469.1          850.8
    192
    193
    194The performance under Linux is improved by the use of look-ahead code.
    195The following results show the improvement which is obtained under
    196Linux due to the look-ahead code. Also given are the times for the
    197original Linux emulator with the 4.1 'soft' lib.
    198
    199 [ Linus' note: I changed look-ahead to be the default under linux, as
    200   there was no reason not to use it after I had edited it to be
    201   disabled during tracing ]
    202
    203            wm-FPU-emu w     original w
    204            look-ahead       'soft' lib
    205   +         106.4             190.2
    206   -         108.6-111.6      192.4-216.2
    207   *         113.4             193.1
    208   /         108.8-124.4      700.1-706.2
    209
    210 sin()       390.5            2642.0
    211 cos()       381.5            2767.4
    212 tan()       496.5            3153.3
    213 atan()      367.2-435.5     2439.4-3396.8
    214
    215 sqrt()      195.1            4732.5
    216 log()       358.0-387.5     3359.2-3390.3
    217 exp()       619.3            4046.4
    218
    219
    220These figures are now somewhat out-of-date. The emulator has become
    221progressively slower for most functions as more of the 80486 features
    222have been implemented.
    223
    224
    225----------------------- Accuracy of wm-FPU-emu -----------------------
    226
    227
    228The accuracy of the emulator is in almost all cases equal to or better
    229than that of an Intel 80486 FPU.
    230
    231The results of the basic arithmetic functions (+,-,*,/), and fsqrt
    232match those of an 80486 FPU. They are the best possible; the error for
    233these never exceeds 1/2 an lsb. The fprem and fprem1 instructions
    234return exact results; they have no error.
    235
    236
    237The following table compares the emulator accuracy for the sqrt(),
    238trig and log functions against the Turbo C "emulator". For this table,
    239each function was tested at about 400 points. Ideal worst-case results
    240would be 64 bits. The reduced Turbo C accuracy of cos() and tan() for
    241arguments greater than pi/4 can be thought of as being related to the
    242precision of the argument x; e.g. an argument of pi/2-(1e-10) which is
    243accurate to 64 bits can result in a relative accuracy in cos() of
    244about 64 + log2(cos(x)) = 31 bits.
    245
    246
    247Function      Tested x range            Worst result                Turbo C
    248                                        (relative bits)
    249
    250sqrt(x)       1 .. 2                    64.1                         63.2
    251atan(x)       1e-10 .. 200              64.2                         62.8
    252cos(x)        0 .. pi/2-(1e-10)         64.4 (x <= pi/4)             62.4
    253                                        64.1 (x = pi/2-(1e-10))      31.9
    254sin(x)        1e-10 .. pi/2             64.0                         62.8
    255tan(x)        1e-10 .. pi/2-(1e-10)     64.0 (x <= pi/4)             62.1
    256                                        64.1 (x = pi/2-(1e-10))      31.9
    257exp(x)        0 .. 1                    63.1 **                      62.9
    258log(x)        1+1e-6 .. 2               63.8 **                      62.1
    259
    260** The accuracy for exp() and log() is low because the FPU (emulator)
    261does not compute them directly; two operations are required.
    262
    263
    264The emulator passes the "paranoia" tests (compiled with gcc 2.3.3 or
    265later) for 'float' variables (24 bit precision numbers) when precision
    266control is set to 24, 53 or 64 bits, and for 'double' variables (53
    267bit precision numbers) when precision control is set to 53 bits (a
    268properly performing FPU cannot pass the 'paranoia' tests for 'double'
    269variables when precision control is set to 64 bits).
    270
    271The code for reducing the argument for the trig functions (fsin, fcos,
    272fptan and fsincos) has been improved and now effectively uses a value
    273for pi which is accurate to more than 128 bits precision. As a
    274consequence, the accuracy of these functions for large arguments has
    275been dramatically improved (and is now very much better than an 80486
    276FPU). There is also now no degradation of accuracy for fcos and fptan
    277for operands close to pi/2. Measured results are (note that the
    278definition of accuracy has changed slightly from that used for the
    279above table):
    280
    281Function      Tested x range          Worst result
    282                                     (absolute bits)
    283
    284cos(x)        0 .. 9.22e+18              62.0
    285sin(x)        1e-16 .. 9.22e+18          62.1
    286tan(x)        1e-16 .. 9.22e+18          61.8
    287
    288It is possible with some effort to find very large arguments which
    289give much degraded precision. For example, the integer number
    290           8227740058411162616.0
    291is within about 10e-7 of a multiple of pi. To find the tan (for
    292example) of this number to 64 bits precision it would be necessary to
    293have a value of pi which had about 150 bits precision. The FPU
    294emulator computes the result to about 42.6 bits precision (the correct
    295result is about -9.739715e-8). On the other hand, an 80486 FPU returns
    2960.01059, which in relative terms is hopelessly inaccurate.
    297
    298For arguments close to critical angles (which occur at multiples of
    299pi/2) the emulator is more accurate than an 80486 FPU. For very large
    300arguments, the emulator is far more accurate.
    301
    302
    303Prior to version 1.20 of the emulator, the accuracy of the results for
    304the transcendental functions (in their principal range) was not as
    305good as the results from an 80486 FPU. From version 1.20, the accuracy
    306has been considerably improved and these functions now give measured
    307worst-case results which are better than the worst-case results given
    308by an 80486 FPU.
    309
    310The following table gives the measured results for the emulator. The
    311number of randomly selected arguments in each case is about half a
    312million.  The group of three columns gives the frequency of the given
    313accuracy in number of times per million, thus the second of these
    314columns shows that an accuracy of between 63.80 and 63.89 bits was
    315found at a rate of 133 times per one million measurements for fsin.
    316The results show that the fsin, fcos and fptan instructions return
    317results which are in error (i.e. less accurate than the best possible
    318result (which is 64 bits)) for about one per cent of all arguments
    319between -pi/2 and +pi/2.  The other instructions have a lower
    320frequency of results which are in error.  The last two columns give
    321the worst accuracy which was found (in bits) and the approximate value
    322of the argument which produced it.
    323
    324                                frequency (per M)
    325                               -------------------   ---------------
    326instr   arg range    # tests   63.7   63.8    63.9   worst   at arg
    327                               bits   bits    bits    bits
    328-----  ------------  -------   ----   ----   -----   -----  --------
    329fsin     (0,pi/2)     547756      0    133   10673   63.89  0.451317
    330fcos     (0,pi/2)     547563      0    126   10532   63.85  0.700801
    331fptan    (0,pi/2)     536274     11    267   10059   63.74  0.784876
    332fpatan  4 quadrants   517087      0      8    1855   63.88  0.435121 (4q)
    333fyl2x     (0,20)      541861      0      0    1323   63.94  1.40923  (x)
    334fyl2xp1 (-.293,.414)  520256      0      0    5678   63.93  0.408542 (x)
    335f2xm1     (-1,1)      538847      4    481    6488   63.79  0.167709
    336
    337
    338Tests performed on an 80486 FPU showed results of lower accuracy. The
    339following table gives the results which were obtained with an AMD
    340486DX2/66 (other tests indicate that an Intel 486DX produces
    341identical results).  The tests were basically the same as those used
    342to measure the emulator (the values, being random, were in general not
    343the same).  The total number of tests for each instruction are given
    344at the end of the table, in case each about 100k tests were performed.
    345Another line of figures at the end of the table shows that most of the
    346instructions return results which are in error for more than 10
    347percent of the arguments tested.
    348
    349The numbers in the body of the table give the approx number of times a
    350result of the given accuracy in bits (given in the left-most column)
    351was obtained per one million arguments. For three of the instructions,
    352two columns of results are given: * The second column for f2xm1 gives
    353the number cases where the results of the first column were for a
    354positive argument, this shows that this instruction gives better
    355results for positive arguments than it does for negative.  * In the
    356cases of fcos and fptan, the first column gives the results when all
    357cases where arguments greater than 1.5 were removed from the results
    358given in the second column. Unlike the emulator, an 80486 FPU returns
    359results of relatively poor accuracy for these instructions when the
    360argument approaches pi/2. The table does not show those cases when the
    361accuracy of the results were less than 62 bits, which occurs quite
    362often for fsin and fptan when the argument approaches pi/2. This poor
    363accuracy is discussed above in relation to the Turbo C "emulator", and
    364the accuracy of the value of pi.
    365
    366
    367bits   f2xm1  f2xm1 fpatan   fcos   fcos  fyl2x fyl2xp1  fsin  fptan  fptan
    36862.0       0      0      0      0    437      0      0      0      0    925
    36962.1       0      0     10      0    894      0      0      0      0   1023
    37062.2      14      0      0      0   1033      0      0      0      0    945
    37162.3      57      0      0      0   1202      0      0      0      0   1023
    37262.4     385      0      0     10   1292      0     23      0      0   1178
    37362.5    1140      0      0    119   1649      0     39      0      0   1149
    37462.6    2037      0      0    189   1620      0     16      0      0   1169
    37562.7    5086     14      0    646   2315     10    101     35     39   1402
    37662.8    8818     86      0    984   3050     59    287    131    224   2036
    37762.9   11340   1355      0   2126   4153     79    605    357    321   1948
    37863.0   15557   4750      0   3319   5376    246   1281    862    808   2688
    37963.1   20016   8288      0   4620   6628    511   2569   1723   1510   3302
    38063.2   24945  11127     10   6588   8098   1120   4470   2968   2990   4724
    38163.3   25686  12382     69   8774  10682   1906   6775   4482   5474   7236
    38263.4   29219  14722     79  11109  12311   3094   9414   7259   8912  10587
    38363.5   30458  14936    393  13802  15014   5874  12666   9609  13762  15262
    38463.6   32439  16448   1277  17945  19028  10226  15537  14657  19158  20346
    38563.7   35031  16805   4067  23003  23947  18910  20116  21333  25001  26209
    38663.8   33251  15820   7673  24781  25675  24617  25354  24440  29433  30329
    38763.9   33293  16833  18529  28318  29233  31267  31470  27748  29676  30601
    388
    389Per cent with error:
    390        30.9           3.2          18.5    9.8   13.1   11.6          17.4
    391Total arguments tested:
    392       70194  70099 101784 100641 100641 101799 128853 114893 102675 102675
    393
    394
    395------------------------- Contributors -------------------------------
    396
    397A number of people have contributed to the development of the
    398emulator, often by just reporting bugs, sometimes with suggested
    399fixes, and a few kind people have provided me with access in one way
    400or another to an 80486 machine. Contributors include (to those people
    401who I may have forgotten, please forgive me):
    402
    403Linus Torvalds
    404Tommy.Thorn@daimi.aau.dk
    405Andrew.Tridgell@anu.edu.au
    406Nick Holloway, alfie@dcs.warwick.ac.uk
    407Hermano Moura, moura@dcs.gla.ac.uk
    408Jon Jagger, J.Jagger@scp.ac.uk
    409Lennart Benschop
    410Brian Gallew, geek+@CMU.EDU
    411Thomas Staniszewski, ts3v+@andrew.cmu.edu
    412Martin Howell, mph@plasma.apana.org.au
    413M Saggaf, alsaggaf@athena.mit.edu
    414Peter Barker, PETER@socpsy.sci.fau.edu
    415tom@vlsivie.tuwien.ac.at
    416Dan Russel, russed@rpi.edu
    417Daniel Carosone, danielce@ee.mu.oz.au
    418cae@jpmorgan.com
    419Hamish Coleman, t933093@minyos.xx.rmit.oz.au
    420Bruce Evans, bde@kralizec.zeta.org.au
    421Timo Korvola, Timo.Korvola@hut.fi
    422Rick Lyons, rick@razorback.brisnet.org.au
    423Rick, jrs@world.std.com
    424 
    425...and numerous others who responded to my request for help with
    426a real 80486.
    427