279 lines
12 KiB
Plaintext
279 lines
12 KiB
Plaintext
/*
|
|
* wm-FPU-emu an FPU emulator for 80386 and 80486SX microprocessors.
|
|
*
|
|
*
|
|
* Copyright (C) 1992,1993,1994
|
|
* W. Metzenthen, 22 Parker St, Ormond, Vic 3163,
|
|
* Australia. E-mail billm@vaxc.cc.monash.edu.au
|
|
* All rights reserved.
|
|
*
|
|
* This copyright notice covers the redistribution and use of the
|
|
* FPU emulator developed by W. Metzenthen. It covers only its use
|
|
* in the 386BSD, FreeBSD and NetBSD operating systems. Any other
|
|
* use is not permitted under this copyright.
|
|
*
|
|
* Redistribution and use in source and binary forms, with or without
|
|
* modification, are permitted provided that the following conditions
|
|
* are met:
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
* notice, this list of conditions and the following disclaimer.
|
|
* 2. Redistributions in binary form must include information specifying
|
|
* that source code for the emulator is freely available and include
|
|
* either:
|
|
* a) an offer to provide the source code for a nominal distribution
|
|
* fee, or
|
|
* b) list at least two alternative methods whereby the source
|
|
* can be obtained, e.g. a publically accessible bulletin board
|
|
* and an anonymous ftp site from which the software can be
|
|
* downloaded.
|
|
* 3. All advertising materials specifically mentioning features or use of
|
|
* this emulator must acknowledge that it was developed by W. Metzenthen.
|
|
* 4. The name of W. Metzenthen may not be used to endorse or promote
|
|
* products derived from this software without specific prior written
|
|
* permission.
|
|
*
|
|
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,
|
|
* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
|
|
* AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
|
|
* W. METZENTHEN BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
|
* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
|
* PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
|
* PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
|
* LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
|
* NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
|
* SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
|
*
|
|
*
|
|
* The purpose of this copyright, based upon the Berkeley copyright, is to
|
|
* ensure that the covered software remains freely available to everyone.
|
|
*
|
|
* The software (with necessary differences) is also available, but under
|
|
* the terms of the GNU copyleft, for the Linux operating system and for
|
|
* the djgpp ms-dos extender.
|
|
*
|
|
* W. Metzenthen June 1994.
|
|
*
|
|
* $FreeBSD$
|
|
*/
|
|
|
|
wm-FPU-emu is an FPU emulator for Linux. It is derived from wm-emu387
|
|
which is my 80387 emulator for djgpp (gcc under msdos); wm-emu387 was
|
|
in turn based upon emu387 which was written by DJ Delorie for djgpp.
|
|
The interface to the Linux kernel is based upon the original Linux
|
|
math emulator by Linus Torvalds.
|
|
|
|
My target FPU for wm-FPU-emu is that described in the Intel486
|
|
Programmer's Reference Manual (1992 edition). Numerous facets of the
|
|
functioning of the FPU are not well covered in the Reference Manual;
|
|
in the absence of clear details I have made guesses about the most
|
|
reasonable behaviour. Recently, this situation has improved because
|
|
I now have some access to the results produced by a real 80486 FPU.
|
|
|
|
wm-FPU-emu does not implement all of the behaviour of the 80486 FPU.
|
|
See "Limitations" later in this file for a partial list of some
|
|
differences. I believe that the missing features are never used by
|
|
normal C or FORTRAN programs.
|
|
|
|
|
|
Please report bugs, etc to me at:
|
|
apm233m@vaxc.cc.monash.edu.au
|
|
|
|
|
|
--Bill Metzenthen
|
|
May 1993
|
|
|
|
|
|
----------------------- Internals of wm-FPU-emu -----------------------
|
|
|
|
Numeric algorithms:
|
|
(1) Add, subtract, and multiply. Nothing remarkable in these.
|
|
(2) Divide has been tuned to get reasonable performance. The algorithm
|
|
is not the obvious one which most people seem to use, but is designed
|
|
to take advantage of the characteristics of the 80386. I expect that
|
|
it has been invented many times before I discovered it, but I have not
|
|
seen it. It is based upon one of those ideas which one carries around
|
|
for years without ever bothering to check it out.
|
|
(3) The sqrt function has been tuned to get good performance. It is based
|
|
upon Newton's classic method. Performance was improved by capitalizing
|
|
upon the properties of Newton's method, and the code is once again
|
|
structured taking account of the 80386 characteristics.
|
|
(4) The trig, log, and exp functions are based in each case upon quasi-
|
|
"optimal" polynomial approximations. My definition of "optimal" was
|
|
based upon getting good accuracy with reasonable speed.
|
|
|
|
The code of the emulator is complicated slightly by the need to
|
|
account for a limited form of re-entrancy. Normally, the emulator will
|
|
emulate each FPU instruction to completion without interruption.
|
|
However, it may happen that when the emulator is accessing the user
|
|
memory space, swapping may be needed. In this case the emulator may be
|
|
temporarily suspended while disk i/o takes place. During this time
|
|
another process may use the emulator, thereby changing some static
|
|
variables (eg FPU_st0_ptr, etc). The code which accesses user memory
|
|
is confined to five files:
|
|
fpu_entry.c
|
|
reg_ld_str.c
|
|
load_store.c
|
|
get_address.c
|
|
errors.c
|
|
|
|
----------------------- Limitations of wm-FPU-emu -----------------------
|
|
|
|
There are a number of differences between the current wm-FPU-emu
|
|
(version beta 1.4) and the 80486 FPU (apart from bugs). Some of the
|
|
more important differences are listed below:
|
|
|
|
All internal computations are performed at 64 bit or higher precision
|
|
and rounded etc as required by the PC bits of the FPU control word.
|
|
Under the crt0 version for Linux current at March 1993, the FPU PC
|
|
bits specify 53 bits precision.
|
|
|
|
The precision flag (PE of the FPU status word) and the Roundup flag
|
|
(C1 of the status word) are now partially implemented. Does anyone
|
|
write code which uses these features?
|
|
|
|
The functions which load/store the FPU state are partially implemented,
|
|
but the implementation should be sufficient for handling FPU errors etc
|
|
in 32 bit protected mode.
|
|
|
|
The implementation of the exception mechanism is flawed for unmasked
|
|
interrupts.
|
|
|
|
Detection of certain conditions, such as denormal operands, is not yet
|
|
complete.
|
|
|
|
----------------------- Performance of wm-FPU-emu -----------------------
|
|
|
|
Speed.
|
|
-----
|
|
|
|
The speed of floating point computation with the emulator will depend
|
|
upon instruction mix. Relative performance is best for the instructions
|
|
which require most computation. The simple instructions are adversely
|
|
affected by the fpu instruction trap overhead.
|
|
|
|
|
|
Timing: Some simple timing tests have been made on the emulator functions.
|
|
The times include load/store instructions. All times are in microseconds
|
|
measured on a 33MHz 386 with 64k cache. The Turbo C tests were under
|
|
ms-dos, the next two columns are for emulators running with the djgpp
|
|
ms-dos extender. The final column is for wm-FPU-emu in Linux 0.97,
|
|
using libm4.0 (hard).
|
|
|
|
function Turbo C djgpp 1.06 WM-emu387 wm-FPU-emu
|
|
|
|
+ 60.5 154.8 76.5 139.4
|
|
- 61.1-65.5 157.3-160.8 76.2-79.5 142.9-144.7
|
|
* 71.0 190.8 79.6 146.6
|
|
/ 61.2-75.0 261.4-266.9 75.3-91.6 142.2-158.1
|
|
|
|
sin() 310.8 4692.0 319.0 398.5
|
|
cos() 284.4 4855.2 308.0 388.7
|
|
tan() 495.0 8807.1 394.9 504.7
|
|
atan() 328.9 4866.4 601.1 419.5-491.9
|
|
|
|
sqrt() 128.7 crashed 145.2 227.0
|
|
log() 413.1-419.1 5103.4-5354.21 254.7-282.2 409.4-437.1
|
|
exp() 479.1 6619.2 469.1 850.8
|
|
|
|
|
|
The performance under Linux is improved by the use of look-ahead code.
|
|
The following results show the improvement which is obtained under
|
|
Linux due to the look-ahead code. Also given are the times for the
|
|
original Linux emulator with the 4.1 'soft' lib.
|
|
|
|
[ Linus' note: I changed look-ahead to be the default under linux, as
|
|
there was no reason not to use it after I had edited it to be
|
|
disabled during tracing ]
|
|
|
|
wm-FPU-emu w original w
|
|
look-ahead 'soft' lib
|
|
+ 106.4 190.2
|
|
- 108.6-111.6 192.4-216.2
|
|
* 113.4 193.1
|
|
/ 108.8-124.4 700.1-706.2
|
|
|
|
sin() 390.5 2642.0
|
|
cos() 381.5 2767.4
|
|
tan() 496.5 3153.3
|
|
atan() 367.2-435.5 2439.4-3396.8
|
|
|
|
sqrt() 195.1 4732.5
|
|
log() 358.0-387.5 3359.2-3390.3
|
|
exp() 619.3 4046.4
|
|
|
|
|
|
These figures are now somewhat out-of-date. The emulator has become
|
|
progressively slower for most functions as more of the 80486 features
|
|
have been implemented.
|
|
|
|
|
|
----------------------- Accuracy of wm-FPU-emu -----------------------
|
|
|
|
|
|
Accuracy: The following table gives the accuracy of the sqrt(), trig
|
|
and log functions. Each function was tested at about 400 points. Ideal
|
|
results would be 64 bits. The reduced accuracy of cos() and tan() for
|
|
arguments greater than pi/4 can be thought of as being due to the
|
|
precision of the argument x; e.g. an argument of pi/2-(1e-10) which is
|
|
accurate to 64 bits can result in a relative accuracy in cos() of about
|
|
64 + log2(cos(x)) = 31 bits. Results for the Turbo C emulator are given
|
|
in the last column.
|
|
|
|
|
|
Function Tested x range Worst result (bits) Turbo C
|
|
|
|
sqrt(x) 1 .. 2 64.1 63.2
|
|
atan(x) 1e-10 .. 200 62.6 62.8
|
|
cos(x) 0 .. pi/2-(1e-10) 63.2 (x <= pi/4) 62.4
|
|
35.2 (x = pi/2-(1e-10)) 31.9
|
|
sin(x) 1e-10 .. pi/2 63.0 62.8
|
|
tan(x) 1e-10 .. pi/2-(1e-10) 62.4 (x <= pi/4) 62.1
|
|
35.2 (x = pi/2-(1e-10)) 31.9
|
|
exp(x) 0 .. 1 63.1 62.9
|
|
log(x) 1+1e-6 .. 2 62.4 62.1
|
|
|
|
|
|
As of version 1.3 of the emulator, the accuracy of the basic
|
|
arithmetic has been improved (by a small fraction of a bit). Care has
|
|
been taken to ensure full accuracy of the rounding of the basic
|
|
arithmetic functions (+,-,*,/,and fsqrt), and they all now produce
|
|
results which are exact to the 64th bit (unless there are any bugs
|
|
left). To ensure this, it was necessary to effectively get information
|
|
of up to about 128 bits precision. The emulator now passes the
|
|
"paranoia" tests (compiled with gcc 2.3.3) for 'float' variables (24
|
|
bit precision numbers) when precision control is set to 24, 53 or 64
|
|
bits, and for 'double' variables (53 bit precision numbers) when
|
|
precision control is set to 53 bits (a properly performing FPU cannot
|
|
pass the 'paranoia' tests for 'double' variables when precision
|
|
control is set to 64 bits).
|
|
|
|
------------------------- Contributors -------------------------------
|
|
|
|
A number of people have contributed to the development of the
|
|
emulator, often by just reporting bugs, sometimes with a suggested
|
|
fix, and a few kind people have provided me with access in one way or
|
|
another to an 80486 machine. Contributors include (to those people who
|
|
I have forgotten, please excuse me):
|
|
|
|
Linus Torvalds
|
|
Tommy.Thorn@daimi.aau.dk
|
|
Andrew.Tridgell@anu.edu.au
|
|
Nick Holloway alfie@dcs.warwick.ac.uk
|
|
Hermano Moura moura@dcs.gla.ac.uk
|
|
Jon Jagger J.Jagger@scp.ac.uk
|
|
Lennart Benschop
|
|
Brian Gallew geek+@CMU.EDU
|
|
Thomas Staniszewski ts3v+@andrew.cmu.edu
|
|
Martin Howell mph@plasma.apana.org.au
|
|
M Saggaf alsaggaf@athena.mit.edu
|
|
Peter Barker PETER@socpsy.sci.fau.edu
|
|
tom@vlsivie.tuwien.ac.at
|
|
Dan Russel russed@rpi.edu
|
|
Daniel Carosone danielce@ee.mu.oz.au
|
|
cae@jpmorgan.com
|
|
Hamish Coleman t933093@minyos.xx.rmit.oz.au
|
|
|
|
...and numerous others who responded to my request for help with
|
|
a real 80486.
|
|
|