120286 Commits

Author SHA1 Message Date
ru
7aa2a06ebf Fix prototype. 2005-11-24 14:17:35 +00:00
bde
4417000483 Minor cleanups and optimizations:
- Remove dead code that I forgot to remove in the previous commit.

- Calculate the sum of the lower terms of the polynomial (divided by
  x**5) in a single expression (sum of odd terms) + (sum of even terms)
  with parentheses to control grouping.  This is clearer and happens to
  give better instruction scheduling for a tiny optimization (an
  average of about ~0.5 cycles/call on Athlons).

- Calculate the final sum in a single expression with parentheses to
  control grouping too.  Change the grouping from
  first_term + (second_term + sum_of_lower_terms) to
  (first_term + second_term) + sum_of_lower_terms.  Normally the first
  grouping must be used for accuracy, but extra precision makes any
  grouping give a correct result so we can group for efficiency.  This
  is a larger optimization (average 3-4 cycles/call or 5%).

- Use parentheses to indicate that the C order of left to right evaluation
  is what is wanted (for efficiency) in a multiplication too.

The old fdlibm code has several optimizations related to these.  2
involve doing an extra operation that can be done almost in parallel
on some superscalar machines but are pessimizations on sequential
machines.  Others involve statement ordering or expression grouping.
All of these except the ordering for the combining the sums of the odd
and even terms seem to be ideal for Athlons, but parallelism is still
limited so all of these optimizations combined together with the ones
in this commit save only ~6-8 cycles (~10%).

On an AXP, tanf() on uniformly distributed args in [-2pi, 2pi] now
takes 39-59 cycles.  I don't know of any more optimizations for tanf()
short of writing it all in asm with very MD instruction scheduling.
Hardware fsin takes 122-138 cycles.  Most of the optimizations for
tanf() don't work very well for tan[l]().  fdlibm tan() now takes
145-365 cycles.
2005-11-24 13:48:40 +00:00
ru
f6e0fe2653 Improve the documentation of "proxyall" knob, somewhat: we do not
proxy for hosts that are reachable through the same interface the
request came in from.  This feature is mainly for hosts reachable
through some P2P link, e.g. the gif(4) tunnel.
2005-11-24 13:44:42 +00:00
ru
5bd42d0a34 Fix prototype. 2005-11-24 11:29:11 +00:00
ru
af47fb2f88 Fix prototypes. 2005-11-24 11:26:36 +00:00
ru
5633435ae3 Fix prototypes. 2005-11-24 11:14:06 +00:00
ru
46b5b6bcde Fix prototypes. 2005-11-24 10:54:47 +00:00
ru
3cf38aeba7 Fix prototype. 2005-11-24 10:43:35 +00:00
ru
f815813dd1 Fix prototype. 2005-11-24 10:32:39 +00:00
ru
ae11cb5ef9 Fix prototypes. 2005-11-24 10:30:44 +00:00
ru
7b90f188c4 Fix prototypes. 2005-11-24 10:06:05 +00:00
ru
6968f8c5bd Fix prototype to match the code and documentation. 2005-11-24 09:51:59 +00:00
joel
7eed0b9958 s/5.5/6.0/ in HISTORY section.
Discussed with:	ru
2005-11-24 09:25:10 +00:00
ru
0d06027644 Revert last revision, strmode() should be moved to <unistd.h> to be
properly fixed.
2005-11-24 08:30:44 +00:00
ru
2a0206e03e Add missing "struct" in i386/i386/machdep.c,v 1.497 by deischen@. 2005-11-24 08:16:18 +00:00
ru
d9eedd9185 Make SYNOPSIS compile.
Attn peter@: this manpage wasn't synced with your code changes.
2005-11-24 07:48:19 +00:00
ru
a615d0b31e Fix prototypes.
Attn davidxu@: most likely, the description should also be tweaked
after your undocumented changes that changed these prototypes.
2005-11-24 07:33:35 +00:00
ru
ac0207ffd9 Fix prototype to match the code and documentation. 2005-11-24 07:20:26 +00:00
ru
bf558bda27 Fix prototypes. 2005-11-24 07:12:01 +00:00
ru
e82db33c27 Keep up with const poisoning in uuid.h,v 1.3. 2005-11-24 07:04:20 +00:00
ru
f76625156c Fix prototype of strmode() to match the code and documentation. 2005-11-24 06:59:35 +00:00
ru
07d744857c Fix prototype. 2005-11-24 06:56:21 +00:00
njl
b4b8d998c5 Only copy out the battery status/info if there was no error. 2005-11-24 05:23:56 +00:00
cognet
792c8e2f49 Use a magic number to know we were started from the elf wrapper.
Add a dummy _start function to make the non-elf version of the wrapper work.
2005-11-24 02:27:55 +00:00
cognet
2a2203ec20 Create a non-elf pure binary version of the kernel as well. 2005-11-24 02:25:49 +00:00
bde
caae9bf081 Optimized by eliminating the special case for 0.67434 <= |x| < pi/4.
A single polynomial approximation for tan(x) works in infinite precision
up to |x| < pi/2, but in finite precision, to restrict the accumulated
roundoff error to < 1 ulp, |x| must be restricted to less than about
sqrt(0.5/((1.5+1.5)/3)) ~= 0.707.  We restricted it a bit more to
give a safety margin including some slop for optimizations.  Now that
we use double precision for the calculations, the accumulated roundoff
error is in double-precision ulps so it can easily be made almost 2**29
times smaller than a single-precision ulp.  Near x = pi/4 its maximum
is about 0.5+(1.5+1.5)*x**2/3 ~= 1.117 double-precision ulps.

The minimax polynomial needs to be different to work for the larger
interval.  I didn't increase its degree the old degree is just large
enough to keep the final error less than 1 ulp and increasing the
degree would be a pessimization.  The maximum error is now ~0.80
ulps instead of ~0.53 ulps.

The speedup from this optimization for uniformly distributed args in
[-2pi, 2pi] is 28-43% on athlons, depending on how badly gcc selected
and scheduled the instructions in the old version.  The old version
has some int-to-float conversions that are apparently difficult to schedule
well, but gcc-3.3 somehow did everything ~10 cycles or ~10% faster than
gcc-3.4, with the difference especially large on AXPs.  On A64s, the
problem seems to be related to documented penalties for moving single
precision data to undead xmm registers.  With this version, the speed
is cycles is almost independent of the athlon and gcc version despite
the large differences in instruction selection to use the FPU on AXPs
and SSE on A64s.
2005-11-24 02:04:26 +00:00
glebius
f9340d9884 Merge in new driver version from Intel - 3.2.18.
The most important change is support for adapters based on
82571 and 82572 chips.

Tested on:	82547EI on i386
Tested on:	82540EM on sparc64
2005-11-24 01:44:49 +00:00
kris
0dcc27d29e Correct division by zero error in comment. 2005-11-24 00:53:14 +00:00
rodrigc
4808c023d9 Remove UFS-specific parts from mount(8).
For mounting UFS, all mount options are passed directly to nmount(),
without any UFS-specific logic.
2005-11-23 23:22:56 +00:00
rodrigc
6e9653b075 These files were never hooked into the build, and were the start
of an nmount()-based mount program for UFS.
Now that mount(8) calls nmount() directly for mounting UFS filesystems,
they are unnecessary.
2005-11-23 23:06:33 +00:00
rodrigc
dc0fe47898 In nmount() and vfs_donmount(), do not strcmp() the options in the iovec
directly.  We need to copyin() the strings in the iovec before
we can strcmp() them.  Also, when we want to send the errmsg back
to userspace, we need to copyout()/copystr() the string.

Add a small helper function vfs_getopt_pos() which takes in the
name of an option, and returns the array index of the name in the iovec,
or -1 if not found.  This allows us to locate an option in
the iovec without actually manipulating the iovec members. directly via
strcmp().

Noticed by:	kris on sparc64
2005-11-23 20:51:15 +00:00
ru
11d4f09966 Fix prototype. 2005-11-23 20:34:37 +00:00
ru
642fd4337d Fix prototype. 2005-11-23 20:26:58 +00:00
rodrigc
675a0bcf59 Do not pass userquota and groupquota mount options to nmount().
These options are read from fstab by quotacheck(8), but are not
valid mount options that need to be passed down the the filesystem.

Noticed by:	maxim
2005-11-23 20:17:27 +00:00
avatar
d45ac37ecd - Adding the missing 'W' option back which was accidentally removed
in rev1.37.
- Fixing a core dump inside build_iovec_argf by providing a !NULL format
  string to vsnprintf(3).

Reviewed by:	rodrigc
2005-11-23 19:52:14 +00:00
jhb
10794dc0a1 Add locking and mark MPSAFE:
- Add locked variants of start, init, and ifmedia_upd.
- Add a mutex to the softc and remove spl calls.
- Use callout(9) rather than timeout(9).
- Setup interrupt handler last in attach.
- Use M_ZERO rather than bzero.

MFC after:	1 week
Tested by:	wpaul
2005-11-23 18:51:34 +00:00
jhb
bc2f4ae553 MFi386: Sort and add COUNT_{IPIS,XINVLTLB_HITS}.
Pointy hat to:	jhb (2)
2005-11-23 18:12:05 +00:00
jhb
300d90fc68 Sort. 2005-11-23 18:11:24 +00:00
cognet
2c70dd955a MFP4: Bring in arm9 cache-related functions
Obtained from:	NetBSD
2005-11-23 18:02:40 +00:00
damien
bb8a6c0cae Optimize PLCP length field computation for 802.11b rates. 2005-11-23 17:32:57 +00:00
wpaul
3d571e2f28 Somehow memmove() got mapped to memset() in the patch table. Create a
real memmove() implementation and use that instead.
2005-11-23 17:10:46 +00:00
ru
869e65f881 Fix prototypes. 2005-11-23 16:44:23 +00:00
jhb
1d59f819d8 - Quiet the pci_link(4) devices so that they don't show up in dmesg now.
- Improve panic message if we fail to read the PCI bus number from a bridge
  device.
- Don't try to lookup a BIOS IRQ for a link unless the link is routed via
  an ISA IRQ since BIOSen currently only route PCI link devices via ISA
  IRQs.

Tested by:	Mathieu Prevot bsdhack at club-internet dot fr
MFC after:	1 week
2005-11-23 16:36:13 +00:00
ru
5e1264a066 There's no longer^Wyet <sys/capability.h>. 2005-11-23 16:24:39 +00:00
ru
f0442273f1 Fix inet6_opt_get_val() prototype. 2005-11-23 16:07:54 +00:00
ru
07eeed1e1c Make SYNOPSIS compile. 2005-11-23 15:55:38 +00:00
ru
906caa442c Make SYNOPSIS compile after imp@'s changes. 2005-11-23 15:44:42 +00:00
ru
baae9ec455 Make SYNOPSIS compile. 2005-11-23 15:41:36 +00:00
bde
1e3150891d Use only double precision for "kernel" tanf (except for returning float).
This is a minor interface change.  The function is renamed from
__kernel_tanf() to __kernel_tandf() so that misues of it will cause
link errors and not crashes.

This version is a routine translation with no special optimizations
for accuracy or efficiency.  It gives an unimportant increase in
accuracy, from ~0.9 ulps to 0.5285 ulps.  Almost all of the error is
from the minimax polynomial (~0.03 ulps and the final rounding step
(< 0.5 ulps).  It gives strange differences in efficiency in the -5
to +10% range, with -O1 fairly consistently becoming faster and -O2
slower on AXP and A64 with gcc-3.3 and gcc-3.4.
2005-11-23 14:27:56 +00:00
ru
11e07dda30 Add missing includes. 2005-11-23 10:49:07 +00:00