Commit Graph

120027 Commits

Author SHA1 Message Date
Bruce Evans
833f0e1a4a Minor cleanups and optimizations:
- Remove dead code that I forgot to remove in the previous commit.

- Calculate the sum of the lower terms of the polynomial (divided by
  x**5) in a single expression (sum of odd terms) + (sum of even terms)
  with parentheses to control grouping.  This is clearer and happens to
  give better instruction scheduling for a tiny optimization (an
  average of about ~0.5 cycles/call on Athlons).

- Calculate the final sum in a single expression with parentheses to
  control grouping too.  Change the grouping from
  first_term + (second_term + sum_of_lower_terms) to
  (first_term + second_term) + sum_of_lower_terms.  Normally the first
  grouping must be used for accuracy, but extra precision makes any
  grouping give a correct result so we can group for efficiency.  This
  is a larger optimization (average 3-4 cycles/call or 5%).

- Use parentheses to indicate that the C order of left to right evaluation
  is what is wanted (for efficiency) in a multiplication too.

The old fdlibm code has several optimizations related to these.  2
involve doing an extra operation that can be done almost in parallel
on some superscalar machines but are pessimizations on sequential
machines.  Others involve statement ordering or expression grouping.
All of these except the ordering for the combining the sums of the odd
and even terms seem to be ideal for Athlons, but parallelism is still
limited so all of these optimizations combined together with the ones
in this commit save only ~6-8 cycles (~10%).

On an AXP, tanf() on uniformly distributed args in [-2pi, 2pi] now
takes 39-59 cycles.  I don't know of any more optimizations for tanf()
short of writing it all in asm with very MD instruction scheduling.
Hardware fsin takes 122-138 cycles.  Most of the optimizations for
tanf() don't work very well for tan[l]().  fdlibm tan() now takes
145-365 cycles.
2005-11-24 13:48:40 +00:00
Ruslan Ermilov
34c96b563e Improve the documentation of "proxyall" knob, somewhat: we do not
proxy for hosts that are reachable through the same interface the
request came in from.  This feature is mainly for hosts reachable
through some P2P link, e.g. the gif(4) tunnel.
2005-11-24 13:44:42 +00:00
Ruslan Ermilov
877205d1d4 Fix prototype. 2005-11-24 11:29:11 +00:00
Ruslan Ermilov
4226a8bf6f Fix prototypes. 2005-11-24 11:26:36 +00:00
Ruslan Ermilov
94f5f5df3d Fix prototypes. 2005-11-24 11:14:06 +00:00
Ruslan Ermilov
3a14548604 Fix prototypes. 2005-11-24 10:54:47 +00:00
Ruslan Ermilov
70b0774919 Fix prototype. 2005-11-24 10:43:35 +00:00
Ruslan Ermilov
41792fb59f Fix prototype. 2005-11-24 10:32:39 +00:00
Ruslan Ermilov
639d850061 Fix prototypes. 2005-11-24 10:30:44 +00:00
Ruslan Ermilov
de599f05ea Fix prototypes. 2005-11-24 10:06:05 +00:00
Ruslan Ermilov
b484d9f687 Fix prototype to match the code and documentation. 2005-11-24 09:51:59 +00:00
Joel Dahl
19797b2256 s/5.5/6.0/ in HISTORY section.
Discussed with:	ru
2005-11-24 09:25:10 +00:00
Ruslan Ermilov
52fbcc15a0 Revert last revision, strmode() should be moved to <unistd.h> to be
properly fixed.
2005-11-24 08:30:44 +00:00
Ruslan Ermilov
1a581012df Add missing "struct" in i386/i386/machdep.c,v 1.497 by deischen@. 2005-11-24 08:16:18 +00:00
Ruslan Ermilov
47be132478 Make SYNOPSIS compile.
Attn peter@: this manpage wasn't synced with your code changes.
2005-11-24 07:48:19 +00:00
Ruslan Ermilov
93f0f0427b Fix prototypes.
Attn davidxu@: most likely, the description should also be tweaked
after your undocumented changes that changed these prototypes.
2005-11-24 07:33:35 +00:00
Ruslan Ermilov
20d91f5626 Fix prototype to match the code and documentation. 2005-11-24 07:20:26 +00:00
Ruslan Ermilov
7062693e56 Fix prototypes. 2005-11-24 07:12:01 +00:00
Ruslan Ermilov
6eee826901 Keep up with const poisoning in uuid.h,v 1.3. 2005-11-24 07:04:20 +00:00
Ruslan Ermilov
fee45c0c38 Fix prototype of strmode() to match the code and documentation. 2005-11-24 06:59:35 +00:00
Ruslan Ermilov
36c71f6ac1 Fix prototype. 2005-11-24 06:56:21 +00:00
Nate Lawson
1f6e47a324 Only copy out the battery status/info if there was no error. 2005-11-24 05:23:56 +00:00
Olivier Houchard
ce4210d673 Use a magic number to know we were started from the elf wrapper.
Add a dummy _start function to make the non-elf version of the wrapper work.
2005-11-24 02:27:55 +00:00
Olivier Houchard
f5a9ac9ca4 Create a non-elf pure binary version of the kernel as well. 2005-11-24 02:25:49 +00:00
Bruce Evans
16638b5585 Optimized by eliminating the special case for 0.67434 <= |x| < pi/4.
A single polynomial approximation for tan(x) works in infinite precision
up to |x| < pi/2, but in finite precision, to restrict the accumulated
roundoff error to < 1 ulp, |x| must be restricted to less than about
sqrt(0.5/((1.5+1.5)/3)) ~= 0.707.  We restricted it a bit more to
give a safety margin including some slop for optimizations.  Now that
we use double precision for the calculations, the accumulated roundoff
error is in double-precision ulps so it can easily be made almost 2**29
times smaller than a single-precision ulp.  Near x = pi/4 its maximum
is about 0.5+(1.5+1.5)*x**2/3 ~= 1.117 double-precision ulps.

The minimax polynomial needs to be different to work for the larger
interval.  I didn't increase its degree the old degree is just large
enough to keep the final error less than 1 ulp and increasing the
degree would be a pessimization.  The maximum error is now ~0.80
ulps instead of ~0.53 ulps.

The speedup from this optimization for uniformly distributed args in
[-2pi, 2pi] is 28-43% on athlons, depending on how badly gcc selected
and scheduled the instructions in the old version.  The old version
has some int-to-float conversions that are apparently difficult to schedule
well, but gcc-3.3 somehow did everything ~10 cycles or ~10% faster than
gcc-3.4, with the difference especially large on AXPs.  On A64s, the
problem seems to be related to documented penalties for moving single
precision data to undead xmm registers.  With this version, the speed
is cycles is almost independent of the athlon and gcc version despite
the large differences in instruction selection to use the FPU on AXPs
and SSE on A64s.
2005-11-24 02:04:26 +00:00
Gleb Smirnoff
7ddd4e4126 Merge in new driver version from Intel - 3.2.18.
The most important change is support for adapters based on
82571 and 82572 chips.

Tested on:	82547EI on i386
Tested on:	82540EM on sparc64
2005-11-24 01:44:49 +00:00
Kris Kennaway
12f6e63d15 Correct division by zero error in comment. 2005-11-24 00:53:14 +00:00
Craig Rodrigues
7f2444598a Remove UFS-specific parts from mount(8).
For mounting UFS, all mount options are passed directly to nmount(),
without any UFS-specific logic.
2005-11-23 23:22:56 +00:00
Craig Rodrigues
722705c6b2 These files were never hooked into the build, and were the start
of an nmount()-based mount program for UFS.
Now that mount(8) calls nmount() directly for mounting UFS filesystems,
they are unnecessary.
2005-11-23 23:06:33 +00:00
Craig Rodrigues
5e6b93a014 In nmount() and vfs_donmount(), do not strcmp() the options in the iovec
directly.  We need to copyin() the strings in the iovec before
we can strcmp() them.  Also, when we want to send the errmsg back
to userspace, we need to copyout()/copystr() the string.

Add a small helper function vfs_getopt_pos() which takes in the
name of an option, and returns the array index of the name in the iovec,
or -1 if not found.  This allows us to locate an option in
the iovec without actually manipulating the iovec members. directly via
strcmp().

Noticed by:	kris on sparc64
2005-11-23 20:51:15 +00:00
Ruslan Ermilov
4ca0505435 Fix prototype. 2005-11-23 20:34:37 +00:00
Ruslan Ermilov
8b79908889 Fix prototype. 2005-11-23 20:26:58 +00:00
Craig Rodrigues
35d6c7f50e Do not pass userquota and groupquota mount options to nmount().
These options are read from fstab by quotacheck(8), but are not
valid mount options that need to be passed down the the filesystem.

Noticed by:	maxim
2005-11-23 20:17:27 +00:00
Tai-hwa Liang
62c7cb517a - Adding the missing 'W' option back which was accidentally removed
in rev1.37.
- Fixing a core dump inside build_iovec_argf by providing a !NULL format
  string to vsnprintf(3).

Reviewed by:	rodrigc
2005-11-23 19:52:14 +00:00
John Baldwin
b05223a327 Add locking and mark MPSAFE:
- Add locked variants of start, init, and ifmedia_upd.
- Add a mutex to the softc and remove spl calls.
- Use callout(9) rather than timeout(9).
- Setup interrupt handler last in attach.
- Use M_ZERO rather than bzero.

MFC after:	1 week
Tested by:	wpaul
2005-11-23 18:51:34 +00:00
John Baldwin
addfb88d47 MFi386: Sort and add COUNT_{IPIS,XINVLTLB_HITS}.
Pointy hat to:	jhb (2)
2005-11-23 18:12:05 +00:00
John Baldwin
0c43612a35 Sort. 2005-11-23 18:11:24 +00:00
Olivier Houchard
2ec9d05328 MFP4: Bring in arm9 cache-related functions
Obtained from:	NetBSD
2005-11-23 18:02:40 +00:00
Damien Bergamini
8ea9781eed Optimize PLCP length field computation for 802.11b rates. 2005-11-23 17:32:57 +00:00
Bill Paul
f1b78ee016 Somehow memmove() got mapped to memset() in the patch table. Create a
real memmove() implementation and use that instead.
2005-11-23 17:10:46 +00:00
Ruslan Ermilov
79be508c8f Fix prototypes. 2005-11-23 16:44:23 +00:00
John Baldwin
ef2cda76c0 - Quiet the pci_link(4) devices so that they don't show up in dmesg now.
- Improve panic message if we fail to read the PCI bus number from a bridge
  device.
- Don't try to lookup a BIOS IRQ for a link unless the link is routed via
  an ISA IRQ since BIOSen currently only route PCI link devices via ISA
  IRQs.

Tested by:	Mathieu Prevot bsdhack at club-internet dot fr
MFC after:	1 week
2005-11-23 16:36:13 +00:00
Ruslan Ermilov
8ae7a845d5 There's no longer^Wyet <sys/capability.h>. 2005-11-23 16:24:39 +00:00
Ruslan Ermilov
49e5b98f5a Fix inet6_opt_get_val() prototype. 2005-11-23 16:07:54 +00:00
Ruslan Ermilov
5306fb2d0c Make SYNOPSIS compile. 2005-11-23 15:55:38 +00:00
Ruslan Ermilov
b0faeb2d42 Make SYNOPSIS compile after imp@'s changes. 2005-11-23 15:44:42 +00:00
Ruslan Ermilov
16a97b8591 Make SYNOPSIS compile. 2005-11-23 15:41:36 +00:00
Bruce Evans
94a5f9be99 Use only double precision for "kernel" tanf (except for returning float).
This is a minor interface change.  The function is renamed from
__kernel_tanf() to __kernel_tandf() so that misues of it will cause
link errors and not crashes.

This version is a routine translation with no special optimizations
for accuracy or efficiency.  It gives an unimportant increase in
accuracy, from ~0.9 ulps to 0.5285 ulps.  Almost all of the error is
from the minimax polynomial (~0.03 ulps and the final rounding step
(< 0.5 ulps).  It gives strange differences in efficiency in the -5
to +10% range, with -O1 fairly consistently becoming faster and -O2
slower on AXP and A64 with gcc-3.3 and gcc-3.4.
2005-11-23 14:27:56 +00:00
Ruslan Ermilov
c48648d2c1 Add missing includes. 2005-11-23 10:49:07 +00:00
Kirill Ponomarev
c94ab799af Document PKG_PATH enviroment variable.
Prodded by:	Mark Andrews <Mark_Andrews AT isc DOT org>
MFC after:	2 days
2005-11-23 10:31:59 +00:00