11634 Commits

Author SHA1 Message Date
jasone
c614695539 Clean up manipulation of chunk page map elements to remove some tenuous
assumptions about whether bits are set at various times.  This makes
adding other flags safe.

Reorganize functions in order to inline i{m,c,p,s,re}alloc().  This
allows the entire fast-path call chains for malloc() and free() to be
inlined. [1]

Suggested by:	[1] Stuart Parmenter <stuart@mozilla.com>
2008-02-08 00:35:56 +00:00
bde
efcf10f47b Use a better method of scaling by 2**k. Instead of adding to the
exponent bits of the reduced result, construct 2**k (hopefully in
parallel with the construction of the reduced result) and multiply by
it.  This tends to be much faster if the construction of 2**k is
actually in parallel, and might be faster even with no parallelism
since adjustment of the exponent requires a read-modify-wrtite at an
unfortunate time for pipelines.

In some cases involving exp2* on amd64 (A64), this change saves about
40 cycles or 30%.  I think it is inherently only about 12 cycles faster
in these cases and the rest of the speedup is from partly-accidentally
avoiding compiler pessimizations (the construction of 2**k is now
manually scheduled for good results, and -O2 doesn't always mess this
up).  In most cases on amd64 (A64) and i386 (A64) the speedup is about
20 cycles.  The worst case that I found is expf on ia64 where this
change is a pessimization of about 10 cycles or 5%.  The manual
scheduling for plain exp[f] is harder and not as tuned.

Details specific to expm1*:
- the saving is closer to 12 cycles than to 40 for expm1* on i386 (A64).
  For some reason it is much larger for negative args.
- also convert to __FBSDID().
2008-02-07 09:42:19 +00:00
bde
22e608f1ce Use a better method of scaling by 2**k. Instead of adding to the
exponent bits of the reduced result, construct 2**k (hopefully in
parallel with the construction of the reduced result) and multiply by
it.  This tends to be much faster if the construction of 2**k is
actually in parallel, and might be faster even with no parallelism
since adjustment of the exponent requires a read-modify-wrtite at an
unfortunate time for pipelines.

In some cases involving exp2* on amd64 (A64), this change saves about
40 cycles or 30%.  I think it is inherently only about 12 cycles faster
in these cases and the rest of the speedup is from partly-accidentally
avoiding compiler pessimizations (the construction of 2**k is now
manually scheduled for good results, and -O2 doesn't always mess this
up).  In most cases on amd64 (A64) and i386 (A64) the speedup is about
20 cycles.  The worst case that I found is expf on ia64 where this
change is a pessimization of about 10 cycles or 5%.  The manual
scheduling for plain exp[f] is harder and not as tuned.

This change ld128/s_exp2l.c has not been tested.
2008-02-07 03:17:05 +00:00
des
67c8e0948c Add missing #include
Spotted by:	tinderbox
Submitted by:	Pietro Cerutti <gahr@gahr.ch>
Pointy hat to:	des
2008-02-06 23:25:29 +00:00
des
ddda03a2e0 Yet another pointy hat: when I zapped FBSDprivate_1.1, I forgot to move
its contents to FBSDprivate_1.0.
2008-02-06 20:45:46 +00:00
des
b4e1ea3e1c Add pthread_mutex_isowned_np() here as well; libthr and libkse are supposed
to have identical functionality.

MFC after:	2 weeks
2008-02-06 20:44:29 +00:00
des
f006d1f25a Remove unnecessary prototype. 2008-02-06 20:43:19 +00:00
des
0cd1685caf Add pthread_mutex_isowned_np() so there is no need for an additional
prototype next to the implementation.

MFC after:	2 weeks
2008-02-06 20:42:35 +00:00
des
ee907d59af Previous commit had a typo that resulted in symbol versioning being
(silently) disabled for libkse...

Pointy hat to:	des
2008-02-06 20:33:59 +00:00
des
03dc3be400 Give libkse the same treatment as libthr re. symbol versioning.
MFC after:	2 weeks
2008-02-06 20:30:48 +00:00
des
3086491df6 Convert pthread.map to the format expected by version_gen.awk, and modify
the Makefile accordingly; libthr now explicitly uses libc's Versions.def.

MFC after:	2 weeks
2008-02-06 20:25:00 +00:00
des
85a226c62d Remove incorrectly added FBSDprivate_1.1 namespace, and move symbols which
are new in FreeBSD 8 to the appropriate namespace.
2008-02-06 20:20:29 +00:00
des
053e111aba Per discussion on -threads, rename _islocked_np() to _isowned_np(). 2008-02-06 19:34:31 +00:00
des
d129ae8c34 Add necessary cast for tolower() argument.
Submitted by:	Joerg Sonnenberger <joerg@britannica.bec.de>
MFC after:	1 week
2008-02-06 11:39:55 +00:00
bde
bd06cb56ab As for the float trig functions and logf, use a minimax polynomial
that is specialized for float precision.  The new polynomial has degree
5 instead of 11, and a maximum error of 2**-27.74 ulps instead
of 2**-30.64.  This doesn't affect the final error significantly; the
maximum error was and is about 0.9101 ulps on amd64 -01 and the number
of cases with an error of > 0.5 ulps is actually reduced by epsilon
despite the larger error in the polynomial.

This is about 15% faster on amd64 (A64), i386 (A64) and ia64.  The asm
version is still used instead of this on i386 since it is faster and
more accurate.
2008-02-06 06:35:21 +00:00
jasone
44c343f8fa Track dirty unused pages so that they can be purged if they exceed a
threshold, according to the 'F' MALLOC_OPTIONS flag.  This obsoletes the
'H' flag.

Try to realloc() large objects in place.  This substantially speeds up
incremental large reallocations in the common case.

Fix a bug in arena_ralloc() that caused relocation of sub-page objects
even if the old and new sizes were in the same size class.

Maintain trees of runs and simplify the per-chunk page map.  This allows
logarithmic-time searching for sufficiently large runs in
arena_run_alloc(), whereas the previous algorithm required linear time
in the worst case.

Break various large functions into smaller sub-functions, and inline
only the functions that are in the fast path for small object
allocation/deallocation.

Remove an unnecessary check in base_pages_alloc_mmap().

Avoid integer division in choose_arena() for the NO_TLS case on
single-CPU systems.
2008-02-06 02:59:54 +00:00
matteo
da1547d460 set WARNS to 1: with WARNS=2 an aliasing error in a file generated by
rpcgen from include/rpcsvc/rex.x is exposed and I really don't know
how to fix it.

MFC after:	1 week
2008-02-05 20:03:45 +00:00
jkoshy
5c5b24c7a8 Document the return type for gelf_fsize(3).
Submitted by:		kaiw
2008-02-04 18:50:28 +00:00
des
c089534891 After careful consideration (and a brief discussion with attilio@), change
the semantics of pthread_mutex_islocked_np() to return true if and only if
the mutex is held by the current thread.

Obviously, change the regression test to match.

MFC after:	2 weeks
2008-02-04 12:35:23 +00:00
matteo
5f3ef8ab8c Fix incorrect handling of malloc failures
PR:		bin/83369
MFC after:	1 week
2008-02-04 07:56:36 +00:00
des
72d185548f Add pthread_mutex_islocked_np(), a cheap way to verify that a mutex is
locked.  This is intended primarily to support the userland equivalent
of the various *_ASSERT_LOCKED() macros we have in the kernel.

MFC after:	2 weeks
2008-02-03 22:38:10 +00:00
ume
97fd4b42a1 Remove incomplete support of AI_ALL and AI_V4MAPPED.
Reported by:	"Heiko Wundram (Beenic)" <wundram__at__beenic.net>
2008-02-03 19:07:55 +00:00
phk
13132840a1 Give sendfile(2) a SF_SYNC flag which makes it wait until all mbufs
referencing the files VM pages are returned from the network stack,
making changes to the file safe.

This flag does not guarantee that the data has been transmitted to the
other end.
2008-02-03 15:54:41 +00:00
jkoshy
234cf2fb65 Correct a typo. 2008-02-03 06:04:38 +00:00
deischen
9dcd92a867 When reinitializing a lockuser, don't assume that the lock is in
use.  If it is in use, use the watched request, otherwise use the
lockuser's own request.  Only allocate a lockuser request if both
requests are null.

PR:	119920
Tested by (6.x):	Landon Fuller <landonf -at- bikemonkey -dot- org>
2008-01-31 19:38:26 +00:00
jhb
d8dba28f83 The devstat(3) manpage claims that only <devstat.h> is needed as a
prerequisite for using this interface.  However, the 'statinfo' struct
actually references CPUSTATES from <sys/resource.h>, so in fact it
requires <sys/resource.h> to compile.  Use a nested include of
<sys/resource.h> to make the code match the docs.

Reported by:	Pietro Cerutti  gahr | gahr.ch
2008-01-31 16:55:12 +00:00
kaiw
e57ce2b6e6 Add hook routine archive_write_ar_finish() which writes the 'ar'
global header if nothing else has been written before the closing of
the archive. This will change the behaviour when creating archives
without members, i.e., instead of generating a 0-size archive file, an
archive with just the global header (8 bytes in total) will be created
and it is indeed a valid archive by the definition of libarchive, thus
subsequent operation on this archive will be accepted. This especially
solves the failure caused by following sequence: (several ports do)

% ar cru libfoo.a  	    # without specifying obj files
% ranlib libfoo.a

Reviewed by:	kientzle, jkoshy
Approved by:	kientzle
Approved by:	jkoshy	(mentor)
Reported by:	erwin
MFC after:	1 month
2008-01-31 08:11:01 +00:00
kientzle
213915cb49 Add a test to verify compatibility with archives with
odd hardlinks.  I need to extend this to test pax extended
archives with bodies attached to hardlinks and other less-common cases.
2008-01-31 07:47:38 +00:00
kientzle
512df3b205 Tighten up the heuristic that decides whether or not we should
obey or ignore the size field on a hardlink entry.  In particular,
if we're reading a non-POSIX archive, we should always ignore
the size field.

This should fix both the audio/xmcd port and the math/unixstat port.

Thanks to: Pav Lucistnik for pointing these two ports out to me.
MFC after: 7 days
2008-01-31 07:41:45 +00:00
trhodes
46c986723b Update this manual page to describe the extattr_list_file() and the
extattr_list_fd() functions.

PR:		108142
Submitted by:	Richard Dawe <rich@phekda.gotadsl.co.uk>
Reviewed by:	kientzle
2008-01-29 18:15:38 +00:00
das
b2c068251b Adjust the exponent before converting the result from double to
float precision. This fixes some double rounding problems for
subnormals and simplifies things a bit.
2008-01-28 01:19:07 +00:00
yar
ac1e4103b9 Our fts(3) API, as inherited from 4.4BSD, suffers from integer
fields in FTS and FTSENT structs being too narrow.  In addition,
the narrow types creep from there into fts.c.  As a result, fts(3)
consumers, e.g., find(1) or rm(1), can't handle file trees an ordinary
user can create, which can have security implications.

To fix the historic implementation of fts(3), OpenBSD and NetBSD
have already changed <fts.h> in somewhat incompatible ways, so we
are free to do so, too.  This change is a superset of changes from
the other BSDs with a few more improvements.  It doesn't touch
fts(3) functionality; it just extends integer types used by it to
match modern reality and the C standard.

Here are its points:

o For C object sizes, use size_t unless it's 100% certain that
  the object will be really small.  (Note that fts(3) can construct
  pathnames _much_ longer than PATH_MAX for its consumers.)

o Avoid the short types because on modern platforms using them
  results in larger and slower code.  Change shorts to ints as
  follows:

	- For variables than count simple, limited things like states,
	  use plain vanilla `int' as it's the type of choice in C.

	- For a limited number of bit flags use `unsigned' because signed
	  bit-wise operations are implementation-defined, i.e., unportable,
	  in C.

o For things that should be at least 64 bits wide, use long long
  and not int64_t, as the latter is an optional type.  See
  FTSENT.fts_number aka FTS.fts_bignum.  Extending fts_number `to
  satisfy future needs' is pointless because there is fts_pointer,
  which can be used to link to arbitrary data from an FTSENT.
  However, there already are fts(3) consumers that require fts_number,
  or fts_bignum, have at least 64 bits in it, so we must allow for them.

o For the tree depth, use `long'.  This is a trade-off between making
  this field too wide and allowing for 64-bit inode numbers and/or
  chain-mounted filesystems.  On the one hand, `long' is almost
  enough for 32-bit filesystems on a 32-bit platform (our ino_t is
  uint32_t now).  On the other hand, platforms with a 64-bit (or
  wider) `long' will be ready for 64-bit inode numbers, as well as
  for several 32-bit filesystems mounted one under another.  Note
  that fts_level has to be signed because -1 is a magic value for it,
  FTS_ROOTPARENTLEVEL.

o For the `nlinks' local var in fts_build(), use `long'.  The logic
  in fts_build() requires that `nlinks' be signed, but our nlink_t
  currently is uint16_t.  Therefore let's make the signed var wide
  enough to be able to represent 2^16-1 in pure C99, and even 2^32-1
  on a 64-bit platform.  Perhaps the logic should be changed just
  to use nlink_t, but it can be done later w/o breaking fts(3) ABI
  any more because `nlinks' is just a local var.

This commit also inludes supporting stuff for the fts change:

o Preserve the old versions of fts(3) functions through libc symbol
versioning because the old versions appeared in all our former releases.

o Bump __FreeBSD_version just in case.  There is a small chance that
some ill-written 3-rd party apps may fail to build or work correctly
if compiled after this change.

o Update the fts(3) manpage accordingly.  In particular, remove
references to fts_bignum, which was a FreeBSD-specific hack to work
around the too narrow types of FTSENT members.  Now fts_number is
at least 64 bits wide (long long) and fts_bignum is an undocumented
alias for fts_number kept around for compatibility reasons.  According
to Google Code Search, the only big consumers of fts_bignum are in
our own source tree, so they can be fixed easily to use fts_number.

o Mention the change in src/UPDATING.

PR:		bin/104458
Approved by:	re (quite a while ago)
Discussed with:	deischen (the symbol versioning part)
Reviewed by:	-arch (mostly silence); das (generally OK, but we didn't
		agree on some types used; assuming that no objections on
		-arch let me to stick to my opinion)
2008-01-26 17:09:40 +00:00
bde
997d2d26fb Fix a harmless type error in 1.9. 2008-01-25 21:09:21 +00:00
des
a0d60019af Fix a regression introduced in rev 1.99: replace fclose(f) with a comment
explaining why f cannot possibly be a valid FILE * at this point.

MFC after:	1 day
2008-01-23 20:57:59 +00:00
kientzle
417364f2db Track version # from the portable release. 2008-01-23 05:48:07 +00:00
kientzle
d164e15296 Explain a subtle API change that was made recently.
Even though I believe this is a good change, it does
have the potential to break certain clients, so it's
good to document the reasoning behind the change.
2008-01-23 05:47:08 +00:00
kientzle
5c89a8c35a Properly pad symlinks when writing cpio "newc" format.
Thanks to: Jesse Barker for reporting this.
MFC after: 7 days
2008-01-23 05:43:26 +00:00
ache
061b803830 Fix longstanding mb/wc functions segfault if error occurse
inside _<encoding>_init().
Currently _EUC_init() only was affected.
2008-01-23 03:05:35 +00:00
ache
28095b28d0 Better fix for longstanding segfault. Don't touch current locale at all
on unknown encoding. Previous fix resets it to POSIX.
2008-01-23 02:17:27 +00:00
ache
76c6a978cc 1) Add (void) cast to _none_init() (while I am here)
2) Fix longstanding segfault in mb/wc code when unknown encoding is specified
in the locale file (mb/wc functions becomes NULL in that case).
2008-01-23 01:57:26 +00:00
trhodes
3c543fe5ae Xref flopen.3 which references this manual page.
PR:	112650
2008-01-22 15:56:48 +00:00
ache
c52b8566b4 Introduce new encoding: "ASCII"
It differs from default C/POSIX "NONE" mainly by stricter 8bit check
for mb*towc*/wc*tomb* family, returning EILSEQ
2008-01-21 23:48:12 +00:00
bde
babf3acb0e Fix cutoffs. This is just a cleanup and an optimization for unusual
cases which are used mainly by regression tests.

As usual, the cutoff for tiny args was not correctly translated to
float precision.  It was 2**-54 but 2**-24 works.  It must be about
2**-precision, since the error from approximating log(1+x) by x is
about the same as |x|.  Exhaustive testing shows that 2**-24 gives
perfect rounding in round-to-nearest mode.

Similarly for the cutoff for being small, except this is not used by
so many other functions.  It was 2**-29 but 2**-15 works.  It must be
a bit smaller than sqrt(2**-precision), since the error from
approximating log(1+x) by x-x*x/2 is about the same as x*x.  Exhaustive
testing shows that 2**-15 gives a maximum error of 0.5052 ulps in
round-to-nearest-mode.  The algorithm for the general case is only good
for 0.8388 ulps, so this is sufficient (but it loses slightly on i386 --
then extra precision gives 0.5032 ulps for the general case).

While investigating this, I noticed that optimizing the usual case by
falling into a middle case involving a simple polynomial evaluation
(return x-x*x/2 instead of x here) is not such a good idea since it
gives an enormous pessimization of tinier args on machines for which
denormals are slow.  Float x*x/2 is denormal when |x| ~< 2**-64 and
x*x/2 is evaluated in float precision, so it can easily be denormal
for normal x.  This is even more interesting for general polynomial
evaluations.  Multiplying out large powers of x is normally a good
optimization since it reduces dependencies, but it creates denormals
starting with quite large x.
2008-01-21 13:46:21 +00:00
bde
ef5ed15ee4 Oops, when merging from the float version to the double versions, don't
forget to translate "float" to "double".

ucbtest didn't detect the bug, but exhaustive testing of the float
case relative to the double case eventually did.  The bug only affects
args x with |x| ~> 2**19*(pi/2) on non-i386 (i386 is broken in a
different way for large args).
2008-01-20 04:09:44 +00:00
bde
b3048e4365 Remove the float version of the kernel of arg reduction for pi/2, since
it should never have existed and it has not been used for many years
(floats are reduced faster using doubles).  All relevant changes (just
the workaround for broken assignment) have been merged to the double
version.
2008-01-19 22:50:50 +00:00
bde
2005bbb395 Do an ordinary assignment in STRICT_ASSIGN() except for floats until
there is a problem with non-floats (when i386 defaults to extra
precision).  This essentially restores yesterday's behaviour for doubles
on i386 (since generic rint() isn't used and everywhere else assumed
working assignment), but for arches that use the generic rint() it
finishes restoring some of 1995's behaviour (don't waste time doing
unnecessary store/load).
2008-01-19 22:05:14 +00:00
bde
8499893373 Use STRICT_ASSIGN() for exp2f() and exp2() instead of a volatile
variable hack for exp2f() only.

The volatile variable had a surprisingly large cost for exp2f() -- 19
cycles or 15% on i386 in the worst case observed.  This is only partly
explained by there being several references to the variable, only one
of which benefited from it being volatile.  Arches that have working
assignment are likely to benefit even more from not having any volatile
variable.

exp2() now has a chance of working with extra precision on i386.

exp2() has even more references to the variable, so it would have been
pessimized more by simply declaring the variable as volatile.  Even
the temporary volatile variable for STRICT_ASSIGN costs 5-10% on i386,
(A64) so I will change STRICT_ASSIGN() to do an ordinary assignment
until i386 defaults to extra precision.
2008-01-19 21:37:14 +00:00
bde
3b6da800e8 Use STRICT_ASSIGN() for _kernel_rem_pio2f() and _kernel_rem_pio2f()
instead of a volatile cast hack for the float version only.  The cast
hack broke with gcc-4, but this was harmless since the float version
hasn't been used for a few years.  Merge from the float version so
that the double version has a chance of working on i386 with extra
precision.

See k_rem_pio2f.c rev.1.8 for the original hack.

Convert to _FBSDID().
2008-01-19 20:02:55 +00:00
bde
6d3cabaca6 Use STRICT_ASSIGN() for log1pf() and log1p() instead of a volatile cast
hack for log1pf() only.  The cast hack broke with gcc-4, resulting in
~1 million errors of more than 1 ulp, with a maximum error of ~1.5 ulps.
Now the maximum error for log1pf() on i386 is 0.5034 ulps again (this
depends on extra precision), and log1p() has a chance of working with
extra precision.

See s_log1pf.c 1.8 for the original hack.  (It claims only 62343 large
errors).

Convert to _FBSDID().  Another thing broken with gcc-4 is the static
const hack used for rcsids.
2008-01-19 18:13:21 +00:00
bde
aaee2ef564 Use STRICT_ASSIGN() instead of assorted direct volatile hacks to work
around assignments not working for gcc on i386.  Now volatile hacks
for rint() and rintf() don't needlessly pessimize so many arches
and the remaining pessimizations (for arm and powerpc) can be avoided
centrally.

This cleans up after s_rint.c 1.3 and 1.13 and s_rintf.c 1.3 and 1.9:
- s_rint.c 1.13 broke 1.3 by only using a volatile cast hack in 1 place
  when it was needed in 2 places, and the volatile cast hack stopped
  working with gcc-4.  These bugs only affected correctness tests on
  i386 since i386 normally uses asm rint() and doesn't support the
  extra precision mode that would break assignments of doubles.
- s_rintf.c 1.9 improved(?) on 1.3 by using a volatile variable hack
  instead of an extra-precision variable hack, but it declared 2
  variables as volatile when only 1 variable needed to be volatile.
  This only affected speed tests on i386 since i386 uses asm rintf().
2008-01-19 16:37:57 +00:00