uudecode into the main test driver and invoking it just-in-time
within the various tests.
Also, incorporate a number of improvements to the main test support
code that have proven useful on other projects where I've used this
framework.
(left over from when the unified read/write structure was copied
to form separate read and write structures) and eliminate the
pointless initialization of a couple of the unused fields.
Solaris and AIX.
fcntl(fd, F_DUP2FD, arg) and dup2(fd, arg) are functionnaly equivalent.
Document it.
Add some regression tests (identical to the dup2(2) regression tests).
PR: 120233
Submitted by: Jukka Ukkonen
Approved by: rwaston (mentor)
MFC after: 1 month
Significant changes:
- rev. 1.11: Use PRId64 instead of a cast to long long and %lld to print
an int64_t.
- rev. 1.12: Fix a bug that humanize_number() produces "1000" where it
should be "1.0G" or "1.0M". The bug reported by Greg Troxel.
PR: 118461
PR: 102694
Approved by: rwatson (mentor)
Obtained from: NetBSD
MFC after: 1 month
that there might be starvations, but because we have already locked the
thread, the cpuset settings will always be done before the new thread
does real-world work.
we set scheduling parameters and cpu binding fully in userland, and
because default scheduling policy is SCHED_RR (time-sharing), we set
default sched_inherit to PTHREAD_SCHED_INHERIT, this saves a system
call.
however if current thread is executing cancellation handler, signal
SIGCANCEL may have already been blocked, this is unexpected, unblock the
signal in new thread if this happens.
MFC after: 1 week
and assignment.
- Add a reference to a struct cpuset in each thread that is inherited from
the thread that created it.
- Release the reference when the thread is destroyed.
- Add prototypes for syscalls and macros for manipulating cpusets in
sys/cpuset.h
- Add syscalls to create, get, and set new numbered cpusets:
cpuset(), cpuset_{get,set}id()
- Add syscalls for getting and setting affinity masks for cpusets or
individual threads: cpuid_{get,set}affinity()
- Add types for the 'level' and 'which' parameters for the cpuset. This
will permit expansion of the api to cover cpu masks for other objects
identifiable with an id_t integer. For example, IRQs and Jails may be
coming soon.
- The root set 0 contains all valid cpus. All thread initially belong to
cpuset 1. This permits migrating all threads off of certain cpus to
reserve them for special applications.
Sponsored by: Nokia
Discussed with: arch, rwatson, brooks, davidxu, deischen
Reviewed by: antoine
e_rem_pio2.c:
This case goes up to about 2**20pi/2, but the comment about it said that
it goes up to about 2**19pi/2.
It went too far above 2**pi/2, giving a multiplier fn with 21 significant
bits in some cases. This would be harmful except for a numerical
accident. It happens that the terms of the approximation to pi/2,
when rounded to 33 bits so that multiplications by 20-bit fn's are
exact, happen to be rounded to 32 bits so multiplications by 21-bit
fn's are exact too, so the bug only complicates the error analysis (we
might lose a bit of accuracy but have bits to spare).
e_rem_pio2f.c:
The bogus comment in e_rem_pio2.c was copied and the code was changed
to be bug-for-bug compatible with it, except the limit was made 90
ulps smaller than necessary. The approximation to pi/2 was not
modified except for discarding some of it.
The same rough error analysis that justifies the limit of 2**20pi/2
for double precision only justifies a limit of 2**18pi/2 for float
precision. We depended on exhaustive testing to check the magic numbers
for float precision. More exaustive testing shows that we can go up
to 2**28pi/2 using a 53+25 bit approximation to pi/2 for float precision,
with a the maximum error for cosf() and sinf() unchanged at 0.5009
ulps despite the maximum error in rem_pio2f being ~0.25 ulps. Implement
this.
This reduces the size of a statically-linked binary by approximately 100KB
in a trivial "return (0)" test application. readelf -S was used to verify
that the .text section was reduced and that using strlen() saved a few
more bytes over using sizeof(). Since the section of code is only called
when environ is corrupt (program bug), I went with fewer bytes over fewer
cycles.
I made minor edits to the submitted patch to make the output resemble
warnx().
Submitted by: kib bz
Approved by: wes (mentor)
MFC after: 5 days
them. Thus, any fd whose value is greater than SHRT_MAX is handled
incorrectly (the short value is sign-extended when converted to an int).
An unpleasant side effect is that if fopen() opens a file and gets a
backing fd that is greater than SHRT_MAX, fclose() will fail and the file
descriptor will be leaked. Better handle this by fixing fopen(), fdopen(),
and freopen() to fail attempts to use a fd greater than SHRT_MAX with
EMFILE.
At some point in the future we should look at expanding the file descriptor
in FILE to an int, but that is a bit complicated due to ABI issues.
MFC after: 1 week
Discussed on: arch
Reviewed by: wollman
{SHRT_MAX}, so {STREAM_MAX} should be no greater than that. (This
does not exactly meet the letter of POSIX but comes reasonably close
to it in spirit.)
MFC after: 14 days
gives an average speedup of about 12 cycles or 17% for
9pi/4 < |x| <= 2**19pi/2 and a smaller speedup for larger x, and a
small speeddown for |x| <= 9pi/4 (only 1-2 cycles average, but that
is 4%).
Inlining this is less likely to bust caches than inlining the float
version since it is much smaller (about 220 bytes text and rodata) and
has many fewer branches. However, the float version was already large
due to its manual inlining of the branches and also the polynomial
evaluations.
__kernel_rem_pio2(). This simplifies analysis of aliasing and thus
results in better code for the usual case where __kernel_rem_pio2()
is not called. In particular, when __ieee854_rem_pio2[f]() is inlined,
it normally results in y[] being returned in registers. I couldn't
get this to work using the restrict qualifier.
In float precision, this saves 2-3% in most cases on amd64 and i386
(A64) despite it not being inlined in float precision yet. In double
precision, this has high variance, with an average gain of 2% for
amd64 and 0.7% for i386 (but a much larger gain for usual cases) and
some losses.
this function and its callers cosf(), sinf() and tanf() don't waste time
converting values from doubles to floats and back for |x| > 9pi/4.
All these functions were optimized a few years ago to mostly use doubles
internally and across the __kernel*() interfaces but not across the
__ieee754_rem_pio2f() interface.
This saves about 40 cycles in cosf(), sinf() and tanf() for |x| > 9pi/4
on amd64 (A64), and about 20 cycles on i386 (A64) (except for cosf()
and sinf() in the upper range). 40 cycles is about 35% for |x| < 9pi/4
<= 2**19pi/2 and about 5% for |x| > 2**19pi/2. The saving is much
larger on amd64 than on i386 since the conversions are not easy to
optimize except on i386 where some of them are automatic and others
are optimized invalidly. amd64 is still about 10% slower in cosf()
and tanf() in the lower range due to conversion overhead.
This also gives a tiny speedup for |x| <= 9pi/4 on amd64 (by simplifying
the code). It also avoids compiler bugs and/or additional slowness
in the conversions on (not yet supported) machines where double_t !=
double.
e_rem_pio2.c:
Float and double precision didn't work because init_jk[] was 1 too small.
It needs to be 2 larger than you might expect, and 1 larger than it was
for these precisions, since its test for recomputing needs a margin of
47 bits (almost 2 24-bit units).
init_jk[] seems to be barely enough for extended and quad precisions.
This hasn't been completely verified. Callers now get about 24 bits
of extra precision for float, and about 19 for double, but only about
8 for extended and quad. 8 is not enough for callers that want to
produce extra-precision results, but current callers have rounding
errors of at least 0.8 ulps, so another 1/2**8 ulps of error from the
reduction won't affect them much.
Add a comment about some of the magic for init_jk[].
e_rem_pio2.c:
Double precision worked in practice because of a compensating off-by-1
error here. Extended precision was asked for, and it executed exactly
the same code as the unbroken double precision.
e_rem_pio2f.c:
Float precision worked in practice because of a compensating off-by-1
error here. Double precision was asked for, and was almost needed,
since the cosf() and sinf() callers want to produce extra-precision
results, at least internally so that their error is only 0.5009 ulps.
However, the extra precision provided by unbroken float precision is
enough, and the double-precision code has extra overheads, so the
off-by-1 error cost about 5% in efficiency on amd64 and i386.