freebsd-dev

Author	SHA1	Message	Date
Tim Kientzle	a26e9253f6	Optimize skipping over Zip entries. Thanks to: Dan Nelson, who sent me the patch MFC after: 7 days	2008-02-27 06:05:59 +00:00
Garrett Wollman	6ca61b39bb	stdio is currently limited to file descriptors not greater than {SHRT_MAX}, so {STREAM_MAX} should be no greater than that. (This does not exactly meet the letter of POSIX but comes reasonably close to it in spirit.) MFC after: 14 days	2008-02-27 05:56:57 +00:00
Ruslan Ermilov	a059c409c2	Added the "restrict" type-qualifier to the readlink() prototype.	2008-02-26 20:33:52 +00:00
Tim Kientzle	35f4ae0981	Rename the archive_endian.h functions to avoid name clashes with NetBSD's sys/endian.h file. Pointed out by: Joerg Sonnenberger	2008-02-26 07:17:47 +00:00
Bruce Evans	e822ea5b2a	Inline __ieee754__rem_pio2f(). On amd64 (A64) and i386 (A64), this gives an average speedup of about 12 cycles or 17% for 9pi/4 < \|x\| <= 2**19pi/2 and a smaller speedup for larger x, and a small speeddown for \|x\| <= 9pi/4 (only 1-2 cycles average, but that is 4%). Inlining this is less likely to bust caches than inlining the float version since it is much smaller (about 220 bytes text and rodata) and has many fewer branches. However, the float version was already large due to its manual inlining of the branches and also the polynomial evaluations.	2008-02-25 22:19:17 +00:00
Bruce Evans	c32951b16e	Use a temporary array instead of the arg array y[] for calling __kernel_rem_pio2(). This simplifies analysis of aliasing and thus results in better code for the usual case where __kernel_rem_pio2() is not called. In particular, when __ieee854_rem_pio2[f]() is inlined, it normally results in y[] being returned in registers. I couldn't get this to work using the restrict qualifier. In float precision, this saves 2-3% in most cases on amd64 and i386 (A64) despite it not being inlined in float precision yet. In double precision, this has high variance, with an average gain of 2% for amd64 and 0.7% for i386 (but a much larger gain for usual cases) and some losses.	2008-02-25 18:28:58 +00:00
Bruce Evans	70d818a20e	Change __ieee754_rem_pio2f() to return double instead of float so that this function and its callers cosf(), sinf() and tanf() don't waste time converting values from doubles to floats and back for \|x\| > 9pi/4. All these functions were optimized a few years ago to mostly use doubles internally and across the __kernel() interfaces but not across the __ieee754_rem_pio2f() interface. This saves about 40 cycles in cosf(), sinf() and tanf() for \|x\| > 9pi/4 on amd64 (A64), and about 20 cycles on i386 (A64) (except for cosf() and sinf() in the upper range). 40 cycles is about 35% for \|x\| < 9pi/4 <= 219pi/2 and about 5% for \|x\| > 2*19pi/2. The saving is much larger on amd64 than on i386 since the conversions are not easy to optimize except on i386 where some of them are automatic and others are optimized invalidly. amd64 is still about 10% slower in cosf() and tanf() in the lower range due to conversion overhead. This also gives a tiny speedup for \|x\| <= 9pi/4 on amd64 (by simplifying the code). It also avoids compiler bugs and/or additional slowness in the conversions on (not yet supported) machines where double_t != double.	2008-02-25 13:33:20 +00:00
Christian Brueffer	636133e3dd	Add missing words. MFC after: 3 days	2008-02-25 13:03:18 +00:00
Bruce Evans	0d1564b6c7	Fix some off-by-1 errors. e_rem_pio2.c: Float and double precision didn't work because init_jk[] was 1 too small. It needs to be 2 larger than you might expect, and 1 larger than it was for these precisions, since its test for recomputing needs a margin of 47 bits (almost 2 24-bit units). init_jk[] seems to be barely enough for extended and quad precisions. This hasn't been completely verified. Callers now get about 24 bits of extra precision for float, and about 19 for double, but only about 8 for extended and quad. 8 is not enough for callers that want to produce extra-precision results, but current callers have rounding errors of at least 0.8 ulps, so another 1/2**8 ulps of error from the reduction won't affect them much. Add a comment about some of the magic for init_jk[]. e_rem_pio2.c: Double precision worked in practice because of a compensating off-by-1 error here. Extended precision was asked for, and it executed exactly the same code as the unbroken double precision. e_rem_pio2f.c: Float precision worked in practice because of a compensating off-by-1 error here. Double precision was asked for, and was almost needed, since the cosf() and sinf() callers want to produce extra-precision results, at least internally so that their error is only 0.5009 ulps. However, the extra precision provided by unbroken float precision is enough, and the double-precision code has extra overheads, so the off-by-1 error cost about 5% in efficiency on amd64 and i386.	2008-02-25 11:43:20 +00:00
Rafal Jaworowski	56ae1bed48	Let PowerPC world optionally build with -msoft-float. For FPU-less PowerPC variations (e500 currently), this provides a gcc-level FPU emulation and is an alternative approach to the recently introduced kernel-level emulation (FPU_EMU). Approved by: cognet (mentor) MFp4: e500	2008-02-24 19:22:53 +00:00
Bruce Evans	60a50c2585	Optimize the 9pi/2 < \|x\| <= 2**19pi/2 case some more by avoiding an fabs(), a conditional branch, and sign adjustments of 3 variables for x < 0 when the branch is taken. In double precision, even when the branch is perfectly predicted, this saves about 10 cycles or 10% on amd64 (A64) and i386 (A64) for the negative half of the range, but makes little difference for the positive half of the range. In float precision, it also saves about 4 cycles for the positive half of the range on i386, and many more cycles in both halves on amd64 (28 in the negative half and 11 in the positive half for tanf), but the amd64 times for float precision are anomalously slow so the larger improvement is only a side effect. Previous commits arranged for the x < 0 case to be handled simply: - one part of the rounding method uses the magic number 0x1.8p52 instead of the usual 0x1.0p52. The latter is required for large \|x\|, but it doesn't work for negative x and we don't need it for large \|x\|. - another part of the rounding method no longer needs to add `half'. It would have needed to add -half for negative x. - removing the "quick check no cancellation" in the double precision case removed the need to take the absolute value of the quadrant number. Add my noncopyright in e_rem_pio2.c	2008-02-23 12:53:21 +00:00
Bruce Evans	dbf10e45c4	Avoid using FP-to-integer conversion for !(amd64 \|\| i386) too. Use the FP-to-FP method to round to an integer on all arches, and convert this to an int using FP-to-integer conversion iff irint() is not available. This is cleaner and works well on at least ia64, where it saves 20-30 cycles or about 10% on average for 9Pi/4 < \|x\| <= 32pi/2 (should be similar up to 2**19pi/2, but I only tested the smaller range). After the previous commit to e_rem_pio2.c removed the "quick check no cancellation" non-optimization, the result of the FP-to-integer conversion is not needed so early, so using irint() became a much smaller optimization than when it was committed. An earlier commit message said that cos, cosf, sin and sinf were equally fast on amd64 and i386 except for cos and sin on i386. Actually, cos and sin on amd64 are equally fast to cosf and sinf on i386 (~88 cycles), while cosf and sinf on amd64 are not quite equally slow to cos and sin on i386 (average 115 cycles with more variance).	2008-02-22 18:43:23 +00:00
Bruce Evans	7c1b5e7953	Remove the "quick check no cancellation" optimization for 9pi/2 < \|x\| < 32pi/2 since it is only a small or negative optimation and it gets in the way of further optimizations. It did one more branch to avoid some integer operations and to use a different dependency on previous results. The branches are fairly predictable so they are usually not a problem, so whether this is a good optimization depends mainly on the timing for the previous results, which is very machine-dependent. On amd64 (A64), this "optimization" is a pessimization of about 1 cycle or 1%; on ia64, it is an optimization of about 2 cycles or 1%; on i386 (A64), it is an optimization of about 5 cycles or 4%; on i386 (Celeron P2) it is an optimization of about 4 cycles or 3% for cos but a pessimization of about 5 cycles for sin and 1 cycle for tan. I think the new i386 (A64) slowness is due to an pipeline stall due to an avoidable load-store mismatch (so the old timing was better), and the i386 (Celeron) variance is due to its branch predictor not being too good.	2008-02-22 17:26:24 +00:00
Bruce Evans	43590b1517	Optimize the 9pi/2 < \|x\| <= 2**19pi/2 case on amd64 and i386 by avoiding the the double to int conversion operation which is very slow on these arches. Assume that the current rounding mode is the default of round-to-nearest and use rounding operations in this mode instead of faking this mode using the round-towards-zero mode for conversion to int. Round the double to an integer as a double first and as an int second since the double result is needed much earler. Double rounding isn't a problem since we only need a rough approximation. We didn't support other current rounding modes and produce much larger errors than before if called in a non-default mode. This saves an average about 10 cycles on amd64 (A64) and about 25 on i386 (A64) for x in the above range. In some cases the saving is over 25%. Most cases with \|x\| < 1000pi now take about 88 cycles for cos and sin (with certain CFLAGS, etc.), except on i386 where cos and sin (but not cosf and sinf) are much slower at 111 and 121 cycles respectivly due to the compiler only optimizing well for float precision. A64 hardware cos and sin are slower at 105 cycles on i386 and 110 cycles on amd64.	2008-02-22 15:55:14 +00:00
Bruce Evans	0ddfa46b44	Add an irint() function in inline asm for amd64 and i386. irint() is the same as lrint() except it returns int instead of long. Though the extern lrint() is fairly fast on these arches, it still takes about 12 cycles longer than the inline version, and 12 cycles is a lot in applications where [li]rint() is used to avoid slow conversions that are only a couple of times slower. This is only for internal use. The libm versions of rint() should also be inline, but that would take would take more header engineering. Implementing irint() instead of lrint() also avoids a conflict with the extern declaration of the latter.	2008-02-22 14:11:03 +00:00
Bruce Evans	f839bac29c	Optimize the conversion to bits a little (by about 11 cycles or 16% on i386 (A64), 5 cycles on amd64 (A64), and 3 cycles on ia64). gcc tends to generate very bad code for accessing floating point values as bits except when the integer accesses have the same width as the floating point values, and direct accesses to bit-fields (as is common only for long double precision) always gives such accesses. Use the expsign access method, which is good for 80-bit long doubles and hopefully no worse for 128-bit long doubles. Now the generated code is less bad. There is still unnecessary copying of the arg on amd64 and i386 and mysterious extra slowness on amd64.	2008-02-22 11:59:05 +00:00
Bruce Evans	a7aa8cc980	Optimize the fixup for +-0 by using better classification for this case and by using a table lookup to avoid a branch when this case occurs. On i386, this saves 1-4 cycles out of about 64 for non-large args.	2008-02-22 10:04:53 +00:00
Bruce Evans	33843eef65	Fix rintl() on signaling NaNs and unsupported formats.	2008-02-22 09:21:14 +00:00
David Schultz	5aa554c7e5	s/rcsid/__FBSDID/	2008-02-22 02:30:36 +00:00
David Schultz	fab324dfa4	Remove an unused variable.	2008-02-22 02:27:34 +00:00
David Schultz	7cd50f4d94	Eliminate some warnings.	2008-02-22 02:26:51 +00:00
Philip Paeps	a975b4b6f2	Note, as required by our agreement with IEEE/The Open Group, that the message queue manual pages excerpt the POSIX standard. Spotted by: Mindaugas Rasiukevicius <rmind -at- NetBSD.org> Reviewed by: imp MFC after: 1 day	2008-02-21 19:16:57 +00:00
Tim Kientzle	5b7a04161d	Sanity-check the block size. Thanks to: Joerg Sonnenberger MFC after: 7 days	2008-02-21 03:21:50 +00:00
Bruce Evans	f21d26becb	Merge cosmetic changes from e_rem_pio2.c 1.10 (convert to __FBSDID(); fix indentation and return type of __ieee754_rem_pio2()). Remove unused variables.	2008-02-19 15:42:46 +00:00
Bruce Evans	9e9d3bc9f1	Optimize for 3pi/4 <= \|x\| <= 9pi/4 in much the same way as for pi/4 <= \|x\| <= 3pi/4. Use the same branch ladder as for float precision. Remove the optimization for \|x\| near pi/2 and don't do it near the multiples of pi/2 in the newly optimized range, since it requires fairly large code to handle only relativley few cases. Ifdef out optimization for \|x\| <= pi/4 since this case can't occur because it is done in callers. On amd64 (A64), for cos() and sin() with uniformly distributed args, no cache misses, some parallelism in the caller, and good but not great CC and CFLAGS, etc., this saves about 40 cycles or 38% in the newly optimized range, or about 27% on average across the range \|x\| <= 2pi (~65 cycles for most args, while the A64 hardware fcos and fsin take ~75 cycles for half the args and 125 cycles for the other half). The speedup for tan() is much smaller, especially relatively. The speedup on i386 (A64) is slightly smaller, especially relatively. i386 is still much slower than amd64 here (unlike in the float case where it is slightly faster).	2008-02-19 15:30:58 +00:00
Bruce Evans	9ce8756044	Rearrange the polynomial evaluation for better parallelism. This saves an average of about 8 cycles or 5% on A64 (amd64 and i386 -- more in cycles but about the same percentage on i386, and more with old versions of gcc) with good CFLAGS and some parallelism in the caller. As usual, it takes a couple more multiplications so it will be slower on old machines. Convert to __FBSDID().	2008-02-19 12:54:14 +00:00
Tim Kientzle	b3fa7a9568	Include O_BINARY in open() calls on platforms that support it.	2008-02-19 06:10:48 +00:00
Tim Kientzle	dc4a55fdfc	Another tiny, tiny step towards Windows support. No, I don't plan to ever commit the Windows support files to FreeBSD CVS. That would just be wrong.	2008-02-19 06:06:13 +00:00
Tim Kientzle	54c845efb9	Someday I might forgive the standards bodies for omitting timegm(). Maybe. In the meantime, my workarounds for trying to coax UTC without timegm() are getting uglier and uglier. Apparently, some systems don't support setenv()/unsetenv(), so you can't set the TZ env var and hope thereby to coax mktime() into generating UTC. Without that, I don't see a really good alternative to just giving up and converting to localtime with mktime(). (I suppose I should research the Perl library approach for computing an inverse function to gmtime(); that might actually be simpler than this growing list of hacks.)	2008-02-19 06:02:01 +00:00
Tim Kientzle	334a6ee707	Simplify file type setting.	2008-02-19 05:54:24 +00:00
Tim Kientzle	4d9cfd1eb7	The test_assert() function that backs my custom assert() macro now returns a value, which supports such convenient constructs as: if (assert(NULL != foo())) { } Also be careful to setlocale("C") for each new test to avoid locale pollution. Also a couple of minor portability enhancements.	2008-02-19 05:52:30 +00:00
Tim Kientzle	5c5430972a	Portability: Since the values are fixed and the symbolic names are only present on some platforms, just use the values directly.	2008-02-19 05:49:02 +00:00
Tim Kientzle	98ef1f2ddb	Portability: Include O_BINARY if the local platform defines it.	2008-02-19 05:46:58 +00:00
Tim Kientzle	f167d4f9c3	Correct a compile error when libbz2/zlib are unavailable.	2008-02-19 05:44:59 +00:00
Tim Kientzle	ee10f0feb0	Mark a few additional functions that are/are not available on FreeBSD.	2008-02-19 05:40:28 +00:00
Tim Kientzle	75018fc592	Portability improvements: * If the platform can't restore char nodes, block nodes, or fifos, don't try and just return error. * Include O_BINARY in most open() calls (define O_BINARY to 0 if the platform doesn't provide a definition already) * Refactor the ownership restore to more cleanly support platforms that don't have any form of {l,f,}chown() call. * Comment a lingering issue with older Unix-like systems that allow root to hose the filesystem. I don't (yet) have a good solution for this, but I expect it will require adding more redundant stat() calls. <sigh> MFC after: 14 days	2008-02-19 05:39:35 +00:00
David Schultz	345241c5e0	Document return values better.	2008-02-18 19:02:49 +00:00
David Schultz	71c11dd528	Add tgammaf() as a simple wrapper around tgamma().	2008-02-18 17:27:11 +00:00
Bruce Evans	be396b71c1	2 long double constants were missing L suffixes. This helped break tanl() on !(amd64 \|\| i386). It gave slightly worse than double precision in some cases. tanl() now passes tests of 2^24 values on ia64.	2008-02-18 15:39:52 +00:00
Bruce Evans	19a9e1bb1c	Fix a typo which broke k_tanl.c on !(amd64 \|\| i386).	2008-02-18 14:09:41 +00:00
Bruce Evans	38662c9698	Inline __ieee754__rem_pio2(). With gcc4-2, this gives an average optimization of about 10% for cos(x), sin(x) and tan(x) on \|x\| < 2*19pi/2. We didn't do this before because __ieee754__rem_pio2() is too large and complicated for gcc-3.3 to inline very well. We don't do this for float precision because it interferes with optimization of the usual (?) case (\|x\| < 9pi/4) which is manually inlined for float precision only. This has some rough edges: - some static data is duplicated unnecessarily. There isn't much after the recent move of large tables to k_rem_pio2.c, and some static data is duplicated to good affect (all the data static const, so that the compiler can evaluate expressions like 2*pio2 at compile time and generate even more static data for the constant for this). - extern inline is used (for the same reason as in previous inlining of k_cosf.c etc.), but C99 apparently doesn't allow extern inline functions with static data, and gcc will eventually warn about this. Convert to __FBSDID(). Indent __ieee754_rem_pio2()'s declaration consistently (its style was made inconsistent with fdlibm a while ago, so complete this). Fix __ieee754_rem_pio2()'s return type to match its prototype. Someone changed too many ints to int32_t's when fixing the assumption that all ints are int32_t's.	2008-02-18 14:02:12 +00:00
Kevin Lo	8f9872ccb3	getopt(3) returns -1, not EOF.	2008-02-18 03:19:25 +00:00
David Schultz	842d1d5c98	Use volatile hacks to make sure exp() generates an underflow exception when it's supposed to. Previously, gcc -O2 was optimizing away the statement that generated it.	2008-02-17 21:53:19 +00:00
Jason Evans	1945c7bd47	Fix a race condition in arena_ralloc() for shrinking in-place large reallocation, when junk filling is enabled. Junk filling must occur prior to shrinking, since any deallocated trailing pages are immediately available for use by other threads. Reported by: Mats Palmgren <mats.palmgren@bredband.net>	2008-02-17 18:34:17 +00:00
Jason Evans	196d0d4b59	Remove support for lazy deallocation. Benchmarks across a wide range of allocation patterns, number of CPUs, and MALLOC_OPTIONS settings indicate that lazy deallocation has the potential to worsen throughput dramatically. Performance degradation occurs when multiple threads try to clear the lazy free cache simultaneously. Various experiments to avoid this bottleneck failed to completely solve this problem, while adding yet more complexity.	2008-02-17 17:09:24 +00:00
David Schultz	234b60cd97	Hook up sinl(), cosl(), and tanl() to the build.	2008-02-17 07:33:51 +00:00
David Schultz	8e77cc6431	Add implementations of sinl(), cosl(), and tanl(). Submitted by: Steve Kargl <sgk@apl.washington.edu>	2008-02-17 07:33:12 +00:00
David Schultz	f869a8c5f3	Documentation for sinl(), cosl(), and tanl().	2008-02-17 07:32:44 +00:00
David Schultz	61f955827d	Add kernel functions for 128-bit long doubles. These could be improved a bit, but access to a freebsd/sparc64 machine is needed. Submitted by: bde and Steve Kargl <sgk@apl.washington.edu> (earlier version)	2008-02-17 07:32:31 +00:00
David Schultz	de336b0c5e	Add kernel functions for 80-bit long doubles. Many thanks to Steve and Bruce for putting lots of effort into these; getting them right isn't easy, and they went through many iterations. Submitted by: Steve Kargl <sgk@apl.washington.edu> with revisions from bde	2008-02-17 07:32:14 +00:00

1 2 3 4 5 ...

11521 Commits