performance.
- Always free to the alloc bucket if there is space. This gives LIFO
allocation order to improve hot-cache performance. This also allows
for zones with a single bucket per-cpu rather than a pair if the entire
working set fits in one bucket.
- Enable per-cpu caches of buckets. To prevent recursive bucket
allocation one bucket zone still has per-cpu caches disabled.
- Pick the initial bucket size based on a table driven maximum size
per-bucket rather than the number of items per-page. This gives
more sane initial sizes.
- Only grow the bucket size when we face contention on the zone lock, this
causes bucket sizes to grow more slowly.
- Adjust the number of items per-bucket to account for the header space.
This packs the buckets more efficiently per-page while making them
not quite powers of two.
- Eliminate the per-zone free bucket list. Always return buckets back
to the bucket zone. This ensures that as zones grow into larger
bucket sizes they eventually discard the smaller sizes. It persists
fewer buckets in the system. The locking is slightly trickier.
- Only switch buckets in zalloc, not zfree, this eliminates pathological
cases where we ping-pong between two buckets.
- Ensure that the thread that fills a new bucket gets to allocate from
it to give a better upper bound on allocation time.
Sponsored by: EMC / Isilon Storage Division
real shared object and libssp_nonshared.a.
This was the last showstopper that prevented from enabling SSP for ports
by default. portmgr@ performed a buildworld which showed no significant
breakage with this patch.
Details:
On i386 for PIC objects, gcc uses the __stack_chk_fail_local hidden
symbol instead of calling __stack_chk_fail directly [1]. This happen
not only with our gcc-4.2.1 but also with the latest gcc-4.8. If you
want the very nasty details, see [2].
OTOH the problem doesn't exist on other architectures. It also doesn't
exist with Clang as the latter will somehow manage to create the
function in the object file at compile time (contrary to only
referencing it through a symbol that will be brought in at link time).
In a perfect world, when an object file is compiled with
-fstack-protector, it will be linked into a binary or a DSO with this
same flag as well, so GCC will add libssp_nonshared.a to the linker
command-line. Unfortunately, we don't control softwares in ports and we
may have such broken DSO. This is the whole point of this patch.
You can reproduce the problem on i386 by compiling a source file into an
object file with "-fstack-protector-all -fPIE" and linking it
into a binary without "-fstack-protector".
This ld script automatically proposes libssp_nonshared.a along with the
real libc DSO to the linker. It is important to understand that the
object file contained in this library will be pulled in the resulting
binary _only if_ the linker notices one of its symbols is needed (i.e.
one of the SSP symbol is missing).
A theorical performance impact could be when compiling, but my testing
showed less than 0.1% of difference.
[1] For 32-bit code gcc saves the PIC register setup by using
__stack_chk_fail_local hidden function instead of calling
__stack_chk_fail directly. See comment line 19460 in:
src/contrib/gcc/config/i386/i386.c
[2] When compiling a source file to an object file, if you use something
which is external to the compilation unit, GCC doesn't know yet if
this symbol will be inside or outside the DSO. So it expects the
worst case and routes the symbol through the GOT, which means
additional space and extra relocation for rtld(1).
Declaring a symbol has hidden tells GCC to use the optimal route (no
GOT), but on the other hand this means the symbol has to be provided
in the same DSO (namely libssp_nonshared.a).
On i386, GCC actually uses an hidden symbol for SSP in PIC objects
to save PIC register setup, as said in [1].
PR: ports/138228
PR: ports/168010
Reviewed by: kib, kan
Based on r134760:
Reset the seek pointer to 0 when a file is successfully opened,
since otherwise the initial seek offset will contain the directory
offset of the filesystem block that contained its directory entry.
This bug was mostly harmless because typically the directory is
less than one filesystem block in size so the offset would be zero.
It did however generally break loading a kernel from the (large)
kernel compile directory.
Also reset the seek pointer when a new inode is opened in read_inode(),
though this is not actually necessary now because all callers set
it afterwards.
PR: 177328
Submitted by: Eric van Gyzen
Reviewed by: iedowse
MFC after: 5 days
Store/restore the VFP registers in setjmp/longjmp on ARM EABI if VFP is
enabled in the kernel. It checks the hw.floatingpoint sysctl to see if
floating-point is available and uses this to determine if it should store
them. If it does it uses a different magic value so longjmp is able to know
if it should load them.
libusbx deprecated libusb_get_port_path and replaced it with
libusb_get_port_numbers. The latter omits an extra parameter which was
unused in the FreeBSD implementation anyway.
with merge the functions but leave out the code to save/load the VFP
registers as that requires other changes to ensure the VFP is enabled
first.
This removes storing the old fpa registers. These were never fully
supported, and the only user of this code I can find have moved to newer
CPUs which use a VFP.
as a fairly faithful implementation of the algorithm found in
PTP Tang, "Table-driven implementation of the Expm1 function
in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 18,
211-222 (1992).
Over the last 18-24 months, the code has under gone significant
optimization and testing.
Reviewed by: bde
Obtained from: bde (most of the optimizations)
* Use integral numerical constants, and let the compiler do the
conversion to long double.
ld128/s_expl.c:
* Use integral numerical constants, and let the compiler do the
conversion to long double.
* Use the ENTERI/RETURNI macros, which are no-ops on ld128. This
however makes the ld80 and ld128 identical.
Reviewed by: bde (as part of larger diff)
* In the special case x = -Inf or -NaN, use a micro-optimization
to eliminate the need to access u.xbits.man.
* Fix an off-by-one for small arguments |x| < 0x1p-65.
ld128/s_expl.c:
* In the special case x = -Inf or -NaN, use a micro-optimization
to eliminate the need to access u.xbits.manh and u.xbits.manl.
* Fix an off-by-one for small arguments |x| < 0x1p-114.
Obtained from: bde
* Update the evaluation of the polynomial. This allows the removal
of the now unused variables t23 and t45.
ld128/s_expl.c:
* Update the evaluation of the polynomial and the intermediate
result t. This update allows several numerical constants to be
written as double rather than long double constants. Update
the constants as appropriate.
Obtained from: bde
and use macros to access the e component of the unions. This allows
the portions of the code in ld80 to be identical to the ld128 code.
Obtained from: bde
The names now coincide with the name used in PTP Tang's paper.
* Rename the variable from s to tbl to better reflect that
this is a table, and to be consistent with the naming scheme
in s_exp2l.c
Reviewed by: bde (as part of larger diff)
* Update Copyright years to include 2013.
ld128/s_expl.c:
* Correct and update Copyright years. This code originated from
the ld80 version, so it should reflect the same time period.
Reviewed by: bde (as part of larger diff)
I initially thought wchar_t was locale independent, but this seems to be
only the case on Linux. This means that we cannot depend on the *wc*()
routines to implement *c16*() and *c32*(). Instead, use the Citrus
libiconv that is part of libc.
I'll see if there is anything I can do to make the existing functions
somewhat useful in case the system is built without libiconv in the
nearby future. If not, I'll simply remove the broken implementations.
Reviewed by: jilles, gabor
identified, unify the code of check_deferred_signal() for all
architectures, making the variant under #ifdef x86 common.
Tested by: marius (sparc64)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks