the same order that FreeBSD 6 and before did. Doug
White and the other bloodhounds at ISC discovered that
while FreeBSD 7's ordering of options was more efficient,
it caused some cable modem routers to ignore the
SYN-ACKs ordered in this fashion.
The placement of sackOK after the timestamp option seems
to be the critical difference:
FreeBSD 6:
<mss 1460,nop,wscale 1,nop,nop,timestamp 3512155768 0,sackOK,eol>
FreeBSD 7.0:
<mss 1460,nop,wscale 3,sackOK,timestamp 1370692577 0>
FreeBSD 7.0 + this change:
<mss 1460,nop,wscale 3,nop,nop,timestamp 7371813 0,sackOK,eol>
MFC after: 1 week
the provided trailers. This has been broken since revision 1.240.
Submitted by: Dan Nelson
PR: kern/120948
"sounds ok to me" from: phk
MFC after: 3 days
itself, not on the type of the file. As such, do a readlink to get
the symbolic link's contents and fail to match if the path isn't a
symbolic link.
Pointed out by: des@
can run on processors that don't have a FPU. This is typically the
case for Book E processors. While a tuned system will probably want
to use soft-float (or use a processor that has a FPU if the usage is
FP intensive enough), allowing hard-float on FPU-less systems gives
great portability and flexibility.
Obtained from: NetBSD
o Disable interrupts while not running U-Boot code. We clobber
registers that the U-Boot interrupt handlers assume to be
fixed as per the U-Boot register usage. At this time this only
applies to r14. U-Boot uses r2 now for what they used r29 for.
After we restore r14 in preparation of doing the syscall, we
re-enable interrupts. When we return from the syscall, we
disable interrupts and restore the callee-saved r14.
(link) address and the physical (load) address. Ideally, the mapping
between link and load addresses should be abstracted by the copyin(),
copyout() and readin() functions, so that we don't have to add kluges
in __elfN(loadimage)(). Then, we could also have paged virtual memory
for the kernel. This can be important under EFI, where you need to
allocate physical memory form the firmware if you want to work in all
scenarios.
o Move the API prototypes to a separate header (glue.h)
o Allow the platform to hint libuboot about where to look
for the API signature. The uboot_address variable is
expected to be defined by the platform.
in our find.
The following are nops because they aren't relevant to our find:
-ignore_readdir_race
-noignore_readdir_race
-noleaf
The following aliaes were created:
-gid -> -group [2]
-uid -> -user [2]
-wholename -> -path
-iwholename -> ipath
-mount -> -xdev
-d -> -depth [1]
The following new primaries were created:
-lname like -name, but matches symbolic links only)
-ilname like -lname but case insensitive
-quit exit(0)
-samefile returns true for hard links to the specified file
-true Always true
I changed one primary to match GNU find since I think our use of it violates
POLA
-false Always false (was an alias for -not!)
Also, document the '+' modifier for -execdir, as well as all of the above.
This was previously implemented.
Document the remaining 7 primaries that are in GNU find, but aren't yet
implemented in find(1)
[1] This was done in GNU find for compatibility with FreeBSD, yet they
mixed up command line args and primary args.
[2] -uid/-gid in GNU find ONLY takes a numeric arg, but that arg does the
normal range thing that. GNU find -user and -uid also take a numberic arg,
but don't do the range processing. find(1) does both for -user and -group,
so making -uid and -gid aliases is compatible for all non-error cases used
in GNU find. While not perfect emulation, this seems a reasonable thing
for us.
fabs(), a conditional branch, and sign adjustments of 3 variables for
x < 0 when the branch is taken. In double precision, even when the
branch is perfectly predicted, this saves about 10 cycles or 10% on
amd64 (A64) and i386 (A64) for the negative half of the range, but
makes little difference for the positive half of the range. In float
precision, it also saves about 4 cycles for the positive half of the
range on i386, and many more cycles in both halves on amd64 (28 in the
negative half and 11 in the positive half for tanf), but the amd64
times for float precision are anomalously slow so the larger
improvement is only a side effect.
Previous commits arranged for the x < 0 case to be handled simply:
- one part of the rounding method uses the magic number 0x1.8p52
instead of the usual 0x1.0p52. The latter is required for large |x|,
but it doesn't work for negative x and we don't need it for large |x|.
- another part of the rounding method no longer needs to add `half'.
It would have needed to add -half for negative x.
- removing the "quick check no cancellation" in the double precision
case removed the need to take the absolute value of the quadrant
number.
Add my noncopyright in e_rem_pio2.c
from a group without the need to perform the same operation by replacing
the existing list via the '-M' option. The '-M' option requires someone
to fetch the existing members with pw, deleting the undesired members from
the list and sending the altered list back to pw.
Approved by: wes (mentor)
MFC after: 5 days
- add support for T3C
- add DDP support (zero-copy receive)
- fix TOE transmit of large requests
- fix shutdown so that sockets don't remain in CLOSING state indefinitely
- register listeners when an interface is brought up after tom is loaded
- fix setting of multicast filter
- enable link at device attach
- exit tick handler if shutdown is in progress
- add helper for logging TCB
- add sysctls for dumping transmit queues
- note that TOE wxill not be MFC'd until after 7.0 has been finalized
MFC after: 3 days
consists of the null-terminated name and the contents of any structure
you wish to record. A new ktrstruct() function constructs and emits a
KTR_STRUCT record. It is accompanied by convenience macros for struct
stat and struct sockaddr.
In kdump(1), KTR_STRUCT records are handled by a dispatcher function
that runs stringent sanity checks on its contents before handing it
over to individual decoding funtions for each type of structure.
Currently supported structures are struct stat and struct sockaddr for
the AF_INET, AF_INET6 and AF_UNIX families; support for AF_APPLETALK
and AF_IPX is present but disabled, as I am unable to test it properly.
Since 's' was already taken, the letter 't' is used by ktrace(1) to
enable KTR_STRUCT trace points, and in kdump(1) to enable their
decoding.
Derived from patches by Andrew Li <andrew2.li@citi.com>.
PR: kern/117836
MFC after: 3 weeks
FP-to-FP method to round to an integer on all arches, and convert this
to an int using FP-to-integer conversion iff irint() is not available.
This is cleaner and works well on at least ia64, where it saves 20-30
cycles or about 10% on average for 9Pi/4 < |x| <= 32pi/2 (should be
similar up to 2**19pi/2, but I only tested the smaller range).
After the previous commit to e_rem_pio2.c removed the "quick check no
cancellation" non-optimization, the result of the FP-to-integer
conversion is not needed so early, so using irint() became a much
smaller optimization than when it was committed.
An earlier commit message said that cos, cosf, sin and sinf were equally
fast on amd64 and i386 except for cos and sin on i386. Actually, cos
and sin on amd64 are equally fast to cosf and sinf on i386 (~88 cycles),
while cosf and sinf on amd64 are not quite equally slow to cos and sin
on i386 (average 115 cycles with more variance).
9pi/2 < |x| < 32pi/2 since it is only a small or negative optimation
and it gets in the way of further optimizations. It did one more
branch to avoid some integer operations and to use a different
dependency on previous results. The branches are fairly predictable
so they are usually not a problem, so whether this is a good
optimization depends mainly on the timing for the previous results,
which is very machine-dependent. On amd64 (A64), this "optimization"
is a pessimization of about 1 cycle or 1%; on ia64, it is an
optimization of about 2 cycles or 1%; on i386 (A64), it is an
optimization of about 5 cycles or 4%; on i386 (Celeron P2) it is an
optimization of about 4 cycles or 3% for cos but a pessimization of
about 5 cycles for sin and 1 cycle for tan. I think the new i386
(A64) slowness is due to an pipeline stall due to an avoidable
load-store mismatch (so the old timing was better), and the i386
(Celeron) variance is due to its branch predictor not being too good.
the the double to int conversion operation which is very slow on these
arches. Assume that the current rounding mode is the default of
round-to-nearest and use rounding operations in this mode instead of
faking this mode using the round-towards-zero mode for conversion to
int. Round the double to an integer as a double first and as an int
second since the double result is needed much earler.
Double rounding isn't a problem since we only need a rough approximation.
We didn't support other current rounding modes and produce much larger
errors than before if called in a non-default mode.
This saves an average about 10 cycles on amd64 (A64) and about 25 on
i386 (A64) for x in the above range. In some cases the saving is over
25%. Most cases with |x| < 1000pi now take about 88 cycles for cos
and sin (with certain CFLAGS, etc.), except on i386 where cos and sin
(but not cosf and sinf) are much slower at 111 and 121 cycles respectivly
due to the compiler only optimizing well for float precision. A64
hardware cos and sin are slower at 105 cycles on i386 and 110 cycles
on amd64.
the same as lrint() except it returns int instead of long. Though the
extern lrint() is fairly fast on these arches, it still takes about
12 cycles longer than the inline version, and 12 cycles is a lot in
applications where [li]rint() is used to avoid slow conversions that
are only a couple of times slower.
This is only for internal use. The libm versions of *rint*() should
also be inline, but that would take would take more header engineering.
Implementing irint() instead of lrint() also avoids a conflict with
the extern declaration of the latter.
on i386 (A64), 5 cycles on amd64 (A64), and 3 cycles on ia64). gcc
tends to generate very bad code for accessing floating point values
as bits except when the integer accesses have the same width as the
floating point values, and direct accesses to bit-fields (as is common
only for long double precision) always gives such accesses. Use the
expsign access method, which is good for 80-bit long doubles and
hopefully no worse for 128-bit long doubles. Now the generated code
is less bad. There is still unnecessary copying of the arg on amd64
and i386 and mysterious extra slowness on amd64.
Check that only MREMAP_FIXED and MREMAP_MAYMOVE flags are specified.
Check for the page alignment of the addr argument.
Submitted by: rdivacky
MFC after: 1 week