transfer timeouts that typically cause a transfer to be completed
twice, resulting in panics and page faults:
o A transfer completion interrupt could arrive while an abort_task
event was set up, so the transfer would be aborted after it had
completed. This is very easy to reproduce. Fix this by setting
the transfer status to USBD_TIMEOUT before scheduling the
abort_task so that the transfer completion code will ignore it.
o The transfer completion code could execute concurrently with the
timeout callout, leaving the callout blocked (e.g. waiting for
Giant) while the transfer completion code runs. In this case,
callout_stop() does not prevent the callout from running, so
again the timeout code would run after the transfer was complete.
Handle this case by checking the return value from callout_stop(),
and ignoring the transfer if the callout could not be removed.
o Finally, protect against a timeout callout occurring while a
transfer is being aborted by another process. Here we arrange
for the timeout processing to ignore the transfer, and use
callout_drain() to ensure that the callout has really gone before
completing the transfer.
This was tested by repeatedly performing USB transfers with a timeout
set to approximately the same as the normal transfer completion
time. In the PR below, apparently this occurred by accident with a
particular printer and the default timeout.
PR: kern/71491
changes associated with adding System V IPC support. This will prevent
old modules from being used with the new kernel, and new modules from
being used with the old kernel.
appropriate for different tag requirements. With the former global pool,
bounce pages might get allocated that are appropriate for one tag, but not
appropriate for another, but the system had no way to distinguish between them.
Now zones with distinct attributes are created to hold pages, and each tag
that requires bouncing is associated with a zone. New zones are created as
needed if no existing zones can meet the requirements of the tag. Stats for
each zone are tracked via the hw.busdma sysctl node.
This should help drivers that are failing with mysterious data corruption.
MFC After: 1 week
includes the latter, but also declares variables which are defined
in kern/subr_param.c).
Change som VM parameters from quad_t to unsigned long. They refer to
quantities (size limits for text, heap and stack segments) which must
necessarily be smaller than the size of the address space, so long is
adequate on all platforms.
MFC after: 1 week
sensitive, but less excercised location (the watchdog). While here use the
*_start_locked function directly to avoid drop, grab, drop lock.
I have to be very careful with future ALTQ patches!
Found & reviewed by: rwatson
MFC after: 3 days
The tunable vfs.devfs.fops controls this feature and defaults to off.
When enabled (vfs.devfs.fops=1 in loader), device vnodes opened
through a filedescriptor gets a special fops vector which instead
of the detour through the vnode layer goes directly to DEVFS.
Amongst other things this allows us to run Giant free read/write to
device drivers which have been weaned off D_NEEDGIANT.
Currently this means /dev/null, /dev/zero, disks, (and maybe the
random stuff ?)
On a 700MHz K7 machine this doubles the speed of
dd if=/dev/zero of=/dev/null bs=1 count=1000000
This roughly translates to shaving 2usec of each read/write syscall.
The poll/kqfilter paths need more work before they are giant free,
this work is ongoing in p4::phk_bufwork
Please test this and report any problems, LORs etc.
Change the spelling of the "catch" option to be consistent with the new
options. Implement the "no wait" option. An implementation of the "CPU
private" for i386 will be committed at a later date.
This generates a KTR event for each critical section entered and exited.
It would be desirable to also log the filename and line number of the
source entering or exiting the critical section, but this requires
hacking up the critical section API, so I've not done that yet.
retain the pcbinfo lock until we're done using a pcb in the in-bound
path, as the pcbinfo lock acts as a pseuo-reference to prevent the pcb
from potentially being recycled. Clean up assertions and make sure to
assert that the pcbinfo is locked at the head of code subsections where
it is needed. Free the mbuf at the end of tcp_input after releasing
any held locks to reduce the time the locks are held.
MFC after: 3 weeks
number of entries into bucket_zone_lookup(), which helps make more
clear the logic of consumers of bucket zones.
Annotate the behavior of bucket_init() with a comment indicating
how the various data structures, including the bucket lookup tables,
are initialized.
swapoff: failed to locate %d swap blocks
The race occurred because putpages() can block between the time it
allocates swap space and the time it updates the swap metadata to
associate that space with a vm_object, so swapoff() would complain
about the temporary inconsistency. I hoped to fix this by making
swp_pager_getswapspace() and swp_pager_meta_build() a single atomic
operation, but that proved to be inconvenient. With this change,
swapoff() simply doesn't attempt to be so clever about detecting when
all the pageout activity to the target device should have drained.
because this call is only needed to wake threads that slept when they
discovered a dead object connected to a vnode. To eliminate unnecessary
calls to wakeup() by vnode_pager_dealloc(), introduce a new flag,
OBJ_DISCONNECTWNT.
Reviewed by: tegge@
Expose some of the amd64-specific sysarch functions to allow alternative
implementations of the %fs/%gs code for TLS, threads, etc. USER_LDT does
not exist on the amd64 kernel, so we have to implement things other ways.
priority. The sleep queues don't get updated when the priority of
threads changes, so sleepq_signal() might not always wakeup the
highest priority thread. Updating the queues when thread priorities
change cannot be easily done due to lock orders, so instead we do an
O(n) walk of the queue for a sleepq_signal() operation instead of O(1).
On the other hand, adding a thread to a sleep queue now goes from O(n)
to O(1) so it ends up as an even tradeoff. The correctness here with
regards to priorities is actually fairly important. msleep() gives
interactive threads their priority "boost" after they are placed on the
queue, but before this fix that "boost" wasn't used to determine the
highest priority thread that sleepq_signal() awoke.
- Fix up some comments.
Inspired by: ups, bde
associated with each processor. This ID is inferred from the index
of the pcs structure in the hwprb.
- Give Alpha CPUs FreeBSD CPU IDs more like other architectures where the
boot processor is always CPU 0 and the other processors are numbered
1 ... N. List active CPUs in the system in cpu_mp_announce() as well.
Silence on: alpha@
thread is created rather than adjusting the priority in the main
function. (kthread_create() should probably take the initial priority
as an argument.)
- Only yield the CPU in the !PREEMPTION case if there are any other
runnable threads. Yielding when there isn't anything else better to do
just wastes time in pointless context switches (albeit while the system
is idle.)
- Tweak the updating of the ithread name in ithread_update() so that the
'+' and '*' characters for device names that were too short only get
added at the end after as many device names as possible were fit into
the allocated space. Prior to this, some long devices would result
in '+' chars showing up between two different devices rather than at the
end.
seconds of idling, DRITY flags are removed.
- If mirror is in idle state or is not open for writing, sleep without
timeout when waiting for I/O requests.
- Don't use atomic operations, for now sysctls are protected by Giant.
- Update debugs.
- Fix for good (I hope) force-stopping mirrors and some filure cases
(e.g. the last good component dies when synchronization is in progress).
Don't use ->nstart/->nend consumer's fields, as this could be racy,
because those fields are used in g_down/g_up, use ->index consumer's
field instead for tracking number of not finished requests.
Reported by: marcel
- After 5 seconds of idle time (this should be configurable) mark all
dirty providers as clean, so when mirror is not used in 5 seconds
and there will be power failure, no synchronization on boot is needed.
Idea from: sorry, I can't find who suggested this
- When there are no ACTIVE components and no NEW components destroy whole
mirror, not only provider.
- Fix one debug to show information about I/O request, before we change
its command.
- Eliminate an initialized but unused variable.
- Eliminate an unnecessary call to clear the page's PG_BUSY flag. (The
call to vm_page_rename() already clears the page's PG_BUSY flag through
its call to vm_page_remove().)
In order to avoid livelock, swapoff() skips over objects with a
nonzero pip count and makes another pass if necessary. Since it is
impossible to know which objects we care about, it would choose an
arbitrary object with a nonzero pip count and wait for it before
making another pass, the theory being that this object would finish
paging about as quickly as the ones we care about. Unfortunately,
we may have slept since we acquired a reference to this object.
Hack around this problem by tsleep()ing on the pointer anyway, but
timeout after a fixed interval. More elegant solutions are possible,
but the ones I considered unnecessarily complicate this rare case.
Also, kill some nits that seem to have crept into the swapoff() code
in the last 75 revisions or so:
- Don't pass both sp and sp->sw_used to swap_pager_swapoff(), since
the latter can be derived from the former.
- Replace swp_pager_find_dev() with something simpler. There's no
need to iterate over the entire list of swap devices just to determine
if a given block is assigned to the one we're interested in.
- Expand the scope of the swhash_mtx in a couple of places so that it
isn't released and reacquired once for every hash bucket.
- Don't drop the swhash_mtx while holding a reference to an object.
We need to lock the object first. Unfortunately, doing so would
violate the established lock order, so use VM_OBJECT_TRYLOCK() and
try again on a subsequent pass if the object is already locked.
- Refactor swp_pager_force_pagein() and swap_pager_swapoff() a bit.
syslog(3) if we are a priveleged program (sshd, su, etc.).
- Make syslogd open an additional socket /var/run/logpriv, with 0600
permissions.
- In libc, try to use this socket.
- Do not loop forever if we are using this socket (partial backout of 1.31)
Reviewed by: dwmalone, Andrea Campi <andrea webcom it>
Approved by: julian (mentor)
MFC after: 1 month
out c->c_func, we can't take it after callout_stop(). To take it before
we need to acquire callout_lock, to avoid race. This commit narrows
down area where lock is held, but hack is still present.
This should be redesigned.
Approved by: julian (mentor)
specifiers.
This helps the port team to decide whether to use local patch for
applications that makes use of these GNU extensions (and hopefully we
can get rid of these patches finally)
Requested by: marcus
destined for a blackhole route.
This also means that blackhole routes do not need to be bound to lo(4)
or disc(4) interfaces for the net.inet.ip.fastforwarding=1 case.
Submitted by: james at towardex dot com
Sponsored by: eXtensible Open Router Project <URL:http://www.xorp.org/>
MFC after: 3 weeks
udp_in6, and udp_ip6 to pass socket address state between udp_input(),
udp_append(), and soappendaddr_locked(). While file in the default
configuration, when running with multiple netisrs or direct ithread
dispatch, this can result in races wherein user processes using
recvmsg() get back the wrong source IP/port. To correct this and
related races:
- Eliminate udp_ip6, which is believed to be generated but then never
used. Eliminate ip_2_ip6_hdr() as it is now unneeded.
- Eliminate setting, testing, and existence of 'init' status fields
for the IPv6 structures. While with multiple UDP delivery this
could lead to amortization of IPv4 -> IPv6 conversion when
delivering an IPv4 UDP packet to an IPv6 socket, it added
substantial complexity and side effects.
- Move global structures into the stack, declaring udp_in in
udp_input(), and udp_in6 in udp_append() to be used if a conversion
is required. Pass &udp_in into udp_append().
- Re-annotate comments to reflect updates.
With this change, UDP appears to operate correctly in the presence of
substantial inbound processing parallelism. This solution avoids
introducing additional synchronization, but does increase the
potential stack depth.
Discovered by: kris (Bug Magnet)
MFC after: 3 weeks
motherboard, in practice the changes resulted in many false positives for
heavy network loads, etc. resulting in poor performance. Also, the
motherboard referenced in the 1.109 log has other problems and simply does
not seem to work with the APIC enabled even with the changes in 1.109. The
correct fix for that board seems to be to not use the APIC at all. One
thing kept from 1.109 is that throttled interrupts are now effectively
polled on every clock tick rather than just 10 times per second.
MFC after: 1 month
Tested by: Shunsuke SHINOMIYA shino at fornext dot org
need for most calls to vm_page_busy(). Specifically, most calls to
vm_page_busy() occur immediately prior to a call to vm_page_remove().
In such cases, the containing vm object is locked across both calls.
Consequently, the setting of the vm page's PG_BUSY flag is not even
visible to other threads that are following the synchronization
protocol.
This change (1) eliminates the calls to vm_page_busy() that
immediately precede a call to vm_page_remove() or functions, such as
vm_page_free() and vm_page_rename(), that call it and (2) relaxes the
requirement in vm_page_remove() that the vm page's PG_BUSY flag is
set. Now, the vm page's PG_BUSY flag is set only when the vm object
lock is released while the vm page is still in transition. Typically,
this is when it is undergoing I/O.