- Use the mutex in hardclock to ensure no races between it and
softclock.
- Make softclock be INTR_MPSAFE and provide a flag,
CALLOUT_MPSAFE, which specifies that a callout handler does not
need giant. There is still no way to set this flag when
regstering a callout.
Reviewed by: -smp@, jlemon
Removed most of the hacks that were trying to deal with low-memory
situations prior to now.
The new code is based on the concept that I/O must be able to function in
a low memory situation. All major modules related to I/O (except
networking) have been adjusted to allow allocation out of the system
reserve memory pool. These modules now detect a low memory situation but
rather then block they instead continue to operate, then return resources
to the memory pool instead of cache them or leave them wired.
Code has been added to stall in a low-memory situation prior to a vnode
being locked.
Thus situations where a process blocks in a low-memory condition while
holding a locked vnode have been reduced to near nothing. Not only will
I/O continue to operate, but many prior deadlock conditions simply no
longer exist.
Implement a number of VFS/BIO fixes
(found by Ian): in biodone(), bogus-page replacement code, the loop
was not properly incrementing loop variables prior to a continue
statement. We do not believe this code can be hit anyway but we
aren't taking any chances. We'll turn the whole section into a
panic (as it already is in brelse()) after the release is rolled.
In biodone(), the foff calculation was incorrectly
clamped to the iosize, causing the wrong foff to be calculated
for pages in the case of an I/O error or biodone() called without
initiating I/O. The problem always caused a panic before. Now it
doesn't. The problem is mainly an issue with NFS.
Fixed casts for ~PAGE_MASK. This code worked properly before only
because the calculations use signed arithmatic. Better to properly
extend PAGE_MASK first before inverting it for the 64 bit masking
op.
In brelse(), the bogus_page fixup code was improperly throwing
away the original contents of 'm' when it did the j-loop to
fix the bogus pages. The result was that it would potentially
invalidate parts of the *WRONG* page(!), leading to corruption.
There may still be cases where a background bitmap write is
being duplicated, causing potential corruption. We have identified
a potentially serious bug related to this but the fix is still TBD.
So instead this patch contains a KASSERT to detect the problem
and panic the machine rather then continue to corrupt the filesystem.
The problem does not occur very often.. it is very hard to
reproduce, and it may or may not be the cause of the corruption
people have reported.
Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>)
Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
Pre-rfork code assumed inherent locking of a process's file descriptor
array. However, with the advent of rfork() the file descriptor table
could be shared between processes. This patch closes over a dozen
serious race conditions related to one thread manipulating the table
(e.g. closing or dup()ing a descriptor) while another is blocked in
an open(), close(), fcntl(), read(), write(), etc...
PR: kern/11629
Discussed with: Alexander Viro <viro@math.psu.edu>
are in softclock() for a long time. The old code already did an
splx()/slphigh() pair here, I just missed adding in the equivalent mutex
operations on sched_lock earlier.
This makes crash recovery work for stripe sizes that are not multiples of
DEFAULT_REVIVE_BLOCKSIZE (currently 64 kB).
While we're here, fix a few cosmetic nits.
Reviewed by: grog
Sponsored by: Enitel ASA (http://www.enitel.no/)
may block on a mutex while on the sleep queue without corrupting
it.
- Move dropping of Giant to after the acquire of sched_lock.
Tested by: John Hay <jhay@icomtek.csir.co.za>
jhb
instead of DIAGNOSTIC.
- Remove the p_wchan check as it no longer applies since a process may be
switched out during CURSIG() within msleep() or mawait().
- Remove an extra sanity check only needed during the early SMPng work.
with Julian and Archie.
Implement a new ``sizedstring'' parse type for dealing with field pairs
consisting of a uint16_t followed by a data field of that size, and use
this to deal with the data_len and data fields.
Written by: Archie with some input by me
Agreed in principle by: julian
untimeout() not being called with Giant in those functions. For now,
use the sched_lock to protect the callout wheel in softclock() and in
the various timeout and callout functions.
Noticed by: tegge
16-bit mode. Technically, pcn_probe() is destructive because once the
chip goes into 32-bit mode, the only way to get it out again is a
hardware reset. And once the device is in 32-bit mode, the lnc driver
won't be able to talk to it. So if pcn_probe() is called before the
lnc probe routine, and pcn_probe() rejects the chip as one it doesn't
support, the lnc driver will be SOL.
I don't like this. I think it's a design flaw that you can't switch
the chip out of 32-bit mode once it's selected. The only 'right'
solution is for the pcn driver to support all of the PCI devices
in 32-bit mode, however I don't have samples of all the PCnet series
cards for testing.
acquire Giant as needed in functions that call mi_switch(). The releases
need to be done outside of the sched_lock to avoid potential deadlocks
from trying to acquire Giant while interrupts are disabled.
Submitted by: witness
linux_rt_sendsig() and restore the same signal mask linux does
in rt_sigreturn(). This gets us saving/restoring all 64-bits of the
linux sigset_t in rt signals.
Reviewed by: marcel
idea to be holding the sched_lock while we are calling it. As such,
release sched_lock before calling CURSIG() in msleep() and mawait() and
reacquire it after CURSIG() returns.
Submitted by: witness
to our native connect(). This is required to deal with the differences
in the way linux handles connects on non-blocking sockets.
This gets the private beta of the Compaq Linux/alpha JDK working
on FreeBSD/alpha
Approved by: marcel
tsleep(). Namely, mawait() takes an extra argument which is a mutex
to drop when going to sleep. Just as with msleep(), if the priority
argument includes the PDROP flag, then the mutex will be dropped and will
not be reacquired when the process wakes up.
- Add in a backwards compatible macro await() that passes in NULL as the
mutex argument to mawait().
except that it uses the MTX_NOSWITCH flag while it releases Giant via
mtx_exit().
- Add a mtx_recursed() primitive. This primitive should only be used on
a mutex owned by the current process. It will return non-zero if the
mutex is recursively owned, or zero otherwise.
- Add two new flags MA_RECURSED and MA_NOTRECURSED that can be used in
conjuction with MA_OWNED to control the assertion checked by mtx_assert().
- Fix some of the KTR tracepoint strings to use %p when displaying the lock
field of a mutex, which is a uintptr_t.
macros which provide the same functionality and are a bit more
efficient, convert use of CIRCLEQ's in netgraph PPP code to TAILQ's.
Reviewed by: Archie Cobbs <archie@dellroad.org>