Introduce domain_init_status to keep track of the init status of the domains
list (surprise). 0 = uninitialized, 1 = initialized/unpopulated, 2 =
initialized/done. Higher values can be used to support late addition of
domains which right now "works", but is potential dangerous. I choose to
only give a warning when doing so.
Use domain_init_status with if_attachdomain[1]() to ensure that we have a
complete domains list when we init the if_afdata array. Store the current
value of domain_init_status in if_afdata_initialized. This way we can update
if_afdata after a new protocol has been added (once that is allowed).
Submitted by: se (with changes)
Reviewed by: julian, glebius, se
PR: kern/73321 (partly)
lock collision.
2. Fix two race conditions. One is between _umtx_unlock and signal,
also a thread was marked TDF_UMTXWAKEUP by _umtx_unlock, it is
possible a signal delivered to the thread will cause msleep
returns EINTR, and the thread breaks out of loop, this causes
umtx ownership is not transfered to the thread. Another is in
_umtx_unlock itself, when the function sets the umtx to
UMTX_UNOWNED state, a new thread can come in and lock the umtx,
also the function tries to set contested bit flag, but it will
fail. Although the function will wake a blocked thread, if that
thread breaks out of loop by signal, no contested bit will be set.
to do a window update to the peer (thru an ACK) from soreceive()
itself. TCP will do that upon return from the socket callback.
Sending a window update from soreceive() results in a lock reversal.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Reviewed by: rwatson
soreceive(), then pass in M_DONTWAIT to m_copym(). Also fix up error
handling for the case where m_copym() returns failure.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Reviewed by: rwatson
that the exclusive lock is already held, then we call panic. Don't
clobber internal lock state before panic'ing. This change improves
debugging if this case were to happen.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Reviewed by: rwatson
instead acquire it conditionally in closef() if it is required for
advisory locking. This removes Giant from the close() path of sockets
and pipes (and any other objects that don't acquire Giant in their
fo_close path, such as kqueues). Giant will still be acquired twice for
vnodes -- once for advisory lock teardown, and a second time in the
fo_close method. Both Poul-Henning and I believe that the advisory lock
teardown code can be moved into the vn_closefile path shortly.
This trims a percent or two off the cost of most non-vnode close
operations on SMP, but has a fairly minimal impact on UP where the cost
of a single mutex operation is pretty low.
release the pipe mutex before calling fsetown(), as fsetown()
may block. The sigio code protects the pipe sigio data using
its own mutex, and the pipe reference count held by the caller
will prevent the pipe from being prematurely garbage-collected.
Discovered by: imp
setting the B_REMFREE flag in the buf. This is done to prevent lock order
reversals with code that must call bremfree() with a local lock held.
This also reduces overhead by removing two lock operations per buf for
fsync() and similar.
- Check for the B_REMFREE flag in brelse() and bqrelse() after the bqlock
has been acquired so that we may remove ourself from the free-list.
- Provide a bremfreef() function to immediately remove a buf from a
free-list for use only by NFS. This is done because the nfsclient code
overloads the b_freelist queue for its own async. io queue.
- Simplify the numfreebuffers accounting by removing a switch statement
that executed the same code in every possible case.
- getnewbuf() can encounter locked bufs on free-lists once Giant is removed.
Remove a panic associated with this condition and delay asserts that
inspect the buf until after it is locked.
Reviewed by: phk
Sponsored by: Isilon Systems, Inc.
away, instead only exit storming mode when an interrupt stops firing long
enough for the ithread to exit the loop and go back to sleep.
Tested by: macrus (cruder version)
Don't grab Giant in the upper syscall/wrapper code
NET_LOCK_GIANT in the socket code (sockets/fifos).
mtx_lock(&Giant) in the vnode code.
mtx_lock(&Giant) in the opencrypto code. (This may actually not be
needed, but better safe than sorry).
Devfs grabs Giant if the driver is marked as needing Giant.
Don't grab Giant in the upper syscall/wrapper code
NET_LOCK_GIANT in the socket code (sockets/fifos).
mtx_lock(&Giant) in the vnode code.
Devfs grabs Giant if the driver is marked as needing Giant.
- Have TS_ZOMBIE ttys return POLLHUP instead of POLLERR
- Remove unneeded POLLWRNORM (old bug)
- TS_ZOMBIE ttys will set POLLIN and POLLRDNORM
- Do not call selrecord in TS_ZOMBIE ttys
PR: kern/73821
Reviewed by: bde
MFC after: 4 weeks
a storm is detected, enter "storming" mode which throttles the interrupt
source such that the handlers are run once every clock tick. Previously
we allowed a full set of storm_threshold interations through the handler
before going back to sleep. Also, this currently will intentionally exit
storming mode once a second to see if the storm has passed.
Tested by: marcus
Discussed with: bde
instead of a vnode for it.
The vnode_pager does not and should not have any interest in what
the filesystem uses for backend.
(vfs_cluster doesn't use the backing store argument.)
Use this in all the places where sleeping with the lock held is not
an issue.
The distinction will become significant once we finalize the exact
lock-type to use for this kind of case.
return EINVAL rather than setting error, and don't free sops
unconditionally. The first change was merged accidentally as part of
the larger set of changes to introduce MAC labels and access control,
and potentially lead to continued processing of a request even after
it was determined to be invalid. The second change was due to changes
in the semaphore code since the original work was performed.
Pointed out by: truckman
to be modified and extended without breaking the user space ABI:
Use _kernel variants on _ds structures for System V sempahores, message
queues, and shared memory. When interfacing with userspace, export
only the _ds subsets of the _kernel data structures. A lot of search
and replace.
Define the message structure in the _KERNEL portion of msg.h so that it
can be used by other kernel consumers, but not exposed to user space.
Submitted by: Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net>
Obtained from: TrustedBSD Project
Sponsored by: DARPA, SPAWAR, McAfee Research
changes associated with adding System V IPC support. This will prevent
old modules from being used with the new kernel, and new modules from
being used with the old kernel.
includes the latter, but also declares variables which are defined
in kern/subr_param.c).
Change som VM parameters from quad_t to unsigned long. They refer to
quantities (size limits for text, heap and stack segments) which must
necessarily be smaller than the size of the address space, so long is
adequate on all platforms.
MFC after: 1 week
Change the spelling of the "catch" option to be consistent with the new
options. Implement the "no wait" option. An implementation of the "CPU
private" for i386 will be committed at a later date.
This generates a KTR event for each critical section entered and exited.
It would be desirable to also log the filename and line number of the
source entering or exiting the critical section, but this requires
hacking up the critical section API, so I've not done that yet.
because this call is only needed to wake threads that slept when they
discovered a dead object connected to a vnode. To eliminate unnecessary
calls to wakeup() by vnode_pager_dealloc(), introduce a new flag,
OBJ_DISCONNECTWNT.
Reviewed by: tegge@
priority. The sleep queues don't get updated when the priority of
threads changes, so sleepq_signal() might not always wakeup the
highest priority thread. Updating the queues when thread priorities
change cannot be easily done due to lock orders, so instead we do an
O(n) walk of the queue for a sleepq_signal() operation instead of O(1).
On the other hand, adding a thread to a sleep queue now goes from O(n)
to O(1) so it ends up as an even tradeoff. The correctness here with
regards to priorities is actually fairly important. msleep() gives
interactive threads their priority "boost" after they are placed on the
queue, but before this fix that "boost" wasn't used to determine the
highest priority thread that sleepq_signal() awoke.
- Fix up some comments.
Inspired by: ups, bde
- Tweak the updating of the ithread name in ithread_update() so that the
'+' and '*' characters for device names that were too short only get
added at the end after as many device names as possible were fit into
the allocated space. Prior to this, some long devices would result
in '+' chars showing up between two different devices rather than at the
end.
- Eliminate an initialized but unused variable.
- Eliminate an unnecessary call to clear the page's PG_BUSY flag. (The
call to vm_page_rename() already clears the page's PG_BUSY flag through
its call to vm_page_remove().)
motherboard, in practice the changes resulted in many false positives for
heavy network loads, etc. resulting in poor performance. Also, the
motherboard referenced in the 1.109 log has other problems and simply does
not seem to work with the APIC enabled even with the changes in 1.109. The
correct fix for that board seems to be to not use the APIC at all. One
thing kept from 1.109 is that throttled interrupts are now effectively
polled on every clock tick rather than just 10 times per second.
MFC after: 1 month
Tested by: Shunsuke SHINOMIYA shino at fornext dot org
need for most calls to vm_page_busy(). Specifically, most calls to
vm_page_busy() occur immediately prior to a call to vm_page_remove().
In such cases, the containing vm object is locked across both calls.
Consequently, the setting of the vm page's PG_BUSY flag is not even
visible to other threads that are following the synchronization
protocol.
This change (1) eliminates the calls to vm_page_busy() that
immediately precede a call to vm_page_remove() or functions, such as
vm_page_free() and vm_page_rename(), that call it and (2) relaxes the
requirement in vm_page_remove() that the vm page's PG_BUSY flag is
set. Now, the vm page's PG_BUSY flag is set only when the vm object
lock is released while the vm page is still in transition. Typically,
this is when it is undergoing I/O.
control the number of lines per page rather than a constant. The variable
can be examined and changed in ddb as '$lines'. Setting the variable to
0 will effectively turn off paging.
- Change db_putchar() to force out pending whitespace before outputting
newlines and carriage returns so that one can rub out content on the
current line via '\r \r' type strings.
- Change the simple pager to rub out the --More-- prompt explicitly when
the routine exits.
- Add some aliases to the simple pager to make it more compatible with
more(1): 'e' and 'j' do a single line. 'd' does half a page, and
'f' does a full page.
MFC after: 1 month
Inspired by: kris
This is magic and no other operating system do so (i.e. Solaris, Tru64,
Linux, AIX, HP-UX, Irix, MacOS X, NetBSD).
Discussed on: current@
Reported by: S³awek ¯ak <zaks@prioris.mini.pw.edu.pl>
for modules linked into the kernel or loaded very early, panics will
result otherwise, as the CV code it calls will panic due to its use
of a mutex before it is initialized.
outside of the nice threshold due to a recently awoken thread with a
lower nice value. This further reduces the amount of time a positively
niced thread gets while running in conjunction with a workload that has
many short sleeps (ie buildworld).
check for TD_ON_RUNQ() no longer means the thread is really on a run-
queue. I suspect this state should be re-evaluated as it must mean
something else now. This fixes ULE+KSE+PREEMPTION on UP x86.
At some point later the syncer will unlearn about vnodes and the filesystems
method called by the syncer will know enough about what's in bo_private to
do the right thing.
[1] Ok, I know, but I couldn't resist the pun.
buf->b-dev.
Put a bio between the buf passed to dev_strategy() and the device driver
strategy routine in order to not clobber fields in the buf.
Assert copyright on vfs_bio.c and update copyright message to canonical
text. There is no legal difference between John Dysons two-clause
abbreviated BSD license and the canonical text.
an inordinate amount of synchronous console output that is fairly
undesirable on slower serial console. It's easily hit by accident
when frobbing other sysctls late at night.
Give ffs it's own bufobj->bo_ops vector and create a private strategy
routine, (currently misnamed for forwards compatibility), which is
just a copy of the generic bufstrategy routine except we call
softdep_disk_prewrite() directly instead of through the buf_prewrite()
indirection.
Teach UFS about the need for softdep_disk_prewrite() and call the
function directly in FFS.
Remove buf_prewrite() from the default bufstrategy() and from the
global bio_ops method vector.
We keep si_bsize_phys around for now as that is the simplest way to pull
the number out of disk device drivers in devfs_open(). The correct solution
would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth
when filesystems sit on GEOM, so don't bother for now.
and release of the global page queues lock required to make the call.
Remove GIANT_REQUIRED from vm_hold_free_pages(). All of its VM operations
are properly synchronized.