QUEUE_AGE, QUEUE_LRU, and QUEUE_EMPTY we instead have QUEUE_CLEAN,
QUEUE_DIRTY, QUEUE_EMPTY, and QUEUE_EMPTYKVA. With this patch clean
and dirty buffers have been separated. Empty buffers with KVM
assignments have been separated from truely empty buffers. getnewbuf()
has been rewritten and now operates in a 100% optimal fashion. That is,
it is able to find precisely the right kind of buffer it needs to
allocate a new buffer, defragment KVM, or to free-up an existing buffer
when the buffer cache is full (which is a steady-state situation for
the buffer cache).
Buffer flushing has been reorganized. Previously buffers were flushed
in the context of whatever process hit the conditions forcing buffer
flushing to occur. This resulted in processes blocking on conditions
unrelated to what they were doing. This also resulted in inappropriate
VFS stacking chains due to multiple processes getting stuck trying to
flush dirty buffers or due to a single process getting into a situation
where it might attempt to flush buffers recursively - a situation that
was only partially fixed in prior commits. We have added a new daemon
called the buf_daemon which is responsible for flushing dirty buffers
when the number of dirty buffers exceeds the vfs.hidirtybuffers limit.
This daemon attempts to dynamically adjust the rate at which dirty buffers
are flushed such that getnewbuf() calls (almost) never block.
The number of nbufs and amount of buffer space is now scaled past the
8MB limit that was previously imposed for systems with over 64MB of
memory, and the vfs.{lo,hi}dirtybuffers limits have been relaxed
somewhat. The number of physical buffers has been increased with the
intention that we will manage physical I/O differently in the future.
reassignbuf previously attempted to keep the dirtyblkhd list sorted which
could result in non-deterministic operation under certain conditions,
such as when a large number of dirty buffers are being managed. This
algorithm has been changed. reassignbuf now keeps buffers locally sorted
if it can do so cheaply, and otherwise gives up and adds buffers to
the head of the dirtyblkhd list. The new algorithm is deterministic but
not perfect. The new algorithm greatly reduces problems that previously
occured when write_behind was turned off in the system.
The P_FLSINPROG proc->p_flag bit has been replaced by the more descriptive
P_BUFEXHAUST bit. This bit allows processes working with filesystem
buffers to use available emergency reserves. Normal processes do not set
this bit and are not allowed to dig into emergency reserves. The purpose
of this bit is to avoid low-memory deadlocks.
A small race condition was fixed in getpbuf() in vm/vm_pager.c.
Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Kirk McKusick <mckusick@mckusick.com>
SYSINIT_KT() etc (which is a static, compile-time procedure), use a
NetBSD-style kthread_create() interface. kproc_start is still available
as a SYSINIT() hook. This allowed simplification of chunks of the
sysinit code in the process. This kthread_create() is our old kproc_start
internals, with the SYSINIT_KT fork hooks grafted in and tweaked to work
the same as the NetBSD one.
One thing I'd like to do shortly is get rid of nfsiod as a user initiated
process. It makes sense for the nfs client code to create them on the
fly as needed up to a user settable limit. This means that nfsiod
doesn't need to be in /sbin and is always "available". This is a fair bit
easier to do outside of the SYSINIT_KT() framework.
the caller can easily find the child proc struct. fork(), rfork() etc
syscalls set p->p_retval[] themselves. Simplify the SYSINIT_KT() code
and other kernel thread creators to not need to use pfind() to find the
child based on the pid. While here, partly tidy up some of the fork1()
code for RF_SIGSHARE etc.
(really this time) fix pageout to swap and a couple of clustering cases.
This simplifies BUF_KERNPROC() so that it unconditionally reassigns the
lock owner rather than testing B_ASYNC and having the caller decide when
to do the reassign. At present this is required because some places use
B_CALL/b_iodone to free the buffers without B_ASYNC being set. Also,
vfs_cluster.c explicitly calls BUF_KERNPROC() when attaching the buffers
rather than the parent walking the cluster_head tailq.
Reviewed by: Kirk McKusick <mckusick@mckusick.com>
if LK_RECURSIVE is not set, as we will simply return that the
lock is busy and not actually deadlock. This allows processes
to use polling locks against buffers that they may already
hold exclusively locked.
lockmgr locks. This commit should be functionally equivalent to the old
semantics. That is, all buffer locking is done with LK_EXCLUSIVE
requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will
be done in future commits.
- Split syscons source code into manageable chunks and reorganize
some of complicated functions.
- Many static variables are moved to the softc structure.
- Added a new key function, PREV. When this key is pressed, the vty
immediately before the current vty will become foreground. Analogue
to PREV, which is usually assigned to the PrntScrn key.
PR: kern/10113
Submitted by: Christian Weisgerber <naddy@mips.rhein-neckar.de>
- Modified the kernel console input function sccngetc() so that it
handles function keys properly.
- Reorganized the screen update routine.
- VT switching code is reorganized. It now should be slightly more
robust than before.
- Added the DEVICE_RESUME function so that syscons no longer hooks the
APM resume event directly.
- New kernel configuration options: SC_NO_CUTPASTE, SC_NO_FONT_LOADING,
SC_NO_HISTORY and SC_NO_SYSMOUSE.
Various parts of syscons can be omitted so that the kernel size is
reduced.
SC_PIXEL_MODE
Made the VESA 800x600 mode an option, rather than a standard part of
syscons.
SC_DISABLE_DDBKEY
Disables the `debug' key combination.
SC_ALT_MOUSE_IMAGE
Inverse the character cell at the mouse cursor position in the text
console, rather than drawing an arrow on the screen.
Submitted by: Nick Hibma (n_hibma@FreeBSD.ORG)
SC_DFLT_FONT
makeoptions "SC_DFLT_FONT=_font_name_"
Include the named font as the default font of syscons. 16-line,
14-line and 8-line font data will be compiled in. This option replaces
the existing STD8X16FONT option, which loads 16-line font data only.
- The VGA driver is split into /sys/dev/fb/vga.c and /sys/isa/vga_isa.c.
- The video driver provides a set of ioctl commands to manipulate the
frame buffer.
- New kernel configuration option: VGA_WIDTH90
Enables 90 column modes: 90x25, 90x30, 90x43, 90x50, 90x60. These
modes are mot always supported by the video card.
PR: i386/7510
Submitted by: kbyanc@freedomnet.com and alexv@sui.gda.itesm.mx.
- The header file machine/console.h is reorganized; its contents is now
split into sys/fbio.h, sys/kbio.h (a new file) and sys/consio.h
(another new file). machine/console.h is still maintained for
compatibility reasons.
- Kernel console selection/installation routines are fixed and
slightly rebumped so that it should now be possible to switch between
the interanl kernel console (sc or vt) and a remote kernel console
(sio) again, as it was in 2.x, 3.0 and 3.1.
- Screen savers and splash screen decoders
Because of the header file reorganization described above, screen
savers and splash screen decoders are slightly modified. After this
update, /sys/modules/syscons/saver.h is no longer necessary and is
removed.
at which we may sleep. So, after completing our buffer allocation
we must ensure that another process has not come along and allocated
a different buffer with the same identity. We do this by keeping a
global counter of the number of buffers that getnewbuf has allocated.
We save this count when we enter getnewbuf and check it when we are
about to return. If it has changed, then other buffers were allocated
while we were in getnewbuf, so we must return NULL to let our parent
know that it must recheck to see if it still needs the new buffer.
Hopefully this fix will eliminate the creation of duplicate buffers
with the same identity and the obscure corruptions that they cause.
done is less-than cute, but this whole file is suffering from some amount
of bitrot. Execution of zipped files should probably be implemented in a
manner similar to that of #!/interpreted files.
PR: kern/10780
automatically hacks on the active copy of the IDT if f00f_hack()
has changed it. This also allows simplifications in setidt().
This fixes breakage of FP exception handling by rev.1.55 of
sys/kernel.h. FP exceptions were sent to npx.c's probe handlers
because npx.c "restored" the old handlers to the wrong copy of the
IDT. The SYSINIT for f00f_hack() was purposely run quite late to
avoid problems like this, but it is bogusly associated with the
SYSINIT for proc0 so it was moved with the latter.
Problem reported and fix tested by: Martin Cracauer <cracauer@cons.org>
This is the change to struct sockets that gets rid of so_uid and replaces
it with a much more useful struct pcred *so_cred. This is here to be able
to do socket-level credential checks (i.e. IPFW uid/gid support, to be added
to HEAD soon). Along with this comes an update to pidentd which greatly
simplifies the code necessary to get a uid from a socket. Soon to come:
a sysctl() interface to finding individual sockets' credentials.
would occur when clustering them - caused by running out of buffers
and taking a degenerate code path as a result. It appears that waiting
instead for buffers to become available is okay.
Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
Discovered by: Craig A Soules <soules+@andrew.cmu.edu>
- this causes POSIX locking to use the thread group leader
(p->p_leader) as the locking thread for all advisory locks.
In non-kernel-threaded code p->p_leader == p, so this will have
no effect.
This results in (more) correct POSIX threaded flock-ing semantics.
It also prevents the leader from exiting before any of the children.
(so that p->p_leader will never be stale) in exit1().
We have been running this patch for over a month now in our lab
under load and at customer sites.
Submitted by: John Plevyak <jplevyak@inktomi.com>
to either enqueue or free their mbuf chains, but tcp_usr_send() was
dropping them on the floor if the tcpcb/inpcb has been torn down in the
middle of a send/write attempt. This has been responsible for a wide
variety of mbuf leak patterns, ranging from slow gradual leakage to rather
rapid exhaustion. This has been a problem since before 2.2 was branched
and appears to have been fixed in rev 1.16 and lost in 1.23/1.28.
Thanks to Jayanth Vijayaraghavan <jayanth@yahoo-inc.com> for checking
(extensively) into this on a live production 2.2.x system and that it
was the actual cause of the leak and looks like it fixes it. The machine
in question was loosing (from memory) about 150 mbufs per hour under
load and a change similar to this stopped it. (Don't blame Jayanth
for this patch though)
An alternative approach to this would be to recheck SS_CANTSENDMORE etc
inside the splnet() right before calling pru_send() after all the potential
sleeps, interrupts and delays have happened. However, this would mean
exposing knowledge of the tcp stack's reset handling and removal of the
pcb to the generic code. There are other things that call pru_send()
directly though.
Problem originally noted by: John Plevyak <jplevyak@inktomi.com>