pr_usrreq mechanism which was poorly designed and error-prone. This
commit renames pr_usrreq to pr_ousrreq so that old code which depended on it
would break in an obvious manner. This commit also implements the new
interface for TCP, although the old function is left as an example
(#ifdef'ed out). This commit ALSO fixes a longstanding bug in the
TCP timer processing (introduced by davidg on 1995/04/12) which caused
timer processing on a TCB to always stop after a single timer had
expired (because it misinterpreted the return value from tcp_usrreq()
to indicate that the TCB had been deleted). Finally, some code
related to polling has been deleted from if.c because it is not
relevant t -current and doesn't look at all like my current code.
the high kernel calls into a protocol stack to perform requests on the
user's behalf. We replace the pr_usrreq() entry in struct protosw with a
pointer to a structure containing pointers to functions which implement
the various reuqests; each function is declared with the correct type and
number of arguments. (This is unlike the current scheme in which a quarter
of the requests take arguments of type other than (struct mbuf *) and the
difference is papered over with casts.) There are a few benefits to this
new scheme:
1) Arguments are passed with their correct types, and null-pointer dummies
are no longer necessary.
2) There should be slightly better caching effects from eliminating
the prximity to extraneous code and th switch in pr_usrreq().
3) It becomes much easier to change the types of the arguments to something
other than `struct mbuf *' (e.g.,pushing the work of sosend() into
the protocol as advocated by Van Jacobson).
There is one principal drawback: existing protocol stacks need to
be modified. This is alleviated by compatibility code in
uipc_socket2.c and uipc_domain.c which emulates the new interface
in terms of the old and vice versa.
This idea is not original to me. I read about what Jacobson did
in one of his papers and have tried to implement the first steps
towards something like that here. Much work remains to be done.
The i386 pmap module uses a special area of kernel virtual memory for mapping
of page tables pages when it needs to modify another process's virtual
address space. It's called the 'alternate page table map'. There is only one
of them and it's expected that only one process will be using it at once and
that the operation is atomic.
When the merged VM/buffer cache was implemented over a year ago, it became
necessary to rundown VM pages at I/O completion. The unfortunate and
unforeseen side effect of this is that pmap functions are now called at bio
interrupt time. If there happend to be a process using the alternate page
table map when this I/O completion occurred, it was possible for a different
process's address space to be switched into the alternate page table map -
leaving the current pmap process with the wrong address space mapped when
the interrupt completed. This resulted in BAD things happening like pages
being mapped or removed from the wrong address space, etc.. Since a very
common case of a process modifying another process's address space is during
fork when the kernel stack is inserted, one of the most common manifestations
of this bug was the kernel stack not being mapped properly, resulting in a
silent hang or reboot. This made it VERY difficult to troubleshoot this bug
(I've been trying to figure out the cause of this for >6 months). Fortunately,
the set of conditions that must be true before this problem occurs is
sufficiently rare enough that most people never saw the bug occur. As I/O
rates increase, however, so does the frequency of the crashes. This problem
used to kill wcarchive about every 10 days, but in more recent times when
the traffic exceeded >100GB/day, the machine could barely manage 6 hours of
uptime.
The fix is to make certain that no process has the pages mapped that are
involved in the I/O, before the I/O is started. The pages are made busy, so
no process will be able to map them, either, until the I/O has finished.
This side-steps the issue by still allowing the pmap functions to be called
at interrupt time, but also assuring that the alternate page table map won't
be switched.
Unfortunately, this appears to not be the only cause of this problem. :-(
Reviewed by: dyson
was due to non-aligned 64K transfers taking 17 pages. We currently
do not support >16 page transfers. The transfer is unfortunately truncated,
but since buffers are usually malloced, this is a problem only once in
a while. Savecore is a culprit, but tar/cpio usually aren't. This
is NOT the final fix (which is likely a bouncing scheme), but will at
least keep the system from crashing.
Fixed profiling of system times. It was pre-4.4Lite and didn't support
statclocks. System times were too small by a factor of 8.
Handle deferred profiling ticks the 4.4Lite way: use addupc_task() instead
of addupc(). Call addupc_task() directly instead of using the ADDUPC()
macro.
Removed vestigial support for PROFTIMER.
switch.s:
Removed addupc().
resourcevar.h:
Removed ADDUPC() and declarations of addupc().
cpu.h:
Updated a comment. i386's never were tahoe's, and the deferred profiling
tick became (possibly) multiple ticks in 4.4Lite.
Obtained from: mostly from NetBSD
compiler warnings which occur if you don't have 'options DEVFS' in
your kernel config file:
../../kern/kern_descrip.c: In function `fildesc_drvinit':
../../kern/kern_descrip.c:1103: warning: unused variable `fd'
../../kern/kern_descrip.c: At top level:
../../kern/kern_descrip.c:1095: warning: `devfs_token_stdin' defined but not use
d
../../kern/kern_descrip.c:1096: warning: `devfs_token_stdout' defined but not us
ed
../../kern/kern_descrip.c:1097: warning: `devfs_token_stderr' defined but not us
ed
../../kern/kern_descrip.c:1098: warning: `devfs_token_fildesc' defined but not u
sed
to match (pc98/random_machdep.c probably requires a similar change). This
is a problem area for the PC98 merge - all PC98 ifdefs in <machine/*.h> are
kludges to work around incorrect layering.
disklabel(8) to the kernel (dsopen()). Drivers should initialize the
hardware values (rpm, interleave, skews). Drivers currently don't do
this, but it usually doesn't matter since rotational position stuff is
normally disabled.
All new code is "#ifdef PC98"ed so this should make no difference to
PC/AT (and its clones) users.
Ok'd by: core
Submitted by: FreeBSD(98) development team
It is called from copyin and copyout.
The new routine is conditioned on I586_CPU and I586_FAST_BCOPY, so you
need
options "I586_FAST_BCOPY"
(quotes essenstial) in your kernel config file.
Also, if you have other kernel types configured in your kernel, an
additional check to make sure it is running on a Pentium is inserted.
(It is not clear why it doesn't help on P6s, it may be just that the
Orion chipset doesn't prefetch as efficiently as Tritons and friends.)
Bruce can now hack this away. :)
it assumes all of the data exists in the kernel. Also, fix
sysctl_new-kernel (unused until now) which had reversed operands to
bcopy().
Reviewed by: phk
Poul writes:
... actually the lock/sleep/wakeup cruft shouldn't be needed in the
kernel version I think, but just leave it there for now.
any statclock ticks. Pretend that all the time up to the first
statclock tick is system time. . This makes a difference mainly for
benchmarks that test short-lived processes - the user and system
times for processes that each lived for about 1ms only added up to
about 10% of the real time even when there was very little interrupt
activity.
Break the printing of a quad_t variable correctly.
gcc only inlines memcpy()'s whose count is constant and didn't inline
these. I want memcpy() in the kernel go away so that it's obvious that
it doesn't need to be optimized. Now it is only used for one struct
copy in si.c.
code without the B_READ flag being set. This is a problem when the
data is not cached, and the result will be a bogus attempted write.
Submitted by: Kato Takenori <kato@eclogite.eps.nagoya-u.ac.jp>
to be allocated at boot time. This is an expensive option, as they
consume physical ram and are not pageable etc. In certain situations,
this kind of option is quite useful, especially for news servers that
access a large number of directories at random and torture the name cache.
Defining 5000 or 10000 extra vnodes should cut down the amount of vnode
recycling somewhat, which should allow better name and directory caching
etc.
This is a "your mileage may vary" option, with no real indication of
what works best for your machine except trial and error. Too many will
cost you ram that you could otherwise use for disk buffers etc.
This is based on something John Dyson mentioned to me a while ago.
(returns EPERM always, the errno is specified by POSIX).
If you really have a desperate need to link or unlink a directory, you
can use fsdb. :-)
This should stop any chance of ftpd, rdist, "rm -rf", etc from
bugging out and damaging the filesystem structure or loosing races
with malicious users.
Reviewed by: davidg, bde
contributions or ideas from Stephen McKay <syssgm@devetir.qld.gov.au>,
Alan Cox <alc@cs.rice.edu>, David Greenman <davidg@freebsd.org> and me:
More usage of the TAILQ macros. Additional minor fix to queue.h.
Performance enhancements to the pageout daemon.
Addition of a wait in the case that the pageout daemon
has to run immediately.
Slightly modify the pageout algorithm.
Significant revamp of the pmap/fork code:
1) PTE's and UPAGES's are NO LONGER in the process's map.
2) PTE's and UPAGES's reside in their own objects.
3) TOTAL elimination of recursive page table pagefaults.
4) The page directory now resides in the PTE object.
5) Implemented pmap_copy, thereby speeding up fork time.
6) Changed the pv entries so that the head is a pointer
and not an entire entry.
7) Significant cleanup of pmap_protect, and pmap_remove.
8) Removed significant amounts of machine dependent
fork code from vm_glue. Pushed much of that code into
the machine dependent pmap module.
9) Support more completely the reuse of already zeroed
pages (Page table pages and page directories) as being
already zeroed.
Performance and code cleanups in vm_map:
1) Improved and simplified allocation of map entries.
2) Improved vm_map_copy code.
3) Corrected some minor problems in the simplify code.
Implemented splvm (combo of splbio and splimp.) The VM code now
seldom uses splhigh.
Improved the speed of and simplified kmem_malloc.
Minor mod to vm_fault to avoid using pre-zeroed pages in the case
of objects with backing objects along with the already
existant condition of having a vnode. (If there is a backing
object, there will likely be a COW... With a COW, it isn't
necessary to start with a pre-zeroed page.)
Minor reorg of source to perhaps improve locality of ref.
(PR #1178).
Define a new SO_TIMESTAMP socket option for datagram sockets to return
packet-arrival timestamps as control information (PR #1179).
Submitted by: Louis Mamakos <loiue@TransSys.com>
the past, since it returns to the old system of allocating mbufs out of
a private area rather than using the kernel malloc(). While this may seem
like a backwards step to some, the new allocator is some 20% faster than
the old one and has much better caching properties.
Written by: John Wroclawski <jtw@lcs.mit.edu>