Submitted by: "Richard Seaman, Jr." <lists@tar.com>
Obtained from: linux :-)
Code to allow Linux Threads to run under FreeBSD.
By default not enabled
This code is dependent on the conditional
COMPAT_LINUX_THREADS (suggested by Garret)
This is not yet a 'real' option but will be within some number of hours.
by bde, a few other tweaks to get the patch to apply cleanly again and
some improvements to the comments.
This change closes some fairly minor security holes associated with
F_SETOWN, fixes a few bugs, and removes some limitations that F_SETOWN
had on tty devices. For more details, see the description on the PR.
Because this patch increases the size of the proc and pgrp structures,
it is necessary to re-install the includes and recompile libkvm,
the vinum lkm, fstat, gcore, gdb, ipfilter, ps, top, and w.
PR: kern/7899
Reviewed by: bde, elvind
1) Start using TSM.
Struct procs continue to point to upages structure, after being freed.
Struct vmspace continues to point to pte object and kva space for kstack.
u_map is now superfluous.
2) vm_map's don't need to be reference counted. They always exist either
in the kernel or in a vmspace. The vmspaces are managed by reference
counts.
3) Remove the "wired" vm_map nonsense.
4) No need to keep a cache of kernel stack kva's.
5) Get rid of strange looking ++var, and change to var++.
6) Change more data structures to use our "zone" allocator. Added
struct proc, struct vmspace and struct vnode. This saves a significant
amount of kva space and physical memory. Additionally, this enables
TSM for the zone managed memory.
7) Keep ioopt disabled for now.
8) Remove the now bogus "single use" map concept.
9) Use generation counts or id's for data structures residing in TSM, where
it allows us to avoid unneeded restart overhead during traversals, where
blocking might occur.
10) Account better for memory deficits, so the pageout daemon will be able
to make enough memory available (experimental.)
11) Fix some vnode locking problems. (From Tor, I think.)
12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
(experimental.)
13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c
code. Use generation counts, get rid of unneded collpase operations,
and clean up the cluster code.
14) Make vm_zone more suitable for TSM.
This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)
This is not the infamous, final cleanup of the vnode stuff, but a necessary
step. Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)
Note that an unload facility should be used to call rm_at_exit() (if
procfs is being loaded as an LKM and is subsequently removed), but it
was non-obvious how to do this in the VFS framework.
Reviewed by: Julian Elischer
surprise, procfs actually is optional, and some people truly do generate
kernels without it. Wow. I built a kernel without 'options PROCFS' and
it compiled and linked.
Disallow wait options that are not a combination of the standard POSIX
options WUNTRACED and WNOHANG, as is required by POSIX. BSD doesn't
have any extensions here, but the code was `#ifdef notyet' for some
reason.
it in struct proc instead.
This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.
I have not removed the /*ARGSUSED*/, they will require some looking at.
libkvm, ps and other userland struct proc frobbing programs will need
recompiled.
Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.
A couple of finer points by: bde
Add support for "interrupt driven configuration hooks".
A component of the kernel can register a hook, most likely
during auto-configuration, and receive a callback once
interrupt services are available. This callback will occur before
the root and dump devices are configured, so the configuration
task can affect the selection of those two devices or complete
any tasks that need to be performed prior to launching init.
System boot is posponed so long as a hook is registered. The
hook owner is responsible for removing the hook once their task
is complete or the system boot can continue.
kern_acct.c kern_clock.c kern_exit.c kern_synch.c kern_time.c:
Change the interface and implementation for the kernel callout
service. The new implemntaion is based on the work of
Adam M. Costello and George Varghese, published in a technical
report entitled "Redesigning the BSD Callout and Timer Facilities".
The interface used in FreeBSD is a little different than the one
outlined in the paper. The new function prototypes are:
struct callout_handle timeout(void (*func)(void *),
void *arg, int ticks);
void untimeout(void (*func)(void *), void *arg,
struct callout_handle handle);
If a client wishes to remove a timeout, it must store the
callout_handle returned by timeout and pass it to untimeout.
The new implementation gives 0(1) insert and removal of callouts
making this interface scale well even for applications that
keep 100s of callouts outstanding.
See the updated timeout.9 man page for more details.
The implementation is done (unlike what i've originally been
contemplating) by reparenting kids of processes that have the
appropriate bit set to PID 1, and let PID 1 handle the zombie. This
is far less problematical than what would seem to be ``doing it
right'', for a number of reasons.
Of our currently shipping PID-1-intended programs, 50 % fail the above
assumption. ;-) (Read this: sysinstall doesn't do it right. This is
no problem as long as no program called by sysinstall actually uses
SA_NOCLDWAIT.)
ToDo: . clarify the correct SA_* flag inheritance, compared
to other systems,
. decide whether the compat cruft (osigvec(9)) should
deal with new system additions or not,
. merge OpenBSD's SA_SIGINFO implementation. ;)
Reviewed by: bde
POSIX.4. Additionally, there is some initial code that supports LIO.
This code supports AIO/LIO for all types of file descriptors, with
few if any restrictions. There will be a followup very soon that
will support significantly more efficient operation for VCHR type
files (raw.) This code is also dependent on some kernel features
that don't work under SMP yet. After I commit the changes to the
kernel to support proper address space sharing on SMP, this code
will also work under SMP.
space. (!)
Have each process use the kernel stack and pcb in the kvm space. Since
the stacks are at a different address, we cannot copy the stack at fork()
and allow the child to return up through the function call tree to return
to user mode - create a new execution context and have the new process
begin executing from cpu_switch() and go to user mode directly.
In theory this should speed up fork a bit.
Context switch the tss_esp0 pointer in the common tss. This is a lot
simpler since than swithching the gdt[GPROC0_SEL].sd.sd_base pointer
to each process's tss since the esp0 pointer is a 32 bit pointer, and the
sd_base setting is split into three different bit sections at non-aligned
boundaries and requires a lot of twiddling to reset.
The 8K of memory at the top of the process space is now empty, and unmapped
(and unmappable, it's higher than VM_MAXUSER_ADDRESS).
Simplity the pmap code to manage process contexts, we no longer have to
double map the UPAGES, this simplifies and should measuably speed up fork().
The following parts came from John Dyson:
Set PG_G on the UPAGES that are now in kernel context, and invalidate
them when swapping them out.
Move the upages object (upobj) from the vmspace to the proc structure.
Now that the UPAGES (pcb and kernel stack) are out of user space, make
rfork(..RFMEM..) do what was intended by sharing the vmspace
entirely via reference counting rather than simply inheriting the mappings.
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.
The system boots and can mount UFS filesystems.
Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.
Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
previous snap. Specifically, kern_exit and kern_exec now makes a
call into the pmap module to do a very fast removal of pages from the
address space. Additionally, the pmap module now updates the PG_MAPPED
and PG_WRITABLE flags. This is an optional optimization, but helpful
on the X86.
1/ session leader
2/ Have a console device vnode (/dev/console)
3/ have NULL pointer for a consoel tty struct.
fix the only case where the tty struct is referenced without a prior
check for existance.
note that at_shutdown has a new parameter to indicate When
during a shutdown the callout should be made. also
add a RB_POWEROFF flag to reboot "howto" parameter..
tells the reboot code in our at_shutdown module to turn off the UPS
and kill the power. bound to be useful eventually on laptops
I've been meaning to do this for AGES as I keep having to patch those routines
whenever I write a proprietary package or similar..
any module that assigns resources to processes needs to know when
these events occur. there are existsing modules that should be modified
to take advantage of these.. e.g. SYSV IPC primatives
presently have #ifdef entries in exit()
this also helps with making LKMs out of such things..
(see the man pages at_exit(9) and at_fork(9))
problem with the 'shell scripts' was found, but there was a 'strange'
problem found with a 486 laptop that we could not find. This commit
backs the code back to 25-jul, and will be re-entered after the snapshot
in smaller (more easily tested) chunks.
performance issues.
1) The pmap module has had too many inlines, and so the
object file is simply bigger than it needs to be.
Some common code is also merged into subroutines.
2) Removal of some *evil* PHYS_TO_VM_PAGE macro calls.
Unfortunately, a few have needed to be added also.
The removal caused the need for more vm_page_lookups.
I added lookup hints to minimize the need for the
page table lookup operations.
3) Removal of some bogus performance improvements, that
mostly made the code more complex (tracking individual
page table page updates unnecessarily). Those improvements
actually hurt 386 processors perf (not that people who
worry about perf use 386 processors anymore :-)).
4) Changed pv queue manipulations/structures to be TAILQ's.
5) The pv queue code has had some performance problems since
day one. Some significant scalability issues are resolved
by threading the pv entries from the pmap AND the physical
address instead of just the physical address. This makes
certain pmap operations run much faster. This does
not affect most micro-benchmarks, but should help loaded system
performance *significantly*. DG helped and came up with most
of the solution for this one.
6) Most if not all pmap bit operations follow the pattern:
pmap_test_bit();
pmap_clear_bit();
That made for twice the necessary pv list traversal. The
pmap interface now supports only pmap_tc_bit type operations:
pmap_[test/clear]_modified, pmap_[test/clear]_referenced.
Additionally, the modified routine now takes a vm_page_t arg
instead of a phys address. This eliminates a PHYS_TO_VM_PAGE
operation.
7) Several rewrites of routines that contain redundant code to
use common routines, so that there is a greater likelihood of
keeping the cache footprint smaller.
Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish
overhead for merged cache.
Efficiency improvement for vfs_cluster. It used to do alot of redundant
calls to cluster_rbuild.
Correct the ordering for vrele of .text and release of credentials.
Use the selective tlb update for 486/586/P6.
Numerous fixes to the size of objects allocated for files. Additionally,
fixes in the various pagers.
Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs.
Fixes in the swap pager for exhausted resources. The pageout code
will not as readily thrash.
Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into
page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE),
thereby improving efficiency of several routines.
Eliminate even more unnecessary vm_page_protect operations.
Significantly speed up process forks.
Make vm_object_page_clean more efficient, thereby eliminating the pause
that happens every 30seconds.
Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the
case of filesystems mounted async.
Fix a panic with busy pages when write clustering is done for non-VMIO
buffers.
sysv_ipc.c: add stub functions that either simply return (for the hooks
in kern_fork/kern_exit) or log() a messgae and call enosys() (for the
syscalls). sysv_ipc.c will become "standard" in conf/files and has
#ifs for all the permutations.