reasonable defaults.
This avoids confusing and ugly casting to eopnotsupp or making dummy functions.
Bogus casting of filesystem sysctls to eopnotsupp() have been removed.
This should make *_vfsops.c more readable and reduce bloat.
Reviewed by: msmith, eivind
Approved by: phk
Tested by: Jeroen Ruigrok/Asmodai <asmodai@wxs.nl>
a quick think and discussion among various people some form of some of
these changes will probably be recommitted.
The reversion requested was requested by dg while discussions proceed.
PHK has indicated that he can live with this, and it has been agreed
that some form of some of these changes may return shortly after further
discussion.
- eliminate the fast/slow timeout lists for TCP and instead use a
callout entry for each timer.
- increase the TCP timer granularity to HZ
- implement "bad retransmit" recovery, as presented in
"On Estimating End-to-End Network Path Properties", by Allman and Paxson.
Submitted by: jlemon, wollmann
the highly non-recommended option ALLOW_BDEV_ACCESS is used.
(bdev access is evil because you don't get write errors reported.)
Kill si_bsize_best before it kills Matt :-)
Use the specfs routines rather having cloned copies in devfs.
discussed on current.
The following variables are defined (for now):
osname (defaults to "Linux")
Allow users to change the name of the OS as returned by uname(2),
specially added for all those Linux Netscape users and statistics
maniacs :-) We now have what we all wanted!
osrelease (defaults to "2.2.5")
Allow users to change the version of the OS as returned by uname(2).
Since -current supports glibc2.1 now, change the default to 2.2.5
(was 2.0.36).
oss_version (defaults to 198144 [0x030600])
This one will be used by the OSS_GETVERSION ioctl (PR 12917) which I
can commit now that we have the MIB. The default version number is the
lowest version possible with the current 'encoding'.
A note about imprisoned processes (see jail(2)):
These variables are copy-on-write (as suggested by phk). This means that
imprisoned processes will use the system wide value unless it is written/set
by the process. From that moment on, a copy local to the prison will be
used.
A note about the implementation:
I choose to add a single pointer to struct prison, because I didn't like the
idea of changing struct prison every time I come up with a new variable. As
a side effect, the extra storage is only needed when a variable is set from
within the prison. This also minimizes kernel bloat when the Linuxulator is
not used; both compiled in or as a module.
Reviewed by: bde (first version only) and phk
Make the alias list a SLIST.
Drop the "fast recycling" optimization of vnodes (including
the returning of a prexisting but stale vnode from checkalias).
It doesn't buy us anything now that we don't hardlimit
vnodes anymore.
Rename checkalias2() and checkalias() to addalias() and
addaliasu() - which takes dev_t and udev_t arg respectively.
Make the revoke syscalls use vcount() instead of VALIASED.
Remove VALIASED flag, we don't need it now and it is faster
to traverse the much shorter lists than to maintain the
flag.
vfs_mountedon() can check the dev_t directly, all the vnodes
point to the same one.
Print the devicename in specfs/vprint().
Remove a couple of stale LFS vnode flags.
Remove unimplemented/unused LK_DRAINED;
In lookup() however it's the other way around as we need to supply the
dev_t for the vnode, so devfs still has a copy of it stashed away.
Sourcing it from the vnode in the vnops however is useful as it makes
a lot of the code almost the same as that in specfs.
are still converted to u_long by assignment of the uintptr_t, and
address calculations are still done using u_long. This is OK for
currently supported machines, but addresses should be represented
by vm_offset_t or uintptr_t in case pointers are longer than longs.
"Fixed" size of linker_path[]. MAXPATHLEN + 1 was 1 too large for
search paths with only one file path in them, but much too small
for search paths with several long file paths in them.
Diskslice/label code not yet handled.
Vinum, i4b, alpha, pc98 not dealt with (left to respective Maintainers)
Add the correct hook for devfs to kern_conf.c
The net result of this excercise is that a lot less files depends on DEVFS,
and devtoname() gets more sensible output in many cases.
A few drivers had minor additional cleanups performed relating to cdevsw
registration.
A few drivers don't register a cdevsw{} anymore, but only use make_dev().
NULL) for now. Bruce says I jumped the gun with my change in
revision 1.131, or maybe it should use nanotime(), or maybe it
shouldn't be decided in the VFS layer at all. I'm leaving it with
the old behavior until the Trans-Pacific Internet Vulcan Mind Meld
yields fuller understanding.
by utimes(path, NULL). This gives them the same precision as the
timestamps produced by write operations. Do likewise for lutimes()
and futimes().
Suggested by bde.
have been maintained, and that is still the default. A new sysctl
variable "vfs.timestamp_precision" can be used to enable higher
levels of precision:
0 = seconds only; nanoseconds zeroed (default).
1 = seconds and nanoseconds, accurate within 1/HZ.
2 = seconds and nanoseconds, truncated to microseconds.
>=3 = seconds and nanoseconds, maximum precision.
Level 1 uses getnanotime(), which is fast but can be wrong by up
to 1/HZ. Level 2 uses microtime(). It might be desirable for
consistency with utimes() and friends, which take timeval structures
rather than timespecs. Level 3 uses nanotime() for the higest
precision.
I benchmarked levels 0, 1, and 3 by copying a 550 MB tree with
"cpio -pdu". There was almost negligible difference in the system
times -- much less than 1%, and less than the variation among
multiple runs at the same level. Bruce Evans dreamed up a torture
test involving 1-byte reads with intervening fstat() calls, but
the cpio test seems more realistic to me.
This feature is currently implemented only for the UFS (FFS and
MFS) filesystems. But I think it should be easy to support it in
the others as well.
An earlier version of this was reviewed by Bruce. He's not to
blame for any breakage I've introduced since then.
Reviewed by: bde (an earlier version of the code)
events, in order to pave the way for removing a number of the ad-hoc
implementations currently in use.
Retire the at_shutdown family of functions and replace them with
new event handler lists.
Rework kern_shutdown.c to take greater advantage of the use of event
handlers.
Reviewed by: green
When you use pty(N) it creates pty(N+1) ready for your use in the DEVFS,
so DEVFS is not cluttered up with hundreds of ptys you are never going to
use.
about a dev_t.
printf("%x", dev) now becomes printf("%s", devtoname(dev)) because
printing actual information about the device is much more useful then
printing a pointer to an address that would never help the developer debug.
Submitted by: phk, bde
the attach/detach to
1) MOD_LOAD before attach
2) MOD_UNLOAD after detach
The driver specific event handler can now be used to function as
driver specific init/deinit function (compare to device specific
init/deinit functions: attach & detach).
Introduce BUF_STRATEGY(struct buf *, int flag) macro, and use it throughout.
please see comment in sys/conf.h about the flag argument.
Remove strategy argument from all the diskslice/label/bad144
implementations, it should be found from the dev_t.
Remove bogus and unused strategy1 routines.
Remove open/close arguments from dssize(). Pick them up from dev_t.
Remove unused and unfinished setgeom support from diskslice/label/bad144 code.
stuff: unregister_methods() is horribly broken. The idea, if I'm not mistaken,
is that the refcount on a method is decremented, and only when it reaches
zero is the method freed. However desc->method is set to NULL unconditionally
regardless of the refcount, which means the method pointer is trashed the
first time the method is deallocated. The obvious detrimental effect is
that memory is leaked. The not so obvious effect is that when you call
unregister_method() the second time on the same method, you get a NULL
pointer dereference and a panic.
Now I can successfully unload network device drivers and the miibus module
without crashing the system.
*sigh*
"the device doesn't support a dump routine"
Only print "dump succeeded" when 0 is returned, instead of when an unexpected
error number is returned, print that error number.
Reviewed by: Eivind
we create the pty on the fly when it is first opened.
If you run out of ptys now, just MAKEDEV some more.
This also demonstrate the use of dev_t->si_tty_tty and dev_t->si_drv1
in a device driver.
warnings of the following nature on reloading a kld:
WARNING: "vinum" is usurping "console"'s bmaj
This only applies to cases where "console" is mentioned.
Broken-by: grog
now used in f_ops in place of NULL, and modifications to the files
are more carefully ordered. f_ops should also be set to &badfileops
upon "close" of a file.
This does not fix other problems mentioned in this PR than the first
one.
PR: 11629
Reviewed by: peter
o use suser_xxx rather than suser to support JAIL code.
o KNF comment convention
o use vp->type rather than vaddr.type and eliminate call to
VOP_GETATTR. Bruce says that vp->type is valid at this
point.
Submitted by: bde.
Not fixed:
o return (value)
o Comment needs to be longer and more explicit. It will be after
the advisory.
- device_print_child() either lets the BUS_PRINT_CHILD
method produce the entire device announcement message or
it prints "foo0: not found\n"
Alter sys/kern/subr_bus.c:bus_generic_print_child() to take on
the previous behavior of device_print_child() (printing the
"foo0: <FooDevice 1.1>" bit of the announce message.)
Provide bus_print_child_header() and bus_print_child_footer()
to actually print the output for bus_generic_print_child().
These functions should be used whenever possible (unless you can
just use bus_generic_print_child())
The BUS_PRINT_CHILD method now returns int instead of void.
Modify everything else that defines or uses a BUS_PRINT_CHILD
method to comply with the above changes.
- Devices are 'on' a bus, not 'at' it.
- If a custom BUS_PRINT_CHILD method does the same thing
as bus_generic_print_child(), use bus_generic_print_child()
- Use device_get_nameunit() instead of both
device_get_name() and device_get_unit()
- All BUS_PRINT_CHILD methods return the number of
characters output.
Reviewed by: dfr, peter
macros) to the signal handler, for old-style BSD signal handlers as
the second (int) argument, for SA_SIGINFO signal handlers as
siginfo_t->si_code. This is source-compatible with Solaris, except
that we have no <siginfo.h> (which isn't even mentioned in POSIX
1003.1b).
An rather complete example program is at
http://www3.cons.org/cracauer/freebsd-signal.c
This will be added to the regression tests in src/.
This commit also adds code to disable the (hardware) FPU from
userconfig, so that you can use a software FP emulator on a machine
that has hardware floating point. See LINT.
When creating new processes (or performing exec), the new page
directory is initialized too early. The kernel might grow before
p_vmspace is initialized for the new process. Since pmap_growkernel
doesn't yet know about the new page directory, it isn't updated, and
subsequent use causes a failure.
The fix is (1) to clear p_vmspace early, to stop pmap_growkernel
from stomping on memory, and (2) to defer part of the initialization
of new page directories until p_vmspace is initialized.
PR: kern/12378
Submitted by: tegge
Reviewed by: dfr
also gets the device by st_rdev, which is alright except for the fact that
the sysctl kern.dumpdev passed out a char device. This is a workaround.
Sorry for not committing the fix earlier, before people started having
problems.
vnodes referencing this device.
Details:
cdevsw->d_parms has been removed, the specinfo is available
now (== dev_t) and the driver should modify it directly
when applicable, and the only driver doing so, does so:
vn.c. I am not sure the logic in checking for "<" was right
before, and it looks even less so now.
An intial pool of 50 struct specinfo are depleted during
early boot, after that malloc had better work. It is
likely that fewer than 50 would do.
Hashing is done from udev_t to dev_t with a prime number
remainder hash, experiments show no better hash available
for decent cost (MD5 is only marginally better) The prime
number used should not be close to a power of two, we use
83 for now.
Add new checkalias2() to get around the loss of info from
dev2udev() in bdevvp();
The aliased vnodes are hung on a list straight of the dev_t,
and speclisth[SPECSZ] is unused. The sharing of struct
specinfo means that the v_specnext moves into the vnode
which grows by 4 bytes.
Don't use a VBLK dev_t which doesn't make sense in MFS, now
we hang a dummy cdevsw on B/Cmaj 253 so that things look sane.
Storage overhead from all of this is O(50k).
Bump __FreeBSD_version to 400009
The next step will add the stuff needed so device-drivers can start to
hang things from struct specinfo
the caller to specify a function to be guarded between an entry and exit
barrier, as well as pre- and post-barrier functions.
The primary use for this function is synchronised update of per-cpu private
data. The implementation is almost (but not quite) MI; with a better
mechanism for masking per-CPU interrupts it could probably be hoisted.
Reviewed by: peter (partially)
Only know casualy of this is swapinfo/pstat which should be fixes
the right way: Store the actual pathname in the kernel like mount
does. [Volounteers sought for this task]
The road map from here is roughly: expand struct specinfo into struct
based dev_t. Add dev_t registration facilities for device drivers and
start to use them.
used for timecounting. The possible values are the names of the
physically present harware timecounters ("i8254" and "TSC" on i386's).
Fixed some nearby bitrot in comments in <sys/time.h>.
Reviewed by: phk
1. Printing large quads in small bases overflowed the buffer if
sizeof(u_quad_t) > sizeof(u_long).
2. The sharpflag checks had operator precedence bugs due to excessive
parentheses in all the wrong places.
3. The explicit 0L was bogus in the quad_t comparison and useless in
the long comparision.
4. There was some more bitrot in the comment about ksprintn(). Our
ksprintn() handles bases up to 36 as well as down to 2.
Bruce has other complaints about using %q in kernel and would rather
we went towards using the C9X style %ll and/or %j. (I agree for that
matter, as long as gcc/egcs know how to deal with that.)
Submitted by: bde
I don't know if it was intentional or not, but it would have printed out:
/compat/linux/foo/bar.so: interpreter not found
If it was, then I've broken it. De-constifying the 'interp' variable
or carrying the constness through to elf_load_file() are alternatives.
Alpha believes that %q is for long long, whereas our quad_t and int64_t
is only just a plain long. long long on the alpha is the same size (64
bit) as a long. It was requested, but I have not implemented yet, support
for C9X style %lld - it should be pretty easy though.
large (1G) memory machine configurations. I was able to run 'dbench 32'
on a 32MB system without bring the machine to a grinding halt.
* buffer cache hash table now dynamically allocated. This will
have no effect on memory consumption for smaller systems and
will help scale the buffer cache for larger systems.
* minor enhancement to pmap_clearbit(). I noticed that
all the calls to it used constant arguments. Making
it an inline allows the constants to propogate to
deeper inlines and should produce better code.
* removal of inherent vfs_ioopt support through the emplacement
of appropriate #ifdef's, with John's permission. If we do not
find a use for it by the end of the year we will remove it entirely.
* removal of getnewbufloops* counters & sysctl's - no longer
necessary for debugging, getnewbuf() is now optimal.
* buffer hash table functions removed from sys/buf.h and localized
to vfs_bio.c
* VFS_BIO_NEED_DIRTYFLUSH flag and support code added
( bwillwrite() ), allowing processes to block when too many dirty
buffers are present in the system.
* removal of a softdep test in bdwrite() that is no longer necessary
now that bdwrite() no longer attempts to flush dirty buffers.
* slight optimization added to bqrelse() - there is no reason
to test for available buffer space on B_DELWRI buffers.
* addition of reverse-scanning code to vfs_bio_awrite().
vfs_bio_awrite() will attempt to locate clusterable areas
in both the forward and reverse direction relative to the
offset of the buffer passed to it. This will probably not
make much of a difference now, but I believe we will start
to rely on it heavily in the future if we decide to shift
some of the burden of the clustering closer to the actual
I/O initiation.
* Removal of the newbufcnt and lastnewbuf counters that Kirk
added. They do not fix any race conditions that haven't already
been fixed by the gbincore() test done after the only call
to getnewbuf(). getnewbuf() is a static, so there is no chance
of it being misused by other modules. ( Unless Kirk can think
of a specific thing that this code fixes. I went through it
very carefully and didn't see anything ).
* removal of VOP_ISLOCKED() check in flushbufqueues(). I do not
think this check is necessary, the buffer should flush properly
whether the vnode is locked or not. ( yes? ).
* removal of extra arguments passed to getnewbuf() that are not
necessary.
* missed cluster_wbuild() that had to be a cluster_wbuild_wb() in
vfs_cluster.c
* vn_write() now calls bwillwrite() *PRIOR* to locking the vnode,
which should greatly aid flushing operations in heavy load
situations - both the pageout and update daemons will be able
to operate more efficiently.
* removal of b_usecount. We may add it back in later but for now
it is useless. Prior implementations of the buffer cache never
had enough buffers for it to be useful, and current implementations
which make more buffers available might not benefit relative to
the amount of sophistication required to implement a b_usecount.
Straight LRU should work just as well, especially when most things
are VMIO backed. I expect that (even though John will not like
this assumption) directories will become VMIO backed some point soon.
Submitted by: Matthew Dillon <dillon@backplane.com>
Reviewed by: Kirk McKusick <mckusick@mckusick.com>
than a review, this was a nice puzzle.
This is supposed to be binary and source compatible with older
applications that access the old FreeBSD-style three arguments to a
signal handler.
Except those applications that access hidden signal handler arguments
bejond the documented third one. If you have applications that do,
please let me know so that we take the opportunity to provide the
functionality they need in a documented manner.
Also except application that use 'struct sigframe' directly. You need
to recompile gdb and doscmd. `make world` is recommended.
Example program that demonstrates how SA_SIGINFO and old-style FreeBSD
handlers (with their three args) may be used in the same process is at
http://www3.cons.org/tmp/fbsd-siginfo.c
Programs that use the old FreeBSD-style three arguments are easy to
change to SA_SIGINFO (although they don't need to, since the old style
will still work):
Old args to signal handler:
void handler_sn(int sig, int code, struct sigcontext *scp)
New args:
void handler_si(int sig, siginfo_t *si, void *third)
where:
old:code == new:second->si_code
old:scp == &(new:si->si_scp) /* Passed by value! */
The latter is also pointed to by new:third, but accessing via
si->si_scp is preferred because it is type-save.
FreeBSD implementation notes:
- This is just the framework to make the interface POSIX compatible.
For now, no additional functionality is provided. This is supposed
to happen now, starting with floating point values.
- We don't use 'sigcontext_t.si_value' for now (POSIX meant it for
realtime-related values).
- Documentation will be updated when new functionality is added and
the exact arguments passed are determined. The comments in
sys/signal.h are meant to be useful.
Reviewed by: BDE
dynamicly linked binaries to run in a chroot'd environment with "emul_path"
as the new root. The new behavior of loading interpreters is identical to the
principle of overlaying.
PR: 10145
into uipc_mbuf.c. This reduces three sets of identical tunable code to
one set, and puts the initialisation with the mbuf code proper.
Make NMBUFs tunable as well.
Move the nmbclusters sysctl here as well.
Move the initialisation of maxsockets from param.c to uipc_socket2.c,
next to its corresponding sysctl.
Use the new tunable macros for the kern.vm.kmem.size tunable (this should have
been in a separate commit, whoops).
returns 0 after ptrace() attach and/or detach doesn't quite quite
deliver a signal. Perhaps the process shouldn't be woken in this
case, but avoiding the problem is easy.
PR: 12247
Fixed a couple of places where mechanical fixing of compiler warnings
caused misspelling of NOLOCKF as NULL.
allow changes to the filesystem's write_behind behavior. By the
default the filesystem aggressively issues write_behind's. Three values
may be specified for vfs.write_behind. 0 disables write_behind, 1 results
in historical operation (agressive write_behind), and 2 is an experimental
backed-off write_behind. The values of 0 and 1 are recommended. The value
of 0 is recommended in conjuction with an increase in the number of
NBUF's and the number of dirty buffers allowed (vfs.{lo,hi}dirtybuffers).
Note that a value of 0 will radically increase the dirty buffer load on
the system. Future work on write_behind behavior will use values 2 and
greater for testing purposes.
Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Kirk McKusick <mckusick@mckusick.com>
QUEUE_AGE, QUEUE_LRU, and QUEUE_EMPTY we instead have QUEUE_CLEAN,
QUEUE_DIRTY, QUEUE_EMPTY, and QUEUE_EMPTYKVA. With this patch clean
and dirty buffers have been separated. Empty buffers with KVM
assignments have been separated from truely empty buffers. getnewbuf()
has been rewritten and now operates in a 100% optimal fashion. That is,
it is able to find precisely the right kind of buffer it needs to
allocate a new buffer, defragment KVM, or to free-up an existing buffer
when the buffer cache is full (which is a steady-state situation for
the buffer cache).
Buffer flushing has been reorganized. Previously buffers were flushed
in the context of whatever process hit the conditions forcing buffer
flushing to occur. This resulted in processes blocking on conditions
unrelated to what they were doing. This also resulted in inappropriate
VFS stacking chains due to multiple processes getting stuck trying to
flush dirty buffers or due to a single process getting into a situation
where it might attempt to flush buffers recursively - a situation that
was only partially fixed in prior commits. We have added a new daemon
called the buf_daemon which is responsible for flushing dirty buffers
when the number of dirty buffers exceeds the vfs.hidirtybuffers limit.
This daemon attempts to dynamically adjust the rate at which dirty buffers
are flushed such that getnewbuf() calls (almost) never block.
The number of nbufs and amount of buffer space is now scaled past the
8MB limit that was previously imposed for systems with over 64MB of
memory, and the vfs.{lo,hi}dirtybuffers limits have been relaxed
somewhat. The number of physical buffers has been increased with the
intention that we will manage physical I/O differently in the future.
reassignbuf previously attempted to keep the dirtyblkhd list sorted which
could result in non-deterministic operation under certain conditions,
such as when a large number of dirty buffers are being managed. This
algorithm has been changed. reassignbuf now keeps buffers locally sorted
if it can do so cheaply, and otherwise gives up and adds buffers to
the head of the dirtyblkhd list. The new algorithm is deterministic but
not perfect. The new algorithm greatly reduces problems that previously
occured when write_behind was turned off in the system.
The P_FLSINPROG proc->p_flag bit has been replaced by the more descriptive
P_BUFEXHAUST bit. This bit allows processes working with filesystem
buffers to use available emergency reserves. Normal processes do not set
this bit and are not allowed to dig into emergency reserves. The purpose
of this bit is to avoid low-memory deadlocks.
A small race condition was fixed in getpbuf() in vm/vm_pager.c.
Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Kirk McKusick <mckusick@mckusick.com>
SYSINIT_KT() etc (which is a static, compile-time procedure), use a
NetBSD-style kthread_create() interface. kproc_start is still available
as a SYSINIT() hook. This allowed simplification of chunks of the
sysinit code in the process. This kthread_create() is our old kproc_start
internals, with the SYSINIT_KT fork hooks grafted in and tweaked to work
the same as the NetBSD one.
One thing I'd like to do shortly is get rid of nfsiod as a user initiated
process. It makes sense for the nfs client code to create them on the
fly as needed up to a user settable limit. This means that nfsiod
doesn't need to be in /sbin and is always "available". This is a fair bit
easier to do outside of the SYSINIT_KT() framework.
the caller can easily find the child proc struct. fork(), rfork() etc
syscalls set p->p_retval[] themselves. Simplify the SYSINIT_KT() code
and other kernel thread creators to not need to use pfind() to find the
child based on the pid. While here, partly tidy up some of the fork1()
code for RF_SIGSHARE etc.
(really this time) fix pageout to swap and a couple of clustering cases.
This simplifies BUF_KERNPROC() so that it unconditionally reassigns the
lock owner rather than testing B_ASYNC and having the caller decide when
to do the reassign. At present this is required because some places use
B_CALL/b_iodone to free the buffers without B_ASYNC being set. Also,
vfs_cluster.c explicitly calls BUF_KERNPROC() when attaching the buffers
rather than the parent walking the cluster_head tailq.
Reviewed by: Kirk McKusick <mckusick@mckusick.com>
if LK_RECURSIVE is not set, as we will simply return that the
lock is busy and not actually deadlock. This allows processes
to use polling locks against buffers that they may already
hold exclusively locked.
lockmgr locks. This commit should be functionally equivalent to the old
semantics. That is, all buffer locking is done with LK_EXCLUSIVE
requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will
be done in future commits.
- Split syscons source code into manageable chunks and reorganize
some of complicated functions.
- Many static variables are moved to the softc structure.
- Added a new key function, PREV. When this key is pressed, the vty
immediately before the current vty will become foreground. Analogue
to PREV, which is usually assigned to the PrntScrn key.
PR: kern/10113
Submitted by: Christian Weisgerber <naddy@mips.rhein-neckar.de>
- Modified the kernel console input function sccngetc() so that it
handles function keys properly.
- Reorganized the screen update routine.
- VT switching code is reorganized. It now should be slightly more
robust than before.
- Added the DEVICE_RESUME function so that syscons no longer hooks the
APM resume event directly.
- New kernel configuration options: SC_NO_CUTPASTE, SC_NO_FONT_LOADING,
SC_NO_HISTORY and SC_NO_SYSMOUSE.
Various parts of syscons can be omitted so that the kernel size is
reduced.
SC_PIXEL_MODE
Made the VESA 800x600 mode an option, rather than a standard part of
syscons.
SC_DISABLE_DDBKEY
Disables the `debug' key combination.
SC_ALT_MOUSE_IMAGE
Inverse the character cell at the mouse cursor position in the text
console, rather than drawing an arrow on the screen.
Submitted by: Nick Hibma (n_hibma@FreeBSD.ORG)
SC_DFLT_FONT
makeoptions "SC_DFLT_FONT=_font_name_"
Include the named font as the default font of syscons. 16-line,
14-line and 8-line font data will be compiled in. This option replaces
the existing STD8X16FONT option, which loads 16-line font data only.
- The VGA driver is split into /sys/dev/fb/vga.c and /sys/isa/vga_isa.c.
- The video driver provides a set of ioctl commands to manipulate the
frame buffer.
- New kernel configuration option: VGA_WIDTH90
Enables 90 column modes: 90x25, 90x30, 90x43, 90x50, 90x60. These
modes are mot always supported by the video card.
PR: i386/7510
Submitted by: kbyanc@freedomnet.com and alexv@sui.gda.itesm.mx.
- The header file machine/console.h is reorganized; its contents is now
split into sys/fbio.h, sys/kbio.h (a new file) and sys/consio.h
(another new file). machine/console.h is still maintained for
compatibility reasons.
- Kernel console selection/installation routines are fixed and
slightly rebumped so that it should now be possible to switch between
the interanl kernel console (sc or vt) and a remote kernel console
(sio) again, as it was in 2.x, 3.0 and 3.1.
- Screen savers and splash screen decoders
Because of the header file reorganization described above, screen
savers and splash screen decoders are slightly modified. After this
update, /sys/modules/syscons/saver.h is no longer necessary and is
removed.
at which we may sleep. So, after completing our buffer allocation
we must ensure that another process has not come along and allocated
a different buffer with the same identity. We do this by keeping a
global counter of the number of buffers that getnewbuf has allocated.
We save this count when we enter getnewbuf and check it when we are
about to return. If it has changed, then other buffers were allocated
while we were in getnewbuf, so we must return NULL to let our parent
know that it must recheck to see if it still needs the new buffer.
Hopefully this fix will eliminate the creation of duplicate buffers
with the same identity and the obscure corruptions that they cause.
done is less-than cute, but this whole file is suffering from some amount
of bitrot. Execution of zipped files should probably be implemented in a
manner similar to that of #!/interpreted files.
PR: kern/10780
automatically hacks on the active copy of the IDT if f00f_hack()
has changed it. This also allows simplifications in setidt().
This fixes breakage of FP exception handling by rev.1.55 of
sys/kernel.h. FP exceptions were sent to npx.c's probe handlers
because npx.c "restored" the old handlers to the wrong copy of the
IDT. The SYSINIT for f00f_hack() was purposely run quite late to
avoid problems like this, but it is bogusly associated with the
SYSINIT for proc0 so it was moved with the latter.
Problem reported and fix tested by: Martin Cracauer <cracauer@cons.org>
This is the change to struct sockets that gets rid of so_uid and replaces
it with a much more useful struct pcred *so_cred. This is here to be able
to do socket-level credential checks (i.e. IPFW uid/gid support, to be added
to HEAD soon). Along with this comes an update to pidentd which greatly
simplifies the code necessary to get a uid from a socket. Soon to come:
a sysctl() interface to finding individual sockets' credentials.
would occur when clustering them - caused by running out of buffers
and taking a degenerate code path as a result. It appears that waiting
instead for buffers to become available is okay.
Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
Discovered by: Craig A Soules <soules+@andrew.cmu.edu>
- this causes POSIX locking to use the thread group leader
(p->p_leader) as the locking thread for all advisory locks.
In non-kernel-threaded code p->p_leader == p, so this will have
no effect.
This results in (more) correct POSIX threaded flock-ing semantics.
It also prevents the leader from exiting before any of the children.
(so that p->p_leader will never be stale) in exit1().
We have been running this patch for over a month now in our lab
under load and at customer sites.
Submitted by: John Plevyak <jplevyak@inktomi.com>
to either enqueue or free their mbuf chains, but tcp_usr_send() was
dropping them on the floor if the tcpcb/inpcb has been torn down in the
middle of a send/write attempt. This has been responsible for a wide
variety of mbuf leak patterns, ranging from slow gradual leakage to rather
rapid exhaustion. This has been a problem since before 2.2 was branched
and appears to have been fixed in rev 1.16 and lost in 1.23/1.28.
Thanks to Jayanth Vijayaraghavan <jayanth@yahoo-inc.com> for checking
(extensively) into this on a live production 2.2.x system and that it
was the actual cause of the leak and looks like it fixes it. The machine
in question was loosing (from memory) about 150 mbufs per hour under
load and a change similar to this stopped it. (Don't blame Jayanth
for this patch though)
An alternative approach to this would be to recheck SS_CANTSENDMORE etc
inside the splnet() right before calling pru_send() after all the potential
sleeps, interrupts and delays have happened. However, this would mean
exposing knowledge of the tcp stack's reset handling and removal of the
pcb to the generic code. There are other things that call pru_send()
directly though.
Problem originally noted by: John Plevyak <jplevyak@inktomi.com>
The cdevsw_add() function now finds the major number(s) in the
struct cdevsw passed to it. cdevsw_add_generic() is no longer
needed, cdevsw_add() does the same thing.
cdevsw_add() will print an message if the d_maj field looks bogus.
Remove nblkdev and nchrdev variables. Most places they were used
bogusly. Instead check a dev_t for validity by seeing if devsw()
or bdevsw() returns NULL.
Move bdevsw() and devsw() functions to kern/kern_conf.c
Bump __FreeBSD_version to 400006
This commit removes:
72 bogus makedev() calls
26 bogus SYSINIT functions
if_xe.c bogusly accessed cdevsw[], author/maintainer please fix.
I4b and vinum not changed. Patches emailed to authors. LINT
probably broken until they catch up.
Reformat and initialize correctly all "struct cdevsw".
Initialize the d_maj and d_bmaj fields.
The d_reset field was not removed, although it is never used.
I used a program to do most of this, so all the files now use the
same consistent format. Please keep it that way.
Vinum and i4b not modified, patches emailed to respective authors.
easier to use and more flexible.
* Change BUS_ADD_CHILD to take an order argument instead of a place.
* Define a partial ordering for isa devices so that sensitive devices are
probed before non-sensitive ones.
if there is no character device associated with the block device. In this
case that doesn't matter because bdevvp() doesn't use the character
device structure.
I can use the pointy bit of the axe too.
It never makes sense to specify MAP_COPY_NEEDED without also specifying
MAP_COPY_ON_WRITE, and vice versa. Thus, MAP_COPY_ON_WRITE suffices.
Reviewed by: David Greenman <dg@root.com>
instances to a parent bus.
* Define a new method BUS_ADD_CHILD which can be called from DEVICE_IDENTIFY
to add new instances.
* Add a generic implementation of DEVICE_PROBE which calls DEVICE_IDENTIFY
for each driver attached to the parent's devclass.
* Move the hint-based isa probe from the isa driver to a new isahint driver
which can be shared between i386 and alpha.
inodes were synced every 15 seconds. This is now reversed as during
directory create, we cannot commit the directory entry until its
inode has been written. With this switch, the inodes will be more
likely to be written by the time that the directory is written thus
reducing the number of directory rollbacks that are needed.
Changed to `const void *'. utrace() is undocumented, so nothing should
notice.
Fixed missing consts for utrace() and ktrace() in syscalls.master.
sys/ktrace.h is missing some Lite2 changes of shorts to ints.
with malloc type at the tail of the list changed the list from
linear to circular. This seemed to cause surprisingly few problems,
but it now causes weird output from `vmstat -m', probably because
a more important malloc type is now at the tail of the list.
Fix it by abusing ks_limit instead of ks_next as a flag for being
on the list. Don't forget to clear the flag when a malloc type is
uninit'ed. Uninit'ing is still fundamentally broken -- it loses
history.
udev_t in the kernel but still called dev_t in userland.
Provide functions to manipulate both types:
major() umajor()
minor() uminor()
makedev() umakedev()
dev2udev() udev2dev()
For now they're functions, they will become in-line functions
after one of the next two steps in this process.
Return major/minor/makedev to macro-hood for userland.
Register a name in cdevsw[] for the "filedescriptor" driver.
In the kernel the udev_t appears in places where we have the
major/minor number combination, (ie: a potential device: we
may not have the driver nor the device), like in inodes, vattr,
cdevsw registration and so on, whereas the dev_t appears where
we carry around a reference to a actual device.
In the future the cdevsw and the aliased-from vnode will be hung
directly from the dev_t, along with up to two softc pointers for
the device driver and a few houskeeping bits. This will essentially
replace the current "alias" check code (same buck, bigger bang).
A little stunt has been provided to try to catch places where the
wrong type is being used (dev_t vs udev_t), if you see something
not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if
it makes a difference. If it does, please try to track it down
(many hands make light work) or at least try to reproduce it
as simply as possible, and describe how to do that.
Without DEVT_FASCIST I belive this patch is a no-op.
Stylistic/posixoid comments about the userland view of the <sys/*.h>
files welcome now, from userland they now contain the end result.
Next planned step: make all dev_t's refer to the same devsw[] which
means convert BLK's to CHR's at the perimeter of the vnodes and
other places where they enter the game (bootdev, mknod, sysctl).
unp_internalize() takes a reference to the descriptor. If the send
fails after unp_internalize(), the control mbuf would be freed ophaning
the reference.
Tested in -CURRENT by: Pierre Beyssac <beyssac@enst.fr>
through' to the C compiler.
* Allow the interface to specify a default implementation for methods.
* Allow 'static' methods which are not device specific.
* Add a simple scheme for probe routines to return a priority value. To
make life simple, priority values are negative numbers (positive numbers
are standard errno codes) with zero being the highest priority. The
driver which returns the highest priority will be chosen for the device.
upset about it (and generate things like __main() calls that are reserved
for main()). Renaming was phk's suggestion, but I'd already thought about
it too. (phk liked my suggested name tada() but I decided against it :-)
Reviewed by: phk
- first program lock a region in a file,
- second program wait on the lock,
- first program extend the region,
- second program interrupted by a signal.
particularly annoying hack, namely having the linker bash the moduledata
to set the container pointer, preventing it being const. In the process,
a stack of warnings were fixed and will probably allow a revisit of the
const C_SYSINIT() changes. This explicitly registers modules in files or
preload areas with the module system first, and let them initialize via
SYSINIT/DECLARE_MODULE later in their SI_ORDER_xxx order. The kludge of
finding the containing file is no longer needed since the registration
of modules onto the modules list is done in the context of initializing
the linker file.