an operation, as a kernel client may not have previously checked the CPU
type (it may not be able to).
Also correct the function declaration style for the mem_range functions to
match the rest of this file (oops).
Submitted by: gibbs
the highly non-recommended option ALLOW_BDEV_ACCESS is used.
(bdev access is evil because you don't get write errors reported.)
Kill si_bsize_best before it kills Matt :-)
Use the specfs routines rather having cloned copies in devfs.
UMAPFS_DIAGNOSTIC and UNION_DIAGNOSTIC. Uncommented NULLFS_DIAGNOSTIC.
It is as bogus as the above three but since it is already a new-style
option it is easier to use it than to fix it.
discussed on current.
The following variables are defined (for now):
osname (defaults to "Linux")
Allow users to change the name of the OS as returned by uname(2),
specially added for all those Linux Netscape users and statistics
maniacs :-) We now have what we all wanted!
osrelease (defaults to "2.2.5")
Allow users to change the version of the OS as returned by uname(2).
Since -current supports glibc2.1 now, change the default to 2.2.5
(was 2.0.36).
oss_version (defaults to 198144 [0x030600])
This one will be used by the OSS_GETVERSION ioctl (PR 12917) which I
can commit now that we have the MIB. The default version number is the
lowest version possible with the current 'encoding'.
A note about imprisoned processes (see jail(2)):
These variables are copy-on-write (as suggested by phk). This means that
imprisoned processes will use the system wide value unless it is written/set
by the process. From that moment on, a copy local to the prison will be
used.
A note about the implementation:
I choose to add a single pointer to struct prison, because I didn't like the
idea of changing struct prison every time I come up with a new variable. As
a side effect, the extra storage is only needed when a variable is set from
within the prison. This also minimizes kernel bloat when the Linuxulator is
not used; both compiled in or as a module.
Reviewed by: bde (first version only) and phk
aligned. If I recall correctly, this is to ensure apic_imen can be
accessed in a single bus cycle. Also, use TEXT_ALIGN rather than a
.align 2 (which means 2 byte align on ELF and 4 byte align on a.out)
directory. Also, update arguments of NDINIT for both newstat and newlstat.
While I'm at it, fix style bugs in all {s|ls|fs}tat syscalls.
Reported by: bde
egid will be twice in the set and that setting cr_groups[0] will change egid.
This is simply solved by ignoring cr_groups[0]. That is; linux_getgroups does
not return cr_groups[0] and linux_setgroups does not touch it.
Noticed by: bde
Brought to my attention by: sheldonh
know if and when an unimplemented or obsoleted syscall is being used. Make the
message more end-user friendly.
And as long as we're here, rename some unimplemeted syscalls (linux_phys ->
linux_umount2, linux_vm86 -> linux_vm86old, linux_new_vm86 -> linux_vm86).
Change prototype for linux_newuname from `struct linux_newuname_t *' into
`struct linux_new_utsname *'. This change is reflected in linux.h and
linux_misc.c.
know if and when an unimplemented or obsoleted syscall is being used. Make the
message more end-user friendly.
And as long as we're here, rename some unimplemeted syscalls (linux_phys ->
linux_umount2, linux_vm86 -> linux_vm86old, linux_new_vm86 -> linux_vm86).
Change prototype for linux_newuname from `struct linux_newuname_t *' into
`struct linux_new_utsname *'. This change is reflected in linux.h and
linux_misc.c.
Lastly, make line-continuation and indentation more uniform.
stuff from mem.c. If PERFMON is there, it will "steal" a minor from
mem.c, but mem.c doesn't need to know about this.
Fixed type of cmd argument in perfmon_ioctl().
Diskslice/label code not yet handled.
Vinum, i4b, alpha, pc98 not dealt with (left to respective Maintainers)
Add the correct hook for devfs to kern_conf.c
The net result of this excercise is that a lot less files depends on DEVFS,
and devtoname() gets more sensible output in many cases.
A few drivers had minor additional cleanups performed relating to cdevsw
registration.
A few drivers don't register a cdevsw{} anymore, but only use make_dev().
here any more as they are self identifying. Only PNP remains but that
will be replaced any day now.
Also reword a comment that had been XXX'ed to death to make it clear[er]
why we don't enable interrupts before probing.
PCIBIOS interrupt routing controls may make this possible to fix one day.
than having explicit hooks here.
Treat the eisa/isa attach a little differently so that we defer the
decision about to attach eisa/isa to the motherboard directly only if
the PCI probe (if it exists) fails to turn up a PCI->EISA/ISA bridge.
This restores the original device geometry where ISA and/or EISA attach
to their bridge rather than bypassing and going to the root.
`sleep 1; zzz' trick now.
- APM BIOS Call for suspend/standby now should be issued with delay.
- Delay for suspend/standby can be adjusted by using sysctl(8) interface
(eg. sysctl -w machdep.apm_suspend_delay=3).
PCI fast ethernet controller. Currently, the only card I know that uses
this chip is the D-Link DFE-550TX. (Don't ask me where to buy these: the
only cards I have are samples sent to me by D-Link.)
This driver is the first to make use of the miibus code once I'm sure
it all works together nicely, I'll start converting the other drivers.
The Sundance chip is a clone of the 3Com 3c90x Etherlink XL design
only with its own register layout. Support is provided for ifmedia,
hardware multicast filtering, bridging and promiscuous mode.
MII-compliant PHY drivers. Many 10/100 ethernet NICs available today
either use an MII transceiver or have built-in transceivers that can
be programmed using an MII interface. It makes sense then to separate
this support out into common code instead of duplicating it in all
of the NIC drivers. The mii code also handles all of the media
detection, selection and reporting via the ifmedia interface.
This is basically the same code from NetBSD's /sys/dev/mii, except
it's been adapted to FreeBSD's bus architecture. The advantage to this
is that it automatically allows everything to be turned into a
loadable module. There are some common functions for use in drivers
once an miibus has been attached (mii_mediachg(), mii_pollstat(),
mii_tick()) as well as individual PHY drivers. There is also a
generic driver for all PHYs that aren't handled by a specific driver.
It's possible to do this because all 10/100 PHYs implement the same
general register set in addition to their vendor-specific register
sets, so for the most part you can use one driver for pretty much
any PHY. There are a couple of oddball exceptions though, hence
the need to have specific drivers.
There are two layers: the generic "miibus" layer and the PHY driver
layer. The drivers are child devices of "miibus" and the "miibus" is
a child of a given NIC driver. The "miibus" code and the PHY drivers
can actually be compiled and kldoaded as completely separate modules
or compiled together into one module. For the moment I'm using the
latter approach since the code is relatively small.
Currently there are only three PHY drivers here: the generic driver,
the built-in 3Com XL driver and the NS DP83840 driver. I'll be adding
others later as I convert various NIC drivers to use this code.
I realize that I'm cvs adding this stuff instead of importing it
onto a separate vendor branch, but in my opinion the import approach
doesn't really offer any significant advantage: I'm going to be
maintaining this stuff and writing my own PHY drivers one way or
the other.
events, in order to pave the way for removing a number of the ad-hoc
implementations currently in use.
Retire the at_shutdown family of functions and replace them with
new event handler lists.
Rework kern_shutdown.c to take greater advantage of the use of event
handlers.
Reviewed by: green
trying to size it intelligently just make it 64k and leave it up to the caller
to ensure that the arguments all fit within that range.
This should resolve the issue that some people were seeing with the PnP BIOS
scan crashing on a large PnP node.
Notice that 'unit' wasn't defined once I changed the parameters of the func.
These things make me feel like wading in with a flamethrowr or something.
Too much cruft!
</rant>
if_init_f_t is passed void * containing the address of ifp->if_softc
not the unit number.
Someone tell me if these things don't work as I don't have the hardware
needed to test them. (thats a first.)
I'll get if_ze and if_zp later.
Pointed out by: Kazutaka YOKOTA <yokota@zodiac.mech.utsunomiya-u.ac.jp>
test does not change undefined flag like Cyrix CPUs. Another is that
5/2 test changes undefined flag like Intel CPUs. Latter one could not
be detected and was recognized 486DX CPU. To solve this,
finishidentcpu() calls identblue() when cpu_vendor is null string
(that is, CPUID instruction is not supported) and cpu == CPU_486.
Tests have been done on IBM BlueLightning CPUs, i486SX and i486DX.
- increase the default timeout from 10 seconds to 60 seconds
- add a new kernel option, SCSI_PT_DEFAULT_TIMEOUT, that lets users specify
the default timeout for the pt driver to use
- add two new ioctls, one to get the timeout for a given pt device, the
other to set the timeout for a given pt device. The idea is that
userland applications using the device can set the timeout to suit their
purposes. The ioctls are defined in a new header file, sys/ptio.h
PR: 10266
Reviewed by: gibbs, joerg
into two parts - one to do the bsfl and the other to convert the result
(base 0) to ffs()-like (base 1) in inline C. This enables the optimizer
to be a lot smarter in certain cases, like where it knows that the argument
is non-zero and we want ffs(known non zero arg) - 1. This appears to
produce identical code to the old inline when the argument is unknown.
that goes to opt_dontuse.h is so an opt_*.h file doesn't get created even
though an option may be used for bringing stuff in via files[.*].
Pointed out by: bde
that are linked into the kernel. The KLD compilation options are
changed to call these functions, rather than in-lining the
atomic operations.
This approach makes atomic operations from KLDs significantly
faster on UP systems (though somewhat slower on SMP systems).
PR: i386/13111
Submitted by: peter.jeremy@alcatel.com.au
about a dev_t.
printf("%x", dev) now becomes printf("%s", devtoname(dev)) because
printing actual information about the device is much more useful then
printing a pointer to an address that would never help the developer debug.
Submitted by: phk, bde
didn't match the argument (p->p_pid).
While I'm at it, also fix the dupo in the format string and fix the annoying
inconsistency in all the debug-printfs wrt p_pid arguments. Change all of them
to use the %ld format specifier and cast the p_pid arguments to long.
Submitted by: billf
0x40 and then access data stored in real-mode segment 0x40, even when
called in protected mode. Microsoft unfortunately coddle these individuals,
and so must we if we want to run their code.
This change works around GPFs in some APM and PnP BIOS implementations.
Obtained from: Linux
The first reason for this rewrite is KNF conformance.
The second reason is to avoid redundancy. Each function printed the same
string, with only the syscall name being different. The actual printing is now
performed by a single function, which gets the syscall name as an argument.
The third reason is that of convenience. It's now very easy to add a new
dummy implementation. Just add ``DUMMY(foo);'' to the file. It's also a lot
easier now to see if a syscall has a dummy implementation or not.
The dummies are ordered on syscall number. Please maintain this when adding
new dummies (there're 32 candidates at the time of writing :-)
Reviewed by: bde
prototypes of o{s|g}etrlimit (from sys/sysproto.h). Update linux_{s|g}etrlimit
so that the arguments to o{s|g}etrlimit are corresponding the prototypes.
Pointed out by: bde
apm_bioscall() to check requested BIOS is supported or not.
- Add workaround in apm_driver_version() for the buggy BIOSes which
don't return the connection version in %ax.
PR: i386/13028
Reviewed by: sanpei@sanpei.org and Warner Losh.
functions use the new sigset_t and sigaction_t which allows support for more
than 32 signals. Only the lower 32 signals are supported for now.
linux_rt_sigaction, linux_sigaction and linux_signal use linux_do_sigaction
to do the actual work. That way unnecessary redundancy is avoided. The same
has been done for linux_rt_sigprocmask and linux_sigprocmask. They call
linux_do_sigprocmask to do the actual work.
of kernel space. Remove the ioctl supporting functions, and move the actual
code to the switch-statement. Now everybody can clearly see that the
implementation is really poor.
Also fix a typo in LINUX_TIOCGETD. The underlying function was given command
TIOCSETD instead op TIOCGETD...
Introduce BUF_STRATEGY(struct buf *, int flag) macro, and use it throughout.
please see comment in sys/conf.h about the flag argument.
Remove strategy argument from all the diskslice/label/bad144
implementations, it should be found from the dev_t.
Remove bogus and unused strategy1 routines.
Remove open/close arguments from dssize(). Pick them up from dev_t.
Remove unused and unfinished setgeom support from diskslice/label/bad144 code.
changes. This is part 1 of the complete termios ioctl fixes.
o change type of c_{i|o|c|l}flag in struct termios from unsigned long to
unsigned int. The type now matches the Linux definitions.
o replaced constants by the corresponding defines in sptab[] for clarity.
Since there's no define for 135 baud, its mapping has been dropped.
function bsd_to_linux_termios:
o Fix typo IXON -> IXANY.
o Remove bogus assignment to c_cc[LINUX_VSWTC].
function linux_to_bsd_termios:
o Fix dupo LINUX_IXON -> LINUX_IXANY.
o Add LINUX_CREAD mapping.
o Fix typo IEXTEN -> LINUX_IEXTEN.
function linux_to_bsd_termio:
o Small optimization: Don't preset the complete c_cc array when we next
assign to the first LINUX_NCC entries.
in deterministic behaviour. In this case known garbage out.
The fix is different than suggested in the PR.
PR: 12749
Originator: Boris Nikolaus <boris@cs.tu-berlin.de>
The linux syscalls translate the arguments first before invoking the
FreeBSD native syscalls.
PR: kern/9591
Originator: John Plevyak <jplevyak@inktomi.com>
as PCI->HOST bridges on my (440BX) box.
My change is to remove the test at the beginning entirely, letting the
switch on the device ID happen first. If the device ID is unknown, then
(in the default case) check for the generic PCIS_BRIDGE_HOST tag. This
should allow wierd cases (eg: wpaul's IMS VL bridge) to work by using the
id override. This strategy is more in line with the other PCI match
methods we use elsewhere,
I only have a limited testbed, but having my USB etc devices detected as
PCI->HOST bridges doesn't look good.
correctly. It has the following code:
if (class != PCIC_BRIDGE || subclass != PCIS_BRIDGE_HOST)
return NULL;
My 486 has an Integrated Micro Solutions PCI bridge which identifies
itself as subclass PCIS_BRIDGE_OTHER, not PCIS_BRIDGE_HOST. Consequently,
it gets ignored. In my opinion, the correct test should be:
if ((class != PCIC_BRIDGE) && (subclass != PCIS_BRIDGE_HOST))
return NULL;
That way the test still succeeds because the chip's class is PCIC_BRIDGE.
Clearly it's not reasonable to expect all host to PCI bridges to always
have a subclass of PCIS_BRIDGE_HOST since I've got one that doesn't.
This way the sanity test should remain relatively sane while still allowing
some oddball yet correct hardware to work. If somebody has a better way
to do it, go ahead and tweak the test, but be aware that
class == PCIC_BRIDGE and subclass == PCIS_BRIDGE_OTHER is a valid case.
While I was here, I also added an explicit ID string for the IMS chipset.
I also dealt with a minor style nit: it's bad karma not to have a default
case for your switch statements, but the one in this routine doesn't have
one. The default string of "Host to PCI bridge" is now assigned in a
default case of the switch statement instead of initializing "s" with the
string before the switch and then not having any default case.
Config(8) contains no documentation about this.
Fix the help for the PnP irq and drq commands. This one caused
me a bit of head scratching the other night while trying to get
a problematic PnP device configured properly.
we create the pty on the fly when it is first opened.
If you run out of ptys now, just MAKEDEV some more.
This also demonstrate the use of dev_t->si_tty_tty and dev_t->si_drv1
in a device driver.
in the pathname translation procedure. This proves fatal, and can be
easily fixed. This or a similar change needs to be committed to svr4_util.h
and ibcs2_util.h. I will update ibcs2_util.h, if noone else thinks of a
better way to do this, in the same manner. I will leave svr4 to the
respective maintainer.
This closes the problem of the only crash I've been able to produce as
a user recently, except for (currently not-in-the-source tree) fd
table sharing fixes. Thanks goes to pho for his stress-testers.
of 2 weeks ago that this be done, and anyone who wishes to make bpf more
selective according to securelevel or compile-time options is more
than free to do so.
eisa_add_intr() which now takes an additional arguement (one of
EISA_TRIGGER_LEVEL or EISA_TRIGGER_EDGE).
The flag RR_SHAREABLE has no effect when passed to
bus_alloc_resource(dev, SYS_RES_IRQ, ...) in an EISA device context as
the eisa_alloc_resource() call (bus_alloc_resource method) now deals
with this flag directly, depending on the device ivars.
This change does nothing more than move all the 'shared = inb(foo + iobsse)'
nonesense to the device probe methods rather than the device attach.
Also, print out 'edge' or 'level' in the IRQ announcement message.
Reviewed by: dfr
- Add support for calling 32-bit code in other segments
- Add support for calling 16-bit protected mode code
Update APM to use this facility.
Submitted by: jlemon
- device_print_child() either lets the BUS_PRINT_CHILD
method produce the entire device announcement message or
it prints "foo0: not found\n"
Alter sys/kern/subr_bus.c:bus_generic_print_child() to take on
the previous behavior of device_print_child() (printing the
"foo0: <FooDevice 1.1>" bit of the announce message.)
Provide bus_print_child_header() and bus_print_child_footer()
to actually print the output for bus_generic_print_child().
These functions should be used whenever possible (unless you can
just use bus_generic_print_child())
The BUS_PRINT_CHILD method now returns int instead of void.
Modify everything else that defines or uses a BUS_PRINT_CHILD
method to comply with the above changes.
- Devices are 'on' a bus, not 'at' it.
- If a custom BUS_PRINT_CHILD method does the same thing
as bus_generic_print_child(), use bus_generic_print_child()
- Use device_get_nameunit() instead of both
device_get_name() and device_get_unit()
- All BUS_PRINT_CHILD methods return the number of
characters output.
Reviewed by: dfr, peter
active or not. The only sane thing we can do here is assume that if
APM is supported it might be active at some point, and bail.
In reality, even this isn't good enough; regardless of whether we support
APM or not, the system may well futz with the CPU's clock speed and throw
the TSC off. We need to stop using it for timekeeping except under
controlled circumstances. Curse the lack of a dependable high-resolution
timer.
equivalent to SYS_RES_MEMORY for x86 but for alpha, the rman_get_virtual()
address of the resource is initialised to point into either dense-mapped
or bwx-mapped space respectively, allowing direct memory pointers to be
used to device memory.
Reviewed by: Andrew Gallatin <gallatin@cs.duke.edu>
macros) to the signal handler, for old-style BSD signal handlers as
the second (int) argument, for SA_SIGINFO signal handlers as
siginfo_t->si_code. This is source-compatible with Solaris, except
that we have no <siginfo.h> (which isn't even mentioned in POSIX
1003.1b).
An rather complete example program is at
http://www3.cons.org/cracauer/freebsd-signal.c
This will be added to the regression tests in src/.
This commit also adds code to disable the (hardware) FPU from
userconfig, so that you can use a software FP emulator on a machine
that has hardware floating point. See LINT.
ethernet controllers based on the AIC-6915 "Starfire" controller chip.
There are single port, dual port and quad port cards, plus one 100baseFX
card. All are 64-bit PCI devices, except one single port model.
The Starfire would be a very nice chip were it not for the fact that
receive buffers have to be longword aligned. This requires buffer
copying in order to achieve proper payload alignment on the alpha.
Payload alignment is enforced on both the alpha and x86 platforms.
The Starfire has several different DMA descriptor formats and transfer
mechanisms. This driver uses frame descriptors for transmission which
can address up to 14 packet fragments, and a single fragment descriptor
for receive. It also uses the producer/consumer model and completion
queues for both transmit and receive. The transmit ring has 128
descriptors and the receive ring has 256.
This driver supports both FreeBSD/i386 and FreeBSD/alpha, and uses newbus
so that it can be compiled as a loadable kernel module. Support for BPF
and hardware multicast filtering is included.
Change "void *" to "volatile TYPE *", improving type safety
and eliminating some warnings (e.g., mp_machdep.c rev 1.106).
cpufunc.h:
Eliminate setbits. As defined, it's not precisely correct;
and it's redundant. (Use atomic_set_int instead.)
ipl_funcs.c:
Use atomic_set_int instead of setbits.
systm.h:
Include atomic.h.
Reviewed by: bde
When creating new processes (or performing exec), the new page
directory is initialized too early. The kernel might grow before
p_vmspace is initialized for the new process. Since pmap_growkernel
doesn't yet know about the new page directory, it isn't updated, and
subsequent use causes a failure.
The fix is (1) to clear p_vmspace early, to stop pmap_growkernel
from stomping on memory, and (2) to defer part of the initialization
of new page directories until p_vmspace is initialized.
PR: kern/12378
Submitted by: tegge
Reviewed by: dfr
The structure is the right length, but some of the members (notably
wi_q_info) were off a bit. This causes the received signal strength
values to appear bogus.
- Support for setting memory range attributes on SMP systems using the
new SMP rendezvous function
- Don't print the confusing default memory type message.
- Allow legal overlapping range types.
- Turn interrupts back on after setting MTRRs in UP mode (whoops)
- Don't waste time calling invltlb() after wbinvd(); it's not
SMP-compatible (interrupts are off) and unncessary because
wbinvd already flushes the TLB.
This code is now essentially feature-complete.
the caller to specify a function to be guarded between an entry and exit
barrier, as well as pre- and post-barrier functions.
The primary use for this function is synchronised update of per-cpu private
data. The implementation is almost (but not quite) MI; with a better
mechanism for masking per-CPU interrupts it could probably be hoisted.
Reviewed by: peter (partially)
but broken, since tsc_timecounter is not initialised in that case,
and updating an uninitialised timecounter is fatal.
Fixed style bugs in the machdep.i8254_freq and machdep.tsc_freq
sysctls.
Reviewed by: phk
with respect to interrupts on UP systems. (The upgrade from gcc 2.7.x
to egcs 1.1.2 produced at least one non-atomic code sequence in
swap_pager_getpages.)
In addition, the primitives are now SMP-safe, but only on SMPs. (For
portability between SMPs and UPs, modules are compiled with the SMP-safe
versions.)
Submitted by: dillon and myself
Reviewed by: bde
not masked during handling of shared PCI interrupts. This resulted in
ASTs sometimes being discarded and softclock interrupts sometimes being
handled prematurely (sometimes = quite often on systems with shared PCI
interrupts, never on other systems).
Debugged by: gibbs and other people at plutotech.com
PR: 6944, maybe 12381
gigabit ethernet adapters. This includes two single port cards
(single mode and multimode fiber) and two dual port cards (also single
mode and multimode fiber). SysKonnect is currently the only
vendor with a dual port gigabit ethernet NIC.
The ports on dual port adapters are treated as separate network
interfaces. Thus, if you have an SK-9844 dual port SX card, you
should have both sk0 and sk1 interfaces attached. Dual port cards
are implemented using two XMAC II chips connected to a single
SysKonnect GEnesis controller. Hence, dual port cards are really
one PCI device, as opposed to two separate PCI devices connected
through a PCI to PCI bridge. Note that SysKonnect's drivers use
the two ports for failover purposes rather that as two separate
interfaces, plus they don't support jumbo frames. This applies to
their Linux driver too. :)
Support is provided for hardware multicast filtering, BPF and
jumbo frames. The SysKonnect cards support TCP checksum offload
however this feature is not currently enabled (hopefully it will
be once we get checksum offload support).
There are still a few things that need to be implemeted, like
the ability to communicate with the on-board LM80 voltage/temperature
monitor, but I wanted to get the driver under CVS control and into
-current so people could bang on it.
A big thanks for SysKonnect for making all their programming info
for these cards (and for their FDDI and token ring cards) available
without NDA (see www.syskonnect.com).
large (1G) memory machine configurations. I was able to run 'dbench 32'
on a 32MB system without bring the machine to a grinding halt.
* buffer cache hash table now dynamically allocated. This will
have no effect on memory consumption for smaller systems and
will help scale the buffer cache for larger systems.
* minor enhancement to pmap_clearbit(). I noticed that
all the calls to it used constant arguments. Making
it an inline allows the constants to propogate to
deeper inlines and should produce better code.
* removal of inherent vfs_ioopt support through the emplacement
of appropriate #ifdef's, with John's permission. If we do not
find a use for it by the end of the year we will remove it entirely.
* removal of getnewbufloops* counters & sysctl's - no longer
necessary for debugging, getnewbuf() is now optimal.
* buffer hash table functions removed from sys/buf.h and localized
to vfs_bio.c
* VFS_BIO_NEED_DIRTYFLUSH flag and support code added
( bwillwrite() ), allowing processes to block when too many dirty
buffers are present in the system.
* removal of a softdep test in bdwrite() that is no longer necessary
now that bdwrite() no longer attempts to flush dirty buffers.
* slight optimization added to bqrelse() - there is no reason
to test for available buffer space on B_DELWRI buffers.
* addition of reverse-scanning code to vfs_bio_awrite().
vfs_bio_awrite() will attempt to locate clusterable areas
in both the forward and reverse direction relative to the
offset of the buffer passed to it. This will probably not
make much of a difference now, but I believe we will start
to rely on it heavily in the future if we decide to shift
some of the burden of the clustering closer to the actual
I/O initiation.
* Removal of the newbufcnt and lastnewbuf counters that Kirk
added. They do not fix any race conditions that haven't already
been fixed by the gbincore() test done after the only call
to getnewbuf(). getnewbuf() is a static, so there is no chance
of it being misused by other modules. ( Unless Kirk can think
of a specific thing that this code fixes. I went through it
very carefully and didn't see anything ).
* removal of VOP_ISLOCKED() check in flushbufqueues(). I do not
think this check is necessary, the buffer should flush properly
whether the vnode is locked or not. ( yes? ).
* removal of extra arguments passed to getnewbuf() that are not
necessary.
* missed cluster_wbuild() that had to be a cluster_wbuild_wb() in
vfs_cluster.c
* vn_write() now calls bwillwrite() *PRIOR* to locking the vnode,
which should greatly aid flushing operations in heavy load
situations - both the pageout and update daemons will be able
to operate more efficiently.
* removal of b_usecount. We may add it back in later but for now
it is useless. Prior implementations of the buffer cache never
had enough buffers for it to be useful, and current implementations
which make more buffers available might not benefit relative to
the amount of sophistication required to implement a b_usecount.
Straight LRU should work just as well, especially when most things
are VMIO backed. I expect that (even though John will not like
this assumption) directories will become VMIO backed some point soon.
Submitted by: Matthew Dillon <dillon@backplane.com>
Reviewed by: Kirk McKusick <mckusick@mckusick.com>
than a review, this was a nice puzzle.
This is supposed to be binary and source compatible with older
applications that access the old FreeBSD-style three arguments to a
signal handler.
Except those applications that access hidden signal handler arguments
bejond the documented third one. If you have applications that do,
please let me know so that we take the opportunity to provide the
functionality they need in a documented manner.
Also except application that use 'struct sigframe' directly. You need
to recompile gdb and doscmd. `make world` is recommended.
Example program that demonstrates how SA_SIGINFO and old-style FreeBSD
handlers (with their three args) may be used in the same process is at
http://www3.cons.org/tmp/fbsd-siginfo.c
Programs that use the old FreeBSD-style three arguments are easy to
change to SA_SIGINFO (although they don't need to, since the old style
will still work):
Old args to signal handler:
void handler_sn(int sig, int code, struct sigcontext *scp)
New args:
void handler_si(int sig, siginfo_t *si, void *third)
where:
old:code == new:second->si_code
old:scp == &(new:si->si_scp) /* Passed by value! */
The latter is also pointed to by new:third, but accessing via
si->si_scp is preferred because it is type-save.
FreeBSD implementation notes:
- This is just the framework to make the interface POSIX compatible.
For now, no additional functionality is provided. This is supposed
to happen now, starting with floating point values.
- We don't use 'sigcontext_t.si_value' for now (POSIX meant it for
realtime-related values).
- Documentation will be updated when new functionality is added and
the exact arguments passed are determined. The comments in
sys/signal.h are meant to be useful.
Reviewed by: BDE
into uipc_mbuf.c. This reduces three sets of identical tunable code to
one set, and puts the initialisation with the mbuf code proper.
Make NMBUFs tunable as well.
Move the nmbclusters sysctl here as well.
Move the initialisation of maxsockets from param.c to uipc_socket2.c,
next to its corresponding sysctl.
Use the new tunable macros for the kern.vm.kmem.size tunable (this should have
been in a separate commit, whoops).
print_AMD_info(), L2 internal cache is shown, as are AMD's special CPUID
infos:
CPU: AMD-K6(tm) 3D processor (350.81-MHz 586-class CPU)
Origin = "AuthenticAMD" Id = 0x58c Stepping=12
Features=0x8021bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,PGE,MMX>
AMD Features=0x808029bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,SYSCALL,PGE,MMX,3DNow!>
PR: kern/12512
Submitted by: Louis A. Mamakos <louie@TransSys.COM>
frames (or just insane received packet lengths generated due to errors
reading from the NIC's internal buffers). Anything too large to fit
safely into an mbuf cluster buffer is discarded and an error logged.
I have not observed this problem with my own cards, but on user has
reported it and adding the sanity test seems reasonable in any case.
Problem noted and patch provided by: Per Andersson <per@cdg.chalmers.se>
we will never use more memory than this value (if specified), but will always
check memory for validity up to this amount.
Get rid of the speculative_mprobe option; the memory amount can now be
specified by hw.physmem.
QUEUE_AGE, QUEUE_LRU, and QUEUE_EMPTY we instead have QUEUE_CLEAN,
QUEUE_DIRTY, QUEUE_EMPTY, and QUEUE_EMPTYKVA. With this patch clean
and dirty buffers have been separated. Empty buffers with KVM
assignments have been separated from truely empty buffers. getnewbuf()
has been rewritten and now operates in a 100% optimal fashion. That is,
it is able to find precisely the right kind of buffer it needs to
allocate a new buffer, defragment KVM, or to free-up an existing buffer
when the buffer cache is full (which is a steady-state situation for
the buffer cache).
Buffer flushing has been reorganized. Previously buffers were flushed
in the context of whatever process hit the conditions forcing buffer
flushing to occur. This resulted in processes blocking on conditions
unrelated to what they were doing. This also resulted in inappropriate
VFS stacking chains due to multiple processes getting stuck trying to
flush dirty buffers or due to a single process getting into a situation
where it might attempt to flush buffers recursively - a situation that
was only partially fixed in prior commits. We have added a new daemon
called the buf_daemon which is responsible for flushing dirty buffers
when the number of dirty buffers exceeds the vfs.hidirtybuffers limit.
This daemon attempts to dynamically adjust the rate at which dirty buffers
are flushed such that getnewbuf() calls (almost) never block.
The number of nbufs and amount of buffer space is now scaled past the
8MB limit that was previously imposed for systems with over 64MB of
memory, and the vfs.{lo,hi}dirtybuffers limits have been relaxed
somewhat. The number of physical buffers has been increased with the
intention that we will manage physical I/O differently in the future.
reassignbuf previously attempted to keep the dirtyblkhd list sorted which
could result in non-deterministic operation under certain conditions,
such as when a large number of dirty buffers are being managed. This
algorithm has been changed. reassignbuf now keeps buffers locally sorted
if it can do so cheaply, and otherwise gives up and adds buffers to
the head of the dirtyblkhd list. The new algorithm is deterministic but
not perfect. The new algorithm greatly reduces problems that previously
occured when write_behind was turned off in the system.
The P_FLSINPROG proc->p_flag bit has been replaced by the more descriptive
P_BUFEXHAUST bit. This bit allows processes working with filesystem
buffers to use available emergency reserves. Normal processes do not set
this bit and are not allowed to dig into emergency reserves. The purpose
of this bit is to avoid low-memory deadlocks.
A small race condition was fixed in getpbuf() in vm/vm_pager.c.
Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Kirk McKusick <mckusick@mckusick.com>
behavior slightly.
If machine/bus.h is included, but neither bus_memio.h nor bus_pio.h
are included, then behave as if both were included.
This won't change existing drivers, all of which include one or more
of bus_{p,mem}io.h, but will allow drivers from other systems to come
over with fewer changes. I freely admit that this might not be
optimal for some drivers, but those drivers can be optimized for
FreeBSD after the initial bringup happens.
Without the change, there is a bug that preclude drivers from
compiling with strange warning/errors.
I've been running this here for a while now w/o ill effects.
Reviewed by: gibbs
Not objected to by: bde, arch@ list.
On the VAX, it used to be used for special compilation to avoid the
optimizer which would mess with memory mapped devices etc. These days
we use 'volatile'.
- The kernel environment variable 'hw.physmem' can be used to set the
amount of physical memory space, based at 0, that FreeBSD will use.
Any memory detected over this limit is ignored. Documentation for
this is available under 'help set tunables' in the loader.
- In the case where system memory size can't be accurately determined,
hw.physmem is used as a best-guess memory size, but speculative
probing will be used to determine actual memory size if any of the
guesses or hints are 16M or more.
- If RB_VERBOSE, we list the memory regions as we test them.
- The compile-time option MAXMEM supplies a default value for
'hw.physmem'.
specified in the kernel config file - but setting options MAXMEM works
exactly the same. Userconfig overrides of this have not worked for
ages.
Also, change the getenv for the loader override to hw.physmem based on a
prior suggestion from Mike Smith. I think he still wants to change this
some, but this shouldn't get in his way. This is a forced setting of
the memory size, not a "cap". We probably should have a plain 'maxmem'
variable as well which does do a cap, without loosing the bios memory
configuration data.
memory size. If somebody wants to change the name, fine - I used this
since it's consistant with the config variable it replaces.
This is intended to replace the npx0 msize hack (which no longer works).
SYSINIT_KT() etc (which is a static, compile-time procedure), use a
NetBSD-style kthread_create() interface. kproc_start is still available
as a SYSINIT() hook. This allowed simplification of chunks of the
sysinit code in the process. This kthread_create() is our old kproc_start
internals, with the SYSINIT_KT fork hooks grafted in and tweaked to work
the same as the NetBSD one.
One thing I'd like to do shortly is get rid of nfsiod as a user initiated
process. It makes sense for the nfs client code to create them on the
fly as needed up to a user settable limit. This means that nfsiod
doesn't need to be in /sbin and is always "available". This is a fair bit
easier to do outside of the SYSINIT_KT() framework.
surrounding critical sections that consist of (1) a single read or
(2) a single locked RMW operation.
(Thanks to thomma@slip.net (Tamiji Homma) for helping to test
these changes.)