and unusable by the pccard system since pccard doesn't attach to the
nexus any more. This was stopping my 3c589D from working as pccard unit
0 is used directly for resource allocation and this fails when unit 0
isn't actually attached to anything.
In order to make this work, I created a pseudo-PHY driver to deal with
Macronix chips that use the built-in NWAY support and symbol mode port.
This is actually all of them, with the exception of the original MX98713
which presents its NWAY support via the MII serial interface.
The mxphy driver actually manipulates the controller registers directly
rather than using the miibus_readreg()/miibus_writereg() bus interface
since there are no MII registers to read. The mx driver itself pretends
that the NWAY interface is a PHY locayed at MII address 31 for the sole
purpose of allowing the mxphy_probe() routine to know when it needs to
attach to a host controller.
* Change the hack used on the alpha for mapping devices into DENSE or
BWX memory spaces to a simpler one. Its still a hack and should be
a seperate api to explicitly map the resource.
* Add $FreeBSD$ as necessary.
* Move pnp_eisaformat() to pnp.c, declared in <isa/pnpvar.h>.
* Turn the pnpbios code into an enumerator for the isa bus. This allows
all devices known to the bios to be probed automatically.
Currently the pnpbios code is dependant on the PNPBIOS option. As the code
is tested more and when more drivers are converted this will be made the
default. I have PnP changes in the wings for fdc, atkbd, psm, pcaudio, and
joy. Sio already works with pnpbios.
same interface as Intel's P6 family has. Incidentally, I had disabled
it in the first place since I knew the K7s were coming out soon but
did not want to assume they'd have the same MTRR interface as Intel's
chips.
Submitted by: Ville-Pertti Keinonen <will@iki.fi>
resource_list_release. This removes the dependancy on the
layout of ivars.
* Move set_resource, get_resource and delete_resource from
isa_if.m to bus_if.m.
* Simplify driver code by providing wrappers to those methods:
bus_set_resource(dev, type, rid, start, count);
bus_get_resource(dev, type, rid, startp, countp);
bus_get_resource_start(dev, type, rid);
bus_get_resource_count(dev, type, rid);
bus_delete_resource(dev, type, rid);
* Delete isa_get_rsrc and use bus_get_resource_start instead.
* Fix a stupid typo in isa_alloc_resource reported by Takahashi
Yoshihiro <nyan@FreeBSD.org>.
* Print a diagnostic message if we can't assign resources to a PnP
device.
* Change device_print_prettyname() so that it doesn't print
"(no driver assigned)-1" for anonymous devices.
can provide the correct context to each signal handler.
Fix broken sigsuspend(): don't use p_oldsigmask as a flag, use SAS_OLDMASK
as we did before the linuxthreads support merge (submitted by bde).
Move ps_sigstk from to p_sigacts to the main proc structure since signal
stack should not be shared among threads.
Move SAS_OLDMASK and SAS_ALTSTACK flags from sigacts::ps_flags to proc::p_flag.
Move PS_NOCLDSTOP and PS_NOCLDWAIT flags from proc::p_flag to procsig::ps_flag.
Reviewed by: marcel, jdp, bde
o Remove unused defines from genassym.c that were needed
by the trampoline.
o Add load_gs_param function to support.s that catches
a fault when %gs is loaded with an invalid descriptor.
The function returns EFAULT in that case.
o Remove struct trapframe from mcontext_t and replace it
with the list of registers.
o Modify sendsig and sigreturn accordingly.
This commit contains a patch by bde.
Reviewed by: luoqi, jdp
struct sigcontext and ucontext_t/mcontext_t are defined in such
a way that both (ie struct sigcontext and ucontext_t) can be
passed on to sigreturn. The signal handler is still given a
ucontext_t for maximum flexibility.
For backward compatibility sigreturn restores the state for the
alternate signal stack from sigcontext.sc_onstack and not from
ucontext_t.uc_stack. A good way to determine which value the
application has set and thus which value to use, is still open
for discussion.
NOTE: This change should only affect those binaries that use
sigcontext and/or ucontext_t. In the source tree itself
this is only doscmd. Recompilation is required for those
applications.
This commit also fixes a lot of style bugs without hopefully
adding new ones.
NOTE: struct sigaltstack.ss_size now has type size_t again. For
some reason I changed that into unsigned int.
Parts submitted by: bde
sigaltstack bug found by: bde
-----------------------------
By introducing a new sigframe so that the signal handler operates
on the new siginfo_t and on ucontext_t instead of sigcontext, we
now need two version of sendsig and sigreturn.
A flag in struct proc determines whether the process expects an
old sigframe or a new sigframe. The signal trampoline handles
which sigreturn to call. It does this by testing for a magic
cookie in the frame.
The alpha uses osigreturn to implement longjmp. This means that
osigreturn is not only used for compatibility with existing
binaries. To handle the new sigset_t, setjmp saves it in
sc_reserved (see NOTE).
the struct sigframe has been moved from frame.h to sigframe.h
to handle the complex header dependencies that was caused by
the new sigframe.
NOTE: For the i386, the size of jmp_buf has been increased to hold
the new sigset_t. On the alpha this has been prevented by
using sc_reserved in sigcontext.
have been there in the first place. A GENERIC kernel shrinks almost 1k.
Add a slightly different safetybelt under nostop for tty drivers.
Add some missing FreeBSD tags
for the AN985 "Centaur" chip, which is apparently the next genetation
of the "Comet." The AN985 is also a tulip clone and is similar to the
AL981 except that it uses a 99C66 EEPROM and a serial MII interface
(instead of direct access to the PHY registers).
Also updated various documentation to mention the AN985 and created
a loadable module.
I don't think there are any cards that use this chip on the market yet:
the datasheet I got from ADMtek has boxes with big X's in them where the
diagrams should be, and the sample boards I got have chips without any
artwork on them.
spaces which cross a segment boundry in the page table. pmap_kextract()
is not designed for access to the user space portion of the page
table and cannot handle the null-page-directory-entry case.
The fix is to have vm_fault_quick() return a success or failure which
is then used to avoid calling pmap_kextract().
random-seekable devices. This lets dd(1) know it can seek on them. It
also affects spec_vnopen() (IIRC), but only makes the path of execution smaller,
and does not change its behavior. This is when securelevel >= 2.
the OS does FXSAVE/FXRESTOR instructions (fast FPU save/restore) during
context switching and also enables SIMD since this enables saving the
extra CPU context that isn't saved with normal FPU regs. The other
enables the SIMD instructions to use exception 16 (FPU) error reporting.
Note, this doesn't turn on SIMD, just defines the bits.
return (in signal trampoline code). I plan to do the same on -stable,
so that we have a consistent interface to userland applications.
Reviewed by: bde
the Davicom DM9100 and DM9102 chipsets, including the Jaton Corporation
XPressNet. Datasheet is available from www.davicom8.com.
The DM910x chips are still more tulip clones. The API is reproduced
pretty faithfully, unfortunately the performance is pretty bad. The
transmitter seems to have a lot of problems DMAing multi-fragment
packets. The only way to make it work reliably is to coalesce transmitted
packets into a single contiguous buffer. The Linux driver (written by
Davicom) actually does something similar to this. I can't recomment this
NIC as anything more than a "connectivity solution."
This driver uses newbus and miibus and is supported on both i386
and alpha platforms.
SiS 900 and SiS 7016 PCI fast ethernet chipsets. Full manuals for the
SiS chips can be found at www.sis.com.tw.
This is a fairly simple chipset. The receiver uses a 128-bit multicast
hash table and single perfect entry for the station address. Transmit and
receive DMA and FIFO thresholds are easily tuneable. Documentation is
pretty decent and performance is not bad, even on my crufty 486. This
driver uses newbus and miibus and is supported on both the i386 and
alpha architectures.
1. Move definitions of struct i386_*_args to the header file sysarch.h,
since they are part of the sysarch API. struct i386_get_ldt_args and
i386_set_ldt_args were identical, therefore make them into one
struct i386_ldt_args. Libc should use these definitions as well.
2. Return a more sensible EOPNOTSUPP for unknown operations.
Reviewed by: marcel
new system is integrated with the ISA bus code more cleanly and allows
the future addition of more enumerators such as PnPBIOS and ACPI.
This commit also enables the new pcm driver since it is somewhat tied to
the new PnP code.
an operation, as a kernel client may not have previously checked the CPU
type (it may not be able to).
Also correct the function declaration style for the mem_range functions to
match the rest of this file (oops).
Submitted by: gibbs
stuff from mem.c. If PERFMON is there, it will "steal" a minor from
mem.c, but mem.c doesn't need to know about this.
Fixed type of cmd argument in perfmon_ioctl().
Diskslice/label code not yet handled.
Vinum, i4b, alpha, pc98 not dealt with (left to respective Maintainers)
Add the correct hook for devfs to kern_conf.c
The net result of this excercise is that a lot less files depends on DEVFS,
and devtoname() gets more sensible output in many cases.
A few drivers had minor additional cleanups performed relating to cdevsw
registration.
A few drivers don't register a cdevsw{} anymore, but only use make_dev().
here any more as they are self identifying. Only PNP remains but that
will be replaced any day now.
Also reword a comment that had been XXX'ed to death to make it clear[er]
why we don't enable interrupts before probing.
PCIBIOS interrupt routing controls may make this possible to fix one day.
than having explicit hooks here.
Treat the eisa/isa attach a little differently so that we defer the
decision about to attach eisa/isa to the motherboard directly only if
the PCI probe (if it exists) fails to turn up a PCI->EISA/ISA bridge.
This restores the original device geometry where ISA and/or EISA attach
to their bridge rather than bypassing and going to the root.
PCI fast ethernet controller. Currently, the only card I know that uses
this chip is the D-Link DFE-550TX. (Don't ask me where to buy these: the
only cards I have are samples sent to me by D-Link.)
This driver is the first to make use of the miibus code once I'm sure
it all works together nicely, I'll start converting the other drivers.
The Sundance chip is a clone of the 3Com 3c90x Etherlink XL design
only with its own register layout. Support is provided for ifmedia,
hardware multicast filtering, bridging and promiscuous mode.
trying to size it intelligently just make it 64k and leave it up to the caller
to ensure that the arguments all fit within that range.
This should resolve the issue that some people were seeing with the PnP BIOS
scan crashing on a large PnP node.
test does not change undefined flag like Cyrix CPUs. Another is that
5/2 test changes undefined flag like Intel CPUs. Latter one could not
be detected and was recognized 486DX CPU. To solve this,
finishidentcpu() calls identblue() when cpu_vendor is null string
(that is, CPUID instruction is not supported) and cpu == CPU_486.
Tests have been done on IBM BlueLightning CPUs, i486SX and i486DX.
into two parts - one to do the bsfl and the other to convert the result
(base 0) to ffs()-like (base 1) in inline C. This enables the optimizer
to be a lot smarter in certain cases, like where it knows that the argument
is non-zero and we want ffs(known non zero arg) - 1. This appears to
produce identical code to the old inline when the argument is unknown.
that are linked into the kernel. The KLD compilation options are
changed to call these functions, rather than in-lining the
atomic operations.
This approach makes atomic operations from KLDs significantly
faster on UP systems (though somewhat slower on SMP systems).
PR: i386/13111
Submitted by: peter.jeremy@alcatel.com.au
0x40 and then access data stored in real-mode segment 0x40, even when
called in protected mode. Microsoft unfortunately coddle these individuals,
and so must we if we want to run their code.
This change works around GPFs in some APM and PnP BIOS implementations.
Obtained from: Linux
as PCI->HOST bridges on my (440BX) box.
My change is to remove the test at the beginning entirely, letting the
switch on the device ID happen first. If the device ID is unknown, then
(in the default case) check for the generic PCIS_BRIDGE_HOST tag. This
should allow wierd cases (eg: wpaul's IMS VL bridge) to work by using the
id override. This strategy is more in line with the other PCI match
methods we use elsewhere,
I only have a limited testbed, but having my USB etc devices detected as
PCI->HOST bridges doesn't look good.
correctly. It has the following code:
if (class != PCIC_BRIDGE || subclass != PCIS_BRIDGE_HOST)
return NULL;
My 486 has an Integrated Micro Solutions PCI bridge which identifies
itself as subclass PCIS_BRIDGE_OTHER, not PCIS_BRIDGE_HOST. Consequently,
it gets ignored. In my opinion, the correct test should be:
if ((class != PCIC_BRIDGE) && (subclass != PCIS_BRIDGE_HOST))
return NULL;
That way the test still succeeds because the chip's class is PCIC_BRIDGE.
Clearly it's not reasonable to expect all host to PCI bridges to always
have a subclass of PCIS_BRIDGE_HOST since I've got one that doesn't.
This way the sanity test should remain relatively sane while still allowing
some oddball yet correct hardware to work. If somebody has a better way
to do it, go ahead and tweak the test, but be aware that
class == PCIC_BRIDGE and subclass == PCIS_BRIDGE_OTHER is a valid case.
While I was here, I also added an explicit ID string for the IMS chipset.
I also dealt with a minor style nit: it's bad karma not to have a default
case for your switch statements, but the one in this routine doesn't have
one. The default string of "Host to PCI bridge" is now assigned in a
default case of the switch statement instead of initializing "s" with the
string before the switch and then not having any default case.
we create the pty on the fly when it is first opened.
If you run out of ptys now, just MAKEDEV some more.
This also demonstrate the use of dev_t->si_tty_tty and dev_t->si_drv1
in a device driver.
of 2 weeks ago that this be done, and anyone who wishes to make bpf more
selective according to securelevel or compile-time options is more
than free to do so.
- Add support for calling 32-bit code in other segments
- Add support for calling 16-bit protected mode code
Update APM to use this facility.
Submitted by: jlemon
- device_print_child() either lets the BUS_PRINT_CHILD
method produce the entire device announcement message or
it prints "foo0: not found\n"
Alter sys/kern/subr_bus.c:bus_generic_print_child() to take on
the previous behavior of device_print_child() (printing the
"foo0: <FooDevice 1.1>" bit of the announce message.)
Provide bus_print_child_header() and bus_print_child_footer()
to actually print the output for bus_generic_print_child().
These functions should be used whenever possible (unless you can
just use bus_generic_print_child())
The BUS_PRINT_CHILD method now returns int instead of void.
Modify everything else that defines or uses a BUS_PRINT_CHILD
method to comply with the above changes.
- Devices are 'on' a bus, not 'at' it.
- If a custom BUS_PRINT_CHILD method does the same thing
as bus_generic_print_child(), use bus_generic_print_child()
- Use device_get_nameunit() instead of both
device_get_name() and device_get_unit()
- All BUS_PRINT_CHILD methods return the number of
characters output.
Reviewed by: dfr, peter
active or not. The only sane thing we can do here is assume that if
APM is supported it might be active at some point, and bail.
In reality, even this isn't good enough; regardless of whether we support
APM or not, the system may well futz with the CPU's clock speed and throw
the TSC off. We need to stop using it for timekeeping except under
controlled circumstances. Curse the lack of a dependable high-resolution
timer.
equivalent to SYS_RES_MEMORY for x86 but for alpha, the rman_get_virtual()
address of the resource is initialised to point into either dense-mapped
or bwx-mapped space respectively, allowing direct memory pointers to be
used to device memory.
Reviewed by: Andrew Gallatin <gallatin@cs.duke.edu>
macros) to the signal handler, for old-style BSD signal handlers as
the second (int) argument, for SA_SIGINFO signal handlers as
siginfo_t->si_code. This is source-compatible with Solaris, except
that we have no <siginfo.h> (which isn't even mentioned in POSIX
1003.1b).
An rather complete example program is at
http://www3.cons.org/cracauer/freebsd-signal.c
This will be added to the regression tests in src/.
This commit also adds code to disable the (hardware) FPU from
userconfig, so that you can use a software FP emulator on a machine
that has hardware floating point. See LINT.
ethernet controllers based on the AIC-6915 "Starfire" controller chip.
There are single port, dual port and quad port cards, plus one 100baseFX
card. All are 64-bit PCI devices, except one single port model.
The Starfire would be a very nice chip were it not for the fact that
receive buffers have to be longword aligned. This requires buffer
copying in order to achieve proper payload alignment on the alpha.
Payload alignment is enforced on both the alpha and x86 platforms.
The Starfire has several different DMA descriptor formats and transfer
mechanisms. This driver uses frame descriptors for transmission which
can address up to 14 packet fragments, and a single fragment descriptor
for receive. It also uses the producer/consumer model and completion
queues for both transmit and receive. The transmit ring has 128
descriptors and the receive ring has 256.
This driver supports both FreeBSD/i386 and FreeBSD/alpha, and uses newbus
so that it can be compiled as a loadable kernel module. Support for BPF
and hardware multicast filtering is included.
Change "void *" to "volatile TYPE *", improving type safety
and eliminating some warnings (e.g., mp_machdep.c rev 1.106).
cpufunc.h:
Eliminate setbits. As defined, it's not precisely correct;
and it's redundant. (Use atomic_set_int instead.)
ipl_funcs.c:
Use atomic_set_int instead of setbits.
systm.h:
Include atomic.h.
Reviewed by: bde
When creating new processes (or performing exec), the new page
directory is initialized too early. The kernel might grow before
p_vmspace is initialized for the new process. Since pmap_growkernel
doesn't yet know about the new page directory, it isn't updated, and
subsequent use causes a failure.
The fix is (1) to clear p_vmspace early, to stop pmap_growkernel
from stomping on memory, and (2) to defer part of the initialization
of new page directories until p_vmspace is initialized.
PR: kern/12378
Submitted by: tegge
Reviewed by: dfr
- Support for setting memory range attributes on SMP systems using the
new SMP rendezvous function
- Don't print the confusing default memory type message.
- Allow legal overlapping range types.
- Turn interrupts back on after setting MTRRs in UP mode (whoops)
- Don't waste time calling invltlb() after wbinvd(); it's not
SMP-compatible (interrupts are off) and unncessary because
wbinvd already flushes the TLB.
This code is now essentially feature-complete.
the caller to specify a function to be guarded between an entry and exit
barrier, as well as pre- and post-barrier functions.
The primary use for this function is synchronised update of per-cpu private
data. The implementation is almost (but not quite) MI; with a better
mechanism for masking per-CPU interrupts it could probably be hoisted.
Reviewed by: peter (partially)
but broken, since tsc_timecounter is not initialised in that case,
and updating an uninitialised timecounter is fatal.
Fixed style bugs in the machdep.i8254_freq and machdep.tsc_freq
sysctls.
Reviewed by: phk
with respect to interrupts on UP systems. (The upgrade from gcc 2.7.x
to egcs 1.1.2 produced at least one non-atomic code sequence in
swap_pager_getpages.)
In addition, the primitives are now SMP-safe, but only on SMPs. (For
portability between SMPs and UPs, modules are compiled with the SMP-safe
versions.)
Submitted by: dillon and myself
Reviewed by: bde
not masked during handling of shared PCI interrupts. This resulted in
ASTs sometimes being discarded and softclock interrupts sometimes being
handled prematurely (sometimes = quite often on systems with shared PCI
interrupts, never on other systems).
Debugged by: gibbs and other people at plutotech.com
PR: 6944, maybe 12381
large (1G) memory machine configurations. I was able to run 'dbench 32'
on a 32MB system without bring the machine to a grinding halt.
* buffer cache hash table now dynamically allocated. This will
have no effect on memory consumption for smaller systems and
will help scale the buffer cache for larger systems.
* minor enhancement to pmap_clearbit(). I noticed that
all the calls to it used constant arguments. Making
it an inline allows the constants to propogate to
deeper inlines and should produce better code.
* removal of inherent vfs_ioopt support through the emplacement
of appropriate #ifdef's, with John's permission. If we do not
find a use for it by the end of the year we will remove it entirely.
* removal of getnewbufloops* counters & sysctl's - no longer
necessary for debugging, getnewbuf() is now optimal.
* buffer hash table functions removed from sys/buf.h and localized
to vfs_bio.c
* VFS_BIO_NEED_DIRTYFLUSH flag and support code added
( bwillwrite() ), allowing processes to block when too many dirty
buffers are present in the system.
* removal of a softdep test in bdwrite() that is no longer necessary
now that bdwrite() no longer attempts to flush dirty buffers.
* slight optimization added to bqrelse() - there is no reason
to test for available buffer space on B_DELWRI buffers.
* addition of reverse-scanning code to vfs_bio_awrite().
vfs_bio_awrite() will attempt to locate clusterable areas
in both the forward and reverse direction relative to the
offset of the buffer passed to it. This will probably not
make much of a difference now, but I believe we will start
to rely on it heavily in the future if we decide to shift
some of the burden of the clustering closer to the actual
I/O initiation.
* Removal of the newbufcnt and lastnewbuf counters that Kirk
added. They do not fix any race conditions that haven't already
been fixed by the gbincore() test done after the only call
to getnewbuf(). getnewbuf() is a static, so there is no chance
of it being misused by other modules. ( Unless Kirk can think
of a specific thing that this code fixes. I went through it
very carefully and didn't see anything ).
* removal of VOP_ISLOCKED() check in flushbufqueues(). I do not
think this check is necessary, the buffer should flush properly
whether the vnode is locked or not. ( yes? ).
* removal of extra arguments passed to getnewbuf() that are not
necessary.
* missed cluster_wbuild() that had to be a cluster_wbuild_wb() in
vfs_cluster.c
* vn_write() now calls bwillwrite() *PRIOR* to locking the vnode,
which should greatly aid flushing operations in heavy load
situations - both the pageout and update daemons will be able
to operate more efficiently.
* removal of b_usecount. We may add it back in later but for now
it is useless. Prior implementations of the buffer cache never
had enough buffers for it to be useful, and current implementations
which make more buffers available might not benefit relative to
the amount of sophistication required to implement a b_usecount.
Straight LRU should work just as well, especially when most things
are VMIO backed. I expect that (even though John will not like
this assumption) directories will become VMIO backed some point soon.
Submitted by: Matthew Dillon <dillon@backplane.com>
Reviewed by: Kirk McKusick <mckusick@mckusick.com>
than a review, this was a nice puzzle.
This is supposed to be binary and source compatible with older
applications that access the old FreeBSD-style three arguments to a
signal handler.
Except those applications that access hidden signal handler arguments
bejond the documented third one. If you have applications that do,
please let me know so that we take the opportunity to provide the
functionality they need in a documented manner.
Also except application that use 'struct sigframe' directly. You need
to recompile gdb and doscmd. `make world` is recommended.
Example program that demonstrates how SA_SIGINFO and old-style FreeBSD
handlers (with their three args) may be used in the same process is at
http://www3.cons.org/tmp/fbsd-siginfo.c
Programs that use the old FreeBSD-style three arguments are easy to
change to SA_SIGINFO (although they don't need to, since the old style
will still work):
Old args to signal handler:
void handler_sn(int sig, int code, struct sigcontext *scp)
New args:
void handler_si(int sig, siginfo_t *si, void *third)
where:
old:code == new:second->si_code
old:scp == &(new:si->si_scp) /* Passed by value! */
The latter is also pointed to by new:third, but accessing via
si->si_scp is preferred because it is type-save.
FreeBSD implementation notes:
- This is just the framework to make the interface POSIX compatible.
For now, no additional functionality is provided. This is supposed
to happen now, starting with floating point values.
- We don't use 'sigcontext_t.si_value' for now (POSIX meant it for
realtime-related values).
- Documentation will be updated when new functionality is added and
the exact arguments passed are determined. The comments in
sys/signal.h are meant to be useful.
Reviewed by: BDE
into uipc_mbuf.c. This reduces three sets of identical tunable code to
one set, and puts the initialisation with the mbuf code proper.
Make NMBUFs tunable as well.
Move the nmbclusters sysctl here as well.
Move the initialisation of maxsockets from param.c to uipc_socket2.c,
next to its corresponding sysctl.
Use the new tunable macros for the kern.vm.kmem.size tunable (this should have
been in a separate commit, whoops).
print_AMD_info(), L2 internal cache is shown, as are AMD's special CPUID
infos:
CPU: AMD-K6(tm) 3D processor (350.81-MHz 586-class CPU)
Origin = "AuthenticAMD" Id = 0x58c Stepping=12
Features=0x8021bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,PGE,MMX>
AMD Features=0x808029bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,SYSCALL,PGE,MMX,3DNow!>
PR: kern/12512
Submitted by: Louis A. Mamakos <louie@TransSys.COM>
we will never use more memory than this value (if specified), but will always
check memory for validity up to this amount.
Get rid of the speculative_mprobe option; the memory amount can now be
specified by hw.physmem.
QUEUE_AGE, QUEUE_LRU, and QUEUE_EMPTY we instead have QUEUE_CLEAN,
QUEUE_DIRTY, QUEUE_EMPTY, and QUEUE_EMPTYKVA. With this patch clean
and dirty buffers have been separated. Empty buffers with KVM
assignments have been separated from truely empty buffers. getnewbuf()
has been rewritten and now operates in a 100% optimal fashion. That is,
it is able to find precisely the right kind of buffer it needs to
allocate a new buffer, defragment KVM, or to free-up an existing buffer
when the buffer cache is full (which is a steady-state situation for
the buffer cache).
Buffer flushing has been reorganized. Previously buffers were flushed
in the context of whatever process hit the conditions forcing buffer
flushing to occur. This resulted in processes blocking on conditions
unrelated to what they were doing. This also resulted in inappropriate
VFS stacking chains due to multiple processes getting stuck trying to
flush dirty buffers or due to a single process getting into a situation
where it might attempt to flush buffers recursively - a situation that
was only partially fixed in prior commits. We have added a new daemon
called the buf_daemon which is responsible for flushing dirty buffers
when the number of dirty buffers exceeds the vfs.hidirtybuffers limit.
This daemon attempts to dynamically adjust the rate at which dirty buffers
are flushed such that getnewbuf() calls (almost) never block.
The number of nbufs and amount of buffer space is now scaled past the
8MB limit that was previously imposed for systems with over 64MB of
memory, and the vfs.{lo,hi}dirtybuffers limits have been relaxed
somewhat. The number of physical buffers has been increased with the
intention that we will manage physical I/O differently in the future.
reassignbuf previously attempted to keep the dirtyblkhd list sorted which
could result in non-deterministic operation under certain conditions,
such as when a large number of dirty buffers are being managed. This
algorithm has been changed. reassignbuf now keeps buffers locally sorted
if it can do so cheaply, and otherwise gives up and adds buffers to
the head of the dirtyblkhd list. The new algorithm is deterministic but
not perfect. The new algorithm greatly reduces problems that previously
occured when write_behind was turned off in the system.
The P_FLSINPROG proc->p_flag bit has been replaced by the more descriptive
P_BUFEXHAUST bit. This bit allows processes working with filesystem
buffers to use available emergency reserves. Normal processes do not set
this bit and are not allowed to dig into emergency reserves. The purpose
of this bit is to avoid low-memory deadlocks.
A small race condition was fixed in getpbuf() in vm/vm_pager.c.
Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Kirk McKusick <mckusick@mckusick.com>
behavior slightly.
If machine/bus.h is included, but neither bus_memio.h nor bus_pio.h
are included, then behave as if both were included.
This won't change existing drivers, all of which include one or more
of bus_{p,mem}io.h, but will allow drivers from other systems to come
over with fewer changes. I freely admit that this might not be
optimal for some drivers, but those drivers can be optimized for
FreeBSD after the initial bringup happens.
Without the change, there is a bug that preclude drivers from
compiling with strange warning/errors.
I've been running this here for a while now w/o ill effects.
Reviewed by: gibbs
Not objected to by: bde, arch@ list.
- The kernel environment variable 'hw.physmem' can be used to set the
amount of physical memory space, based at 0, that FreeBSD will use.
Any memory detected over this limit is ignored. Documentation for
this is available under 'help set tunables' in the loader.
- In the case where system memory size can't be accurately determined,
hw.physmem is used as a best-guess memory size, but speculative
probing will be used to determine actual memory size if any of the
guesses or hints are 16M or more.
- If RB_VERBOSE, we list the memory regions as we test them.
- The compile-time option MAXMEM supplies a default value for
'hw.physmem'.
specified in the kernel config file - but setting options MAXMEM works
exactly the same. Userconfig overrides of this have not worked for
ages.
Also, change the getenv for the loader override to hw.physmem based on a
prior suggestion from Mike Smith. I think he still wants to change this
some, but this shouldn't get in his way. This is a forced setting of
the memory size, not a "cap". We probably should have a plain 'maxmem'
variable as well which does do a cap, without loosing the bios memory
configuration data.
memory size. If somebody wants to change the name, fine - I used this
since it's consistant with the config variable it replaces.
This is intended to replace the npx0 msize hack (which no longer works).
SYSINIT_KT() etc (which is a static, compile-time procedure), use a
NetBSD-style kthread_create() interface. kproc_start is still available
as a SYSINIT() hook. This allowed simplification of chunks of the
sysinit code in the process. This kthread_create() is our old kproc_start
internals, with the SYSINIT_KT fork hooks grafted in and tweaked to work
the same as the NetBSD one.
One thing I'd like to do shortly is get rid of nfsiod as a user initiated
process. It makes sense for the nfs client code to create them on the
fly as needed up to a user settable limit. This means that nfsiod
doesn't need to be in /sbin and is always "available". This is a fair bit
easier to do outside of the SYSINIT_KT() framework.
1. Rise is recognized in identdcpu.c.
2. The TSC is not written to. A workaround for the CPU bug is being
applied to clock.c (the bug being that the mP6 has TSC enabled
in its CPUID-capabilities, but it only supports reading it. If we
try to write to it (MSR 16), a GPF occurs.) The new behavior is that
FreeBSD will _not_ zero the TSC. Instead, we do a bit of 64-bit
arithmetic.
Reviewed by: msmith
Obtained from: unfurl & msmith
sure that i686_mem was only used when
1. CPUID had MTRR set (this was there before)
2. the CPU was GenuineIntel (not there)
3. the CPU is a 686 (also not there)
This should prevent any problems with CPUs that set MTRR but aren't
compatibile with Intel's interface (none that I know of yet.)
automatically hacks on the active copy of the IDT if f00f_hack()
has changed it. This also allows simplifications in setidt().
This fixes breakage of FP exception handling by rev.1.55 of
sys/kernel.h. FP exceptions were sent to npx.c's probe handlers
because npx.c "restored" the old handlers to the wrong copy of the
IDT. The SYSINIT for f00f_hack() was purposely run quite late to
avoid problems like this, but it is bogusly associated with the
SYSINIT for proc0 so it was moved with the latter.
Problem reported and fix tested by: Martin Cracauer <cracauer@cons.org>
The cdevsw_add() function now finds the major number(s) in the
struct cdevsw passed to it. cdevsw_add_generic() is no longer
needed, cdevsw_add() does the same thing.
cdevsw_add() will print an message if the d_maj field looks bogus.
Remove nblkdev and nchrdev variables. Most places they were used
bogusly. Instead check a dev_t for validity by seeing if devsw()
or bdevsw() returns NULL.
Move bdevsw() and devsw() functions to kern/kern_conf.c
Bump __FreeBSD_version to 400006
This commit removes:
72 bogus makedev() calls
26 bogus SYSINIT functions
if_xe.c bogusly accessed cdevsw[], author/maintainer please fix.
I4b and vinum not changed. Patches emailed to authors. LINT
probably broken until they catch up.
Reformat and initialize correctly all "struct cdevsw".
Initialize the d_maj and d_bmaj fields.
The d_reset field was not removed, although it is never used.
I used a program to do most of this, so all the files now use the
same consistent format. Please keep it that way.
Vinum and i4b not modified, patches emailed to respective authors.
The old version only worked right when the time was read strictly
more often than every 1/HZ seconds, but we only guarantee reading
it every (1/HZ + epsilon) seconds. Part of rev.1.126-1.127 attempted
to fix this but didn't succeed. Detect counter rollover using the
heuristic from the old version of microtime() with additional
complications for supporting calls from fast interrupt handlers.
This works provided i8254 interrupts are not delayed by more than
1/(2*HZ) seconds.
This needs more comments, and cleanups for the SMP case, and more
testing of the SMP case before it is merged into RELENG_3.
Tested by: jhay
on systems with no FFS.
- Remove all references to mfs from cpu_rootconf(). mfs_init is
called prior to cpu_rootconf(), so it can set mountrootfsname to mfs
and (more imporantly) set rootdev using the (bogus in Bruce's opinion)
special major number of 255.
* Re-work the resource allocation code to use helper functions in subr_bus.c.
* Add simple isa interface for manipulating the resource ranges which can be
allocated and remove the code from isa_write_ivar() which was previously
used for this purpose.
ADMtek AL981 "Comet" chipset. The AL981 is yet another DEC tulip clone,
except with simpler receive filter options. The AL981 has a built-in
transceiver, power management support, wake on LAN and flow control.
This chip performs extremely well; it's on par with the ASIX chipset
in terms of speed, which is pretty good (it can do 11.5MB/sec with TCP
easily).
I would have committed this driver sooner, except I ran into one problem
with the AL981 that required a workaround. When the chip is transmitting
at full speed, it will sometimes wedge if you queue a series of packets
that wrap from the end of the transmit descriptor list back to the
beginning. I can't explain why this happens, and none of the other tulip
clones behave this way. The workaround this is to just watch for the end
of the transmit ring and make sure that al_start() breaks out of its
packet queuing loop and waiting until the current batch of transmissions
completes before wrapping back to the start of the ring. Fortunately, this
does not significantly impact transmit performance.
This is one of those things that takes weeks of analysis just to come
up with two or three lines of code changes.