by myself. It solves a serious vm_map corruption problem that can occur
with the buffer cache when block sizes > 64K are used. This code has been
heavily tested in -stable but only tested somewhat on -current. An MFC
will occur in a few days. My additions include the vm_map_simplify_entry()
and minor buffer cache boundry case fix.
Make the buffer cache use a system map for buffer cache KVM rather then a
normal map.
Ensure that VM objects are not allocated for system maps. There were cases
where a buffer map could wind up with a backing VM object -- normally
harmless, but this could also result in the buffer cache blocking in places
where it assumes no blocking will occur, possibly resulting in corrupted
maps.
Fix a minor boundry case in the buffer cache size limit is reached that
could result in non-optimal code.
Add vm_map_simplify_entry() calls to prevent 'creeping proliferation'
of vm_map_entry's in the buffer cache's vm_map. Previously only a simple
linear optimization was made. (The buffer vm_map typically has only a
handful of vm_map_entry's. This stabilizes it at that level permanently).
PR: 20609
Submitted by: (Tor Egge) tegge
- If possible, context switch to the thread directly in sched_ithd(),
rather than triggering a delayed ast reschedule.
- Disable interrupts while restoring fpu state in the trap handler,
in order to ensure that we are not preempted in the middle, which
could cause migration to another cpu.
Reviewed by: peter
Tested by: peter (alpha)
to 1GB. A box of mine is running with MAXDSIZ and DFLDSIZ increased
up to 1.5GB.
Wishlist: It would be nice to warn if MAXTSIZ + MAXDSIZ + MAXSSIZ
exceeds VM_MAXUSER_ADDRESS - VM_MINUSER_ADDRESS.
incompletely converting simplelocks to mutexes (COM_LOCK() is supposed
to hide the SMP locking internals, but it now depends on mutex interfaces
being visible).
problem is that a mutex lock, prior to this change, is acquired before
the curproc is set to idleproc, so we mess ourselves up by calling
the mutex lock routine with curproc == NULL.
Moving it up after the aps_ready spin-wait has us hopefully setting it
after idleproc is setup.
Solved by: jake (the allmighty) :-)
instead of a trapframe directly. (Requested by bde.)
- Convert the alpha switch_trampoline to call fork_exit() and use the MI
fork_return() instead of child_return().
- Axe child_return().
that name as a variable. Use mtx_owned(&Giant) where appropriate
instead.
- Proc locking.
- P_FOO -> PS_FOO.
- Update comments about enable interrupts during trap and why this may be
bad if we trap while holding a spin mutex.
- Don't bother resetting p to curproc in syscall() in case we are the child
returning from fork. The child hasn't returned from fork through syscall
in a while.
- Remove fork_return() as it has been superseded by the MI version.
the alpha mp_machdep.c.
- Proc locking.
- Catch up to the P_FOO -> PS_FOO proc flags changes.
- Stick ap_init()'s prototype with the other prototypes.
- Remove the Xforwardirq IPI.
- Remove unused simplelocks.
- Don't try to psignal() from forward_statclock(), but set the appropriate
signal pending flag in p_sflag instead.
- Add in KTR_SMP tracepoints for various SMP functions. (Brought over
from the alpha port)
- Setup proc0.p_heldmtx, proc0.contested, and curproc earlier so that we
can use mutexes.
- Initialize sched_lock and Giant earlier and enter Giant during init386.
- Use suser(9) instead of checking cr_uid directly.
inline functions non-inlined. Hide parts of the mutex implementation that
should not be exposed.
Make sure that WITNESS code is not executed during boot until the mutexes
are fully initialized by SI_SUB_MUTEX (the original motivation for this
commit).
Submitted by: peter
interrupt threads to run with it always >= 1, so that malloc can
detect M_WAITOK from "interrupt" context. This is also necessary
in order to context switch from sched_ithd() directly.
Reviewed By: peter
initialization until after malloc() is safe to call, then iterate through
all mutexes and complete their initialization.
This change is necessary in order to avoid some circular bootstrapping
dependencies.
for SMP; just use the same ones as UP. These weren't used without
holding Giant anyway, and the routines that use them would have to
be protected from pre-emption to avoid migrating cpus.
pre-emptable kernel. For variables of size 4 bytes or less they compile
to a single instruction, which does not allow a process to migrate cpus
in the middle, and get the value for the "wrong" cpu.
appropriate function, rather than doing a horse-and-buggy
acquire. They now take the mutex type as an arg and can be
used with sleep as well as spin mutexes.
All calls to mtx_init() for mutexes that recurse must now include
the MTX_RECURSE bit in the flag argument variable. This change is in
preparation for an upcoming (further) mutex API cleanup.
The witness code will call panic() if a lock is found to recurse but
the MTX_RECURSE bit was not set during the lock's initialization.
The old MTX_RECURSE "state" bit (in mtx_lock) has been renamed to
MTX_RECURSED, which is more appropriate given its meaning.
The following locks have been made "recursive," thus far:
eventhandler, Giant, callout, sched_lock, possibly some others declared
in the architecture-specific code, all of the network card driver locks
in pci/, as well as some other locks in dev/ stuff that I've found to
be recursive.
Reviewed by: jhb
non-386 atomic_load_acq(). %eax is an input since its value is used in
the cmpxchg instruction, but we don't care what value it is, so setting
it to a specific value is just wasteful. Thus, it is being used without
being initialized as the warning stated, but it is ok for it to be used
because its value isn't important. Thus, we are only sort of lying when
we say it is an output only operand.
- Add "cc" to the clobber list for atomic_load_acq() since the cmpxchgl
changes ZF.
slow enough as it is, without having to constantly check that it really
is an i386 still. It was possible to compile out the conditionals for
faster cpus by leaving out 'I386_CPU', but it was not possible to
unconditionally compile for the i386. You got the runtime checking whether
you wanted it or not. This makes I386_CPU mutually exclusive with the
other cpu types, and tidies things up a little in the process.
Reviewed by: alfred, markm, phk, benno, jlemon, jhb, jake, grog, msmith,
jasone, dcs, des (and a bunch more people who encouraged it)
compiling errors where gcc would run out of registers.
- Add "cc" to the list of clobbers for micro-ops where we perform
instructions that alter %eflags.
- Use xchgl instead of cmpxchgl to release a spin lock. This could allow
for more efficient register allocation as we no longer mandate that %eax
be used.
- Reenable the optimized mutex micro-ops in the non-i386 case.
that modules can call.
- Remove the old gcc <= 2.8 versions of the atomic ops.
- Resort the order of some things in the file so that there is only
one #ifdef for KLD_MODULE, and so that all WANT_FUNCTIONS stuff is
moved to the bottom of the file.
- Remove ATOMIC_ACQ_REL() and just use explicit macros instead.
time I tinkered around here. Since INTREN is called from the interrupt
critical path now, it should not be too expensive. In this case, we
look at the bits being changed to decide which 8 bit IO port to write to
rather than unconditionally writing to both. I could probably have gone
further and only done the write if the bits actually changed, but that
seemed overkill for the usual case in interrupt threads.
[an outb is rather expensive when it has to cross the ISA bus]
exactly the same functionality via a sysctl, making this feature
a run-time option.
The default is 1(ON), which means that /dev/random device will
NOT block at startup.
setting kern.random.sys.seeded to 0(OFF) will cause /dev/random
to block until the next reseed, at which stage the sysctl
will be changed back to 1(ON).
While I'm here, clean up the sysctls, and make them dynamic.
Reviewed by: des
Tested on Alpha by: obrien
implement memory fences for the 486+. The 386 still uses versions w/o
memory fences as all operations on the 386 are not program ordered.
The 386 versions are not MP safe.
declarations of a variable of the same name. The one in the outer block
was unused and probably just slipped in at one point or another. This
silences a compiler warning.
__FreeBSD_version 500015 can be used to detect their disappearance.
- Move the symbols for SMP_prvspace and lapic from globals.s to
locore.s.
- Remove globals.s with extreme prejudice.
symbols in globals.s.
PCPU_GET(name) returns the value of the per-cpu variable
PCPU_PTR(name) returns a pointer to the per-cpu variable
PCPU_SET(name, val) sets the value of the per-cpu variable
In general these are not yet used, compatibility macros remain.
Unifdef SMP struct globaldata, this makes variables such as cpuid
available for UP as well.
Rebuilding modules is probably a good idea, but I believe old
modules will still work, as most of the old infrastructure
remains.
as multi-processor kernels. The old way made it difficult for kernel
modules to be portable between uni-processor and multi-processor
kernels. It is no longer necessary to jump through hoops.
- always load %fs with the private segment on entry to the kernel
- change the type of the self referntial pointer from struct privatespace
to struct globaldata
- make the globaldata symbol have value 0 in all cases, so the symbols
in globals.s are always offsets, not aliases for fields in globaldata
- define the globaldata space used for uniprocessor kernels in C, rather
than assembler
- change the assmebly language accessors to use %fs, add a macro
PCPU_ADDR(member, reg), which loads the register reg with the address
of the per-cpu variable member
This version is functional and is aproaching solid..
notice I said APROACHING. There are many node types I cannot test
I have tested: echo hole ppp socket vjc iface tee bpf async tty
The rest compile and "Look" right. More changes to follow.
DEBUGGING is enabled in this code to help if people have problems.
To use it, some dll is needed. And currently, the dll is only for NetBSD.
So one more kernel module is needed.
For more infomation,
http://chiharu.haun.org/peace/ .
Reviewed by: bp
format version number. (userland programs should not need to be
recompiled when the netgraph kernel internal ABI is changed.
Also fix modules that don;t handle the fact that a caller may not supply
a return message pointer. (benign at the moment because the calling code
checks, but that will change)
Add detach routine and turn driver into a module so it can be loaded
and unloaded. Also take a stab at implementing multicast packet
reception so that this NIC will work with IPv6. Promiscuous mode
doesn't seem to work, but I'm not sure why. It works well enough that
I can run dhclient on it and put it on the office network though.
Also ripped out spl stuff and replaced it with mutexes.
This is a driver for the LanMedia/SBE LMC150x E1/T1 family of cards.
The driver currently support unframed E1 (2048kbit/s) and framed
E1 (nx64).
These cards will provision E1/T1 lines for about 1/4 the cost of
a cisco router...
vm86_trap() to return to the calling program directly. vm86_trap()
doesn't return, thus it was never returning to trap() to release
Giant. Thus, release Giant before calling vm86_trap().
struct swblock entries by dividing the number of the entries by 2
until the swap metadata fits.
- Reject swapon(2) upon failure of swap_zone allocation.
This is just a temporary fix. Better solutions include:
(suggested by: dillon)
o reserving swap in SWAP_META_PAGES chunks, and
o swapping the swblock structures themselves.
Reviewed by: alfred, dillon
variables from i386 assembly language. The syntax is PCPU(member)
where member is the capitalized name of the per-cpu variable, without
the gd_ prefix. Example: movl %eax,PCPU(CURPROC). The capitalization
is due to using the offsets generated by genassym rather than the symbols
provided by linking with globals.o. asmacros.h is the wrong place for
this but it seemed as good a place as any for now. The old implementation
in asnames.h has not been removed because it is still used to de-mangle
the symbols used by the C variables for the UP case.
of explicit calls to lockmgr. Also provides macros for the flags
pased to specify shared, exclusive or release which map to the
lockmgr flags. This is so that the use of lockmgr can be easily
replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.
This clears out my outstanding netgraph changes.
There is a netgraph change of design in the offing and this is to some
extent a superset of soem of the new functionality and some of the old
functionality that may be removed.
This code works as before, but allows some new features that I want to
work with and evaluate. It is the basis for a version of netgraph
with integral locking for SMP use.
This is running on my test machine with no new problems :-)
the witness code is compiled in. Without this, the witness code doesn't
notice that sched_lock is released by fork_trampoline() and thus gets all
confused about spin lock order later on.
held and panic if so (conditional on witness).
- Change witness_list to return the number of locks held so this is easier.
- Add kern/syscalls.c to the kernel build if witness is defined so that the
panic message can contain the name of the offending system call.
- Add assertions that Giant and sched_lock are not held when returning from
a system call, which were missing for alpha and ia64.
waiting for procfs to get fixed:
- Use fill_eproc() to obtain correct VM stats. Attempt to compute VmLib.
- Fill some more fields in proc/<pid>/stat, and add four (unimplemented)
fields after studying a recent Linux kernel.
- Compute CPU frequency only once instead of twice.
- Fix some comments that were OBE.
- Fix indentation except where it makes the code less readable.
- Move PCI core code to dev/pci.
- Split bridge code out into separate modules.
- Remove the descriptive strings from the bridge drivers. If you
want to know what a device is, use pciconf. Add support for
broadly identifying devices based on class/subclass, and for
parsing a preloaded device identification database so that if
you want to waste the memory, you can identify *anything* we know
about.
- Remove machine-dependant code from the core PCI code. APIC interrupt
mapping is performed by shadowing the intline register in machine-
dependant code.
- Bring interrupt routing support to the Alpha
(although many platforms don't yet support routing or mapping
interrupts entirely correctly). This resulted in spamming
<sys/bus.h> into more places than it really should have gone.
- Put sys/dev on the kernel/modules include path. This avoids
having to change *all* the pci*.h includes.
calling the C functions mtx_enter_hard() and mtx_exit_hard() clobbers them.
Note that %eax is also not call safe, but it is already clobbered due to
cmpxchg. However, now we are back to not compiling again, so these macros
are still left disabled for now.
that of MTX_EXIT. Don't assume that the reg parameter to MTX_ENTER
holds curproc, load it explicitly. Put semi-colons at the end of
the macros to be more consistent and so its harder to forget them
when these change.
SMP problem. Compaq, in their infinite wisdom, forgot to put the IO apic
intpin #0 connection to the 8259 PIC into the mptable. This hack is to
look and see if intpin #0 has *no* table entry and adds a fake ExtInt
entry for the remap routines to use. isa/clock.c will still test the
interrupts. This entry is only ever used on an already broken system.
mpapic.c. This gives us the benefit of C type checking. These functions
are not called in any critical paths and are not used by the interrupt
routines.
spending, which was unused now that all software interrupts have
their own thread. Make the legacy schednetisr use an atomic op
for setting bits in the netisr mask.
Reviewed by: jhb
Also, while here, run up to 32 interrupt sources on APIC systems.
Normalize INTREN/INTRDIS so they are the same on both UP and SMP systems
rather than sometimes a macro, and sometimes a function.
Reviewed by: jhb, jakeb
MPLOCKED macro
(2) Use decimal 12 rather than hex 0xc in an addl
(3) Implement MTX_ENTER for the I386_CPU case
(4) Use semi-colons between instructions to allow MTX_ENTER
and MTX_ENTER_WITH_RECURSION to be assembled
(5) Use incl instead of incw to increment the recusion count
(6) 10 is not a valid label, use 7, 8 and 9 rather than 8, 9 and 10
(7) Sort numeric labels
Submitted by: bde (2, 4, and 5)
pushl that of the new process, rather than doing a movl (%esp) and
assuming that the stack has been setup right. This make the initial
stack setup slightly more sane, and will make it easier to stick
an interrupted process onto the run queue without its knowing.
process is on the alternate stack or not. For compatibility
with sigstack(2) state is being updated if such is needed.
We now determine whether the process is on the alternate
stack by looking at its stack pointer. This allows a process
to siglongjmp from a signal handler on the alternate stack
to the place of the sigsetjmp on the normal stack. When
maintaining state, this would have invalidated the state
information and causing a subsequent signal to be delivered
on the normal stack instead of the alternate stack.
PR: 22286
-current and RELENG_4 with GENERIC.
NKPT is the number of initial bootstrap page table pages we create for
the kernel during startup. Once VM is up, we resize it as needed, but
with 4G ram, the size of the vm_page_t structures was pushing it over
the limit. The fact that trimmed down kernels boot on 4G ram machines
suggests that we were pretty close to the edge.
The "30" is arbitary, but smaller than the 'nkpt' variable on all
machines that I checked.
timeout. If DIAGNOSTIC is turned on, then display a message to the console
with a map of which CPUs failed to stop or restart. This gives an SMP box
at least a fighting chance of getting into DDB if one of the other CPUs has
interrupts disabled.
before adding/removing packets from the queue. Also, the if_obytes and
if_omcasts fields should only be manipulated under protection of the mutex.
IF_ENQUEUE, IF_PREPEND, and IF_DEQUEUE perform all necessary locking on
the queue. An IF_LOCK macro is provided, as well as the old (mutex-less)
versions of the macros in the form _IF_ENQUEUE, _IF_QFULL, for code which
needs them, but their use is discouraged.
Two new macros are introduced: IF_DRAIN() to drain a queue, and IF_HANDOFF,
which takes care of locking/enqueue, and also statistics updating/start
if necessary.
struct sigframe. We need more than only the signal context.
o Properly convert the signal mask when setting up the signal
frame in linux_sendsig and properly convert it back in
linux_sigreturn.
Do some cleanups and improve style while here.
can unload. Doing so leaves the linuxulator in a crippled
state (no ioctl support) when Linux binaries are run at
unload time.
While here, consistently spell ELF in capitals and perform
some minor style improvements.
ELF spelling submitted by: asmodai
counter register in-CPU.
This is to be used as a fast "timer", where linearity is more important
than time, and multiple lines in the linearity caused by multiple CPUs
in an SMP machine is not a problem.
This adds no code whatsoever to the FreeBSD kernel until it is actually
used, and then as a single-instruction inline routine (except for the
80386 and 80486 where it is some more inline code around nanotime(9).
Reviewed by: bde, kris, jhb
- Use the mutex in hardclock to ensure no races between it and
softclock.
- Make softclock be INTR_MPSAFE and provide a flag,
CALLOUT_MPSAFE, which specifies that a callout handler does not
need giant. There is still no way to set this flag when
regstering a callout.
Reviewed by: -smp@, jlemon
may block on a mutex while on the sleep queue without corrupting
it.
- Move dropping of Giant to after the acquire of sched_lock.
Tested by: John Hay <jhay@icomtek.csir.co.za>
jhb
instead of DIAGNOSTIC.
- Remove the p_wchan check as it no longer applies since a process may be
switched out during CURSIG() within msleep() or mawait().
- Remove an extra sanity check only needed during the early SMPng work.
acquire Giant as needed in functions that call mi_switch(). The releases
need to be done outside of the sched_lock to avoid potential deadlocks
from trying to acquire Giant while interrupts are disabled.
Submitted by: witness
linux_rt_sendsig() and restore the same signal mask linux does
in rt_sigreturn(). This gets us saving/restoring all 64-bits of the
linux sigset_t in rt signals.
Reviewed by: marcel
sched_lock. This is needed for kernel threads that are created before
interrupts are enabled. kthreads created by kld's that are created at
SI_SUB_KLD such as the random kthread.
Tested by: phk
linux_sigset_t by updating the linux_sigframe struct so as to include
linux's "extramask" field. This field contains the upper 32-bits of
the sigset. extramask sits behind a linux_fpstate struct, which I've
defined primarily for padding purposes.
While we're here, define LINUX_NSIG in terms of LINUX_NBPW (32) and
LINUX_NSIG_WORDS (2).
This fixes problems where threaded apps would accumulate a large
number of zombies. This was happening because the exit signal resides
in the upper 32-bits of the sigset and was never getting unmasked by
the manager thread after the first child exited.
PR: misc/18530 (may be related, originator not yet contacted)
Reviewed by: marcel
syscall compare against a variable sv_minsigstksz in struct
sysentvec as to properly take the size of the machine- and
ABI dependent struct sigframe into account.
The SVR4 and iBCS2 modules continue to have a minsigstksz of
8192 to preserve behavior. The real values (if different) are
not known at this time. Other ABI modules use the real
values.
The native MINSIGSTKSZ is now defined as follows:
Arch MINSIGSTKSZ
---- -----------
alpha 4096
i386 2048
ia64 12288
Reviewed by: mjacob
Suggested by: bde
This code has help us comprehence ACPI spec .
Contributors of this code is as follows(except for FreeBSD commiter):
Yasuo Yokoyama,
Munehiro Matsuda,
and ALL acpi-jp@jp.freebsd.org people.
Thanks.
R.I.P.
Previously we had to include <machine/param.h> or <sys/param.h> bogusly
due to the fact that <sys/socket.h> CMSG macros needed the ALIGN macro,
which was defined in param.h. However, including param.h was a disaster
for namespace pollution.
This solution, as contributed by shin a while ago, fixes it elegantly
by wrapping the definitions around some namespace pollution preventer
definitions.
This patch was long overdue.
This should allow any network programmer to use <sys/socket.h> as
before.
PR: 19971, 20530
Submitted by: Martin Kaeske <MartinKaeske@lausitz.net>
Mark Andrews <Mark.Andrews@nominum.com>
Patch submitted by: shin
Reviewed by: bde
systems.
From the PR:
When 'probe.slot' is PCI_SLOTMAX (== 31) and 'probe.func' is 7,
call to 'pci_cfgread()' here and machine suddenly hangs up.
I don't know why... (or 450GX chipset's bug?)
PR: i386/20379
Submitted by: Masayuki FUKUI <fukui@sonic.nm.fujitsu.co.jp>
comments on the same line like so:
device foo # FooInc Brand NetEther cards
Also, move the wireless NIC cards to their own section.
Add commented out wl driver in wireless section.
Remove obsolete or redundant comments about some of the wireless cards
that used to apply but don't since we've removed 'at foobus'.
There should be no functional changes in this change.
happen when the vm system maps past the end of an object or tries
to map a zero length object, the pmap layer misses the fact that
offsets wrap into negative numbers and we get stuck.
Found by: Joost Pol aka Nohican <nohican@marcella.niets.org>
Submitted by: tegge
- Look for a hardwired interrupt in the routing table for this
bus/device/pin (we already did this).
- Look for another device with the same link byte which has a hardwired
interrupt.
- Look for a PCI device matching an entry with the same link byte
which has already been assigned an interrupt, and use that.
- Look for a routable interrupt listed in the "PCI only" interrupts
field and use that.
- Pick the first interrupt that's marked as routable and use that.
This removes support for booting current kernels with very old bootblocks.
Device driver writers: Please remove initializations for the d_bmaj
field in your cdevsw{}.