- Don't hold sched_lock around addupc_task() as this apparently breaks
profiling badly due to sched_lock being held across copyin().
Reported by: bde (2)
work because opt_preemption.h wasn't #include'd. Instead, make use of the
do_switch parameter to ithread_schedule() and do the check in the alpha
interrupt code.
scheduling an interrupt thread to run when needed. This has the side
effect of enabling support for entropy gathering from interrupts on
all architectures.
- Change the software interrupt and x86 and alpha hardware interrupt code
to use ithread_schedule() for most of their processing when scheduling
an interrupt to run.
- Remove the pesky Warning message about interrupt threads having entropy
enabled. I'm not sure why I put that in there in the first place.
- Add more error checking for parameters and change some cases that
returned EINVAL to panic on failure instead via KASSERT().
- Instead of doing a documented evil hack of setting the P_NOLOAD flag
on every interrupt thread whose pri was SWI_CLOCK, set the flag
explicity for clk_ithd's proc during start_softintr().
in mi_switch() just before calling cpu_switch() so that the first switch
after a resched request will satisfy the request.
- While I'm at it, move a few things into mi_switch() and out of
cpu_switch(), specifically set the p_oncpu and p_lastcpu members of
proc in mi_switch(), and handle the sched_lock state change across a
context switch in mi_switch().
- Since cpu_switch() no longer handles the sched_lock state change, we
have to setup an initial state for sched_lock in fork_exit() before we
release it.
Please note:
When committing changes to this file, it is important to note that
linux is not freebsd -- their system call numbers (and sometimes names)
are different on different platforms. When in doubt (and you always need
to be) check the arch-specific unistd.h and entry.S files in the linux
kernel sources to see what the syscall numbers really are.
always on curproc. This is needed to implement signal delivery properly
(see a future log message for kern_sig.c).
Debogotified the definition of aston(). aston() was defined in terms
of signotify() (perhaps because only the latter already operated on
a specified process), but aston() is the primitive.
Similar changes are needed in the ia64 versions of cpu.h and trap.c.
I didn't make them because the ia64 is missing the prerequisite changes
to make astpending and need_resched per-process and those changes are
too large to make without testing.
clear MCPCIA_INT_MASK0 helps things substantially. So, why not indeed?
Rearrange irq and cookie calculation to use shifts/masks instead
of division. Fix things to correctly remember the intpin for that
one in a million non-INTA PCI device.
made no sense in the context of wrapping them within the _SYBRIDGE macro-
or anything like it- so we concluded that this must have been a typo
in the docs. This also doesn't use the same bridge offset as anything
else.
Add some defines for the INT_CTL register.
- All processes go into the same array of queues, with different
scheduling classes using different portions of the array. This
allows user processes to have their priorities propogated up into
interrupt thread range if need be.
- I chose 64 run queues as an arbitrary number that is greater than
32. We used to have 4 separate arrays of 32 queues each, so this
may not be optimal. The new run queue code was written with this
in mind; changing the number of run queues only requires changing
constants in runq.h and adjusting the priority levels.
- The new run queue code takes the run queue as a parameter. This
is intended to be used to create per-cpu run queues. Implement
wrappers for compatibility with the old interface which pass in
the global run queue structure.
- Group the priority level, user priority, native priority (before
propogation) and the scheduling class into a struct priority.
- Change any hard coded priority levels that I found to use
symbolic constants (TTIPRI and TTOPRI).
- Remove the curpriority global variable and use that of curproc.
This was used to detect when a process' priority had lowered and
it should yield. We now effectively yield on every interrupt.
- Activate propogate_priority(). It should now have the desired
effect without needing to also propogate the scheduling class.
- Temporarily comment out the call to vm_page_zero_idle() in the
idle loop. It interfered with propogate_priority() because
the idle process needed to do a non-blocking acquire of Giant
and then other processes would try to propogate their priority
onto it. The idle process should not do anything except idle.
vm_page_zero_idle() will return in the form of an idle priority
kernel thread which is woken up at apprioriate times by the vm
system.
- Update struct kinfo_proc to the new priority interface. Deliberately
change its size by adjusting the spare fields. It remained the same
size, but the layout has changed, so userland processes that use it
would parse the data incorrectly. The size constraint should really
be changed to an arbitrary version number. Also add a debug.sizeof
sysctl node for struct kinfo_proc.
Some things needed bits of <i386/include/lock.h> - cy.c now has its
own (only) copy of the COM_(UN)LOCK() macros, and IMASK_(UN)LOCK()
has been moved to <i386/include/apic.h> (AKA <machine/apic.h>).
Reviewed by: jhb
genassym here, but what I've also noticed is that we're dorking
with a mutex directly at assembler level- I'm not sure that this
is wise at this stage in the SMP port- I think it's going to be much
safer for a while to do things in C until SMP wunderkind figure out
what works and slow down this 3 order differential...
it as I was playing with some other ways of doing kernel preemption.
You must still specify the PREEMPTION option in your config file to get a
preemptive kernel.
attributes. This is needed for AST's to be properly posted in a preemptive
kernel. They are backed by two new flags in p_sflag: PS_ASTPENDING and
PS_NEEDRESCHED. They are still accesssed by their old macros:
aston(), astoff(), etc. For completeness, an astpending() macro has been
added to check for a pending AST, and clear_resched() has been added to
clear need_resched().
- Rename syscall2() on the x86 back to syscall() to be consistent with
other architectures.
- Use swi_* function names.
- Use void * to hold cookies to handlers instead of struct intrhand *.
- In sio.c, use 'driver_name' instead of "sio" as the name of the driver
lock to minimize diffs with cy(4).
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.
Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)
only covers about 3-4 lines.
- Don't lower the IPL while we are on the interrupt stack. Instead, save
the raised IPL and change the saved IPL in sched_lock to IPL_0 before
calling mi_switch(). When we are resumed, restore the saved IPL in
sched_lock to the saved raised IPL so that when we release sched_lock
we won't lower the IPL. Without this, we would get nested interrupts
that would overflow the kernel stack.
Tested by: mjacob
* Optimise the return path for syscalls so that they only restore a minimal
set of registers instead of performing a full exception_return.
A new flag in the trapframe indicates that the frame only holds partial
state. When it is necessary to perform a full state restore (e.g. after an
execve or signal), the flag is cleared to force a full restore.
- If possible, context switch to the thread directly in sched_ithd(),
rather than triggering a delayed ast reschedule.
- Disable interrupts while restoring fpu state in the trap handler,
in order to ensure that we are not preempted in the middle, which
could cause migration to another cpu.
Reviewed by: peter
Tested by: peter (alpha)
* Optimise the return path for syscalls so that they only restore a minimal
set of registers instead of performing a full exception_return.
A new flag in the trapframe indicates that the frame only holds partial
state. When it is necessary to perform a full state restore (e.g. after an
execve or signal), the flag is cleared to force a full restore.
instead of a trapframe directly. (Requested by bde.)
- Convert the alpha switch_trampoline to call fork_exit() and use the MI
fork_return() instead of child_return().
- Axe child_return().
to extract the PC from that to send to addupc_task() so that it can be
called from MI code.
- Remove all traces of have_giant with extreme prejudice and use
mtx_owned(&Giant) instead where appropriate.
- Proc locking.
- P_FOO -> PS_FOO.
- Don't grab Giant just to look in curproc's p_addr during a trap since we
may choose to immediately exit. Instead, delay grabbing Giant a bit
until we actually need it.
- Don't reset 'p' to 'curproc' in syscall() to handle the case of a child
returning from fork1() since children don't return via syscall().
- Remove an XXX comment in ast() that questions the correctness of the
userland check. The code is correct.
- Don't send IPIs for pmap_invalidate_page() or pmap_invalidate_all()
in the UP case.
- Catch up to cpuno -> cpuid.
- Convert some sanity checks that were #ifdef DIAGNOSTIC to KASSERT()'s.
- Rename the per-CPU variable 'cpuno' to 'cpuid'. This was done so that
there is one consistent name across all architectures for a logical
CPU id.
- Remove all traces of IRQ forwarding.
- Add globaldata_register() hook called by globaldata_init() to register
globaldata structures in the cpuid_to_globaldata array.
- Catch up to P_FOO -> PS_FOO.
- Bring across some fixes for forwarded_statclock() from the i386 version
to handle ithreads and idleproc properly.
- Rename addugd_intr_forwarded() to addupc_intr_forwarded() so that it is
the same name on all architectures.
- Set flags in p_sflag instead of calling psignal() from
forward_hardclock().
- Proc locking.
- When we handle an IPI, turn off its bit in the mask of IPI's we are
currently handling so that an IPI doesn't send a CPU into an infinite
loop.
that mutex operations work.
- Enter Giant earlier so we hold it during boot.
- Proc locking.
- Move globaldata_init() into here from mp_machdep.c so that UP kernels
don't depend on mp_machdep.c. Use a callout in the SMP case to register
the boot processor's globaldata in the cpuid_to_globaldata array.
inline functions non-inlined. Hide parts of the mutex implementation that
should not be exposed.
Make sure that WITNESS code is not executed during boot until the mutexes
are fully initialized by SI_SUB_MUTEX (the original motivation for this
commit).
Submitted by: peter
interrupt threads to run with it always >= 1, so that malloc can
detect M_WAITOK from "interrupt" context. This is also necessary
in order to context switch from sched_ithd() directly.
Reviewed By: peter
initialization until after malloc() is safe to call, then iterate through
all mutexes and complete their initialization.
This change is necessary in order to avoid some circular bootstrapping
dependencies.
All calls to mtx_init() for mutexes that recurse must now include
the MTX_RECURSE bit in the flag argument variable. This change is in
preparation for an upcoming (further) mutex API cleanup.
The witness code will call panic() if a lock is found to recurse but
the MTX_RECURSE bit was not set during the lock's initialization.
The old MTX_RECURSE "state" bit (in mtx_lock) has been renamed to
MTX_RECURSED, which is more appropriate given its meaning.
The following locks have been made "recursive," thus far:
eventhandler, Giant, callout, sched_lock, possibly some others declared
in the architecture-specific code, all of the network card driver locks
in pci/, as well as some other locks in dev/ stuff that I've found to
be recursive.
Reviewed by: jhb
exactly the same functionality via a sysctl, making this feature
a run-time option.
The default is 1(ON), which means that /dev/random device will
NOT block at startup.
setting kern.random.sys.seeded to 0(OFF) will cause /dev/random
to block until the next reseed, at which stage the sysctl
will be changed back to 1(ON).
While I'm here, clean up the sysctls, and make them dynamic.
Reviewed by: des
Tested on Alpha by: obrien
__FreeBSD_version 500015 can be used to detect their disappearance.
- Move the symbols for SMP_prvspace and lapic from globals.s to
locore.s.
- Remove globals.s with extreme prejudice.
be 64 bits wide. The largest known current actual physical implementation
is 40 bits, so BUS_SPACE_MAXADDR should reflect this. It also seems to
me that BUS_SPACE_UNRESTRICTED should b ~0UL, not ~0.
symbols in globals.s.
PCPU_GET(name) returns the value of the per-cpu variable
PCPU_PTR(name) returns a pointer to the per-cpu variable
PCPU_SET(name, val) sets the value of the per-cpu variable
In general these are not yet used, compatibility macros remain.
Unifdef SMP struct globaldata, this makes variables such as cpuid
available for UP as well.
Rebuilding modules is probably a good idea, but I believe old
modules will still work, as most of the old infrastructure
remains.
of explicit calls to lockmgr. Also provides macros for the flags
pased to specify shared, exclusive or release which map to the
lockmgr flags. This is so that the use of lockmgr can be easily
replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.
CPU version (apecs:ev4::cia:ev5) and the irq hardware depends on the systype
previously, only ev4 AS1000s and ev5 AS1000a's would have worked.
tested by: wilko (in its -stable form)
noticed by: daniel
held and panic if so (conditional on witness).
- Change witness_list to return the number of locks held so this is easier.
- Add kern/syscalls.c to the kernel build if witness is defined so that the
panic message can contain the name of the offending system call.
- Add assertions that Giant and sched_lock are not held when returning from
a system call, which were missing for alpha and ia64.
- Move PCI core code to dev/pci.
- Split bridge code out into separate modules.
- Remove the descriptive strings from the bridge drivers. If you
want to know what a device is, use pciconf. Add support for
broadly identifying devices based on class/subclass, and for
parsing a preloaded device identification database so that if
you want to waste the memory, you can identify *anything* we know
about.
- Remove machine-dependant code from the core PCI code. APIC interrupt
mapping is performed by shadowing the intline register in machine-
dependant code.
- Bring interrupt routing support to the Alpha
(although many platforms don't yet support routing or mapping
interrupts entirely correctly). This resulted in spamming
<sys/bus.h> into more places than it really should have gone.
- Put sys/dev on the kernel/modules include path. This avoids
having to change *all* the pci*.h includes.
files which Compaq open-sourced (with a BSD license).
This commit adds support for proper PCI interrupt mapping and much
better support for swizzling between "standard" isa IRQs and the stdio
irqs used by the t2. This also adds enabling/disabling/eoi support
for AlphaServer 2100A machines. The 2100A (or lynx) interrupt
hardware is is very different (and much nicer) than the 2100.
Previously, only AS2100 and AS2000 machines worked.
This commits also lays the groundwork for supporting ExtIO modules.
These modules are essentially a second hose. This work is left
unfinished pending testing on real hardware. Wilko tells me that
ExtIO modules are quite rare, and may not actually exist in the wild.
Obtained from: Tru64
Tested by: wilko
spending, which was unused now that all software interrupts have
their own thread. Make the legacy schednetisr use an atomic op
for setting bits in the netisr mask.
Reviewed by: jhb
tweak to enable/disable interrupt sources. Seems to work. It is unclear
how many of the PC164 models actually might needs this, and whether or
not there are other hidden issues.
Obtained from:Bernd Walter <ticso@cicely8.cicely.de>
not return ENOEXEC. This is because image activators should return -1 if they
don't claim an image. They should return ENOEXEC if they do claim it,
but cannot load it due to sime problem with the image. This bug was
preventing static compilation of the osf/1 module. I'm surprised it
did not cause more problems.
EOI after the ithread runs, send the EOI when we get the interrupt and
disable the source. After the ithread is run, the source is renabled.
Also, add isa_handle_fast_intr() which handles fast interrupts by sending
an EOI after the handler is run.
This fixes the chronic missing interrupt problems under heavy NFS load
on my UP1000 and should result in greater stability for alphas which
route all irqs through an isa pic.
Discussed with: jhb, bde (sending non-specific EOIs early was bde's idea)
like the args to the config space accessors these functions replaced.
This reduces the likelyhood of overflow when the args are used in
macros on the alpha. This prevents memory management faults when
probing the pci bus on sables, multias and nonames.
Approved by: dfr
Tested by: Bernd Walter <ticso@cicely8.cicely.de>
process is on the alternate stack or not. For compatibility
with sigstack(2) state is being updated if such is needed.
We now determine whether the process is on the alternate
stack by looking at its stack pointer. This allows a process
to siglongjmp from a signal handler on the alternate stack
to the place of the sigsetjmp on the normal stack. When
maintaining state, this would have invalidated the state
information and causing a subsequent signal to be delivered
on the normal stack instead of the alternate stack.
PR: 22286
can unload. Doing so leaves the linuxulator in a crippled
state (no ioctl support) when Linux binaries are run at
unload time.
While here, consistently spell ELF in capitals and perform
some minor style improvements.
ELF spelling submitted by: asmodai
counter register in-CPU.
This is to be used as a fast "timer", where linearity is more important
than time, and multiple lines in the linearity caused by multiple CPUs
in an SMP machine is not a problem.
This adds no code whatsoever to the FreeBSD kernel until it is actually
used, and then as a single-instruction inline routine (except for the
80386 and 80486 where it is some more inline code around nanotime(9).
Reviewed by: bde, kris, jhb
- move the call to cia_init_sgmap() to after we've determined if we're a pyxis
- convert needed splhigh() in cia_sgmap_invalidate_pyxis() to disable_intr()
Previously, any isa DMA on a pyxis based machine would cause a panic
in cia_sgmap_invalidate_pyxis() because the pyxis workaround was never
setup.
- while i'm at it, convert needed splhigh() in cia_swiz_set_hae_mem to
disable_intr()
- Use the mutex in hardclock to ensure no races between it and
softclock.
- Make softclock be INTR_MPSAFE and provide a flag,
CALLOUT_MPSAFE, which specifies that a callout handler does not
need giant. There is still no way to set this flag when
regstering a callout.
Reviewed by: -smp@, jlemon
may block on a mutex while on the sleep queue without corrupting
it.
- Move dropping of Giant to after the acquire of sched_lock.
Tested by: John Hay <jhay@icomtek.csir.co.za>
jhb
acquire Giant as needed in functions that call mi_switch(). The releases
need to be done outside of the sched_lock to avoid potential deadlocks
from trying to acquire Giant while interrupts are disabled.
Submitted by: witness
to our native connect(). This is required to deal with the differences
in the way linux handles connects on non-blocking sockets.
This gets the private beta of the Compaq Linux/alpha JDK working
on FreeBSD/alpha
Approved by: marcel
mainly cut-n-pasted from the i386 port, except for the method of setting
the child's stack which is the only MD part of this function.
I've tested with the example apps shipped with the linux threads source
code (ex1-ex6) and with several binary builds of Mozilla.
- No signal translation is needed. Our signals match the OSF/1 signals
- an OSF/1 sigset_t is 64 bits. Make certain to use all 64-bits of it.
We'd previously only used the lower 32 bits. This was mostly harmless
as I don't know of an OSF/1 apps which use any signals > 31. However,
the alpha Linux ABI uses the osf/1 signal routines and threaded linux
apps tyically use signals 32 and 33 to comminicate with the manager
thread, so it is important we preserve the upper 32-bits.
Reviewed by: marcel (at least in principal)
syscall compare against a variable sv_minsigstksz in struct
sysentvec as to properly take the size of the machine- and
ABI dependent struct sigframe into account.
The SVR4 and iBCS2 modules continue to have a minsigstksz of
8192 to preserve behavior. The real values (if different) are
not known at this time. Other ABI modules use the real
values.
The native MINSIGSTKSZ is now defined as follows:
Arch MINSIGSTKSZ
---- -----------
alpha 4096
i386 2048
ia64 12288
Reviewed by: mjacob
Suggested by: bde
for an interrupt to enable/disable from the vector (and GID too, if we
had multiple GIDs)- so, stupidly for now, search for the right mcpcia's
softc so we have the right base address for the bridge CSR to apply
IRQ bit-twiddle's to. Alas- this doesn't yet allow us to run, but it's
the right direction.
Previously we had to include <machine/param.h> or <sys/param.h> bogusly
due to the fact that <sys/socket.h> CMSG macros needed the ALIGN macro,
which was defined in param.h. However, including param.h was a disaster
for namespace pollution.
This solution, as contributed by shin a while ago, fixes it elegantly
by wrapping the definitions around some namespace pollution preventer
definitions.
This patch was long overdue.
This should allow any network programmer to use <sys/socket.h> as
before.
PR: 19971, 20530
Submitted by: Martin Kaeske <MartinKaeske@lausitz.net>
Mark Andrews <Mark.Andrews@nominum.com>
Patch submitted by: shin
Reviewed by: bde
argument. These flags include INTR_FAST, INTR_MPSAFE, etc.
- Properly handle INTR_EXCL when it is passed in to allow an interrupt
handler to claim exclusive ownership of an interrupt thread.
- Add support for psuedo-fast interrupts on the alpha. For fast interrupts,
we don't allocate an interrupt thread; instead, during dispatching of an
interrupt, we run the handler directly instead of scheduling the thread
to run. Note that the handler is currently run without Giant and must be
MP safe. The only fast handler currently is for the sio driver.
Requested by: dfr
because it only takes a struct tag which makes it impossible to
use unions, typedefs etc.
Define __offsetof() in <machine/ansi.h>
Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h>
Remove myriad of local offsetof() definitions.
Remove includes of <stddef.h> in kernel code.
NB: Kernelcode should *never* include from /usr/include !
Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API.
Deprecate <struct.h> with a warning. The warning turns into an error on
01-12-2000 and the file gets removed entirely on 01-01-2001.
Paritials reviews by: various.
Significant brucifications by: bde
change_ruid() in kern_prot.c. This fixes an incorrect use
of chgproccnt().
Update both osf1_setuid() and osf1_setgid() to use setsugid() instead
of just frobbing the flag.
(mostly) submitted by: truckman
type of software interrupt. Roughly, what used to be a bit in spending
now maps to a swi thread. Each thread can have multiple handlers, just
like a hardware interrupt thread.
- Instead of using a bitmask of pending interrupts, we schedule the specific
software interrupt thread to run, so spending, NSWI, and the shandlers
array are no longer needed. We can now have an arbitrary number of
software interrupt threads. When you register a software interrupt
thread via sinthand_add(), you get back a struct intrhand that you pass
to sched_swi() when you wish to schedule your swi thread to run.
- Convert the name of 'struct intrec' to 'struct intrhand' as it is a bit
more intuitive. Also, prefix all the members of struct intrhand with
'ih_'.
- Make swi_net() a MI function since there is now no point in it being
MD.
Submitted by: cp
more include file including <sys/proc.h>, but there still is this wonky
and (causes warnings on i386) reference in globals.h.
CURTHD is now defined in <machine/globals.h> as well. The correct thing
to do is provide a platform function for this.