1060 Commits

Author SHA1 Message Date
Marcel Moolenaar
a50bc30203 o Fix cut-n-paste whitespace corruption in previous commit
o  For trap-based upcalls the argument (the kse_mailbox) to
   the UTS must be written onto the kernel stack, not the
   user stack. While here, deal with the fact that we may
   be at a NaT collection point.
2003-08-07 07:40:19 +00:00
Marcel Moolenaar
bee4e73025 In cpu_set_upcall_kse(), create the upcall according to the entry
path into the kernel. Normally it's due to a syscall, but one can
also be created as the result of a clock interrupt (for example).
This now even more looks like exec_setregs().

While here, add an assert that we don't expect more than 8KB of
dirty registers on the kernel stack.
2003-08-06 23:28:19 +00:00
Marcel Moolenaar
5f20d75a5f o In revision 1.45 of exception.S we changed exception_restore to
unconditionally restore ar.k7 (kernel memory stack) and ar.k6
   (kernel register stack). I don't know what I was smoking then,
   but if you unconditionally restore ar.k6, you also want to
   compute its value unconditionally. By having the computation
   predicated and dependent on whether we return to user mode, we
   would end up writing junk (= invalid value for ar.bspstore) if
   we would return to kernel mode. But the whole point of the
   unconditional restoration was that there is a grey area where
   we still need to have ar.k6 restored. If we restore with a junk
   value, we would end up wedging the machine on the next interrupt.
   So, unconditionally calculate the value we unconditionally write
   to ar.k6.

o  The previous braino was found while making the following change:
   We used to clear the lower 9 bits of the value we write to ar.k6.
   The meaning being that we know that the kernel register stack is
   at least 512 byte aligned and simply clearing the lower 9 bits
   allows us to return to a context of which we don't have dirty
   registers on the kernel stack, even though the context that
   entered the kernel does have dirty registers on the kernel stack.
   By masking-off the lower bits, we correctly obtain the base of
   the register stack without having to worry that we didn't actually
   reached the base while unwinding it.
   The change is to mask off the lower 13 bits, knowing that the
   kernel register stack is always 8KB aligned. The advantage is that
   we don't have to worry anymore if there's more than 512 bytes of
   dirty registers on the kernel stack. A situation that frequently
   occurs. In exec_setregs() in machdep.c:1.147 or older, we had to
   deal with that situation by copying the active portion of the
   register stack down in multiples of 512 bytes. Now that we mask off
   the lower 13 bits we don't have to do that at all. Contemporary
   IPF processors have a register file that can hold up to 96 stacked
   registers (=784 bytes [incl. 2 NaT collections]). With no indication
   that register files grow beyond a couple of hundred registers, we
   should not have to worry about it anymore... and yes, 640KB is
   enough for everybody :-)
   This change helps setcontext(2) and cpu_set_upcall_kse() in that
   they can return to completely different contexts without having to
   mess with the kernel stack. Of course exec_setregs() doesn't need
   to do that anymore as well.
2003-08-06 21:32:38 +00:00
Marcel Moolenaar
7f36189f8a o Put the syscall return registers in the context. Not only do we
need this for swapcontext(), KSE upcalls initiated from ast()
   also need to save them so that we properly return the syscall
   results after having had a context switch. Note that we don't
   use r11 in the kernel. However, the runtime specification has
   defined r8-r11 as return registers, so we put r11 in the context
   as well. I think deischen@ was trying to tell me that we should
   save the return registers before. I just wasn't ready for it :-)

o  The EPC syscall code has 2 return registers and 2 frame markers
   to save. The first (rp/pfs) belongs to the syscall stub itself.
   The second (iip/cfm) belongs to the caller of the syscall stub.
   We want to put the second in the context (note that iip and cfm
   relate to interrupts. They are only being misused by the syscall
   code, but are not part of a regular context).
   This way, when the context is switched to again, we return to
   the caller of setcontext(2) as one would expect.

o  Deal with dirty registers on the kernel stack. The getcontext()
   syscall will flush the RSE, so we don't expect any dirty registers
   in that case. However, in thread_userret() we also need to save
   the context in certain cases. When that happens, we are sure that
   there are dirty registers on the kernel stack.
   This implementation simply copies the registers, one at a time,
   from the kernel stack to the user stack. NAT collections are not
   dealt with. Hence we don't preserve NaT bits. A better solution
   needs to be found at some later time.
   We also don't deal with this in all cases in set_mcontext. No
   temporay solution is implemented because it's not a showstopper.
   The problem is that we need to ignore the dirty registers and we
   automaticly do that for at most 62 registers. When there are more
   than 62 dirty registers we have a memory "leak".

This commit is fundamental for KSE support.
2003-08-05 18:52:02 +00:00
Marcel Moolenaar
02cc6a6f35 Fix logic bug in the previous commit. Any region less than 5 is a
user space region. Hence, we need to test if 5 is greater than the
region; not greater equal.
This bug caused us to call ast() while interrupting kernel mode.
2003-08-04 22:00:48 +00:00
John Baldwin
3bdbd658f1 - Since td_critnest is now initialized in MI code, it doesn't have to be
set in cpu_critical_fork_exit() anymore.
- As far as I can tell, cpu_thread_link() has never been used, not even
  when it was originally added, so remove it.
2003-08-04 20:32:45 +00:00
Marcel Moolenaar
46e31b2612 Cleanup the clock code. This includes:
o  Remove alpha specific timer code (mc146818A) and compiled-out
   calibration of said timer.
o  Remove i386 inherited timer code (i8253) and related acquire and
   release functions.
o  Move sysbeep() from clock.c to machdep.c and have it return
   ENODEV. Console beeps should be implemented using ACPI or if no
   such device is described, using the sound driver.
o  Move the sysctls related to adjkerntz, disable_rtc_set and
   wall_cmos_clock from machdep.c to clock.c, where the variables
   are.
o  Don't hardcode a hz value of 1024 in cpu_initclocks() and don't
   bother faking a stathz that's 1/8 of that. Keep it simple: hz
   defaults to HZ and stathz equals hz. This is also how it's done
   for sparc64.
o  Keep a per-CPU ITC counter (pc_clock) and adjustment (pc_clockadj)
   to calculate ITC skew and corrections. On average, we adjust the
   ITC match register once every ~1500 interrupts for a duration of
   2 consequtive interruprs. This is to correct the non-deterministic
   behaviour of the ITC interrupt (there's a delay between the match
   and the raising of the interrupt).
o  Add 4 debugging sysctls to monitor clock behaviour. Those are
   debug.clock_adjust_edges, debug.clock_adjust_excess,
   debug.clock_adjust_lost and debug.clock_adjust_ticks. The first
   counts the individual adjustment cycles (when the skew first
   crosses the threshold), the second counts the number of times the
   adjustment was excessive (any non-zero value is to be considered
   a bug), the third counts lost clock interrupts and the last counts
   the number of interrupts for which we applied an adjustment
   (debug.clock_adjust_ticks / debug.clock_adjust_edges gives the
   avarage duration of an individual adjustment -- should be ~2).

While here, remove some nearby (trivial) left-overs from alpha and
other cleanups.
2003-08-04 05:13:18 +00:00
Marcel Moolenaar
5192a6fc07 Fix handling of external interrupts: we weren't calling ast() when
interrupting user mode. The net effect of this bug is that a clock
interrupt does not cause rescheduling and processes are not
preempted. It only takes a "while (1);" to render the machine
useless.

This bug was introduced by the context changes and EPC syscall code.
Handling of ASTs was moved to C for clarity and ease of maintenance,
but was not added for the external interrupt case.

This needs to be revisited. We now have calls to do_ast() in trap(),
break_syscall() and ivt_External_Interrupt(). A single call in
exception_restore covers these 3 places without duplication. This
is where we handled ASTs prior to the overhaul, except that the
meat has been moved to do_ast(), a C function. This was the goal
to begin with.

Pointy hat: marcel
2003-08-04 00:08:39 +00:00
David E. O'Brien
a98a5f06d3 Style sync. 2003-08-03 07:50:19 +00:00
Marcel Moolenaar
6a1909919b Don't use uint64_t. Use unsigned long instead. One is supposed to use
ucontext_t without having to include headers other than <ucontext.h>.
2003-08-02 01:12:31 +00:00
Marcel Moolenaar
b65384050e Write the preserved registers to (and read them from) struct reg and
struct fpreg.
2003-08-01 07:21:34 +00:00
Bosko Milekic
b053bc8407 Make sure that when the PV ENTRY zone is created in pmap, that it's
created not only with UMA_ZONE_VM but also with UMA_ZONE_NOFREE.  In
the i386 case in particular, the pmap code would hook a special
page allocation routine that allocated from kernel_map and not kmem_map,
and so when/if the pageout daemon drained the zones, it could actually
push out slabs from the PV ENTRY zone but call UMA's default page_free,
which resulted in pages allocated from kernel_map being freed to
kmem_map; bad.  kmem_free() ignores the return value of the
vm_map_delete and just returns.  I'm not sure what the exact
repercussions could be, but it doesn't look good.

In the PAE case on i386, we also set-up a zone in pmap, so be
conservative for now and make that zone also ZONE_NOFREE and
ZONE_VM.  Do this for the pmap zones for the other archs too,
although in some cases it may not be entirely necessarily.  We'd
rather be safe than sorry at this point.

Perhaps all UMA_ZONE_VM zones should by default be also
UMA_ZONE_NOFREE?

May fix some of silby's crashes on the PV ENTRY zone.
2003-07-31 03:39:51 +00:00
Peter Wemm
ad7a226f9d Deal with 'options KSTACK_PAGES' being a global option. 2003-07-31 01:31:32 +00:00
Peter Wemm
aac6412bcd Cosmetic: fix some disorder of #include "opt_...." files 2003-07-31 01:29:09 +00:00
Peter Wemm
edc367db34 Remove leftover relic of pmap_new_thread() etc. 2003-07-31 01:28:41 +00:00
Maxime Henrion
d5afecd068 - Introduce a new busdma flag BUS_DMA_ZERO to request for zero'ed
memory in bus_dmamem_alloc().  This is possible now that
  contigmalloc() supports the M_ZERO flag.
- Remove the locking of Giant around calls to contigmalloc() since
  contigmalloc() now grabs Giant itself.
2003-07-27 13:52:10 +00:00
Marcel Moolenaar
e2fe99a2e0 Remove prototype of ia64_pa_access(). The function has been moved to
mem.c where it's been made static.
2003-07-26 10:13:30 +00:00
Marcel Moolenaar
4f373ec187 Avoid using __aligned(16). Instead define the jmp_buf in terms of
long doubles. This gives us 16-byte alignment. Add a CTASSERT for
the size of the jmp_buf to detect ABI breakages.
2003-07-26 08:03:43 +00:00
Marcel Moolenaar
dc539f3ee0 Unbreak ia64 builds now -Werror is enabled again. Avoid obsolete
memory operand construct.
2003-07-26 07:23:25 +00:00
Marcel Moolenaar
938b878e45 Revert previous commit. We don't use setjmp()/longjmp() for context
switching anymore, so there's no need to save and restore GP. This
change breaks threaded applications linked against libc_r. Pull the
tier 2 card again: relink. This will link against libthr instead.
2003-07-25 22:36:48 +00:00
Alan Cox
059358675e MFi386 revision 1.416
Add vm object locking to pmap_prefault().

Note: powerpc and sparc64 do not implement this function.
2003-07-25 18:58:39 +00:00
Marcel Moolenaar
c1e97bb458 Remove __aligned(16) from the definition of struct _ia64_fpreg. It's
a non-standard construct. Instead, redefine struct _ia64_fpreg as a
union and put a long double in it. On ia64 and for LP64, this is
defined by the ABI to have 16-byte alignment. For ILP32 a long double
has 4-byte alignment, but we don't support ILP32.

Note that the in-memory image of a long double does not match the in-
memory image of spilled FP registers. This means that one cannot use
the fpr_flt field to interpet the bits. For this reason we continue
to use an aggregate type.
2003-07-25 08:02:24 +00:00
Marcel Moolenaar
cd7e5d6eb4 Remove INVARIANT* and WITNESS. This makes the simulator much more
pleasant to use.
2003-07-25 07:52:20 +00:00
Marcel Moolenaar
076f523998 Move ia64_pa_access() from machdep.c to mem.c and declare it static.
It's only used in mem.c and cannot accidentally be used elsewhere
this way.
2003-07-25 05:37:13 +00:00
Marcel Moolenaar
c5262d75ed Disable the single-step trap on a debug related trap, including of
course the single-step trap itself.
2003-07-25 00:11:14 +00:00
Marcel Moolenaar
793e17ba11 We sloppily created an array for the high FP registers (f32-f127),
but this just created a weird inconsistency when porting gdb(1).
Instead, we name each high FP register seperately, like we do for
all the other registers.
2003-07-23 03:08:34 +00:00
Marcel Moolenaar
e90153536e Rename thread_siginfo to cpu_thread_siginfo. 2003-07-15 04:43:33 +00:00
Marcel Moolenaar
ed4ee6b2af Enable the high FP registers when we call the FPSWA handler and disable
them again afterwards. This fixes a disabled FP fault while in the FPSWA
handler.
While here, merge the FP fault and FP trap handling code to reduce code
duplication. Where code was different, it was not sure it should be.

Trigger case: ports/math/atlas
2003-07-13 04:08:16 +00:00
Marcel Moolenaar
480d3dd2ea Add logic to trace across/over a trapframe. We have ABI markers in
our unwind information for functions that are entry points into the
kernel. When stepping to the next frame, the unwinder will let us
know when sych a marker was encountered. We use this to stop the
current unwind session, query the trapframe and restart a new
unwind session based on the new trapframe.

The implementation is a bit sloppy, but at this time there are
bigger fish to fry.
2003-07-12 04:35:09 +00:00
Marcel Moolenaar
2c75b47793 Add a body directive before the first instruction in epc_syscall().
This results in a zero length prologue and a body that covers the
whole function. This is more correct.
2003-07-11 08:52:48 +00:00
Marcel Moolenaar
67f79f5a15 Remove a gratuitous align directive after the endp directive for
IVT entries.
2003-07-11 08:49:26 +00:00
Marcel Moolenaar
290245ea4c Don't call malloc() and free() while in the debugger and unwinding
to get a stacktrace. This does not work even with M_NOWAIT when we
have WITNESS and is generally a bad idea (pointed out by bde@). We
allocate an 8K heap for use by the unwinder when ddb is active. A
stack trace roughly takes up half of that in any case, so we have
some room for complex unwind situations. We don't want to waste too
much space though. Due to the nature of unwinding, we don't worry
too much about fragmentation or performance of unwinding while in
the debugger. For now we have our own heap management, but we may
be able to leverage from existing code at some later time.

While here:
o  Make sure we actually free the unwind environment after unwinding.
   This fixes a memory leak.
o  Replace Doug's license with mine in unwind.c and unwind.h. Both
   files don't have much, if any, of Doug's code left since the EPC
   syscall overhaul and the import of the unwinder.
o  Remove dead code.
o  Replace M_NOWAIT with M_WAITOK for all remaining malloc() calls.
2003-07-05 23:21:58 +00:00
Alan Cox
1f78f902a8 Background: pmap_object_init_pt() premaps the pages of a object in
order to avoid the overhead of later page faults.  In general, it
implements two cases: one for vnode-backed objects and one for
device-backed objects.  Only the device-backed case is really
machine-dependent, belonging in the pmap.

This commit moves the vnode-backed case into the (relatively) new
function vm_map_pmap_enter().  On amd64 and i386, this commit only
amounts to code rearrangement.  On alpha and ia64, the new machine
independent (MI) implementation of the vnode case is smaller and more
efficient than their pmap-based implementations.  (The MI
implementation takes advantage of the fact that objects in -CURRENT
are ordered collections of pages.)  On sparc64, pmap_object_init_pt()
hadn't (yet) been implemented.
2003-07-03 20:18:02 +00:00
Ruslan Ermilov
0be33d3321 The .s files were repo-copied to .S files.
Approved by:	marcel
Repocopied by:	joe
2003-07-02 12:57:07 +00:00
Marcel Moolenaar
c55b999c72 The use of SYSINIT requires the inclusion of <sys/kernel.h> 2003-07-02 01:22:29 +00:00
Maxime Henrion
75f9bf73ec Make this even closer to other busdma backends. 2003-07-01 21:21:45 +00:00
Maxime Henrion
02681c8bc2 Sync bounce pages support with the alpha backend. More precisely:
o use a mutex to protect the bounce pages structure.
	o use a SYSINIT function to initialize the bounce pages structures
	  and thus avoid a race condition in alloc_bounce_pages().
	o add support for the BUS_DMA_NOWAIT flag in bus_dmamap_load().
	o remove obsolete splhigh()/splx() calls.
	o remove printf() about incorrect locking in busdma_swi() and sync
	  busdma_swi() with the one of the alpha backend.
	o use __FBSDID.
2003-07-01 18:08:05 +00:00
Maxime Henrion
4813f72a9b Honor the boundary of the busdma tag when allocating bounce pages.
This was fixed in revision 1.5 of alpha/alpha/busdma_machdep.c and
was never fixed in other busdma backends using bounce pages.
2003-07-01 16:54:54 +00:00
Scott Long
f6b1c44d1f Mega busdma API commit.
Add two new arguments to bus_dma_tag_create(): lockfunc and lockfuncarg.
Lockfunc allows a driver to provide a function for managing its locking
semantics while using busdma.  At the moment, this is used for the
asynchronous busdma_swi and callback mechanism.  Two lockfunc implementations
are provided: busdma_lock_mutex() performs standard mutex operations on the
mutex that is specified from lockfuncarg.  dftl_lock() is a panic
implementation and is defaulted to when NULL, NULL are passed to
bus_dma_tag_create().  The only time that NULL, NULL should ever be used is
when the driver ensures that bus_dmamap_load() will not be deferred.
Drivers that do not provide their own locking can pass
busdma_lock_mutex,&Giant args in order to preserve the former behaviour.

sparc64 and powerpc do not provide real busdma_swi functions, so this is
largely a noop on those platforms.  The busdma_swi on is64 is not properly
locked yet, so warnings will be emitted on this platform when busdma
callback deferrals happen.

If anyone gets panics or warnings from dflt_lock() being called, please
let me know right away.

Reviewed by:	tmm, gibbs
2003-07-01 15:52:06 +00:00
Alan Cox
dca96f1adc - Export pmap_enter_quick() to the MI VM. This will permit the
implementation of a largely MI pmap_object_init_pt() for vnode-backed
   objects.  pmap_enter_quick() is implemented via pmap_enter() on sparc64
   and powerpc.
 - Correct a mismatch between pmap_object_init_pt()'s prototype and its
   various implementations.  (I plan to keep pmap_object_init_pt() as
   the MD hook for device-backed objects on i386 and amd64.)
 - Correct an error in ia64's pmap_enter_quick() and adjust its interface
   to match the other versions.  Discussed with: marcel
2003-06-29 21:20:04 +00:00
Alan Cox
269acda954 - Remove the calls to pmap_install() from pmap_object_init_pt(); they are
redundant.  Discussed with: marcel
 - MFi386: Add vm object locking to pmap_object_init_pt().
2003-06-29 06:10:32 +00:00
Marcel Moolenaar
d9a4740f18 Implement cpu_set_upcall_kse(). Elementary testing shows that this
function behaves correctly in principle, but is not expected to be
100% complete. In any case, with this commit we have KSE ported
enough to start runtime testing with threaded applications and fix
whatever bugs or omissions we encounter. Yay!
2003-06-28 09:22:25 +00:00
David Xu
b8f480ab94 Add a machine depended function thread_siginfo, SA signal code
will use the function to construct a siginfo structure and use
the result to export to userland.

Reviewed by: julian
2003-06-28 06:34:08 +00:00
Scott Long
3eaffdf7e0 Do the first and mostly mechanical step of adding mutex support to the
bus_dma async callback scheme.  Note that sparc64 does not seem to do
async callbacks.  Note that ia64 callbacks might not be MPSAFE at the
moment.  Note that powerpc doesn't seem to do async callbacks due to
the implementation being incomplete.

Reviewed by:	mostly silence on arch@
2003-06-27 08:31:48 +00:00
Marcel Moolenaar
e2905ce3a0 Add TLS related relocation. 2003-06-19 06:51:43 +00:00
Alan Cox
40ebf3e43a Fix a performance bug in all of the various implementations of
uma_small_alloc(): They always zeroed the page regardless of what the
caller requested.
2003-06-18 02:57:38 +00:00
David Xu
0e2a4d3aeb Rename P_THREADED to P_SA. P_SA means a process is using scheduler
activations.
2003-06-15 00:31:24 +00:00
Alan Cox
49a2507bd1 Migrate the thread stack management functions from the machine-dependent
to the machine-independent parts of the VM.  At the same time, this
introduces vm object locking for the non-i386 platforms.

Two details:

1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES.  The
different machine-dependent implementations used various combinations
of KSTACK_GUARD and KSTACK_GUARD_PAGES.  To disable guard page, set
KSTACK_GUARD_PAGES to 0.

2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new.  In
5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed
to vm_page_alloc() or vm_page_grab().
2003-06-14 23:23:55 +00:00
Alan Cox
89f4fca265 Move the *_new_altkstack() and *_dispose_altkstack() functions out of the
various pmap implementations into the machine-independent vm.  They were
all identical.
2003-06-14 06:20:25 +00:00
Marcel Moolenaar
222a7e518c Remove kernel event tracing. The overhead is significant when running
under ski.
2003-06-14 00:01:24 +00:00