Commit Graph

918 Commits

Author SHA1 Message Date
marcel
2c3af6b0c7 Revamp of the syscall path, exception and context handling. The
prime objectives are:
o  Implement a syscall path based on the epc inststruction (see
   sys/ia64/ia64/syscall.s).
o  Revisit the places were we need to save and restore registers
   and define those contexts in terms of the register sets (see
   sys/ia64/include/_regset.h).

Secundairy objectives:
o  Remove the requirement to use contigmalloc for kernel stacks.
o  Better handling of the high FP registers for SMP systems.
o  Switch to the new cpu_switch() and cpu_throw() semantics.
o  Add a good unwinder to reconstruct contexts for the rare
   cases we need to (see sys/contrib/ia64/libuwx)

Many files are affected by this change. Functionally it boils
down to:
o  The EPC syscall doesn't preserve registers it does not need
   to preserve and places the arguments differently on the stack.
   This affects libc and truss.
o  The address of the kernel page directory (kptdir) had to
   be unstaticized for use by the nested TLB fault handler.
   The name has been changed to ia64_kptdir to avoid conflicts.
   The renaming affects libkvm.
o  The trapframe only contains the special registers and the
   scratch registers. For syscalls using the EPC syscall path
   no scratch registers are saved. This affects all places where
   the trapframe is accessed. Most notably the unaligned access
   handler, the signal delivery code and the debugger.
o  Context switching only partly saves the special registers
   and the preserved registers. This affects cpu_switch() and
   triggered the move to the new semantics, which additionally
   affects cpu_throw().
o  The high FP registers are either in the PCB or on some
   CPU. context switching for them is done lazily. This affects
   trap().
o  The mcontext has room for all registers, but not all of them
   have to be defined in all cases. This mostly affects signal
   delivery code now. The *context syscalls are as of yet still
   unimplemented.

Many details went into the removal of the requirement to use
contigmalloc for kernel stacks. The details are mostly CPU
specific and limited to exception_save() and exception_restore().
The few places where we create, destroy or switch stacks were
mostly simplified by not having to construct physical addresses
and additionally saving the virtual addresses for later use.

Besides more efficient context saving and restoring, which of
course yields a noticable speedup, this also fixes the dreaded
SMP bootup problem as a side-effect. The details of which are
still not fully understood.

This change includes all the necessary backward compatibility
code to have it handle older userland binaries that use the
break instruction for syscalls. Support for break-based syscalls
has been pessimized in favor of a clean implementation. Due to
the overall better performance of the kernel, this will still
be notived as an improvement if it's noticed at all.

Approved by: re@ (jhb)
2003-05-16 21:26:42 +00:00
marcel
4742b3abf1 o In pmap_install, don't prevent switching the pmap if we're
switching to kernel_pmap. The pmap is not special enough.
o  Clear the active bit on the pmap we're switching out.
o  Fix some nearby style(9) bugs.

Approved by: re@
2003-05-16 07:57:44 +00:00
marcel
e18b3f977b Indent a comment. This makes 1.100.
Still approved by: re@ (blanket)
2003-05-16 07:05:08 +00:00
marcel
f6ab86d828 Turn pmap_growkernel() into a critical section. While here, initialize
kernel_vm_end in pmap_bootstrap. Don't delay the initialization until
we need to grow the kernel VM space. This BTW happens twice before
we enter either single- or multi-user mode. Don't adjust kernel_vm_end
while growing based on whether the KPT contains a non-NULL entry. We
trust kernel_vm_end to be correct and we make sure it's still correct
after growing.
Define virtual_avail and virtual_end in terms of VM_MIN_KERNEL_ADDRESS
and VM_MAX_KERNEL_ADDRESS (resp). Don't hardcode region knowledge.
2003-05-16 07:03:15 +00:00
marcel
6a36805952 Revamp the RID allocation code:
o  Limit the size of the region ID map to 64KB. This gives a bitmap
   that is large enough to keep track of 2^19 numbers. The minimal map
   size is 32KB. The reason we limit the map size is that processor
   models may have implemented a 24-bit region ID, which would give
   a 2MB bitmap while the maximum number of allocations is always
   less than PID_MAX*5, which is less than 2^19.
o  Allocate all region IDs up-front. The slight downside of reserving
   more RIDs then a process needs (3 for ia64 native and 1 for ia32)
   is preferable over the call to pmap_ensure_rid() where RIDs are
   allocated on demand. On SMP systems this may lead to a race
   condition.
o  When allocating a region ID, don't use arc4random(). We're not
   interested in randomness or uniform distribution across the
   spectrum. We only need uniqueness. Random numbers may easily
   collide when the number of allocated RIDs is high, creating a
   possibly unbounded retry rate.
2003-05-16 06:40:40 +00:00
marcel
d3715e0039 Move the conditional definition of KSTACK_MAX_PAGES up ahead where
it's more visible.

Approved by: re@ (blanket)
2003-05-16 06:17:34 +00:00
marcel
7a98b54102 This file creates register sets based on the runtime specification.
The advantage of using register sets is that you don't focus on each
register seperately, but instead instroduce a level of abstraction.
This reduces the chance of errors, and also simplifies the code.
The register sers form the basis of everything register.
The sets in this file are:

struct _special
contains all of the control related registers, such as instruction
pointer and stack pointer. It also contains interrupt specific registers
like the faulting address. The set is roughly split in 3 groups. The
first contains the registers that define a context or thread. This is
the only group that the kernel needs to switch threads.  The second group
contains registers needed in addition to the first group needed to switch
userland threads. This group contains the thread pointer and the FP control
register. The third group contains those registers we need for execption
handling and are used on top of the first two groups.

struct _callee_saved, struct _callee_saved_fp
These sets contain the preserved registers, including the NaT after
spilling. The general registers (including branch registers) are
seperated from the FP registers for ptrace(2).

struct _caller_saved, struct _caller_saved_fp
These sets contain the scratch registers based on SDM 2.1, This means that
both ar.csd and ar.ccd are included here, even though they contain ia32
segment register descriptions. We keep seperate NaT bits for scratch and
preserved registers, because they are never saved/restored at the same
time.

struct _high_fp
The upper 96 FP registers that can be enabled/disabled seperately on
the CPU from the lower 32 FP registers. Due to the size of this set,
we treat them specially, even though they are defined as scratch
registers.

CVS ----------------------------------------------------------------------
2003-05-15 08:36:03 +00:00
marcel
26cfa9fe2e This file contains elementary context related functions used to
save and restore "sets" of registers in various places.
The restorectx and swapctx functions are used by cpu_switch()
and deal with the special registers, as well as the preserved
registers.
The *callee_saved* functions are used to save and restore the
preserved registers (integer and floating-point). They are
useful for signal delivery and ptrace support.
The save_high_fp and restore_high_fp functions are used to
"load" and "unload" to and from the CPU as part of lazy context
switching.
The ia32 specific context functions have been kept with the ia32
code.

Approved by: re@ (blanket)
2003-05-15 08:08:32 +00:00
marcel
b176577e02 This file contains the code that implements the syscall path based
on the epc instruction. The epc instruction, given the permissions
of the page in which the epc is located, allows the privilege level
to be increased with little or no overhead. The previous privilege
level is recorded in the current frame marker and is restored by
a regular (function) return.
Since the epc instruction has to live in a page with non-standard
properties, we hardwire a "gateway" page in the address space. The
address of the gateway page is exported to userland in ar.k7. This
allows us to rewire the page without breaking the ABI.
The syscall stubs in libc are regular function calls that slightly
differ from the normal runtime. The difference is mostly to simplify
the stubs themselves by by moving some of the logic to the kernel.
The libc stubs call into the gateway page (offset 0), from where the
kernel trampolines to the code that sets up a minimal trapframe and
arranges to execute from the kernel stack.
The way back is basicly the same. The kernel returns to the gateway
page, whereby privilege is dropped, and jumps back to the syscall
stub.
Only the special registers are saved in the trapframe. None of the
scratch registers are preserved and since the kernel follows the
same runtime model, none of the preserved registers are saved.
Future enhancements can include the implementation of lightweight
syscalls, where kernel functions are performed without setting up
a trapframe. Good candidates are the *context syscalls for example.

Now that there's a gateway page from which code can be executed in
a non-privileged context, we also have the ideal place to put the
signal trampolines. By moving the signal trampolines from the user
stack to the gateway page, we open up the doors to unexecutable
stacks. The gateway page contains signal trampolines for both the
"legacy" break-based syscall code and the new and improved epc-
based syscall code.

Approved: re@ (blanket)
2003-05-15 07:51:22 +00:00
jhb
f0272107fb - Merge struct procsig with struct sigacts.
- Move struct sigacts out of the u-area and malloc() it using the
  M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
  sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
  that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
  and thread_stopped() are now MP safe.

Reviewed by:	arch@
Approved by:	re (rwatson)
2003-05-13 20:36:02 +00:00
kan
7ff03aee33 Style fixes.
Remove DBL_DIG, DBL_MIN, DBL_MAX and their FLT_ counterparts, they
were marked for deprecation ever since SUSv1 at least.
Only define ULLONG_MIN/MAX and LLONG_MAX if long long type is
supported.
Restore a lost comment in MI _limits.h file and remove it from
sys/limits.h where it does not belong.
2003-05-04 22:13:04 +00:00
marcel
9bab15512c Fix c99 victim: the accepted character '0 most now be types as '0'. 2003-05-03 23:05:16 +00:00
marcel
4359b7238f Option KADB does not exist. It came from alpha, where it still exists. 2003-05-02 20:34:15 +00:00
marcel
80cd0e6d1f Kill MID_MACHINE, its a.out specific, the only platform that supports
it is i386. All of the other platforms should remove it too.
	-- peter@
2003-04-30 23:16:33 +00:00
jhb
48eb8eab8c Range check the syscall number before looking it up in the syscallnames[]
array.

Submitted by:	pho
2003-04-30 17:59:27 +00:00
kan
d7b605c280 Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on:	standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
2003-04-29 13:36:06 +00:00
marcel
fc30bfe794 Revamp the newbus functions:
o  do not use the in* and out* functions. These functions are used by
   legacy drivers and thus must have ia32 compatible behaviour. Hence,
   they  need to have fences. Using these functions for newbus would
   then pessimize performance.
o  remove the conditional compilation of PIO and/or MEMIO support. It's
   a PITA without having any significant benefit. We always support them
   both. Since there are no I/O ports on ia64 (they are simulated by the
   chipset by translating memory mapped I/O to predefined uncacheable
   memory regions) the only difference between PIO and MEMIO is in the
   address calculation. There should be enough ILP that can be exploited
   here that making these computations compile-time conditional is not
   worth it. We now also don't use the read* and write* functions.
o  Add the missing *_8 variants. They were missing, although not missed.
   It's for completeness.
o  Do not add the fences that were present in the low-level support
   functions here. We're using uncacheable memory, which means that
   accesses are in program order. Change the barrier implementation
   to not only do a memory fence, but also an acceptance fence. This
   should more reliably synchronize drivers with the hardware. The
   memory fence enforces ordering, but does not imply visibility (ie
   the access does not necessarily have happened). This is what the
   acceptance deals with.

cpufunc.h cleanup:
o  Remove the low-level memory mapped I/O support functions. They are
   not used. Keep the low-level I/O port access functions for legacy
   drivers and add fences to ensure ia32 compatibility.
o  Remove the syscons specific functions now that we have moved the
   proper definitions where they belong.
o  Replace the ia64_port_address() and ia64_memory_address() functions
   with macros. There's a bigger change inline functions get inlined
   when there aren't function callsi and the calculations are simply
   enough to do it with macros.

Replace the one reference to ia64_memory address in mp_machdep.c to
use the macro.
2003-04-29 09:50:03 +00:00
jhb
7d7a41d0f4 - Push down Giant into the sysarch() calls that still need Giant.
- Standardize on EINVAL rather than EOPNOTSUPP if the sysarch op value is
  invalid.
2003-04-25 20:04:02 +00:00
jhb
21de071f52 Regen. 2003-04-25 15:59:44 +00:00
jhb
e493fd3faa Oops, the thr_* and jail_attach() syscall entries should be NOPROTO rather
than STD.
2003-04-25 15:59:18 +00:00
deischen
4b37b6e450 Add an argument to get_mcontext() which specified whether the
syscall return values should be cleared.  The system calls
getcontext() and swapcontext() want to return 0 on success
but these contexts can be switched to at a later time so
the return values need to be cleared in the saved register
sets.  Other callers of get_mcontext() would normally want
the context without clearing the return values.

Remove the i386-specific context saving from the KSE code.
get_mcontext() is not i386-specific any more.

Fix a bad pointer in the alpha get_mcontext() code.  The
context was being bcopy()'d from &td->tf_frame, but tf_frame
is itself a pointer, so the thread was being copied instead.
Spotted by jake.

Glanced at by:  jake
Reviewed by:    bde (months ago)
2003-04-25 01:50:30 +00:00
jhb
c30db0c24f Regen. 2003-04-24 20:50:57 +00:00
jhb
91308d46a4 Fix the thr_create() entry by adding a trailing \. Also, sync up the
MP safe flag for thr_* with the main table.
2003-04-24 20:49:46 +00:00
kan
b073b4daca Add a new sys/limits.h file which in turn depends on machine/_limits.h
to get actual constant values. This is in preparation for machine/limits.h
retirement.

Discussed on:	standards@
Submitted by:	Craig Rodrigues <rodrigc@attbi.com>  (*)
Modified by:	kan
2003-04-23 21:41:59 +00:00
jhb
f3d7052aea - Replace inline implementations of sigprocmask() with calls to
kern_sigprocmask() in the various binary compatibility emulators.
- Replace calls to sigsuspend(), sigaltstack(), sigaction(), and
  sigprocmask() that used the stackgap with calls to the corresponding
  kern_sig*() functions instead without using the stackgap.
2003-04-22 18:23:49 +00:00
davidxu
c3b8b61056 Remove single threading detecting code, these code really should be
replaced by thread_user_enter(), but current we don't want to enable
this in trap.
2003-04-22 03:17:41 +00:00
marcel
efc0c38e40 Don't use the tpa instruction to implement pmap_kextract. The tpa
instruction requires that a translation is present in the TC. This
may trigger a TLB miss and a subsequent call to vm_fault().
This implementation is deliberately non-inline for debugging and
profiling purposes. Partial or full inlining should eventually be
done.

Valuable insights by: jake
2003-04-22 01:48:43 +00:00
simokawa
8674e42fa6 Add FireWire drivers to GENERIC. 2003-04-21 16:44:05 +00:00
jhb
27fa8b59bd Use the proc lock to protect p_singlethread and a P_WEXIT test. This
fixes a couple of potential KSE panics on non-i386 arch's that weren't
holding the proc lock when calling thread_exit().
2003-04-18 20:20:00 +00:00
marcel
3df9a5196e Add the EHCI host controller. 2003-04-16 01:29:08 +00:00
mux
f81b8b1670 I deserve a big pointy hat for having missed all those references
to bus_dmasync_op_t in my last commit.
2003-04-10 23:50:06 +00:00
mux
41c3ac60b2 Change the operation parameter of bus_dmamap_sync() from an
enum to an int and redefine the BUS_DMASYNC_* constants as
flags.  This allows us to specify several operations in one
call to bus_dmamap_sync() as in NetBSD.
2003-04-10 23:03:33 +00:00
mike
ee5efe23ec o In struct prison, add an allprison linked list of prisons (protected
by allprison_mtx), a unique prison/jail identifier field, two path
  fields (pr_path for reporting and pr_root vnode instance) to store
  the chroot() point of each jail.
o Add jail_attach(2) to allow a process to bind to an existing jail.
o Add change_root() to perform the chroot operation on a specified
  vnode.
o Generalize change_dir() to accept a vnode, and move namei() calls
  to callers of change_dir().
o Add a new sysctl (security.jail.list) which is a group of
  struct xprison instances that represent a snapshot of active jails.

Reviewed by:	rwatson, tjr
2003-04-09 02:55:18 +00:00
des
93c2d21808 Introduce an M_ASSERTPKTHDR() macro which performs the very common task
of asserting that an mbuf has a packet header.  Use it instead of hand-
rolled versions wherever applicable.

Submitted by:	Hiten Pandya <hiten@unixdaemons.com>
2003-04-08 14:25:47 +00:00
marcel
af39be00aa Remove COMPAT_FREEBSD4. It's impossible because FreeBSD 4 does not
run on ia64 at all.
2003-04-08 08:32:00 +00:00
marcel
6baa93cddc Remove the 32KB VHPT section from the kernel image. We don't really
use it because we allocate a VHPT based on the size of the physical
memory and even if the allocated VHPT is 32KB, we don't use the in-
image section for it. Since the VHPT must be naturally aligned, we
save 48K on average (due to alignment).
Consequently, we start off with the VHPT disabled (it is assumed
the VHPT is disabled because the EFI loader runs without memory
address translation and thus has no need to setup the VHPT). It's
probably a good idea to explicitly disable the VHPT if we make the
use of the VHPT optional.
2003-04-06 21:31:26 +00:00
marcel
2b120d8952 Also set the access bit in the PTE when we get a data dirty bit fault.
This avoids an immediate access bit fault when we serviced the dirty
bit fault in case the access bit is unset. This typically happens for
newly allocated memory that's being zeroed and thus very common.
2003-04-06 05:55:36 +00:00
marcel
35b1bbbb4c Include <geom/geom_disk.h> and stop including <sys/disk.h>. The
former gives us 'struct disk'.
2003-04-05 21:14:05 +00:00
des
bf10676408 Define ovbcopy() as a macro which expands to the equivalent bcopy() call,
to take care of the KAME IPv6 code which needs ovbcopy() because NetBSD's
bcopy() doesn't handle overlap like ours.

Remove all implementations of ovbcopy().

Previously, bzero was a function pointer on i386, to save a jmp to
bzero_vector.  Get rid of this microoptimization as it only confuses
things, adds machine-dependent code to an MD header, and doesn't really
save all that much.

This commit does not add my pagezero() / pagecopy() code.
2003-04-04 17:29:55 +00:00
phk
ee395e078c Use bioq_flush() to drain a bio queue with a specific error code.
Retain the mistake of not updating the devstat API for now.

Spell bioq_disksort() consistently with the remaining bioq_*().

#include <geom/geom_disk.h> where this is more appropriate.
2003-04-01 15:06:26 +00:00
jeff
c44b6b488c - Add thr and umtx system calls. 2003-04-01 01:15:56 +00:00
jeff
fde71359bc - Define a new md function 'casuptr'. This atomically compares and sets
a pointer that is in user space.  It will be used as the basic primitive
   for a kernel supported user space lock implementation.
 - Implement this function in x86's support.s
 - Provide stubs that return -1 in all other architectures.  Implementations
   will follow along shortly.

Reviewed by:	jake
2003-04-01 00:18:55 +00:00
jeff
3e36051ca6 - Add a placeholder for sigwait 2003-03-31 23:36:40 +00:00
jeff
3946316f71 - Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with
a follow on commit to kern_sig.c
 - signotify() now operates on a thread since unmasked pending signals are
   stored in the thread.
 - PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.
2003-03-31 22:49:17 +00:00
jeff
e81bb84595 - Change trapsignal() to accept a thread and not a proc.
- Change all consumers to pass in a thread.

Right now this does not cause any functional changes but it will be important
later when signals can be delivered to specific threads.
2003-03-31 22:02:38 +00:00
jeff
ff354db2d8 - Use sigexit() instead of twiddling the signal mask, catch, ignore, and
action bits to allow SIGILL to work as expected.  This brings this file in
   line with other architectures.
2003-03-31 21:40:47 +00:00
das
3fda8d8ac7 Correct LDBL_* constants based on values from i386. 2003-03-27 20:38:22 +00:00
jake
a780914035 - Add vm_paddr_t, a physical address type. This is required for systems
where physical addresses larger than virtual addresses, such as i386s
  with PAE.
- Use this to represent physical addresses in the MI vm system and in the
  i386 pmap code.  This also changes the paddr parameter to d_mmap_t.
- Fix printf formats to handle physical addresses >4G in the i386 memory
  detection code, and due to kvtop returning vm_paddr_t instead of u_long.

Note that this is a name change only; vm_paddr_t is still the same as
vm_offset_t on all currently supported platforms.

Sponsored by:	DARPA, Network Associates Laboratories
Discussed with:	re, phk (cdevsw change)
2003-03-25 00:07:06 +00:00
ru
5f25fa151f Remove bitrot associated with `maxusers'.
Submitted by:	bde
2003-03-22 14:18:23 +00:00
mux
09448722ff Use atomic operations to increment and decrement the refcount
in busdma tags.  There are currently no tags shared accross
different drivers so this isn't needed at the moment, but it
will be required when we'll have a proper newbus method to get
the parent busdma tag.
2003-03-20 19:45:26 +00:00