this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.
Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
some arches and the syscall table is machine-independent. It was
(bogusly) conditional on COMPAT_43, so this usually makes no difference.
ia64: in addition:
- replace the bogus cloned comment before osigreturn() by a correct one.
osigreturn() is just a stub fo ia64's.
- fix the formatting of cloned comment before sigreturn().
- fix the return code. use nosys() instead of returning ENOSYS to get
the same semantics as if the syscall is not in the syscall table.
Generating SIGSYS is actually correct here.
- fix style bugs.
powerpc: copy the cleaned up ia64 stub. This mainly fixes a bogus comment.
sparc64: copy the cleaned up the ia64 stub, since there was no stub before.
cpu(s) into the kernel, and sync-ing them up to "kernel" mode so we can
send them ipis, which also work.
Thanks to John Baldwin for providing me with access to the hardware
that made this possible.
Parts obtained from: bsd/os
Call critical_enter/critical_exit around (fast) interrupt handlers. All
non-threaded interrupts are fast, and the threaded interrupt scheduler is
itself a fast interrupt.
Assert that an interrupt handler we are about to call is non-zero.
Be paranoid about restoring the users global registers. Do it as the
last thing before switching to alternate globals (when we magically get
our preloaded registers back), and do it with interrupts disabled. Any
kind of kernel trap when the globals are not setup properly is bad news.
Don't save and restore the kernel g6, it invariably points to the current
pcb now.
data word in an interrupt packet is non-zero, it points to code to execute
to handle the ipi, so jump to it instead of enqueueing the packet. It
is unclear if we will need queued ipis.
Interrupt g7 now points to pcpu, instead of to the per-cpu interrupt queue
itself, so use that instead. Interrupt g6 is no longer reserved.
parameters needed for smp support.
If we are not the boot processor, jump to the smp startup code instead.
Implement a per-cpu panic stack, which is used for bootstrapping both
primary and secondary processors and during faults on the kernel stack.
Arrange the per-cpu page like the pcb, with the struct pcpu at the end
of the page and the panic stack before it.
Use the boot processor's panic stack for calling sparc64_init.
Split the code to set preloaded global registers and to map the kernel
tsb out into functions, which non-boot processors can call.
Allocate the kstack for thread0 dynamically in pmap_bootstrap, and give
it a guard page too.
to the current pcb.
Remove interrupt global defines; they use PCPU_REG now.
Move ATOMIC_INC_INT here from exception.s, add ATOMIC_DEC_INT.
Add a KASSERT macro for use in assembler.
substantial fraction of the number of entries of tte's in the tsb
would need to be looked up, traverse the tsb instead. This is crucial
in some places, e.g. when swapping out a process, where a certain
pmap_remove() call would take very long time to complete without this.
2. Implement pmap_qenter_flags(), which will become used later
3. Reactivate the instruction cache flush done when mapping as executable.
This is required e.g. when executing files via NFS, but is known to
cause problems on UltraSPARC-IIe CPU's. If you have such a CPU, you
will need to comment this call out for now.
Submitted by: jake (3)
struct ofw_nexus_reg. Implement UPA device memory management in the
nexus driver.
Adapt the psycho driver to these changes, and do some minor cleanup work
while being there.
for certain user pages, stores to kernel pages would not update the
affected cache lines, which would sometimes cause the wrong data to be
returned for loads from kernel pages. This was especially fatal when
the addresses affected held the kernel stack pointer, and a random
value was loaded into it.
Fix a harmless off by one error in a dcache_inval_phys call.
Fix a potential race in setting up the per-cpu pointer if the special
restore fails on return to user mode fails and we need to trap back
into the kernel to fault in more stack.
Remove debug code.
an efficient way for the kernel to bounce certain mundane traps back to
userland for handling there. A user trap handler returns directly to the
trapping user code, rather than going through the kernel again. Only a
handful of instructions are actually executed in kernel mode.
Implement sysarch(SPARC_UTRAP_INSTALL).
Add code to handle sharing of the user trap table across forks and unsharing
at exec.
This can be used to implement efficient tracking of floating point register
usage in userland, fe by a thread library, and to handle alignment fault
fixups and instruction emulation in userland, for which the code may need
to be different for 32bit and 64bit binaries.
something wrong with the kernel stack.
Add code to check the kernel stack pointer in various important places
and try hard not to go down in flames if its wrong.
Add fields to md_page for tracking virtual page color, and pv entry
lists.
Fix pmap_track_modified to work for non-kernel pmaps. This is due to
kernel virtual addresses potentially overlapping with userland addresses.
much less magic, fragile, broken. Use ttes rather than sttes.
We still use the replacement scheme used by the original code, which
is pretty cool.
Many crucial bug fixes from: tmm
of sttes. Also removes many differences between this and the other pmaps.
Reserve the kva space used by the openfirmware translations.
Use physical addresses directly in pmap_zero_page and pmap_copy_page, now
that we have the cache line shooting support.
Add code to track the virtual cachability of mapped pages. The dmmu
requires that multiple mappings of the same phsyical address have the
save virtual address bits up to a colour boundary. Violating this
requires all mappings to be mapped uncacheable. We do not yet handle
the case of a badly aliased mapping becoming cachable again.
Many crucial bug fixes from: tmm
the registers so we don't uselessly save them over and over again for
each context switch until another floating point instruction is executed.
Use a non-specific tlb slot for the tsb, which needs to have a locked
entry.
Remove overly verbose traces.
Add macros to atomically increment an integer variable in the data
section and to atomically set a bit in a tte. Note that the latter
does not return the new value.
Rewrite RESUME_SPILLFILL_MAGIC to use more sensical calculations, and
to preserve all alternate globals religiously. Must now be called on
alternate globals.
Defer switching to the kernel stack until inside the syscall, trap,
interrupt wrappers. Splitting the windows is all that's really urgent.
Adapt to new trap types.
Add %xcc where appropriate in order to not use v8 opcodes inadvertantly
(which work fine).
Modify the low level tlb fault handlers to operate on a tsb made up of
ttes, not sttes. This effectively makes the tsb twice as large.
After atomically updating tte bits in memory, also set the bit in the
register that holds the data which will be loaded into the tlb. The
macro returns the old value.
Use the preloaded mmu global which holds the address of the current
user tsb.
Add back a low level protection fault handler instead of just punting
into the vm system. This effectively saves a soft fault per COW fault.
Add a trace to intr_enqueue.
Pass arguments to the trap, interrupt, syscall wrappers in the out
registers instead of some on the stack, some in registers.
Use the preloaded alternate global pcb register.
2. Make trap_pfault more like it is on other architectures.
3. Fix a bug in syscall() which caused system calls with more than
six arguments that are called through the wild card syscall to
have their arguments scrambled. This affected mmap due to the
(bogus) wrapper in libc.
Submitted by: tmm (3)
Add some traces that can be useful but are also very loud.
Use defines for offsets into jmpbuf instead of magic numbers.
Fix a style bug.
Fixup comments.
specified by the sparc abi. We use numerically higher values for all
internal kernel types.
Remove soft trap types which need to be exposed to userland. They will
move to utrap.h.
their duration. This is still only effective as long as they are
only used in the static kernel. Code in modules may cause instruction
faults which makes these break in different ways anyway.
2. Add a load bearing membar #Sync.
3. Add an inline for demapping an entire context.
Submitted by: tmm (1, 2)
Bloat trapframe with many extra fields so we don't need extra structures.
Use small data types where possible.
Remove second copy of TF_DONE.
Remove mmuframe.
- The MD functions critical_enter/exit are renamed to start with a cpu_
prefix.
- MI wrapper functions critical_enter/exit maintain a per-thread nesting
count and a per-thread critical section saved state set when entering
a critical section while at nesting level 0 and restored when exiting
to nesting level 0. This moves the saved state out of spin mutexes so
that interlocking spin mutexes works properly.
- Most low-level MD code that used critical_enter/exit now use
cpu_critical_enter/exit. MI code such as device drivers and spin
mutexes use the MI wrappers. Note that since the MI wrappers store
the state in the current thread, they do not have any return values or
arguments.
- mtx_intr_enable() is replaced with a constant CRITICAL_FORK which is
assigned to curthread->td_savecrit during fork_exit().
Tested on: i386, alpha
- The MI portions of struct globaldata have been consolidated into a MI
struct pcpu. The MD per-CPU data are specified via a macro defined in
machine/pcpu.h. A macro was chosen over a struct mdpcpu so that the
interface would be cleaner (PCPU_GET(my_md_field) vs.
PCPU_GET(md.md_my_md_field)).
- All references to globaldata are changed to pcpu instead. In a UP kernel,
this data was stored as global variables which is where the original name
came from. In an SMP world this data is per-CPU and ideally private to each
CPU outside of the context of debuggers. This also included combining
machine/globaldata.h and machine/globals.h into machine/pcpu.h.
- The pointer to the thread using the FPU on i386 was renamed from
npxthread to fpcurthread to be identical with other architectures.
- Make the show pcpu ddb command MI with a MD callout to display MD
fields.
- The globaldata_register() function was renamed to pcpu_init() and now
init's MI fields of a struct pcpu in addition to registering it with
the internal array and list.
- A pcpu_destroy() function was added to remove a struct pcpu from the
internal array and list.
Tested on: alpha, i386
Reviewed by: peter, jake
o Hide nonstandard functions and types in <netinet/in.h> when
_POSIX_SOURCE is defined.
o Add some missing types (required by POSIX.1-200x) to <netinet/in.h>.
o Restore vendor ID from Rev 1.1 in <netinet/in.h> and make use of new
__FBSDID() macro.
o Fix some miscellaneous issues in <arpa/inet.h>.
o Correct final argument for the inet_ntop() function (POSIX.1-200x).
o Get rid of the namespace pollution from <sys/types.h> in
<arpa/inet.h>.
Reviewed by: fenner
Partially submitted by: bde
in asm files.
2. Temporarily cause subnormal operands in floating point operations
to be treated as zeros so that comlpetion of the operation does not
need to be emulated.
3. Catch fp_exception_other and correctly skip over the unfinished
instruction, but basically ignore them. Emulating the instruction
is not yet supported.
4. Zero td_retval[1] as well in syscall().
Submitted by: tmm (2, 3)
2. Add a TF_DONE macro, which fiddles a trapframe to make the retry on
return from traps act like a done (advance past the trapping
instruction instead of re-executing).
3. Flush the windows before entering the debugger, since it is no
longer done in the breakpoint trap vector.
4. Print a warning if trace <pid> is attempted, it is not yet implemented.
5. Print traps better and decode system calls in traces.
Submitted by: rwatson (4)
to determine if a process is using floating point. in order to avoid
sign extending a 13 bit immediate.
2. We don't need to context switch cwp anymore, it is better to just
fiddle the save tstate on return from traps. See exception.s 1.10
and 1.12.
3. Completely remove pcb_cwp.
4. Implement vmapbuf, vunmapbuf and vm_fault_quick. Completely remove
TODOs from vm_machdep.c (yay!).
Submitted by: tmm (1, 3, 4)
Obtained from: existing archs (4)
space from kernel space and from an alternate address space to kernel
space.
2. Remove the unused and unprototyped physcopy() and physzero() and replace
with the more versatile ascopy() and aszero(), inspired by the above.
These can be used to copy and zero physical pages of memory without mapping
them into kernel space first.
3. Use magic numbers for the offsets in the jmpbuf structure like other
platforms.
4. Use SET.
Submitted by: tmm (1, 4)
in the window trap vectors were mixed up. All this did is cause unnecesary
traps and look wierd in traces. Superfluous traps happen a lot in normal
operation, so we are rather good at recovering from them.
2. Store the arguments for a ktr trace in the right place.
3. Use a generic trap vector for breakpoints. It should not be special.
4. Save the frame pointer in the trap frame for kernel traps if DDB is compiled
in, otherwsie we don't save the out registers for kernel traps and stack
traces can't go through nested traps.
5. Apply the same fix to the return from kernel mode trap code as for user
mode traps. Ensure that the window we're returning to is the same one
that we restore to by fiddling the cwp in the saved tstate. This requires
that we transfer the values loaded from the trap frame into alternate
globals before restore-ing, but doing so is not very expensive and not
worth worrying about. Not changing the saved cwp can result in the register
values magically changing on return from traps if we happen to have slept
and the windows don't work out exactly the same. Fix the trace just before
the retry to account for different register usage.
6. Use a SET macro for loading address constants rather than a variation of
set and setx. set only works for 32 bit constants, while setx works for
64 bit constants as well, but produces bloated code when unnecessary.
Gas always generates the canonical 2 register, 6 instruction form, even
when it could be optimized; set uses 1 register and 2 instructions. At
the moment we assume that the kernel binary is below 4GB so set is
always sufficient, but the macro allows it to be configured. Note that
this has nothing to do with 32 vs. 64 bit address space, it only applies
to addresses of symbols which are known at compile/link time.
Submitted by: tmm (6)
this case, the firmware trap table needs to be restored). Make use of
it in cpu_halt() and cpu_reset(), and make cpu_reset() reboot the kernel
that was used previously insead of behaving like cpu_halt().
Add a shutdown_final event handler that turns the power off if requested.
unaligned accesses, and instr.h, which contrains definitions for the
sparc64 instruction set (partly from NetBSD).
Make use of some definitions from instr.h in db_disasm.c.
o Make <stdint.h> a symbolic link to <sys/stdint.h>.
o Move most of <sys/inttypes.h> into <sys/stdint.h>, as per C99.
o Remove <sys/inttypes.h>.
o Adjust includes in sys/types.h and boot/efi/include/ia64/efibind.h
to reflect new location of integer types in <sys/stdint.h>.
o Remove previously symbolicly linked <inttypes.h>, instead create a
new file.
o Add MD headers <machine/_inttypes.h> from NetBSD.
o Include <sys/stdint.h> in <inttypes.h>, as required by C99; and
include <machine/_inttypes.h> in <inttypes.h>, to fill in the
remaining requirements for <inttypes.h>.
o Add additional integer types in <machine/ansi.h> and
<machine/limits.h> which are included via <sys/stdint.h>.
Partially obtain from: NetBSD
Tested on: alpha, i386
Discussed on: freebsd-standards@bostonradio.org
Reviewed by: bde, fenner, obrien, wollman
userland. The per thread ucred reference is immutable and thus needs no
locks to be read. However, until all the proc locking associated with
writes to p_ucred are completed, it is still not safe to use the per-thread
reference.
Tested on: x86 (SMP), alpha, sparc64
{set,fill}_{,fp,db}regs() fixup:
- Add dummy {set,fill}_dbregs() on architectures that don't have them.
- KSEfy the powerpc versions (struct proc -> struct thread).
- Some architectures had the prototypes in md_var.h, some in reg.h, and
some in both; for consistency, move them to reg.h on all platforms.
These functions aren't really MD (the implementation is MD, but the interface
is MI), so they should move to an MI header, but I haven't figured out which
one yet.
Run-tested on i386, build-tested on Alpha, untested on other platforms.
was used. This resulted in bogus bad window traps (invalid wstate).
Add a trace to sfsr traps (alignment among other things).
Use KTR_TRAP instead of KTR_CT1.
Use the right registers when storing the values of various
mmu registers into the trap frame. This fixes a bug where sometimes
the context number reported by a fault would be garbage. Sometimes
it would be zero for faults on user address space so the kernel would
wrongly think that it was a fault on kernel address space and fail.
Use the preloaded registers in the vectored interrupt trap instead
of reading pointers from memory. Remove traces due to register
pressure and excess verbosity. We can probably still sneak in one
trace. Remove some debug code.
Go back to using the tsb register during kernel page table lookups.
This is the best way to not have to have the address of the kernel tsb be
a compile time constant. We lie and say we have 1 page tsb when really
its much larger. This way the hardware provides bits 13-22 of the
virtual address (the lower 9 bits of the virtual page number) in the
form of the address of the tte corresponding to the fault address in
the (1 page) kernel tsb. With some clever arithmetic we can then get
bits 22 and up from the tte tag and add them to the tte address in
order to index massive tsbs (basically unlimited).
Add traps for physical address hardware watchpoints.
Don't try to pass the window state from the trap table entry point
all the way down to the common trap code. Its too easy to clobber
and reading it again doesn't cost much.
Fixup some traces.
Fiddle the cwp bits on return from the kernel to user mode so that
the window we are returning to is always the same as the one we
restore to in the trap code. Strictly speaking this is not necessary,
it only affects return from fork and exec, but setting up the windows
right would require hard coding the right cwp values in cpu_fork and
setregs, basically hard coding the number of frames between syscall and
tl0_ret. The result of getting it wrong is usually a spill to an invalid
stack pointer; either 0 or pointing into kernel space. This should also
alleviate the need to context switch the cwp.
Transfer the trap state from locals to alternate globals in the trap
return code so that we can do a restore and rotate the windows before
reloading the trap registers. If the restore fails we'll trap back
into the kernel, so there's no point in loading the trap registers
before hand. Its is crucial that the window trap recovery code not
clobber the alternate globals.
boundary. It must be on at least an 8 byte boundary so that the length
of the signal code is a multiple of 8 (well aligned). The size is used
in the calculation of the address of the argument and environment vectors
on the user stack; getting it wrong results in the string pointers being
misaligned and causes alignment faults in getenv() among other things.
Allocate a regular stack frame below the signal frame on the user stack
and join up the frame pointer to the previous frame. This fixes longjmp-ing
out of signal handlers. Longjmp traverses the stack upwards in order to
find the right frame to return to, so the frame pointers must join up
seamlessly. I thought this would just work, but obviously the frame
needs to be below the signal frame, not above it like before. Account
for the extra space in the signal code.
Preload pointers to interrupt data structures in interrupt globals.
This avoids the need to load the pointers from memory in the vectored
interrupt trap handler.
Transfer the first 2 out registers into td_retval in setregs. We use
the same registers for system call arguments as return values, so these
registers got clobbered by the system call return values on return from
execve. They now get clobbered by the right values. We must put the values
in both the out registers in the trapframe and in td_retval because init
calls exec but fails to transfer the return value into the out registers.
This fixes a bug where the first exec after init would pass junk to the
c runtime, instead of a pointer to the argument strings. A better solution
would be to return EJUSTRETURN on success from execve.
Adjust for change in pmap_bootstraps prototype.
Map the message buffer after the trap table is setup. We will fault
on it immediately.
Don't use a hard coded address constant for the virtual address of the
kernel tsb. Allocate kernel virtual address space for the kernel tsb
at runtime.
Remove unused parameter to pmap_bootstrap.
Adapt pmap.c to use KVA_PAGES.
Map the message buffer too.
Add some traces.
Implement pmap_protect.
be used to index tables of counters.
Remove intr_dispatch() inline, it is implemented directly in tl*_intr now.
Count stray interrupts in a table of counters like intrcnt.
Disable interrupts briefly when setting up the interrupt vector table.
We must disable interrupts completely, not just raise the pil.
Pass pointers to the intr_vector structures rather than a vector number
to sched_ithd and intr_stray.
the existence of the __gnuc_va_list type[*] because our compiler is GCC.
[*] __gnuc_va_list is defined in the GCC ginclude/stdarg.h replacement
headerwhich we don't use.
Until now, the ptrace syscall was implemented as a wrapper that called
various functions in procfs depending on which ptrace operation was
requested. Most of these functions were themselves wrappers around
procfs_{read,write}_{,db,fp}regs(), with only some extra error checks,
which weren't necessary in the ptrace case anyway.
This commit moves procfs_rwmem() from procfs_mem.c into sys_process.c
(renaming it to proc_rwmem() in the process), and implements ptrace()
directly in terms of procfs_{read,write}_{,db,fp}regs() instead of
having it fake up a struct uio and then call procfs_do{,db,fp}regs().
It also moves the prototypes for procfs_{read,write}_{,db,fp}regs()
and proc_rwmem() from proc.h to ptrace.h, and marks all procfs files
except procfs_machdep.c as "optional procfs" instead of "standard".
Handle overlap in bcopy.
Add routines for copying and zeroing pages using physical addresses
directly.
Remove all the hacks to account for calling the firmware on its own
trap table, we use the kernel trap table. There is still a problem
with OF_exit().
easier and hopefully this code is done changing radically.
Don't use the mmu tlb register to address the kernel page table, nor
the 8k pointer register. The hardware will do some of the page table
lookup by storing the the base address in an internal register and
calculating the address of the tte in the table. However it is limited
to a 1 meg tsb, which only maps 512 megs. The kernel page table only
has one level, so its easy to just do it by hand, which has the advantage
of supporting abitrary amounts of kvm and only costs a few more instructions.
Increase kvm to 1 gig now that its easy to do so and so we don't waste
most of a 4 meg page.
Fix some traces. Fix more proc locking.
Call tsb_stte_promote if we get a soft fault on a mapping in the upper
levels of the tsb. If there is an invalid or unreferenced mapping
in the primary tsb, it will be replaced.
Immediately fail for faults occuring in {f,s}uswintr.
one 4 meg page can map both the kernel and the openfirmware mappings.
Add the openfirmware mappings to the kernel tsb so we can call the firmware
on the kernel trap table and access kernel memory normally.
Implement pmap_swapout_proc, pmap_swapin_proc, pmap_swapout_thread,
pmap_swapin_thread, pmap_activate, pmap_page_exists, and pmap_phys_address.
Add a guard page at the bottom of the kernel stack. Its unclear how easy
it will be to detect these faults and do something useful.
Setup the registers on exec how the c runtime expects.
Implement various {fill,set}_*regs.
Fix proc locking.
will be private to each CPU.
- Re-style(9) the globaldata structures. There really needs to be a MI
struct pcpu that has a MD struct mdpcpu member at some point.
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
it to the MI area. KSE touched cpu_wait() which had the same change
replicated five ways for each platform. Now it can just do it once.
The only MD parts seemed to be dealing with fpu state cleanup and things
like vm86 cleanup on x86. The rest was identical.
XXX: ia64 and powerpc did not have cpu_throw(), so I've put a functional
stub in place.
Reviewed by: jake, tmm, dillon