- Implement sampling modes and logging support in hwpmc(4).
- Separate MI and MD parts of hwpmc(4) and allow sharing of
PMC implementations across different architectures.
Add support for P4 (EMT64) style PMCs to the amd64 code.
- New pmcstat(8) options: -E (exit time counts) -W (counts
every context switch), -R (print log file).
- pmc(3) API changes, improve our ability to keep ABI compatibility
in the future. Add more 'alias' names for commonly used events.
- bug fixes & documentation.
o Remove the clock interface. Not only does it conflict with the MI
version when device genclock is added to the kernel, it was also
not possible to have more than 1 clock device. This of course would
have been a problem if we actually had more than 1 clock device.
In short: we don't need a clock interface and if we do eventually,
we should be using the MI one.
o Rewrite inittodr() and resettodr() to take into account that:
1) We use the EFI interface directly.
2) time_t is 64-bit and we do need to make sure we can determine
leap years from year 2100 and on. Add a nice explanation of
where leap years come from and why.
3) This rewrite happened in 2005 so any date prior to 1/1/2005
(either M/D/Y or D/M/Y) is bogus. Reprogram the EFI clock with
1/1/2005 in that case.
4) The EFI clock has a high probability of being correct, so
only (further) correct the EFI clock when the file system time
is larger. That should never happen in a time-synchronised world.
Complain when EFI lost 2 days or more.
Replace the copyright notice now that I (pretty much) rewrote all of
this file.
into _bus.h to help with name space polution from including all of bus.h.
In a few days, I'll commit changes to the MI code to take advantage of thse
sepration (after I've made sure that these changes don't break anything in
the main tree, I've tested in my trees, but you never know...).
Suggested by: bde (in 2002 or 2003 I think)
Reviewed in principle by: jhb
critical_enter() and critical_exit() are now solely a mechanism for
deferring kernel preemptions. They no longer have any affect on
interrupts. This means that standalone critical sections are now very
cheap as they are simply unlocked integer increments and decrements for the
common case.
Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter()
and spinlock_exit(). This KPI is responsible for providing whatever MD
guarantees are needed to ensure that a thread holding a spin lock won't
be preempted by any other code that will try to lock the same lock. For
now all archs continue to block interrupts in a "spinlock section" as they
did formerly in all critical sections. Note that I've also taken this
opportunity to push a few things into MD code rather than MI. For example,
critical_fork_exit() no longer exists. Instead, MD code ensures that new
threads have the correct state when they are created. Also, we no longer
try to fixup the idlethreads for APs in MI code. Instead, each arch sets
the initial curthread and adjusts the state of the idle thread it borrows
in order to perform the initial context switch.
This change is largely a big NOP, but the cleaner separation it provides
will allow for more efficient alternative locking schemes in other parts
of the kernel (bare critical sections rather than per-CPU spin mutexes
for per-CPU data for example).
Reviewed by: grehan, cognet, arch@, others
Tested on: i386, alpha, sparc64, powerpc, arm, possibly more
sys/bus_dma.h instead of being copied in every single arch. This slightly
reorders a flag that was specific to AXP and thus changes the ABI there.
The interface still relies on bus_space definitions found in <machine/bus.h>
so it cannot be included on its own yet, but that will be fixed at a later
date. Add an MD <machine/bus_dma.h> for ever arch for consistency and to
allow for future MD augmentation of the API. sparc64 makes heavy use of
this right now due to its different bus_dma implemenation.
place.
This moves the dependency on GCC's and other compiler's features into
the central sys/cdefs.h file, while the individual source files can
then refer to #ifdef __COMPILER_FEATURE_FOO where they by now used to
refer to #if __GNUC__ > 3.1415 && __BARC__ <= 42.
By now, GCC and ICC (the Intel compiler) have been actively tested on
IA32 platforms by netchild. Extension to other compilers is supposed
to be possible, of course.
Submitted by: netchild
Reviewed by: various developers on arch@, some time ago
o implement double-extended and single precision loads and stores,
o implement double precision stores,
o replace the machdep.unaligned_print sysctl with debug.unaligned_print
and change the default value to 0,
o replace the machdep.unaligned_sigbus sysctl with debug.unaligned_test,
o Remmove the fillfd() function. The function is trvial enough for
inline assembly.
The debug.unaligned_test sysctl is used to test the emulation of
misaligned loads and stores. When PSR.ac is 0, the CPU will handle
misaligned memory accesses itselfi and we don't get an exception
for it. When PSR.ac is 1, the process needs to be signalled and we
should not emulate. The sysctl takes effect when PSR.ac is 1 and
tells us that we should emulate and not send a signal.
PR: 72268
MFC after: 1 week
specified register, but a pointer to the in-memory representation of
that value. The reason for this is twofold:
1. Not all registers can be represented by a register_t. In particular
FP registers fall in that category. Passing the new register value
by reference instead of by value makes this point moot.
2. When we receive a G or P packet, both are for writing a register,
the packet will have the register value in target-byte order and
in the memory representation (modulo the fact that bytes are sent
as 2 printable hexadecimal numbers of course). We only need to
decode the packet to have a pointer to the register value.
This change fixes the bug of extracting the register value of the P
packet as a hexadecimal number instead of as a bit array. The quick
(and dirty) fix to bswap the register value in gdb_cpu_setreg() as
it has been added on i386 and amd64 can therefore be removed and has
in fact been that.
Tested on: alpha, amd64, i386, ia64, sparc64
o Remove a bogus comment that relates to alpha.
o s/u_int64_t/uint64_t/g
o Add bi_spare2 to make the internal padding explicit.
o Move BOOTINFO_MAGIC after the field it applies to.
old or previous value instead of void. This is not as is documented
in atomic(9), but is API (and ABI) compatible and simply makes sense.
This feature will primarily be used for atomic PTE updates in PMAP/ng.
Completely remove the remaining EFI includes and add our own (type)
definitions instead. While here, abstract more of the internals by
providing interface functions.
EFI headers and put them all in <machine/fpu.h>. The Intel EFI headers
conflict with the Intel ACPI headers (duplicate type definitions), so
are being phased out in the kernel.
these two reasons:
1. On ia64 a function pointer does not hold the address of the first
instruction of a functions implementation. It holds the address
of a function descriptor. Hence the user(), btrap(), eintr() and
bintr() prototypes are wrong for getting the actual code address.
2. The logic forces interrupt, trap and exception entry points to
be layed-out contiguously. This can not be achieved on ia64 and is
generally just bad programming.
The MCOUNT_FROMPC_USER macro is used to set the frompc argument to
some kernel address which represents any frompc that falls outside
the kernel text range. The macro can expand to ~0U to bail out in
that case.
The MCOUNT_FROMPC_INTR macro is used to set the frompc argument to
some kernel address to represent a call to a trap or interrupt
handler. This to avoid that the trap or interrupt handler appear to
be called from everywhere in the call graph. The macro can expand
to ~0U to prevent adjusting frompc. Note that the argument is selfpc,
not frompc.
This commit defines the macros on all architectures equivalently to
the original code in sys/libkern/mcount.c. People can take it from
here...
Compile-tested on: alpha, amd64, i386, ia64 and sparc64
Boot-tested on: i386
of the MCOUNT_ENTER, MCOUNT_EXIT and MCOUNT_DECL defines. Also make
sure there's a prototype of _MCOUNT_DECL(). This allows us to build
a kernel. There are still unresolved symbols, so linking fails.
_mcount() stub when profiling is enabled. Emit this code sequence
for assembly routines as welli (MCOUNT definition in <machine/asm.h>.
We do not pass the GOT entry however as the 4th argument, because it's
not used. The _mcount() stub calls __mcount(), which does the actual
work. Define _MCOUNT_DECL to define __mcount. We do not have an
implementation of mcount(), so we define MCOUNT as empty, but have a
weak alias to _mcount() in _mcount.S.
Note that the _mcount() stub in the kernel is slightly different from
the stub in userland. This is because we do not have to worry about
nested routines in the kernel.
have been rush hour...
While here, move COMPAT_IA32 from opt_global.h to opt_compat.h like on
amd64. Consequently, it's unsafe to use the option in pcb.h. We now
unconditionally have the ia32 specific registers in the PCB.
This commit is untested.
to allow dumping per-thread machine specific notes. On ia64 we use this
function to flush the dirty registers onto the backingstore before we
write out the PRSTATUS notes.
Tested on: alpha, amd64, i386, ia64 & sparc64
Not tested on: arm, powerpc
The hardware always gives read access for privilege level 0, which
means that we cannot use the hardware access rights and privilege
level in the PTE to test whether there's a change in protection. So,
we save the original vm_prot_t in the PTE as well.
Add pmap_pte_prot() to set the proper access rights and privilege
level on the PTE given a pmap and the requested protection.
The above allows us to compare the protection in pmap_extract_and_hold()
which was missing. While in pmap_extract_and_hold(), add pmap locking.
While here, clean up most (i.e. all but one) PTE macros we inherited
from alpha. They were either unused, used inconsistently, badly named
or simply weren't beneficial. We save the wired and managed state of
the PTE in distinct (bit) fields.
While in pte.h, s/u_int64_t/uint64_t/g
pmap locking obtained from: alc@
feedback & review by: alc@
related to breakpoints and single stepping into SIGTRAP so gdb(1) knows
why the remote target has stopped. In particular, gdb(1) needs to know
if the reason is something of its own doing.
being defined, define and use a new MD macro, cpu_spinwait(). It only
expands to something on i386 and amd64, so the compiled code should be
identical.
Name of the macro found by: jhb
Reviewed by: jhb
their own directory and module, leaving the MD parts in the MD
area (the MD parts _are_ part of the modules). /dev/mem and /dev/io
are now loadable modules, thus taking us one step further towards
a kernel created entirely out of modules. Of course, there is nothing
preventing the kernel from having these statically compiled.
dereference curthread. It is called only from critical_{enter,exit}(),
which already dereferences curthread. This doesn't seem to affect SMP
performance in my benchmarks, but improves MySQL transaction throughput
by about 1% on UP on my Xeon.
Head nodding: jhb, bmilekic
Most of the changes are a direct result of adding thread awareness.
Typically, DDB_REGS is gone. All registers are taken from the
trapframe and backtraces use the PCB based contexts. DDB_REGS was
defined to be a trapframe on all platforms anyway.
Thread awareness introduces the following new commands:
thread X switch to thread X (where X is the TID),
show threads list all threads.
The backtrace code has been made more flexible so that one can
create backtraces for any thread by giving the thread ID as an
argument to trace.
With this change, ia64 has support for breakpoints.
o ksym_start and ksym_end changed type to vm_offset_t.
o Make debugging support conditional upon KDB instead of DDB.
o Call kdb_enter() instead of breakpoint().
o Remove implementation of Debugger().
o Call kdb_trap() according to the new world order.
unwinder:
o s/db_active/kdb_active/g
o Various s/ddb/kdb/g
o Add support for unwinding from the PCB as well as the trapframe.
Abuse a spare field in the special register set to flag whether
the PCB was actually constructed from a trapframe so that we can
make the necessary adjustments.
md_var.h:
o Add RSE convenience macros.
o Add ia64_bsp_adjust() to add or subtract from BSP while taking
NaT collections into account.
a PCB from a trapframe for purposes of unwinding the stack. The PCB
is used as the thread context and all but the thread that entered the
debugger has a valid PCB.
This function can also be used to create a context for the threads
running on the CPUs that have been stopped when the debugger got
entered. This however is not done at the time of this commit.
in which multiple (presumably different) debugger backends can be
configured and which provides basic services to those backends.
Besides providing services to backends, it also serves as the single
point of contact for any and all code that wants to make use of the
debugger functions, such as entering the debugger or handling of the
alternate break sequence. For this purpose, the frontend has been
made non-optional.
All debugger requests are forwarded or handed over to the current
backend, if applicable. Selection of the current backend is done by
the debug.kdb.current sysctl. A list of configured backends can be
obtained with the debug.kdb.available sysctl. One can enter the
debugger by writing to the debug.kdb.enter sysctl.
backend improves over the old GDB support in the following ways:
o Unified implementation with minimal MD code.
o A simple interface for devices to register themselves as debug
ports, ala consoles.
o Compression by using run-length encoding.
o Implements GDB threading support.
to <sys/gmon.h>. Cleaned them up a little by not attempting to ifdef
for incomplete and out of date support for GUPROF in userland, as in
the sparc64 version.
individual asm versions. The global lock is shared between the BIOS and
OS and thus cannot use our mutexes. It is defined in section 5.2.9.1 of
the ACPI specification.
Reviewed by: marcel, bde, jhb
distinguish between debugger inserted breakpoints and fixed
breakpoints. While here, make sure the break instruction never
ends up in the last slot of a bundle by forcing it to be an
M-unit instruction. This makes it easier for use to skip over
it.
level of abstraction for any and all CPU mask and CPU bitmap variables
so that platforms have the ability to break free from the hard limit
of 32 CPUs, simply because we don't have more bits in an u_int. Note
that the type is not supposed to solve massive parallelism, where
the number of CPUs can be larger than the width of the widest integral
type. As such, cpumask_t is not supposed to be a compound type. If
such would be necessary in the future, we can deal with the issues
then and there. For now, it can be assumed that the type is integral
and unsigned.
With this commit, all MD definitions start off as u_int. This allows
us to phase-in cpumask_t at our leasure without breaking anything.
Once cpumask_t is used consistently, platforms can switch to wider
(or smaller) types if such would be beneficial (or not; whatever :-)
Compile-tested on: i386
with a memory mapped I/O range that's immediately before it and is
not 256MB aligned. As a result, when an address is accessed in the
memory mapped range and a direct mapping is added for it, it overlaps
with the pre-mapped I/O port space and causes a machine check.
Based on a patch from: arun@
at it, use the ANSI C generic pointer type for the second argument,
thus matching the documentation.
Remove the now extraneous (and now conflicting) function declarations
in various libc sources. Remove now unnecessary casts.
Reviewed by: bde
as these ioctl's aren't MD. This also means they are installed in
/usr/include/dev/bktr now. Also provide compatability wrappers for
where these headers lived in 4.x.
flags. We now create asynchronous contexts or syscall contexts only.
Syscall contexts differ from the minimal ABI dictated contexts by
having the scratch registers saved and restored because that's where
we keep the syscall arguments and syscall return values.
Since this change affects KSE, have it use kse_switchin(2) for the
"new" syscall context.
physical mapping.
- Move the sf_buf API to its own header file; make struct sf_buf's
definition machine dependent. In this commit, we remove an
unnecessary field from struct sf_buf on the alpha, amd64, and ia64.
Ultimately, we may eliminate struct sf_buf on those architecures
except as an opaque pointer that references a vm page.
use set_mcontext() to restore the context in sigreturn(). Since we
put the syscall number and the syscall arguments in the trapframe
(we don't save the scratch registers for syscalls, which allows us
to reuse the space to our advantage), create a MD specific flag so
that we save the scratch registers even for syscalls. We would not
be able to restart a syscall otherwise.
The signal trampoline does not need to flush the regiters anymore,
because get_mcontext() already handles that. In fact, if we set up
the context correctly, we do not need to have a trampoline at all.
This change however only minimally changes the trampoline code. In
follow-up commits this can be further optimized.
Note that normally we preserve cfm and iip in the trapframe created
by the EPC syscall path when we restore a context in set_mcontext()
because those fields are not normally set for a synchronuous context.
The kernel puts the return address and frame info of the syscall
stub in there. By preserving these fields we hide this detail from
userland which allows us to use setcontext(2) for user created
contexts. However, sigreturn() is commonly called from the trampoline,
which means that if we preserve cfm and iip in all cases, we would
return to the trampoline after the sigreturn(), which means we hit
the safety net: we call exit(2). So, we do not preserve cfm and iip
when we have a synchronous context that also has scratch registers
(the uncommon context created by sendsig() only), under the assumption
that if such a context is created in userland, something special is
going on and the use of cfm and iip is then just another quirk. All
this is invisible in the common case.
An example of useless is bios.h. An example of wrong is msdos.h (due
to the use of long for 32-bit fields).
display.h cannot be removed because it's used by syscons. That header
however has no platform dependency and shouldn't really be here.
Removal if these headers may cause build failures in the ports tree.
It's the ports that need fixing in that case.
Tested with: buildworld, LINT
license. Only clause 3 has been revoked. Restore the fourth clause
as clause 3.
Pointed out by: das@
Remove my name as a copyright holder since I don't use a BSD license
compatible or comparable to the UCB license. I choose not to add a
complete second license for my work for aesthetic reasons, nor to
replace the UCB license on grounds of rewriting more than 90% of the
source files. The rewrite can also be seen as an enhancement and since
the files were practically empty, it's rather trivial to have changed
90% of the files.
added for XFree86. There are 2 reasons for doing this with sysarch():
1. The memory mapped I/O space is not at a fixed physical address. An
application has to use some interface to get the base address. It
gets worse if the machine has multiple memory mapped I/O spaces.
2. Access to the memory mapped I/O space needs to happen through a
translation that is flagged as uncachable. There's no interface
that allows a process to do uncached memory I/O, other than though
/dev/mem (possibly).
So, until we either disallow direct access to I/O or bus space from
userland or have a better way of doing this, sysarch() has the least
negative impact on existing interfaces.
we had were bogus.
While here, reassign the copyright to the Project. There's nothing
in this files that originates from NetBSD, especially now that the
FreeBSD/alpha bits have been removed, but even then the amount of
inherited code that we actually used was nil.
by libguile that needs to know the base of the RSE backing store. We
currently do not export the fixed address to userland by means of a
sysctl so user code needs to hardcode it for now. This will be revisited
later.
The RSE backing store is now at the bottom of region 4. The memory stack
is at the top of region 4. This means that the whole region is usable
for the stacks, giving a 61-bit stack space.
Port: lang/guile (depended of x11/gnome2)
about because we're still tier 2 and our current compiler, as well
as future compilers will not support varargs. This is mostly a
no-op in practice, because <sys/varargs.h> should already cause
compile failures.
systems where the data/stack/etc limits are too big for a 32 bit process.
Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c.
Supply an ia32_fixlimits function. Export the clip/default values to
sysctl under the compat.ia32 heirarchy.
Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max
value rather than the sysctl tweakable variable. This allows mmap to
place mappings at sensible locations when limits have been reduced.
Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same
method as mmap(0, ...) now does.
Note that we cannot remove all references to the sysctl tweakable
maxdsiz etc variables because /etc/login.conf specifies a datasize
of 'unlimited'. And that causes exec etc to fail since it can no
longer find space to mmap things.
but for CPL != 0. For some reason yet unknown it is possible for the
CPL to be 2. This would previously be counted as kernel mode, which
resulted in nasty panics. By changing the test it is now treated as
user mode, which is more correct. We still need to figure out how it
is possible that the privilege level can be 2 (or 1 for that matter),
because it's not used by us. We only use 3 (user mode) and 0 (kernel
mode).
we think is the correct trigger mode and polarity. This allows us to
implement BUS_CONFIG_INTR() as an update of the RTE in question.
Consequently, we can trust the RTE when we enable an interrupt and
avoids that we need to know about the trigger mode and polarity at
that time.
latter is a kernel option for IA64_ID_PAGE_SHIFT, which in turn
determines IA64_ID_PAGE_MASK and IA64_ID_PAGE_SIZE.
The constants are used instead of the literal hardcoding (in its
various forms) of the size of the direct mappings created in region
6 and 7. The default and probably only workable size is still 256M,
but for kicks we use 128M for LINT.
we return to kernel or userland. This triggered a panic in a KSE
application when TDF_USTATCLOCK was set in the case userland was
interrupted, but we never called ast() on our way out. As such,
we called ast() at some other time. Unfortunately, TDF_USTATCLOCK
handling assumes running in the interrupt thread. This was not
the case anymore.
To avoid making the same mistake later, interrupt() now returns
to its caller whether we interrupted userland or not. This avoids
that we have to duplicate the check in assembly, where it's bound
to fall off the scope. Now we simply check the return value and
call ast() if appropriate.
Run into this: davidxu
ultimate trigger for the follow-up fixes in revisions 1.78, 1.80,
1.81 and 1.82 of trap.c. I was simply too pre-occupied with the
gateway page and how it blurs kernel space with user space and
vice versa that I couldn't see that it was all a load of bollocks.
It's not the IP address that matters, it's the privilege level that
counts. We never run in user space with lifted permissions and we
sure can not run in kernel space without it. Sure, the gateway page
is the exception, but not if you look at the privilege level. It's
user space if you run with user permissions and kernel space otherwise.
So, we're back to looking at the privilege level like it should be.
There's no other way.
Pointy hat: marcel
prototypes of cpu_halt(), cpu_reset() and swi_vm() from md_var.h to
cpu.h. This affects db_command.c and kern_shutdown.c.
ia64: move all MD prototypes from cpu.h to md_var.h. This affects
madt.c, interrupt.c and mp_machdep.c. Remove is_physical_memory().
It's not used (vm_machdep.c).
alpha: the MD prototypes have been left in cpu.h with a comment
that they should be there. Moving them is left for later. It was
expected that the impact would be significant enough to be done in
a seperate commit.
powerpc: MD prototypes left in cpu.h. Comment added.
Suggested by: bde
Tested with: make universe (pc98 incomplete)
and the move to control register to avoid dependency violations when
these functions are used. Note that explicit data and instruction
serialization also need to be in a subsequent instruction group.
This too requires that we have an igrp break here.
PT_SETKSTACK. These requests allow the tracing process to access the
dirty registers of the traced process that are on the kernel stack.
Note that there's currently no way to access the rnat register for
those dirty registers that are not (yet) covered by a nat collection
point. The interface for this is still being slept on.
Also note that implied by these requests is the division of work:
The tracing process has to keep track of where registers are spilled
and is responsible to figure out where the NaT bit of the stacked
registers are at any time during the execution of the traced process.
The kernel provides the interfaces but will not abstract the fact
that the register stack can be split. This model does not follow
the approach taken in Linux where PT_PEEK and PT_POKE deals with
this automagically.
when we create contexts. The meaning of the flags are documented in
<machine/ucontext.h>. I only list them here to help browsing the
commit logs:
_MC_FLAGS_ASYNC_CONTEXT
_MC_FLAGS_HIGHFP_VALID
_MC_FLAGS_KSE_SET_MBOX
_MC_FLAGS_RETURN_VALID
_MC_FLAGS_SCRATCH_VALID
Yes, _MC_FLAGS_KSE_SET_MBOX is a hack and I'm proud of it :-)
set in cpu_critical_fork_exit() anymore.
- As far as I can tell, cpu_thread_link() has never been used, not even
when it was originally added, so remove it.
o Remove alpha specific timer code (mc146818A) and compiled-out
calibration of said timer.
o Remove i386 inherited timer code (i8253) and related acquire and
release functions.
o Move sysbeep() from clock.c to machdep.c and have it return
ENODEV. Console beeps should be implemented using ACPI or if no
such device is described, using the sound driver.
o Move the sysctls related to adjkerntz, disable_rtc_set and
wall_cmos_clock from machdep.c to clock.c, where the variables
are.
o Don't hardcode a hz value of 1024 in cpu_initclocks() and don't
bother faking a stathz that's 1/8 of that. Keep it simple: hz
defaults to HZ and stathz equals hz. This is also how it's done
for sparc64.
o Keep a per-CPU ITC counter (pc_clock) and adjustment (pc_clockadj)
to calculate ITC skew and corrections. On average, we adjust the
ITC match register once every ~1500 interrupts for a duration of
2 consequtive interruprs. This is to correct the non-deterministic
behaviour of the ITC interrupt (there's a delay between the match
and the raising of the interrupt).
o Add 4 debugging sysctls to monitor clock behaviour. Those are
debug.clock_adjust_edges, debug.clock_adjust_excess,
debug.clock_adjust_lost and debug.clock_adjust_ticks. The first
counts the individual adjustment cycles (when the skew first
crosses the threshold), the second counts the number of times the
adjustment was excessive (any non-zero value is to be considered
a bug), the third counts lost clock interrupts and the last counts
the number of interrupts for which we applied an adjustment
(debug.clock_adjust_ticks / debug.clock_adjust_edges gives the
avarage duration of an individual adjustment -- should be ~2).
While here, remove some nearby (trivial) left-overs from alpha and
other cleanups.
memory in bus_dmamem_alloc(). This is possible now that
contigmalloc() supports the M_ZERO flag.
- Remove the locking of Giant around calls to contigmalloc() since
contigmalloc() now grabs Giant itself.
switching anymore, so there's no need to save and restore GP. This
change breaks threaded applications linked against libc_r. Pull the
tier 2 card again: relink. This will link against libthr instead.
a non-standard construct. Instead, redefine struct _ia64_fpreg as a
union and put a long double in it. On ia64 and for LP64, this is
defined by the ABI to have 16-byte alignment. For ILP32 a long double
has 4-byte alignment, but we don't support ILP32.
Note that the in-memory image of a long double does not match the in-
memory image of spilled FP registers. This means that one cannot use
the fpr_flt field to interpet the bits. For this reason we continue
to use an aggregate type.
but this just created a weird inconsistency when porting gdb(1).
Instead, we name each high FP register seperately, like we do for
all the other registers.
them again afterwards. This fixes a disabled FP fault while in the FPSWA
handler.
While here, merge the FP fault and FP trap handling code to reduce code
duplication. Where code was different, it was not sure it should be.
Trigger case: ports/math/atlas
our unwind information for functions that are entry points into the
kernel. When stepping to the next frame, the unwinder will let us
know when sych a marker was encountered. We use this to stop the
current unwind session, query the trapframe and restart a new
unwind session based on the new trapframe.
The implementation is a bit sloppy, but at this time there are
bigger fish to fry.
to get a stacktrace. This does not work even with M_NOWAIT when we
have WITNESS and is generally a bad idea (pointed out by bde@). We
allocate an 8K heap for use by the unwinder when ddb is active. A
stack trace roughly takes up half of that in any case, so we have
some room for complex unwind situations. We don't want to waste too
much space though. Due to the nature of unwinding, we don't worry
too much about fragmentation or performance of unwinding while in
the debugger. For now we have our own heap management, but we may
be able to leverage from existing code at some later time.
While here:
o Make sure we actually free the unwind environment after unwinding.
This fixes a memory leak.
o Replace Doug's license with mine in unwind.c and unwind.h. Both
files don't have much, if any, of Doug's code left since the EPC
syscall overhaul and the import of the unwinder.
o Remove dead code.
o Replace M_NOWAIT with M_WAITOK for all remaining malloc() calls.
Add two new arguments to bus_dma_tag_create(): lockfunc and lockfuncarg.
Lockfunc allows a driver to provide a function for managing its locking
semantics while using busdma. At the moment, this is used for the
asynchronous busdma_swi and callback mechanism. Two lockfunc implementations
are provided: busdma_lock_mutex() performs standard mutex operations on the
mutex that is specified from lockfuncarg. dftl_lock() is a panic
implementation and is defaulted to when NULL, NULL are passed to
bus_dma_tag_create(). The only time that NULL, NULL should ever be used is
when the driver ensures that bus_dmamap_load() will not be deferred.
Drivers that do not provide their own locking can pass
busdma_lock_mutex,&Giant args in order to preserve the former behaviour.
sparc64 and powerpc do not provide real busdma_swi functions, so this is
largely a noop on those platforms. The busdma_swi on is64 is not properly
locked yet, so warnings will be emitted on this platform when busdma
callback deferrals happen.
If anyone gets panics or warnings from dflt_lock() being called, please
let me know right away.
Reviewed by: tmm, gibbs
to the machine-independent parts of the VM. At the same time, this
introduces vm object locking for the non-i386 platforms.
Two details:
1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES. The
different machine-dependent implementations used various combinations
of KSTACK_GUARD and KSTACK_GUARD_PAGES. To disable guard page, set
KSTACK_GUARD_PAGES to 0.
2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new. In
5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed
to vm_page_alloc() or vm_page_grab().
always kernel space. It should be treated as user space when run with
user privileges (which is the case for the signal trampolines). This
fixes its only use in a KASSERT in subr_trap.c.
The current name is confusing, because it indicates to
the client that a bus_dmamap_sync() operation is not
necessary when the flag is specified, which is wrong.
The main purpose of this flag is to hint the underlying
architecture that DMA memory should be mapped in a coherent
way, but the architecture can ignore it. But if the
architecture does supports coherent mapping of memory, then
it makes bus_dmamap_sync() calls cheap.
This flag is the same as the one in NetBSD's Bus DMA.
Reviewed by: gibbs, scottl, des (implicitly)
Approved by: re@ (jhb)
BUS_DMASYNC_ definitions remain as before. The does not change the ABI,
and reverts the API to be a bit more compatible and flexible. This has
survived a full 'make universe'.
Approved by: re (bmah)
PSR only to achieve setting PSR.i back to it's previous value. It
makes it impossible to change any of the 30+ other unrelated bits
when done between intr_disable() and intr_restore(). That's bad.
Instead have intr_disable() return 1 when interrupts were previously
enabled and 0 otherwise and only enable interrupts in intr_restore()
when given a non-0 value.
This change specifically disallows using intr_restore() to disable
interrupts. The reason is simple: interrupts only need to be restored
after they are being disabled, which means that intr_restore() is
called with interrupts disabled and we only need to enable them if
they were previously enabled.
This change does not fix any bugs, other than that it bugged me...
Approved by: re@ (blanket)
and user mode. We need to take into account that the EPC syscall path
introduces a grey area in which one can argue either way, including a
third: neither.
We now use the region in which the IP address lies. Regions 5, 6 and 7
are kernel VA regions and if the IP lies any any of those regions we
assume we're in kernel mode. Hence, we can be in kernel mode even if
we're not on the kernel stack and/or have user privileges. There're
gremlins living in the twilight zone :-)
For the EPC syscall path this particularly means that the process
leaves user mode the moment it calls into the gateway page. This
makes the most sense because from a process' point of view the call
represents a request to the kernel for some service and that service
has been performed if the call returns. With the metric we picked,
this also means that we're back in user mode IFF the call returns.
Approved by: re@ (blanket)
On alpha, PAL is involved in context management and after wiring
the CPU (in alpha_init()) a context switch was performed to tell
PAL about the context. This was bogusly brought over to ia64
where it introduced bugs, because we restored the context from
a mostly uninitialized PCB.
The cleanup constitutes:
o Remove the unused arguments from ia64_init().
o Don't return from ia64_init(), but instead call mi_startup()
directly. This reduces the amount of muckery in assembly and
also allows for the next bullet:
o Save our currect context prior to calling mi_startup(). The
reason for this is that many threads are created from thread0
by cloning the PCB. By saving our context in the PCB, we have
something sane to clone. It also ensures that a cloned thread
that does not alter the context in any way will return to
the saved context, where we're ready for the eventuality with
a nice, user unfriendly panic().
The cleanup fixes at least the following bugs:
o Entering mi_startup() with the RSE in enforced lazy mode.
o Re-execution of ia64_init() in certain "lab" conditions.
While here, add proper unwind directives to __start() so that
the unwind knows it has reached the bottom of the (call) stack.
Approved by: re@ (blanket)
- Fix visibilty test for LONG_BIT and WORD_BIT. `#if defined(__FOO_VISIBLE)'
is alays wrong because __FOO_VISIBLE is always defined (to 0 for
invisibility).
sys/<arch>/include/limits.h
sys/<arch>/include/_limits.h:
- Style fixes.
Submitted by: bde
Reviewed by: bsdmike
Approved by: re (scottl)
switching user regions (region 0-4) with schedlock. Avoid unnecessary
recursion on schedlock by moving the core functionality to another
function (pmap_switch()) where we assert schedlock is held. Turn
pmap_install() into a wrapper that grabs schedlock. This minimizes
the number of callsites that need to be changed.
Since we already have schedlock in cpu_switch() and cpu_throw(),
have them call pmap_switch() directly. These were also the only two
calls to pmap_install() outside pmap.c, so make pmap_install() static
and remove its prototype from pmap.h
Approved by: re (blanket)
prime objectives are:
o Implement a syscall path based on the epc inststruction (see
sys/ia64/ia64/syscall.s).
o Revisit the places were we need to save and restore registers
and define those contexts in terms of the register sets (see
sys/ia64/include/_regset.h).
Secundairy objectives:
o Remove the requirement to use contigmalloc for kernel stacks.
o Better handling of the high FP registers for SMP systems.
o Switch to the new cpu_switch() and cpu_throw() semantics.
o Add a good unwinder to reconstruct contexts for the rare
cases we need to (see sys/contrib/ia64/libuwx)
Many files are affected by this change. Functionally it boils
down to:
o The EPC syscall doesn't preserve registers it does not need
to preserve and places the arguments differently on the stack.
This affects libc and truss.
o The address of the kernel page directory (kptdir) had to
be unstaticized for use by the nested TLB fault handler.
The name has been changed to ia64_kptdir to avoid conflicts.
The renaming affects libkvm.
o The trapframe only contains the special registers and the
scratch registers. For syscalls using the EPC syscall path
no scratch registers are saved. This affects all places where
the trapframe is accessed. Most notably the unaligned access
handler, the signal delivery code and the debugger.
o Context switching only partly saves the special registers
and the preserved registers. This affects cpu_switch() and
triggered the move to the new semantics, which additionally
affects cpu_throw().
o The high FP registers are either in the PCB or on some
CPU. context switching for them is done lazily. This affects
trap().
o The mcontext has room for all registers, but not all of them
have to be defined in all cases. This mostly affects signal
delivery code now. The *context syscalls are as of yet still
unimplemented.
Many details went into the removal of the requirement to use
contigmalloc for kernel stacks. The details are mostly CPU
specific and limited to exception_save() and exception_restore().
The few places where we create, destroy or switch stacks were
mostly simplified by not having to construct physical addresses
and additionally saving the virtual addresses for later use.
Besides more efficient context saving and restoring, which of
course yields a noticable speedup, this also fixes the dreaded
SMP bootup problem as a side-effect. The details of which are
still not fully understood.
This change includes all the necessary backward compatibility
code to have it handle older userland binaries that use the
break instruction for syscalls. Support for break-based syscalls
has been pessimized in favor of a clean implementation. Due to
the overall better performance of the kernel, this will still
be notived as an improvement if it's noticed at all.
Approved by: re@ (jhb)
The advantage of using register sets is that you don't focus on each
register seperately, but instead instroduce a level of abstraction.
This reduces the chance of errors, and also simplifies the code.
The register sers form the basis of everything register.
The sets in this file are:
struct _special
contains all of the control related registers, such as instruction
pointer and stack pointer. It also contains interrupt specific registers
like the faulting address. The set is roughly split in 3 groups. The
first contains the registers that define a context or thread. This is
the only group that the kernel needs to switch threads. The second group
contains registers needed in addition to the first group needed to switch
userland threads. This group contains the thread pointer and the FP control
register. The third group contains those registers we need for execption
handling and are used on top of the first two groups.
struct _callee_saved, struct _callee_saved_fp
These sets contain the preserved registers, including the NaT after
spilling. The general registers (including branch registers) are
seperated from the FP registers for ptrace(2).
struct _caller_saved, struct _caller_saved_fp
These sets contain the scratch registers based on SDM 2.1, This means that
both ar.csd and ar.ccd are included here, even though they contain ia32
segment register descriptions. We keep seperate NaT bits for scratch and
preserved registers, because they are never saved/restored at the same
time.
struct _high_fp
The upper 96 FP registers that can be enabled/disabled seperately on
the CPU from the lower 32 FP registers. Due to the size of this set,
we treat them specially, even though they are defined as scratch
registers.
CVS ----------------------------------------------------------------------
Remove DBL_DIG, DBL_MIN, DBL_MAX and their FLT_ counterparts, they
were marked for deprecation ever since SUSv1 at least.
Only define ULLONG_MIN/MAX and LLONG_MAX if long long type is
supported.
Restore a lost comment in MI _limits.h file and remove it from
sys/limits.h where it does not belong.
o do not use the in* and out* functions. These functions are used by
legacy drivers and thus must have ia32 compatible behaviour. Hence,
they need to have fences. Using these functions for newbus would
then pessimize performance.
o remove the conditional compilation of PIO and/or MEMIO support. It's
a PITA without having any significant benefit. We always support them
both. Since there are no I/O ports on ia64 (they are simulated by the
chipset by translating memory mapped I/O to predefined uncacheable
memory regions) the only difference between PIO and MEMIO is in the
address calculation. There should be enough ILP that can be exploited
here that making these computations compile-time conditional is not
worth it. We now also don't use the read* and write* functions.
o Add the missing *_8 variants. They were missing, although not missed.
It's for completeness.
o Do not add the fences that were present in the low-level support
functions here. We're using uncacheable memory, which means that
accesses are in program order. Change the barrier implementation
to not only do a memory fence, but also an acceptance fence. This
should more reliably synchronize drivers with the hardware. The
memory fence enforces ordering, but does not imply visibility (ie
the access does not necessarily have happened). This is what the
acceptance deals with.
cpufunc.h cleanup:
o Remove the low-level memory mapped I/O support functions. They are
not used. Keep the low-level I/O port access functions for legacy
drivers and add fences to ensure ia32 compatibility.
o Remove the syscons specific functions now that we have moved the
proper definitions where they belong.
o Replace the ia64_port_address() and ia64_memory_address() functions
with macros. There's a bigger change inline functions get inlined
when there aren't function callsi and the calculations are simply
enough to do it with macros.
Replace the one reference to ia64_memory address in mp_machdep.c to
use the macro.
to get actual constant values. This is in preparation for machine/limits.h
retirement.
Discussed on: standards@
Submitted by: Craig Rodrigues <rodrigc@attbi.com> (*)
Modified by: kan
instruction requires that a translation is present in the TC. This
may trigger a TLB miss and a subsequent call to vm_fault().
This implementation is deliberately non-inline for debugging and
profiling purposes. Partial or full inlining should eventually be
done.
Valuable insights by: jake
where physical addresses larger than virtual addresses, such as i386s
with PAE.
- Use this to represent physical addresses in the MI vm system and in the
i386 pmap code. This also changes the paddr parameter to d_mmap_t.
- Fix printf formats to handle physical addresses >4G in the i386 memory
detection code, and due to kvtop returning vm_paddr_t instead of u_long.
Note that this is a name change only; vm_paddr_t is still the same as
vm_offset_t on all currently supported platforms.
Sponsored by: DARPA, Network Associates Laboratories
Discussed with: re, phk (cdevsw change)
are machine dependent because they are not required to update the tlb when
mappings are added or removed, and doing so is machine dependent.
In addition, an implementation may require that pages mapped with pmap_kenter
have a backing vm_page_t, which is not necessarily true of all physical
pages, and so may choose to pass the vm_page_t to pmap_kenter instead of the
physical address in order to make this requirement clear.
not save (restore) the global pointer (GP) in the jmpbuf in setjmp
(longjmp) because it's not needed in general. GP is considered a
scratch register at callsites and hence is always restored after a
call (when it's possible that the call resolves to a symbol in a
different loadmodule; otherwise GP does not have to be saved and
restored at all), including calls to setjmp/longjmp. There's just
one problem with this now that we use setjmp/longjmp for context
switching: A new context must have GP defined properly for the
thread's entry point. This means that we need to put GP in the
jmpbuf and consequently that we have to restore is in longjmp.
This automaticly requires us to save it as well.
When setjmp/longjmp isn't used for context switching, this can be
reverted again.
the J_SIG0 field. While here, rename J_SIG0 to J_SIGSET and
remove J_SIG1. The main reason for this change is that the
128-bit sigset_t is now aligned on a 16-byte boundary, which
allows us to use 16-byte atomic loads and stores on CPUs that
support it. The removal of J_SIG1 is done to avoid confusion:
it is never accessed and should not be. Renaming J_SIG0 to
J_SIGSET is the icing on the cake that's better done now than
later.
o Add a MD header private to libc called _fpmath.h; this header
contains bitfield layouts of MD floating-point types.
o Add a MI header private to libc called fpmath.h; this header
contains bitfield layouts of MI floating-point types.
o Add private libc variables to lib/libc/$arch/gen/infinity.c for
storing NaN values.
o Add __double_t and __float_t to <machine/_types.h>, and provide
double_t and float_t typedefs in <math.h>.
o Add some C99 manifest constants (FP_ILOGB0, FP_ILOGBNAN, HUGE_VALF,
HUGE_VALL, INFINITY, NAN, and return values for fpclassify()) to
<math.h> and others (FLT_EVAL_METHOD, DECIMAL_DIG) to <float.h> via
<machine/float.h>.
o Add C99 macro fpclassify() which calls __fpclassify{d,f,l}() based
on the size of its argument. __fpclassifyl() is never called on
alpha because (sizeof(long double) == sizeof(double)), which is good
since __fpclassifyl() can't deal with such a small `long double'.
This was developed by David Schultz and myself with input from bde and
fenner.
PR: 23103
Submitted by: David Schultz <dschultz@uclink.Berkeley.EDU>
(significant portions)
Reviewed by: bde, fenner (earlier versions)
and instead add platform, firmware and EFI stubs to the loader.
The net effect of this change is that besides a special console and
disk driver, the kernel has no knowledge of the simulator. This has
the following advantages:
o Simulator support is much harder to break,
o It's easier to make use of more feature complete simulators.
This would only need a change in the simulator specific loader,
o Running SMP kernels within the simulator. Note that ski at this
time does not simulate IPIs, so there's no way to start APs.
The platform, firmware and EFI stubs describe the following hardware:
o 4 CPU Itanium,
o 128 MB RAM within the 4GB address space,
o 64 MB RAM above the 4GB address space.
NOTE: The stubs in the skiloader describe a machine that should in
parts be defined by the simulator. Things like processor interrupt
block and AP wakeup vector cannot be choosen at random because they
require interpretation by the simulator. Currently the simulator is
ignorant of this.
This change introduces an unofficial SSC call SSC_SAL_SET_VECTORS
which is ignored by the simulator.
Tested with: ski (version 0.943 for linux)
CLOCK_VECTOR and define it as 254, not 255. Vector 255 is already
in use as the AP wakeup vector on the HP rx2600.
This needs to be made more dynamic. The likelyhood of vector 254
being in use is pretty small, but we already have code to assign
vectors to IPIs (see sal.c) and it's preobably better to have a
centralized "vector manager" that hands out vectors based on
some imput (like priority).
handleclock itself is trivial.
While here, replace (itc_frequency+hz/2)/hz with itm_reload for
consistency. There's now a single place where we determine the
ITM reload value.
interrupt block). We use the previously hardcoded address as a
default only, but will otherwise use whatever ACPI tells us.
The address can be found in the MADT table header or in the
LAPIC override table entry.
space most of the time, but handles machines with lots of I/O
(S)APICs. We cannot make this more dynamic without breaking the
interface with vmstat. Hence, we need to fix the interface first.
devices aren't necessarily mapped within 4GB. I/O port addresses
are offsets into the memory mapped I/O port space, which is not
larger than 16MB. No need to convert those to 64 bit types.
The HCDP table is one (non-proprietary) way for the platform to
inform the OS about headless operation. This field would normally
hold the address as can be found by scanning the EFI system table,
which we also pass to the kernel. The apparent duplication allows
us to synthesize a HCDP table in the loader by whatever means we
can think of, including relocating the platform table into pre-
mapped address space. In short: it gives us more freedom.
Approved by: re (blanket)
Add function map_port_space() to map the memory mapped I/O port
range as uncacheable virtual memory and call it prior to probing
for a console. This removes the dependency on the loader to have
done this for us. Note that this change does not include doing
the same for APs.
Approved by: re (blanket)
to worry about ABI vs released systems yet. This is mostly transparent
since there is no significant exposure in the syscall interface. The
things that go wrong are mostly userland stuff - time(&intvariable).
Reviewed by: dfr, marcel
Approved by: re (jhb)
Don't force 16-byte alignment at run-time. Do it at compile-time.
This saves us the pointer fiddling by the setjmp functions and
reduces complexity. While here, increase the jmp_buf by 16 bytes
to an even 512 bytes. Coincidentally, due to the way alignment
was handled prior to this change, the jmp_buf has not changed in
size, but only in how the space is used. Prior to this change
the 16 bytes were reserved for enforcing alignment; now they are
reserved by us for future extensions.
Therefore, this ABI breaker is relatively save: the failure is
always an alignment trap.
have f16-f31 as part of the context. The PCB has been reorganized to
better match how we save and restore the (preserved) registers. This
commit also moves the context restoriation to its own function (named
pcb_restore), as we did with pcb_save.
Only minimal effort has been put in writing optimal assembly. The
expectation is that there will be more rounds of changes.
from all low-level bus space support functions. There's no need
to actually force the read/write to be accepted by the platform
before we can do anything else. We still have the mf instruction
there, which forces ordering. This too is not required given the
semantices of the bus space I/O functions, but it's not at all
clear to me if there are any poorly written device drivers that
depend on the strict ordering by the processor. The motto here is
to take small steps...
o Properly set the pointer to the counter for each interrupt and
update the intrnames table.
o Remove Alpha cruft from intrcnt.h.
o Create INTRNAME_LEN as the single entity that defines the width
of the names in the intrnames table (incl. terminatinf '\0').
handling clean and functional as 5.x evolves. This allows some of the
nasty bandaids in the 5.x codepaths to be unwound.
Encapsulate 4.x signal handling under COMPAT_FREEBSD4 (there is an
anti-foot-shooting measure in place, 5.x folks need this for a while) and
finish encapsulating the older stuff under COMPAT_43. Since the ancient
stuff is required on alpha (longjmp(3) passes a 'struct osigcontext *'
to the current sigreturn(2), instead of the 'ucontext_t *' that sigreturn
is supposed to take), add a compile time check to prevent foot shooting
there too. Add uniform COMPAT_43 stubs for ia64/sparc64/powerpc.
Tested on: i386, alpha, ia64. Compiled on sparc64 (a few days ago).
Approved by: re
by using the linker hooks. Since these hooks are called for the
kernel as well, we don't need to deal with that with a special
SYSINIT. The initialization implicitly performed on the first
update of the unwind information is made explicit with a SYSINIT.
We now don't need the _ia64_unwind_{start|end} symbols.
as a trivial function that only calls ia64_tpa() and hence requires
the prototype of ia64_tpa(), but by defining pmap_kextract as
ia64_tpa. This solves the inclusion ordering issue in ddb/db_watch.c
expand to __attribute__((packed)) and __attribute__((aligned(x)))
respectively. Replace the handful of gcc-ism's that use
__attribute__((aligned(16))) etc around the kernel with __aligned(16).
There are over 400 __attribute__((packed)) to deal with, that can come
later. I just want to use __packed in new code rather than add more
gcc-ism's.
under way to move the remnants of the a.out toolchain to ports. As the
comment in src/Makefile said, this stuff is deprecated and one should not
expect this to remain beyond 4.0-REL. It has already lasted WAY beyond
that.
Notable exceptions:
gcc - I have not touched the a.out generation stuff there.
ldd/ldconfig - still have some code to interface with a.out rtld.
old as/ld/etc - I have not removed these yet, pending their move to ports.
some includes - necessary for ldd/ldconfig for now.
Tested on: i386 (extensively), alpha
in the original hardwired sysctl implementation.
The buf size calculator still overflows an integer on machines with large
KVA (eg: ia64) where the number of pages does not fit into an int. Use
'long' there.
Change Maxmem and physmem and related variables to 'long', mostly for
completeness. Machines are not likely to overflow 'int' pages in the
near term, but then again, 640K ought to be enough for anybody. This
comes for free on 32 bit machines, so why not?
These types are unlikely to ever become very MD. They include:
clockid_t, ct_rune_t, fflags_t, intrmask_t, mbstate_t, off_t, pid_t,
rune_t, socklen_t, timer_t, wchar_t, and wint_t.
While moving them, make a few adjustments (submitted by bde):
o __ct_rune_t needs to be precisely `int', not necessarily __int32_t,
since the arg type of the ctype functions is int.
o __rune_t, __wchar_t and __wint_t inherit this via a typedef of
__ct_rune_t.
o Some minor wording changes in the comment blocks for ct_rune_t and
mbstate_t.
Submitted by: bde (partially)
called <machine/_types.h>.
o <machine/ansi.h> will continue to live so it can define MD clock
macros, which are only MD because of gratuitous differences between
architectures.
o Change all headers to make use of this. This mainly involves
changing:
#ifdef _BSD_FOO_T_
typedef _BSD_FOO_T_ foo_t;
#undef _BSD_FOO_T_
#endif
to:
#ifndef _FOO_T_DECLARED
typedef __foo_t foo_t;
#define _FOO_T_DECLARED
#endif
Concept by: bde
Reviewed by: jake, obrien
<stdint.h>. Previously, parts were defined in <machine/ansi.h> and
<machine/limits.h>. This resulted in two problems:
(1) Defining macros in <machine/ansi.h> gets in the way of that
header only defining types.
(2) Defining C99 limits in <machine/limits.h> adds pollution to
<limits.h>.
handler in the kernel at the same time. Also, allow for the
exec_new_vmspace() code to build a different sized vmspace depending on
the executable environment. This is a big help for execing i386 binaries
on ia64. The ELF exec code grows the ability to map partial pages when
there is a page size difference, eg: emulating 4K pages on 8K or 16K
hardware pages.
Flesh out the i386 emulation support for ia64. At this point, the only
binary that I know of that fails is cvsup, because the cvsup runtime
tries to execute code in pages not marked executable.
Obtained from: dfr (mostly, many tweaks from me).
hardly MD, since all our platforms share the same macro. It's not
really compiler dependent either, but this helps in reducing
<machine/ansi.h> to only type definitions.
implementations can provide a base zero ffs function if they wish.
This changes
#define RQB_FFS(mask) (ffs64(mask))
foo = RQB_FFS(mask) - 1;
to
#define RQB_FFS(mask) (ffs64(mask) - 1)
foo = RQB_FFS(mask);
On some platforms we can get the "- 1" for free, eg: those that use the
C code for ffs64().
Reviewed by: jake (in principle)
struct uuid defined in <sys/uuid.h>.
Use uuid/UUID instead of guid/GUID to emphasize that the
identifiers are DCE version 1 identifiers and also to avoid
inconsistencies as much a possible.
As a minor positive side-effect, code at -O0 is more optimal. As a
minor negative side-effect, certain boundary cases yield no better
code than non-boundary cases. For example, atomic_set_acq_32(p, 0)
does a useless logical OR with value 0. This was previously elimina-
ted as part of if/while optimizations. Non-boundary cases yield
identical code at -O1 and -O2.
- Don't include ia64_cpu.h and cpu.h
- Guard definitions by _NO_NAMESPACE_POLLUTION
- Move definition of KERNBASE to vmparam.h
o Move definitions of IA64_RR_{BASE|MASK} to vmparam.h
o Move definitions of IA64_PHYS_TO_RR{6|7} to vmparam.h
o While here, remove some left-over Alpha references.
function to return the total number of CPUs and not the highest
CPU id.
o Define mp_maxid based on the minimum of the actual number of
CPUs in the system and MAXCPU.
o In cpu_mp_add, when the CPU id of the CPU we're trying to add
is larger than mp_maxid, don't add the CPU. Formerly this was
based on MAXCPU. Don't count CPUs when we add them. We already
know how many CPUs exist.
o Replace MAXCPU with mp_maxid when used in loops that iterate
over the id space. This avoids a couple of useless iterations.
o In cpu_mp_unleash, use the number of CPUs to determine if we
need to launch the CPUs.
o Remove mp_hardware as it's not used anymore.
o Move the IPI vector array from mp_machdep.c to sal.c. We use
the array as a centralized place to collect vector assignments.
Note that we still assign vectors to SMP specific IPIs in
non-SMP configurations. Rename the array from mp_ipi_vector to
ipi_vector.
o Add IPI_MCA_RENDEZ and IPI_MCA_CMCV. These are used by MCA.
Note that IPI_MCA_CMCV is not SMP specific.
o Initialize the ipi_vector array so that we place the IPIs in
sensible priority classes. The classes are relative to where
the AP wake-up vector is located to guarantee that it's the
highest priority (external) interrupt. Class assignment is
as follows:
class IPI notes
x AP wake-up (normally x=15)
x-1 MCA rendezvous
x-2 AST, Rendezvous, stop
x-3 CMCV, test
o Create pcb_save as the backend for savectx and cpu_switch.
o While here, use explicit bundling for pcb_save and optimize
for compactness (~87% density).
o Not part of the commit is a backend pcb_restore. restorectx()
still jumps halfway into cpu_switch().
only for exceptions.
While adding this to exception_save and exception_restore, it was hard
to find a good place to put the instructions. The code sequence was
sufficiently arbitrarily ordered that the density was low (roughly 67%).
No explicit bundling was used.
Thus, I rewrote the functions to optimize for density (close to 80% now),
and added explicit bundles and nop instructions. The immediate operand
on the nop instruction has been incremented with each instance, to make
debugging a bit easier when looking at recurring patterns. Redundant
stops have been removed as much as possible. Future optimizations can
focus more on performance. A well-placed lfetch can make all the
difference here!
Also, the FRAME_Fxx defines in frame.h were mostly bogus. FRAME_F10 to
FRAME_F15 were copied from FRAME_F9 and still had the same index. We
don't use them yet, so nothing was broken.
i386/ia64/alpha - catch up to sparc64/ppc:
- replace pmap_kernel() with refs to kernel_pmap
- change kernel_pmap pointer to (&kernel_pmap_store)
(this is a speedup since ld can set these at compile/link time)
all platforms (as suggested by jake):
- gc unused pmap_reference
- gc unused pmap_destroy
- gc unused struct pmap.pm_count
(we never used pm_count - we track address space sharing at the vmspace)
_BYTE_ORDER. These are far more useful than their non-underscored
equivalents as these can be used in restricted namespace environments.
Mark the non-underscored variants as deprecated.
and cpu_critical_exit() and moves associated critical prototypes into their
own header file, <arch>/<arch>/critical.h, which is only included by the
three MI source files that need it.
Backout and re-apply improperly comitted syntactical cleanups made to files
that were still under active development. Backout improperly comitted program
structure changes that moved localized declarations to the top of two
procedures. Partially re-apply one of the program structure changes to
move 'mask' into an intermediate block rather then in three separate
sub-blocks to make the code more readable. Re-integrate bug fixes that Jake
made to the sparc64 code.
Note: In general, developers should not gratuitously move declarations out
of sub-blocks. They are where they are for reasons of structure, grouping,
readability, compiler-localizability, and to avoid developer-introduced bugs
similar to several found in recent years in the VFS and VM code.
Reviewed by: jake
disablement assumptions in kern_fork.c by adding another API call,
cpu_critical_fork_exit(). Cleanup the td_savecrit field by moving it
from MI to MD. Temporarily move cpu_critical*() from <arch>/include/cpufunc.h
to <arch>/<arch>/critical.c (stage-2 will clean this up).
Implement interrupt deferral for i386 that allows interrupts to remain
enabled inside critical sections. This also fixes an IPI interlock bug,
and requires uses of icu_lock to be enclosed in a true interrupt disablement.
This is the stage-1 commit. Stage-2 will occur after stage-1 has stabilized,
and will move cpu_critical*() into its own header file(s) + other things.
This commit may break non-i386 architectures in trivial ways. This should
be temporary.
Reviewed by: core
Approved by: core