a lock to prevent interspersed strings written from different CPUs
at the same time.
To avoid putting a buffer on the stack or having to malloc one,
space is incorporated in the per-cpu structure. The buffer
size if 128 bytes; chosen because it's the next power of 2 size
up from 80 characters.
String writes to the console are buffered up the end of the line
or until the buffer fills. Then the buffer is flushed to all
console devices.
Existing low level console output via cnputc() is unaffected by
this change. ithread calls to log() are also unaffected to avoid
blocking those threads.
A minor change to the behaviour in a panic situation is that
console output will still be buffered, but won't be written to
a tty as before. This should prevent interspersed panic output
as a number of CPUs panic before we end up single threaded
running ddb.
Reviewed by: scottl, jhb
MFC after: 2 weeks
argument in parentheses so these macros are safe to use and invocations
with an expression as the argument like __bswap32_const(42 << 23 | 13)
work as expected. Additionally, mask all the individually shifted bytes
as appropriate so the bytes which exceed the width of the respective
__bswapN_const() macro in invocations like __bswap16_const(0xdead600d)
are ignored like it's the case with the corresponding __bswapN_var()
function.
MFC after: 3 days
The 'nooption' kernel config entry has to be used to turn KSE off now.
This isn't my preferred way of dealing with this, but I'll defer to
scottl's experience with the io/mem kernel option change and the grief
experienced over that.
Submitted by: scottl@
except sun4v.
This change makes the transition from a default to an option more
transparent and is an attempt to head off all the compliants that are
likely from people who don't read UPDATING, based on experience with
the io/mem change.
Submitted by: scottl@
: # Options for atkbd:
: options ATKBD_DFLT_KEYMAP # specify the built-in keymap
: makeoptions ATKBD_DFLT_KEYMAP=jp.106
[...]
: nooption ATKBD_DFLT_KEYMAP
: nomakeoption ATKBD_DFLT_KEYMAP
(Previously the option was inherited from MI NOTES.) So my tool in
rev. 1.26 reduced this to removing all "ATKBD_DFLT_KEYMAP" lines,
leaving the option effectively disabled as it was before, but since
it's actually supported on sparc64, turn it on now.
since they just duplicated the MI `reset' command. Instead of removing
them, make `reboot' an MI alias for `reboot' since this gives a better
way of killing the `r' alias for `reset'. Remove the `registers' command
that was used to kill the alias.
Turn the powerpc and sparc64 MD `halt' command into an MI command.
A copy of sparc64/db_interface.c grew in sun4v just after I found the
extra reboot commands. It has not been changed, and is now not
identical. Duplicated commands come out duplicated in ddb's online
help, but cause large problems when used (e.g., on i386's with 2 halt's
and an hwatch, typing h doesn' give the expected message about an
ambiguous command, but hangs like the halt command or a looping parseri
would).
unsuspecting users.
- Add a comment in NOTES about experimental status of SCHED_ULE.
- Make warning about experimental status in sched_ule(4) a bit
stronger.
Suggested and reviewed by: dougb
Discussed on: developers
MFC after: 3 days
Submitted by:
Reviewed by:
Approved by:
Obtained from:
MFC after:
Security:
Move the relocation definitions to the common elf header so that DTrace
can use them on one architecture targeted to a different one.
Add the additional ELF types defines in Sun's "Linker and Libraries"
manual.
pmap_protect(), and pmap_copy() have optimizations for regions
larger than PMAP_TSB_THRESH (which works out to 16MB). This
caused a panic in tsb_foreach for kernel mappings, since
pm->pm_tsb is NULL in that case. This fix teaches tsb_foreach
to use the kernel's tsb in that case.
Submitted by: Michael Plass
MFC after: 3 days
for a bit before retrying to resend an IPI in order to avoid
deadlocks if the other CPU is also trying to send one.
OpenSolaris uses a delay of 1 microsecond here but waiting 2
microseconds with interrupts enabled like Linux does shouldn't
hurt but is a bit safer.
MFC after: 1 day
Originally, I had adopted sparc64's name, pmap_clear_write(), for the
function that is now pmap_remove_write(). However, this function is more
like pmap_remove_all() than like pmap_clear_modify() or
pmap_clear_reference(), hence, the name change.
The higher-level rationale behind this change is described in
src/sys/amd64/amd64/pmap.c revision 1.567. The short version is that I'm
trying to clean up and fix our support for execute access.
Reviewed by: marcel@ (ia64)
and pc98 MD files. Remove nodevice and nooption lines specific
to sio(4) from ia64, powerpc and sparc64 NOTES. There were no
such lines for arm yet.
sio(4) is usable on less than half the platforms, not counting
a future mips platform. Its presence in MI files is therefore
increasingly becoming a burden.
mark system calls as being MPSAFE:
- Stop conditionally acquiring Giant around system call invocations.
- Remove all of the 'M' prefixes from the master system call files.
- Remove support for the 'M' prefix from the script that generates the
syscall-related files from the master system call files.
- Don't explicitly set SYF_MPSAFE when registering nfssvc.
implementations and adjust some of the checks while I'm here:
- Add a new check to make sure we don't return from a syscall in a critical
section.
- Add a new explicit check before userret() to make sure we don't return
with any locks held. The advantage here is that we can include the
syscall number and name in syscall() whereas that info is not available
in userret().
- Drop the mtx_assert()'s of sched_lock and Giant. They are replaced by
the more general checks just added.
MFC after: 2 weeks
system's machine-dependent and machine-independent layers. Once
pmap_clear_write() is implemented on all of our supported
architectures, I intend to replace all calls to pmap_page_protect() by
calls to pmap_clear_write(). Why? Both the use and implementation of
pmap_page_protect() in our virtual memory system has subtle errors,
specifically, the management of execute permission is broken on some
architectures. The "prot" argument to pmap_page_protect() should
behave differently from the "prot" argument to other pmap functions.
Instead of meaning, "give the specified access rights to all of the
physical page's mappings," it means "don't take away the specified
access rights from all of the physical page's mappings, but do take
away the ones that aren't specified." However, owing to our i386
legacy, i.e., no support for no-execute rights, all but one invocation
of pmap_page_protect() specifies VM_PROT_READ only, when the intent
is, in fact, to remove only write permission. Consequently, a
faithful implementation of pmap_page_protect(), e.g., ia64, would
remove execute permission as well as write permission. On the other
hand, some architectures that support execute permission have
basically ignored whether or not VM_PROT_EXECUTE is passed to
pmap_page_protect(), e.g., amd64 and sparc64. This change represents
the first step in replacing pmap_page_protect() by the less subtle
pmap_clear_write() that is already implemented on amd64, i386, and
sparc64.
Discussed with: grehan@ and marcel@
install custom pager functions didn't actually happen in practice (they
all just used the simple pager and passed in a local quit pointer). So,
just hardcode the simple pager as the only pager and make it set a global
db_pager_quit flag that db commands can check when the user hits 'q' (or a
suitable variant) at the pager prompt. Also, now that it's easy to do so,
enable paging by default for all ddb commands. Any command that wishes to
honor the quit flag can do so by checking db_pager_quit. Note that the
pager can also be effectively disabled by setting $lines to 0.
Other fixes:
- 'show idt' on i386 and pc98 now actually checks the quit flag and
terminates early.
- 'show intr' now actually checks the quit flag and terminates early.
in 1999, and there are changes to the sysctl names compared to PR,
according to that discussion. The description is in sys/conf/NOTES.
Lines in the GENERIC files are added in commented-out form.
I'll attach the test script I've used to PR.
PR: kern/14584
Submitted by: babkin
an explicit comment that it's needed for the linuxolator. This is not the
case anymore. For all other architectures there was only a "KEEP THIS".
I'm (and other people too) running a COMPAT_43-less kernel since it's not
necessary anymore for the linuxolator. Roman is running such a kernel for a
for longer time. No problems so far. And I doubt other (newer than ia32
or alpha) architectures really depend on it.
This may result in a small performance increase for some workloads.
If the removal of COMPAT_43 results in a not working program, please
recompile it and all dependencies and try again before reporting a
problem.
The only place where COMPAT_43 is needed (as in: does not compile without
it) is in the (outdated/not usable since too old) svr4 code.
Note: this does not remove the COMPAT_43TTY option.
Nagging by: rdivacky
There is a race with the current locking scheme and removing
it should have no measurable performance impact.
This fixes page faults leading to panics in pmap_enter_quick_locked()
on amd64/i386.
Reviewed by: alc,jhb,peter,ps
moving the typedef of bus_space_tag_t from sys/sparc64/include/bus.h
to sys/sparc64/include/_bus.h. This brings sparc64 in sync with the
other platforms and fixes the compilation of drivers which include
<sys/rman.h> before <machine/bus.h> after sys/sys/rman.h rev. 1.34.
- Remove the definition of bus_type_t from sys/sparc64/include/_bus.h
as it's unused since sys/sparc64/include/bus.h rev. 1.6 and
sys/sparc64/sparc64/bus_machdep.c rev. 1.3.
- Remove some pointless comments.
the arm to compile without all the extras that don't appear, at least
not in the flavors of ARM I deal with. This helps us save about 100k.
If I've botched the available devices on a platform, please let me
know and I'll correct ASAP.
Map the device memory belonging to resources of type SYS_RES_MEMORY into
KVA upon activation so that rman_get_virtual() works as expected.
- In sbus_alloc_resource() checking whether toffs is 0 as an indication
that no applicable child range was found isn't appropriate as it's
perfectly valid for the requested SYS_RES_MEMORY resource to start at
the beginning of a child range. So check for the RMAN still being NULL
instead.
- As a minor runtime speed optimization break out of the loop where we
search for the applicable child range in sbus_alloc_resource() as soon
as it's found.
- Let sbus_setup_intr() return ENOMEM rather than 0 if it can't allocate
memory for the interrupt clearing info.
- Actually do what the comment in sbus_setup_intr() says and disable the
respective interrupt while fiddling with it.
- Remove some superfluous INTVEC() around inr, which already only contains
the interrupt vector, in sbus_setup_intr().
- While here, fix a style(9) bug in sbus_setup_intr() (don't use function
calls in initializers).
The first two changes are required for a CG6 driver.
MFC after: 2 weeks
Retire pmap_track_modified(). We no longer need it because we do not
create managed mappings within the clean submap. To prevent regressions,
add assertions blocking the creation of managed mappings within the clean
submap.
above what's used for fast interrupts, only interrupts with the level of
the interrupt which led to calling intr_fast() (which is used with both
fast and ithread interrupts) are blocked while in that function. Thus
intr_fast() can be preempted by a fast interrupt (which are of a higher
level than ithread interrupts) while servicing an ithread interrupt. This
can lead to a stale pointer to the head of the active interrupt requests
list when back in the ithread interrupt invocation of intr_fast(), in turn
resulting in corruption of the interrupt request lists and consequently
in a panic. Solve this be turning off interrupts in intr_fast() before
reading the pointer to the head of the active list rather than after. [1]
- Add a KASSERT in intr_fast() which asserts that ir_func is non-zero before
calling it. [1]
- Increment interrupt stats after calling the handlers rather than before.
This reduces the delay until direct and fast handlers are serviced, in my
testings by 30% on average for the direct tick interrupt handler, in turn
resulting in less clock drift.
PR: 94778 [1]
Submitted by: Andrew Belashov [1]
MFC after: 2 weeks
PCI devices apparently was changed from a special deferred trap with TPC
pointing to the membar #Sync following the failing load/store instruction
to a precise trap with TPC pointing to the failing load/store instruction.
Thus remove the check the check whether TPC points to a membar #Sync in
case of a data access trap as it's off-by-one for USIII CPUs and it should
be sufficient to check whether the trap happend while in fasword*() to
properly detect traps caused by peeking/poking. This also corresponds to
what other OSs do. Note that also only the USIIi manual suggests to check
the TPC for such traps while the USII one doesn't (in the public USIII
manual device peeking/poking isn't mentioned at all).
PCB in which the context of stopped CPUs is stored. To access this
PCB from KDB, we introduce a new define, called KDB_STOPPEDPCB. The
definition, when present, lives in <machine/kdb.h> and abstracts
where MD code saves the context. Define KDB_STOPPEDPCB on i386,
amd64, alpha and sparc64 in accordance to previous code.
- Use FBSDID in trap.c
- Make the global trap_sig[] static as it's not used outside of trap.c.
- In sendsig() remove an unused variable.
- In trap() sync with the other archs; for fast data access MMU miss and
data access protection traps set ksi_addr to the SFAR reg which contains
the faulting address and otherwise to the TPC reg. Generally the TCP reg
contains the address of the instruction that caused the exception, except
for fast instruction access traps (and some others; more refinement may
be needed here) it also contains the faulting address.
Previously sendsig() always set si_addr to the SFAR reg which is wrong
for most traps.
- In sendsig() add support for FreeBSD old-style signals.
These changes are inspired by kmacy's sun4v changes and allow libsigsegv
to build on FreeBSD/sparc64, but it doesn't pass all checks and tests it
actually should, yet.
MFC after: 5 days
foreign per-CPU pages in cpu_ipi_send() in order to get the module IDs
of the other CPUs can cause a page fault. If this happens when doing a
TLB shootdown while dealing with another page fault this causes a panic
due to the recursive page fault. As I don't spot other code that assumes
or requires that accessing foreign per-CPU pages must not page fault
solve this by adding a statically allocated (and therefore locked in the
kernel pages) array which establishes a FreeBSD CPU ID -> module ID
relation and use that in cpu_ipi_selected() (instead of statically
allocating the per-CPU pages which would just waste memory on say a dual
CPU machine as sun4u theoretically supports up to 128 CPUs or wasting
dTLB slots for the foreign per-CPU pages). [1]
- Fix a potential race in cpu_ipi_send(); as we don't serialize the access
to cpu_ipi_selected() between MI and MD use (only MI-MI and MD-MD) we
might catch the NACK bit caused by sending another IPI. Solve this by
checking the NACK bit in the contents of the interrupt dispatch status
reg read while interrupts were still turned off instead of reading that
reg anew after interrupts were turned on again. This is also what the
CPU docs suggest to do.
- Add a workaround for the SpitFire erratum #54 bug (affecting interrupt
dispatch). While public info regarding what this CPU bug actually causes
is not available testing shows that with the workaround in place it's
less likely to get a "couldn't send ipi" panic, it doesn't solve these
panics entirely though. [2]
Reported by: kris [1]
Some clue from: kmacy [1]
Info from: Linux, OpenSolaris [2]
Additional testing by: kris
MFC after: 3 days
as we have to call tick_init() before cninit() in order to provide the
low-level console drivers with a working DELAY() which in turn means we
cannot use panic() in tick_init().
- s,to high, too high, in the panic string
Inspired by: kmacy's sun4v changes
MFC after: 3 days
when option DEBUG_LOCKS is used. Trap frames are determined by checking
whether the caller was one of the tl0_*() or tl1_*() asm functions via
a newly added pair of dummy symbols in exception.S which mark the begin
and end of these functions. The tl_trap_* pair marks those in the special
.trap section and the tl_text_* in the regular .text section. Because
of their performance penalty db_search_symbol()/db_symbol_values() and
linker_ddb_search_symbol()/linker_ddb_symbol_values() aren't used here
for determining the caller, with db_search_symbol()/db_symbol_values()
additionally not being reentrant.
- For consistency, change db_backtrace() to also use the new markers for
determining the tl0_*() and tl1_*() asm functions instead of bcmp()'ing
the symbol name.
- Use FBSDID in db_trace.c.
PR: 93226
Based on a patch by: Antoine Brodin <antoine.brodin@laposte.net>
Ok'ed by: jhb
pages, not a count of bytes. The sysctl handler for hw.realmem already
uses ctob() to convert realmem from pages to bytes. Thus, on archs that
were storing a byte count in the realmem variable, hw.realmem was inflated.
Reported by: Valerio daelli valerio dot daelli at gmail dot com (alpha)
MFC after: 3 days
Keep accounting time (in per-cpu) cputicks and the statistics counts
in the thread and summarize into struct proc when at context switch.
Don't reach across CPUs in calcru().
Add code to calibrate the top speed of cpu_tickrate() for variable
cpu_tick hardware (like TSC on power managed machines).
Don't enforce monotonicity (at least for now) in calcru. While the
calibrated cpu_tickrate ramps up it may not be true.
Use 27MHz counter on i386/Geode.
Use TSC on amd64 & i386 if present.
Use tick counter on sparc64
Rename struct thread's td_sticks to td_pticks, we will need the
other name for more appropriately named use shortly. Reduce it
from uint64_t to u_int.
Clear td_pticks whenever we enter the kernel instead of recording
its value as reference for userret(). Use the absolute value of
td->pticks in userret() and eliminate third argument.
Keep track of time spent by the cpu in various contexts in units of
"cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t
only when somebody wants to inspect the numbers.
For now "cputicks" are still derived from the current timecounter
and therefore things should by definition remain sensible also on
SMP machines. (The main reason for this first milestone commit is
to verify that hypothesis.)
On slower machines, the avoided multiplications to normalize timestams
at every context switch, comes out as a 5-7% better score on the
unixbench/context1 microbenchmark. On more modern hardware no change
in performance is seen.
in order to support the on-board LANCE in Ultra 1 and to the MI NOTES as
it should work just fine with the AMD PCnet family of chips on all archs
but is not yet meant to replace lnc(4). If a kernel includes all of le(4),
lnc(4) and pcn(4) precedence is given to lnc(4)/pcn(4) for now.
- Like lsi64854_scsi_intr() return -1 in case there was a DMA error so
the caller can distinguish it from a normal interrupt and leave the
reset of the DMA engine to the caller so we don't kill any state there.
- Move the static 'dodrain' flag to struct lsi64854_softc as there can
be more than one LSI64854 used for a LANCE in a system and reset it
again once draining the E-cache is done so we don't keep draining the
cache with every interrupt.
- Remove calling sc->sc_intrchain(), we will call lsi64854_enet_intr()
via sc->intr() in the interrupt handler of the LANCE driver and not
use it in chained mode.
o lsi64854_pp_intr():
- Like lsi64854_scsi_intr() return -1 in case there was a DMA error so
the caller can distinguish it from a normal interrupt.
o Remove the no longer used sc_intrchain* from struct lsi64854_softc.
o Make lsi64854_reset(), lsi64854_setup*() and lsi64854_*_intr() static
to lsi64854.c as we do and will only call them via the respective
function pointers in struct lsi64854_softc.
o While here fix style(9) bugs (variable definition inside a nested scope).
interrupt handler for the LANCE devices and remove dma_setup_intr(). We
just can't completely ignore the DMA engine in a LANCE driver anyway and
calling the DMA engine interrupt handler in the LANCE driver directly
allows to cover it by the LANCE driver lock.
and resume methods so these events propagate through the device driver
hierarchy.
- In dma(4) enable the chaining of the DMA engine interrupt handler for
the LANCE devices via a dma_setup_intr(). This was commented out before
as I was unsure whether I'd use it but this is probably cleaner than
fiddling with the DMA engine interrupt in the LANCE driver directly.
- In ebus_setup_dinfo() free 'intrs' instead of 'reg' twice in case
setting up a child fails due to routing one of its interrupts fails. [1]
Found by: Coverity Prevent [1]
MFC after: 3 days
operands are consumed so use the appropriate constraint modifier.
Before this change GCC used one register for both an input and an
unrelated output operand of in_addword(), causing the input to be
overwritten before it was consumed and thus breaking in_addword().
For in_cksum_hdr() and in_pseudo() this change is more or less
cosmetic.
- Fix a misspelling in a nearby comment.
Reported & tested by: yongari
MFC after: 1 week
to COMPAT_43TTY.
Add COMPAT_43TTY to NOTES and */conf/GENERIC
Compile tty_compat.c only under the new option.
Spit out
#warning "Old BSD tty API used, please upgrade."
if ioctl_compat.h gets #included from userland.
various pcib drivers to use their own private devclass_t variables for
their modules.
- Use the DEFINE_CLASS_0() macro to declare drivers for the various pcib
drivers while I'm here.
- provide an interface (macros) to the page coloring part of the VM system,
this allows to try different coloring algorithms without the need to
touch every file [1]
- make the page queue tuning values readable: sysctl vm.stats.pagequeue
- autotuning of the page coloring values based upon the cache size instead
of options in the kernel config (disabling of the page coloring as a
kernel option is still possible)
MD changes:
- detection of the cache size: only IA32 and AMD64 (untested) contains
cache size detection code, every other arch just comes with a dummy
function (this results in the use of default values like it was the
case without the autotuning of the page coloring)
- print some more info on Intel CPU's (like we do on AMD and Transmeta
CPU's)
Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue"
and report if the cache* values are zero (= bug in the cache detection code)
or not.
Based upon work by: Chad David <davidc@acns.ab.ca> [1]
Reviewed by: alc, arch (in 2004)
Discussed with: alc, Chad David, arch (in 2004)
with flags bitfield and set BI_CAN_EXEC_DYN flag for all brands that usually
allow executing elf dynamic binaries (aka shared libraries). When it is
requested to execute ET_DYN elf image check if this flag is on after we
know the elf brand allowing execution if so.
PR: kern/87615
Submitted by: Marcin Koziej <creep@desk.pl>
passing a pointer to an opaque clockframe structure and requiring the
MD code to supply CLKF_FOO() macros to extract needed values out of the
opaque structure, just pass the needed values directly. In practice this
means passing the pair (usermode, pc) to hardclock() and profclock() and
passing the boolean (usermode) to hardclock_cpu() and hardclock_process().
Other details:
- Axe clockframe and CLKF_FOO() macros on all architectures. Basically,
all the archs were taking a trapframe and converting it into a clockframe
one way or another. Now they can just extract the PC and usermode values
directly out of the trapframe and pass it to fooclock().
- Renamed hardclock_process() to hardclock_cpu() as the latter is more
accurate.
- On Alpha, we now run profclock() at hz (profhz == hz) rather than at
the slower stathz.
- On Alpha, for the TurboLaser machines that don't have an 8254
timecounter, call hardclock() directly. This removes an extra
conditional check from every clock interrupt on Alpha on the BSP.
There is probably room for even further pruning here by changing Alpha
to use the simplified timecounter we use on x86 with the lapic timer
since we don't get interrupts from the 8254 on Alpha anyway.
- On x86, clkintr() shouldn't ever be called now unless using_lapic_timer
is false, so add a KASSERT() to that affect and remove a condition
to slightly optimize the non-lapic case.
- Change prototypeof arm_handler_execute() so that it's first arg is a
trapframe pointer rather than a void pointer for clarity.
- Use KCOUNT macro in profclock() to lookup the kernel profiling bucket.
Tested on: alpha, amd64, arm, i386, ia64, sparc64
Reviewed by: bde (mostly)
to search for a specific extended capability. If the specified capability
is found for the given device, then the function returns success and
optionally returns the offset of that capability. If the capability is
not found, the function returns an error.
means:
o Remove Elf64_Quarter,
o Redefine Elf64_Half to be 16-bit,
o Redefine Elf64_Word to be 32-bit,
o Add Elf64_Xword and Elf64_Sxword for 64-bit entities,
o Use Elf_Size in MI code to abstract the difference between
Elf32_Word and Elf64_Word.
o Add Elf_Ssize as the signed counterpart of Elf_Size.
MFC after: 2 weeks
- Move vtophys() macros next to vtopte() where vtopte() exists to match
comments above vtopte().
- Remove references to the alternate address space in the comment above
vtopte(). amd64 never had the alternate address space, and i386 lost it
prior to PAE support being added.
- s/entires/entries/ in comments.
Reviewed by: alc
KTR_* class macros via genassym.c. Together with sys/sys/ktr.h
rev. 1.34 this has the desired side-effect of providing a default
value for KTR_COMPILE. Thus this fixes warnings from -Wundef
regarding KTR_COMPILE not being defined for .S files.
Requested by: ru
Reviewed by: ru
MACHINE_ARCH and MACHINE). Their purpose was to be able to test
in cpp(1), but cpp(1) only understands integer type expressions.
Using such unsupported expressions introduced a number of subtle
bugs, which were discovered by compiling with -Wundef.
from sys/sparc64/include/ofw_upa.h to sys/sparc64/pci/ofw_pci.h and
rename them to struct ofw_pci_ranges and OFW_PCI_RANGE_* respectively.
This ranges struct only applies to host-PCI bridges but no to other
bridges found on UPA. At the same time it applies to all host-PCI
bridges regardless of whether the interconnection bus is Fireplane/
Safari, JBus or UPA.
- While here rename the PCI_CS_* macros in sys/sparc64/pci/ofw_pci.h
to OFW_PCI_CS_* in order to be consistent and change this header to
use uintXX_t instead of u_intXX_t.