Commit Graph

1776 Commits

Author SHA1 Message Date
Marcel Moolenaar
c56153c577 Define curthread as an inline function that loads the thread pointer
directly from r13, the pcpu pointer. This guarantees correct behaviour
when the thread migrates to a different CPU.
2010-03-22 02:01:33 +00:00
Marcel Moolenaar
a5d64faeca Print MD fields in the pcpu to aid debugging. 2010-03-21 22:39:11 +00:00
Marcel Moolenaar
c50679660e Don't include <machine/_regset.h> when _MACHINE_REGSET_H_ in defined.
This is not for multiple inclusion purposes, because _regset.h already
handles this, but to enable inclusion of the MD header by cross-tools
on non-ia64 installations. The cross-tool can include _regset.h itself
before including MD headers that depend on it.
2010-03-21 22:33:09 +00:00
Marcel Moolenaar
a5cef7a1ce Don't check for boot_verbose in the environment. The loader does
that already and sets RB_VERBOSE. The loader has always done it.
2010-03-20 04:22:22 +00:00
Marcel Moolenaar
3804454ac0 Revamp the interrupt code based on the previous commit:
o   Introduce XIV, eXternal Interrupt Vector, to differentiate from
    the interrupts vectors that are offsets in the IVT (Interrupt
    Vector Table). There's a vector for external interrupts, which
    are based on the XIVs.

o   Keep track of allocated and reserved XIVs so that we can assign
    XIVs without hardcoding anything. When XIVs are allocated, an
    interrupt handler and a class is specified for the XIV. Classes
    are:
    1.  architecture-defined: XIV 15 is returned when no external
	interrupt are pending,
    2.  platform-defined: SAL reports which XIV is used to wakeup
	an AP (typically 0xFF, but it's 0x12 for the Altix 350).
    3.  inter-processor interrupts: allocated for SMP support and
	non-redirectable.
    4.  device interrupts (i.e. IRQs): allocated when devices are
	discovered and are redirectable.

o   Rewrite the central interrupt handler to call the per-XIV
    interrupt handler and rename it to ia64_handle_intr(). Move
    the per-XIV handler implementation to the file where we have
    the XIV allocation/reservation. Clock interrupt handling is
    moved to clock.c. IPI handling is moved to mp_machdep.c.

o   Drop support for the Intel 8259A because it was broken. When
    XIV 0 is received, the CPU should initiate an INTA cycle to
    obtain the interrupt vector of the 8259-based interrupt. In
    these cases the interrupt controller we should be talking to
    WRT to masking on signalling EOI is the 8259 and not the I/O
    SAPIC. This requires adriver for the Intel 8259A which isn't
    available for ia64. Thus stop pretending to support ExtINTs
    and instead panic() so that if we come across hardware that
    has an Intel 8259A, so have something real to work with.

o   With XIVs for IPIs dynamically allocatedi and also based on
    priority, define the IPI_* symbols as variables rather than
    constants. The variable holds the XIV allocated for the IPI.

o   IPI_STOP_HARD delivers a NMI if possible. Otherwise the XIV
    assigned to IPI_STOP is delivered.
2010-03-17 00:37:15 +00:00
Marcel Moolenaar
510e1af7cb Have cpu_throw() loop on blocked_lock as well. This bug has existed
a long time and has gone unnoticed just as long, because I kept
using sched_4bsd (due to sched_ule not working with preemption),
but GENERIC had sched_ule by default -- including SMP.

While here, remove unused inclusion of <machine/clock.h>, remove
totally bogus inclusion of <i386/include/specialreg.h>.
2010-03-15 16:53:09 +00:00
Ed Schouten
338f1debcd Remove COMPAT_43TTY from stock kernel configuration files.
COMPAT_43TTY enables the sgtty interface. Even though its exposure has
only been removed in FreeBSD 8.0, it wasn't used by anything in the base
system in FreeBSD 5.x (possibly even 4.x?). On those releases, if your
ports/packages are less than two years old, they will prefer termios
over sgtty.
2010-03-13 09:21:00 +00:00
Nathan Whitehorn
da4e34909f Accidentally committed test code. Remove it.
Big pointy hat:	me
2010-03-11 14:54:54 +00:00
Nathan Whitehorn
841c0c7ec7 Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

Reviewed by:	kib, jhb
2010-03-11 14:49:06 +00:00
Marcel Moolenaar
3d8de82c72 Remove inclusion of <i386/include/psl.h>
While here move inclusion of <sys/lock.h> in a better place.
2010-03-09 02:08:02 +00:00
Marcel Moolenaar
01422bafc6 Remove support for SYS_RES_DRQ. 2010-03-09 02:05:01 +00:00
Joel Dahl
1edcf74de7 The NetBSD Foundation has granted permission to remove clause 3 and 4 from
the software.

Obtained from:	NetBSD
2010-03-03 17:55:51 +00:00
Marcel Moolenaar
d1d5b9c5a6 Interrupt related cleanups:
o  Assign vectors based on priority, because vectors have
   implied priority in hardware.
o  Use unordered memory accesses to the I/O sapic and use
   the acceptance form of the mf instruction.
o  Remove the sapicreg.h and sapicvar.h headers. All definitions
   in sapicreg.h are private to sapic.c and all definitions in
   sapicvar.h are either private or interface functions. Move the
   interface functions to intr.h.
o  Hide the definition of struct sapic.
2010-02-27 18:55:43 +00:00
Marcel Moolenaar
93e184c12e Prefer I-units and M-units for nop instructions. This works around
McKinley flaws. It also avoids using the F-unit in the kernel for
no reason.
2010-02-22 01:23:41 +00:00
Marcel Moolenaar
6ed45c0269 Normalize nop instructions: Only use 0 for the immediate operand. 2010-02-21 23:41:59 +00:00
Marcel Moolenaar
26ce74e3c5 Remove pm_active from struct pmap as it serves no purpose.
MFC after:	1 week
2010-02-21 23:10:13 +00:00
Attilio Rao
c1210a7d97 Adjust style (following the already existing rules) for the newly
introduced option DEADLKRES.

Reported by:	danfe, julian, avg
2010-02-15 23:44:48 +00:00
Marcel Moolenaar
438e84ae72 Some code cleanups:
o   s/u_int32_t/uint32_t/g
o   Add multiple-inclusion protection.
o   Break long lines.
2010-02-14 17:03:20 +00:00
Marcel Moolenaar
26279767e4 Some code churn:
o   Eliminate IA64_PHYS_TO_RR6 and change all places where the macro is used
    by calling either bus_space_map() or pmap_mapdev().
o   Implement bus_space_map() in terms of pmap_mapdev() and implement
    bus_space_unmap() in terms of pmap_unmapdev().
o   Have ia64_pib hold the uncached virtual address of the processor interrupt
    block throughout the kernel's life and access the elements of the PIB
    through this structure pointer.

This is a non-functional change with the exception of using ia64_ld1() and
ia64_st8() to write to the PIB. We were still using assignments, for which
the compiler generates semaphore reads -- which cause undefined behaviour
for uncacheable memory. Note also that the memory barriers in ipi_send() are
critical for proper functioning.

With all the mapping of uncached memory done by pmap_mapdev(), we can keep
track of the translations and wire them in the CPU. This then eliminates
the need to reserve a whole region for uncached I/O and it eliminates
translation traps for device I/O accesses.
2010-02-14 16:56:24 +00:00
Attilio Rao
88cbfa852e Add the options DEADLKRES (introducing the deadlock resolver thread) in
the 'debugging' section of any HEAD kernel and enable for the mainstream
ones, excluding the embedded architectures.
It may, of course, enabled on a case-by-case basis.

Sponsored by:	Sandvine Incorporated
Requested by:	emaste
Discussed with:	kib
2010-02-10 16:30:04 +00:00
Marcel Moolenaar
58ce165dfd Fix single-stepping when the kernel was entered through the EPC syscall
path. When the taken branch leaves the kernel and enters the process,
we still need to execute the instruction at that address. Don't raise
SIGTRAP when we branch into the process, but enable single-stepping
instead.
2010-02-06 20:46:14 +00:00
Marcel Moolenaar
e59faa5014 In pci_cfgregread() and pci_cfgregwrite(), validate the arguments and check
that the alignment matches the width of the read or write.
2010-01-28 04:50:09 +00:00
Marcel Moolenaar
9d908720aa In cpu_switch(), use an atomic operation to set the td_lock
of the old thread to the mutex that's passed.

Pointed out by: attilio, jhb
2010-01-27 02:32:07 +00:00
Marcel Moolenaar
5111f97cde Remove cpu_boot() and call efi_reset_system() directly from
cpu_reset().
2010-01-23 23:16:50 +00:00
Marcel Moolenaar
646420c8dc Add ioctl requests to /dev/io on ia64 for reading and writing
EFI variables. The primary reason for this is that it allows
sysinstall(8) to add a boot menu item for the newly installed
FreeBSD image.
2010-01-14 02:48:39 +00:00
Marcel Moolenaar
684b17831f Fix previous commitr:. efi_var_set() was copied from efi_var_get(),
but wasn't actually changed.
2010-01-14 02:38:46 +00:00
Marcel Moolenaar
259565ce68 Add wrappers for the RT Variable Services. While here, translate the
EFI status into a standard errno value and change efi_set_time() to
return a standard error.

MFC after:	1 week
2010-01-14 02:14:21 +00:00
Marcel Moolenaar
409a390c33 Use io(4) for I/O port access on ia64, rather than through sysarch(2).
I/O port access is implemented on Itanium by reading and writing to a
special region in memory. To hide details and avoid misaligned memory
accesses, a process did I/O port reads and writes by making a MD system
call. There's one fatal problem with this approach: unprivileged access
was not being prevented. /dev/io serves that purpose on amd64/i386, so
employ it on ia64 as well. Use an ioctl for doing the actual I/O and
remove the sysarch(2) interface.

Backward compatibility is not being considered. The sysarch(2) approach
was added to support X11, but support for FreeBSD/ia64 was never fully
implemented in X11. Thus, nothing gets broken that didn't need more work
to begin with.

MFC after:	1 week
2010-01-11 18:10:13 +00:00
Warner Losh
87948dfdf2 Add INCLUDE_CONFIG_FILE in GENERIC on all non-embedded platforms.
# This is the resolution of removing it from DEFAULTS...

MFC after:	5 days
2010-01-10 17:44:22 +00:00
Bjoern A. Zeeb
193171b7f5 In sys/<arch>/conf/Makefile set TARGET to <arch>. That allows
sys/conf/makeLINT.mk to only do certain things for certain
architectures.

Note that neither arm nor mips have the Makefile there, thus
essentially not (yet) supporting LINT.  This would enable them
do add special treatment to sys/conf/makeLINT.mk as well chosing
one of the many configurations as LINT.

This is a hack of doing this and keeping it in a separate commit
will allow us to more easily identify and back it out.

Discussed on/with:	arch, jhb (as part of the LINT-VIMAGE thread)
MFC after:		1 month
2010-01-08 18:57:31 +00:00
Warner Losh
56eff2143f Revert 200594. This file isn't intended for these sorts of things. 2010-01-04 21:30:04 +00:00
Brooks Davis
9efde58392 Add vlan(4) to all GENERIC kernels.
MFC after:	1 week
2010-01-03 20:40:54 +00:00
Marcel Moolenaar
f7afeafebb Change BUS_SPACE_MAXADDR from 2^32-1 to 2^64-1. 2^32-1 is representative
for its origin, more than for its accuracy.

MFC after:	1 week
2010-01-02 00:37:00 +00:00
Marcel Moolenaar
938026e334 Revamp bus_space access functions:
o   Optimize for memory mapped I/O by making all I/O port acceses function
    calls and marking the test for the IA64_BUS_SPACE_IO tag with
    __predict_false(). Implement the I/O port access functions in a new
    file, called bus_machdep.c.
o   Change the bus_space_handle_t for memory mapped I/O to the virtual
    address rather than the physical address. This eliminates the PA->VA
    translation for every I/O access. The handle for I/O port access is
    still the port number.
o   Move inb(), outb(), inw(), outw(), inl(), outl(), and their string
    variants from cpufunc.h and define them in bus.h. On ia64 these are
    not CPU functions at all. In bus.h they are merely aliases for the
    new I/O port access functions defined in bus_machdep.h.
o   Handle the ACPI resource bug in nexus_set_resource(). There we can
    do it once so that we don't have to worry about it whenever we need
    to write to an I/O port that is really a memory mapped address.

The upshot of this change is that the KBI is better defined and that I/O
port access always involves a function call, allowing us to change the
actual implementation without breaking the KBI. For memory mapped I/O the
virtual address is abstracted, so that we can change the VA->PA mapping
in the kernel without causing an KBI breakage. The exception at this time
is for bus_space_map() and bus_space_unmap().

MFC after:	1 week.
2009-12-30 18:15:25 +00:00
Robert Noland
cfd7bacef2 Update d_mmap() to accept vm_ooffset_t and vm_memattr_t.
This replaces d_mmap() with the d_mmap2() implementation and also
changes the type of offset to vm_ooffset_t.

Purge d_mmap2().

All driver modules will need to be rebuilt since D_VERSION is also
bumped.

Reviewed by:	jhb@
MFC after:	Not in this lifetime...
2009-12-29 21:51:28 +00:00
Antoine Brodin
13e403fdea (S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.

PR:		137213
Submitted by:	Eygene Ryabinkin (initial version)
MFC after:	1 month
2009-12-28 22:56:30 +00:00
Marcel Moolenaar
76e42b3a0c Use unordered memory loads and stores for the in* and out*
family of functions.
2009-12-26 22:22:09 +00:00
Marcel Moolenaar
2191712fd1 Export the bus, cpu and itc frequencies under the hw.freq sysctl node.
The frequencies are in MHz (i.e. a value of 1000 represents 1GHz). The
frequencies are rounded to the nearest whole MHz.

While here, rename and re-type bus_frequency, processor_frequency and
itc_frequency to bus_freq, cpu_freq and itc_freq and make them static.
As unsigned integers, the hw.freq.cpu sysctl can more easily be made
generic (across all architectures) making porting easier.

MFC after:	3 days
2009-12-23 04:48:42 +00:00
Marcel Moolenaar
30fd085c78 Add a bit definition for invalid timestamp in the record header. 2009-12-23 04:39:05 +00:00
Doug Barton
f1bdf073c1 Add INCLUDE_CONFIG_FILE, and a note in comments about how to also
include the comments with CONFIGARGS
2009-12-16 02:17:43 +00:00
Marcel Moolenaar
c4dbd41f5d In exception_save, write-back ar.rnat after switching the backing-
store. Writing to ar.bspstore is defined to leave ar.rnat undefined.

PR:		ia64/120315
MFC after:	3 days
2009-12-08 00:44:23 +00:00
Marcel Moolenaar
58a0206d63 Define struct pcpu_md as the only MD field of struct pcpu (pc_acpi_id
excluded, as it's used by MI code) and mode the sysctl variables from
pcpu_stats to pcpu_md.
Adjust all references accordingly.

While nearby, change the PCPU sysctl tree so that they match the CPU
device sysctl tree -- they are now children of a static node called
"machdep.cpu" and are named only with their cpu ID.
2009-12-07 06:41:27 +00:00
Marcel Moolenaar
4dbb79b42d Allocate the VHPT for each CPU in cpu_mp_start(), rather than
allocating MAXCPU VHPTs up-front. This allows us to max-out MAXCPU
without memory waste -- MAXCPU is now 32 for SMP kernels.

This change also eliminates the VHPT scaling based in the total
memory in the system. It's the workload that determines the best size
of the VHPT. The workload can be affected by the amount of memory,
but not necessarily. For example, there's no performance difference
between VHPT sizes of 256KB, 512KB and 1MB when building the LINT
kernel. This was observed with a system that has 8GB of memory.
By default the kernel will allocate a 1MB VHPT. The user can tune the
system with the "machdep.vhpt.log2size" tunable.
2009-12-07 00:54:02 +00:00
Marcel Moolenaar
4827e0cd5c Make sure bus space accesses use unorder memory loads and stores.
Memory accesses are posted in program order by virtue of the
uncacheable memory attribute.
Since GCC, by default, adds acquire and release semantics to
volatile memory loads and stores, we need to use inline assembly
to guarantee it. With inline assembly, we don't need volatile
pointers anymore.

Itanium does not support semaphore instructions to uncacheable
memory.
2009-12-03 04:06:48 +00:00
Marcel Moolenaar
6852bd671f Move the sysctl related fields to the end of the structure and
make them conditional upon _KERNEL. libkvm includes <sys/pcpu.h>
and <sys/sysctl.h> does not expose the structure definitions to
userland.
2009-11-29 20:17:50 +00:00
Marcel Moolenaar
1011cc260e Eliminate teh use of MAXCPU in static arrays of interrupt counters by
adding statistics counters to the PCPU structure. Export the counters
through sysctl by giving each PCPU structure its own sysctl context.

While here, fix cnt.v_intr by not just having it count clock interrupts,
but every interrupt and add more counters for each interrupt source.
2009-11-28 21:01:15 +00:00
Alan Cox
e2997fea72 Simplify the invocation of vm_fault(). Specifically, eliminate the flag
VM_FAULT_DIRTY.  The information provided by this flag can be trivially
inferred by vm_fault().

Discussed with:	kib
2009-11-27 20:24:11 +00:00
Marcel Moolenaar
9c6a6bc422 Improve upon revision 196196 by removing the newly added comment
in the wrong place and instead add a KASSERT in the right place.
2009-11-24 01:35:21 +00:00
Marcel Moolenaar
65e962fb76 Revert previous commit. The problem was not related to overrunning
the kernel stack at all. The new USB stack simply caused a change
in timing that triggered a firmware bug more often. The addition
of PRINTF_BUFR_SIZE apparently triggered the same firmware bug
even more reliably.

But even with KSTACK_PAGES=5, one instance of the firmware bug
remained: booting with a CD inserted. This problem was run into
by accident after installing Debian and having to boot FreeBSD
to fixup the GPT partitioning (Thanks... not). After bumping
KSTACK_PAGES to 5, it was pretty unbelievable that the stack was
still being too small.

After updating the firmware we could boot with a CD inserted and
KSTACK_PAGES could be lowered back to 4 pages without problems.

Note: It is believed to be a timing related firmware bug, because
the machine check information showed access to the serial console
on one CPU and access to the EHCI HCD on the other CPU. Since
both are devices on the management unit and thus virtualized in
some way, any execution trace that does not include concurrent
access to the BMC from both CPUs is fine.

Note also that it's not understood exactly how increasing the
kernel stack avoided hitting the firmware bug. A change in page
faults does change timing, but it's not known if that's what's
happening here.

In any case: the problem is being monitored. Reverting back to
4 pages for the kernel stack is preferred, because it makes it
easier to switch to 16K pages (double the page size) without
wasting too much memory by not being able to half the number of
pages...
2009-11-23 21:09:23 +00:00
Marcel Moolenaar
1c8a163c8b No need to include opt_kstack_pages.h, because KSTACK_PAGES is
already defined through genassym.c
2009-11-20 07:40:02 +00:00
Marcel Moolenaar
02b5a86f38 Add a seatbelt to the Nested TLB Fault handler to give us a chance
to panic when we have an unexpected TLB fault while interrupt
collection is disabled. Use a token rather than the actual address
of the restart point to avoid the need for the movl instruction.
The token is arbitrary. For the drummers: it's based on a single
paradiddle.
2009-11-20 03:14:54 +00:00
Marcel Moolenaar
bcaf1959ec opt_* headers are included using the quoted form. 2009-11-19 01:27:22 +00:00
Konstantin Belousov
a7b890448c Extract the code that records syscall results in the frame into MD
function cpu_set_syscall_retval().

Suggested by:	marcel
Reviewed by:	marcel, davidxu
PowerPC, ARM, ia64 changes:	marcel
Sparc64 tested and reviewed by:	marius, also sunv reviewed
MIPS tested by:	gonzo
MFC after:	1 month
2009-11-10 11:43:07 +00:00
Marcel Moolenaar
8d077f48f0 Reimplement the lazy FP context switching:
o   Move all code into a single file for easier maintenance.
o   Use a single global lock to avoid having to handle either
    multiple locks or race conditions.
o   Make sure to disable the high FP registers after saving
    or dropping them.
o   use msleep() to wait for the other CPU to save the high
    FP registers.

This change fixes the high FP inconsistency panics.

A single global lock typically serializes too much, which may
be noticable when a lot of threads use the high FP registers,
but in that case it's probably better to switch the high FP
context synchronuously. Put differently: cpu_switch() should
switch the high FP registers if the incoming and outgoing
threads both use the high FP registers.
2009-10-31 22:27:31 +00:00
Konstantin Belousov
d6e029adbe In r197963, a race with thread being selected for signal delivery
while in kernel mode, and later changing signal mask to block the
signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race
exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls.

Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals, closing the race.

Reviewed by:	davidxu
Tested by:	pho
MFC after:	1 month
2009-10-27 10:47:58 +00:00
Marcel Moolenaar
59085a35e8 Add PRINTF_BUFR_SIZE=128, since we have SMP by default.
While here, fix tabulation.
2009-10-24 20:35:34 +00:00
Marcel Moolenaar
b475d22d67 A 32KB kernel stack is not quite enough. The new USB stack is a bit
more stack hungry as compared to the old one that my RX2660 gets
a machine check and spontaneously reboots at the time the USB DVD
drive is found and attached to CAM as a mass storage device. This
doesn't happen always, but definitely varies per kernel build.
Likewise when using a 128-byte printf buffer. The additional 128
bytes that printf needs seems to be enough to have the memory stack
and register stack collide and causing a machine check.

Thus: Bump KSTACK_PAGES from 4 to 5.
2009-10-24 20:28:42 +00:00
Marcel Moolenaar
1a4fcaebe3 o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
    vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
    that translates the vm_map_t argumument to pmap_t.
o   Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
    it replaces the pmap_page_executable() function, added to solve
    the I-cache problem in uiomove_fromphys().
o   In proc_rwmem() call vm_sync_icache() when writing to a page that
    has execute permissions. This assures that when breakpoints are
    written, the I-cache will be coherent and the process will actually
    hit the breakpoint.
o   This also fixes the Book-E PMAP implementation that was missing
    necessary locking while trying to deal with the I-cache coherency
    in pmap_enter() (read: mmu_booke_enter_locked).

The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.
2009-10-21 18:38:02 +00:00
Marcel Moolenaar
d79186defa o Align function on a 32-byte boundary so that the core's front-end
can deliver 2 bundles per cycle to the back-end.
o   Mark syscall stubs with a special unwind ABI tag so that unwind
    libraries know how to unwind.
2009-10-21 18:09:48 +00:00
Konstantin Belousov
023063938a Define architectural load bases for PIE binaries. Addresses were selected
by looking at the bases used for non-relocatable executables by gnu ld(1),
and adjusting it slightly.

Discussed with:	bz
Reviewed by:	kan
Tested by:	bz (i386, amd64), bsam (linux)
MFC after:	some time
2009-10-10 15:31:24 +00:00
Bjoern A. Zeeb
52bf2041ac Make sure that the primary native brandinfo always gets added
first and the native ia32 compat as middle (before other things).
o(ld)brandinfo as well as third party like linux, kfreebsd, etc.
stays on SI_ORDER_ANY coming last.

The reason for this is only to make sure that even in case we would
overflow the MAX_BRANDS sized array, the native FreeBSD brandinfo
would still be there and the system would be operational.

Reviewed by:	kib
MFC after:	1 month
2009-10-03 11:57:21 +00:00
Alan Cox
fe105d45a2 Add a new sysctl for reporting all of the supported page sizes.
Reviewed by:	jhb
MFC after:	3 weeks
2009-09-18 17:04:57 +00:00
Poul-Henning Kamp
a254d1f16d Get rid of the _NO_NAMESPACE_POLLUTION kludge by creating an
architecture specific include file containing the _ALIGN*
stuff which <sys/socket.h> needs.
2009-09-08 20:45:40 +00:00
Marcel Moolenaar
97e84697ae Decouple ACPI CPU Ids from FreeBSD's cpuid. The ACPI Ids can be
sparse, which causes a kernel assert.

Approved by:	re (kensmith)
2009-08-16 01:43:08 +00:00
Attilio Rao
dc6fbf6545 * Completely Remove the option STOP_NMI from the kernel. This option
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions.  This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Adds an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding

This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.

Please don't forget to update your config file with the STOP_NMI
option removal

Reviewed by:	jhb
Tested by:	pho, bz, rink
Approved by:	re (kib)
2009-08-13 17:09:45 +00:00
John Baldwin
013818111a Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses.  The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by:	alc
Approved by:	re (kensmith)
MFC after:	2 weeks
2009-07-24 13:50:29 +00:00
Alan Cox
3153e878dd Add support to the virtual memory system for configuring machine-
dependent memory attributes:

Rename vm_cache_mode_t to vm_memattr_t.  The new name reflects the
fact that there are machine-dependent memory attributes that have
nothing to do with controlling the cache's behavior.

Introduce vm_object_set_memattr() for setting the default memory
attributes that will be given to an object's pages.

Introduce and use pmap_page_{get,set}_memattr() for getting and
setting a page's machine-dependent memory attributes.  Add full
support for these functions on amd64 and i386 and stubs for them on
the other architectures.  The function pmap_page_set_memattr() is also
responsible for any other machine-dependent aspects of changing a
page's memory attributes, such as flushing the cache or updating the
direct map.  The uses include kmem_alloc_contig(), vm_page_alloc(),
and the device pager:

  kmem_alloc_contig() can now be used to allocate kernel memory with
  non-default memory attributes on amd64 and i386.

  vm_page_alloc() and the device pager will set the memory attributes
  for the real or fictitious page according to the object's default
  memory attributes.

Update the various pmap functions on amd64 and i386 that map pages to
incorporate each page's memory attributes in the mapping.

Notes: (1) Inherent to this design are safety features that prevent
the specification of inconsistent memory attributes by different
mappings on amd64 and i386.  In addition, the device pager provides a
warning when a device driver creates a fictitious page with memory
attributes that are inconsistent with the real page that the
fictitious page is an alias for. (2) Storing the machine-dependent
memory attributes for amd64 and i386 as a dedicated "int" in "struct
md_page" represents a compromise between space efficiency and the ease
of MFCing these changes to RELENG_7.

In collaboration with: jhb

Approved by:	re (kib)
2009-07-12 23:31:20 +00:00
Marcel Moolenaar
1ed01448fb On exec(2), when loading the ELF image, pmap_enter_object() is
called to prefault pages. This is an obvious place for making
sure the I-cache is coherent. It was missing though. As such,
execution over NFS and ZFS file systems was failing. NFS was
fixed the wrong way (by flushing the D-cache as part of the
NFS code) in a previous commit. ZFS problems were encountered
after that and indicated that something else was wrong...

Approved by:	re (kib)
2009-07-11 22:27:20 +00:00
Sam Leffler
8c393fd1f0 Cleanup ALIGNED_POINTER:
o add to platforms where it was missing (arm, i386, powerpc, sparc64, sun4v)
o define as "1" on amd64 and i386 where there is no restriction
o make the type returned consistent with ALIGN
o remove _ALIGNED_POINTER
o make associated comments consistent

Reviewed by:	bde, imp, marcel
Approved by:	re (kensmith)
2009-07-05 17:45:48 +00:00
Ed Schouten
89fe4c0a2b Enable POSIX semaphores on all non-embedded architectures by default.
More applications (including Firefox) seem to depend on this nowadays,
so not having this enabled by default is a bad idea.

Proposed by:	miwi
Patch by:	Florian Smeets <flo kasimir com>
Approved by:	re (kib)
2009-07-02 18:24:37 +00:00
Alan Cox
5797795f5a Correct the #endif comment.
Noticed by:	jmallett
Approved by:	re (kib)
2009-06-26 16:22:24 +00:00
Alan Cox
e999111ae7 This change is the next step in implementing the cache control functionality
required by video card drivers.  Specifically, this change introduces
vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all
architectures.  In addition, this changes adds a vm_cache_mode_t parameter
to kmem_alloc_contig() and vm_phys_alloc_contig().  These will be the
interfaces for allocating mapped kernel memory and physical memory,
respectively, with non-default cache modes.

In collaboration with:	jhb
2009-06-26 04:47:43 +00:00
Jeff Roberson
50c202c592 Implement a facility for dynamic per-cpu variables.
- Modules and kernel code alike may use DPCPU_DEFINE(),
   DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined
   PCPU_*.  Requires only one extra instruction more than PCPU_* and is
   virtually the same as __thread for builtin and much faster for shared
   objects.  DPCPU variables can be initialized when defined.
 - Modules are supported by relocating the module's per-cpu linker set
   over space reserved in the kernel.  Modules may fail to load if there
   is insufficient space available.
 - Track space available for modules with a one-off extent allocator.
   Free may block for memory to allocate space for an extent.

Reviewed by:    jhb, rwatson, kan, sam, grehan, marius, marcel, stas
2009-06-23 22:42:39 +00:00
Marcel Moolenaar
a1b6466f8f Drop the high FP state of an exiting thread in cpu_thread_exit() and
not in cpu_exit(). The latter is called after td_md.md_highfp_mtx
has been destroyed, which results in a race condition when another
thread wants to use the high FP registers on the CPU that still has
the high FP registers in question.
2009-06-20 05:36:53 +00:00
Jung-uk Kim
129d3046ef Import ACPICA 20090521. 2009-06-05 18:44:36 +00:00
Robert Watson
bd875f5f13 Remove MAC kernel config files and add "options MAC" to GENERIC, with the
goal of shipping 8.0 with MAC support in the default kernel.  No policies
will be compiled in or enabled by default, but it will now be possible to
load them at boot or runtime without a kernel recompile.

While the framework is not believed to impose measurable overhead when no
policies are loaded (a result of optimization over the past few months in
HEAD), we'll continue to benchmark and optimize as the release approaches.
Please keep an eye out for performance or functionality regressions that
could be a result of this change.

Approved by:	re (kensmith)
Obtained from:	TrustedBSD Project
2009-06-02 18:31:08 +00:00
Jamie Gritton
76ca6f88da Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex.  Jails may
have their own host information, or they may inherit it from the
parent/system.  The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL.  The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by:	bz (mentor)
2009-05-29 21:27:12 +00:00
Ed Schouten
c5e30cc02b Last minute TTY API change: remove mutex argument from tty_alloc().
I don't want people to override the mutex when allocating a TTY. It has
to be there, to keep drivers like syscons happy. So I'm creating a
tty_alloc_mutex() which can be used in those cases. tty_alloc_mutex()
should eventually be removed.

The advantage of this approach, is that we can just remove a function,
without breaking the regular API in the future.
2009-05-29 06:41:23 +00:00
Rink Springer
5c6a5b0200 ia64: Move MCA information retrieval to a per-CPU kthread
Once AP's are launched, their MCA state information is stored and later obtainable using a sysctl. Since the size of the MCA state information is unknown, it will be malloc'ed as needed. However, when 'ia64_ap_startup' runs, it's not yet safe to call malloc and this may cause 'panic: blockable sleep lock (sleep mutex) 8192 @ /usr/src/sys/vm/uma_core.c'. This commit avoids this issue by scheduling a separate kthread to obtain this information, which immediately terminates afterwards.
2009-05-27 18:12:27 +00:00
Marcel Moolenaar
f1c12cd66d Rename ia64_invalidate_icache() to ia64_sync_icache(). We're
not invalidating anything.
2009-05-18 18:44:54 +00:00
Marcel Moolenaar
dbb95048da Add cpu_flush_dcache() for use after non-DMA based I/O so that a
possible future I-cache coherency operation can succeed. On ARM
for example the L1 cache can be (is) virtually mapped, which
means that any I/O that uses temporary mappings will not see the
I-cache made coherent. On ia64 a similar behaviour has been
observed. By flushing the D-cache, execution of binaries backed
by md(4) and/or NFS work reliably.
For Book-E (powerpc), execution over NFS exhibits SIGILL once in
a while as well, though cpu_flush_dcache() hasn't been implemented
yet.

Doing an explicit D-cache flush as part of the non-DMA based I/O
read operation eliminates the need to do it as part of the
I-cache coherency operation itself and as such avoids pessimizing
the DMA-based I/O read operations for which D-cache are already
flushed/invalidated. It also allows future optimizations whereby
the bcopy() followed by the D-cache flush can be integrated in a
single operation, which could be implemented using on-chips DMA
engines, by-passing the D-cache altogether.
2009-05-18 18:37:18 +00:00
Jun Kuriyama
b3b17597ea - Use "device\t" and "options \t" for consistency. 2009-05-10 00:00:25 +00:00
Marcel Moolenaar
23815def34 Remove isa_irq_pending(). It's not used. 2009-04-24 03:43:20 +00:00
Robert Watson
9725389e1e Don't conditionally define CACHE_LINE_SHIFT, as we anticipate sizing
a fair number of static data structures, making this an unlikely
option to try to change without also changing source code. [1]

Change default cache line size on ia64, sparc64, and sun4v to 128
bytes, as this was what rtld-elf was already using on those
platforms. [2]

Suggested by:	bde [1], jhb [2]
MFC after:	2 weeks
2009-04-20 12:59:23 +00:00
Robert Watson
22037b2d2c Add description and cautionary note regarding CACHE_LINE_SIZE.
MFC after:	2 weeks
Suggested by:	alc
2009-04-19 21:26:36 +00:00
Robert Watson
a93fa8f2bb For each architecture, define CACHE_LINE_SHIFT and a derived
CACHE_LINE_SIZE constant.  These constants are intended to
over-estimate the cache line size, and be used at compile-time
when a run-time tuning alternative isn't appropriate or
available.

Defaults for all architectures are 64 bytes, except powerpc
where it is 128 bytes (used on G5 systems).

MFC after:	2 weeks
Discussed on:   arch@
2009-04-19 20:19:13 +00:00
John Baldwin
842f11bef6 Restore bus DMA bounce pages to an offset of 0 when they are released by
a tag that has BUS_DMA_KEEP_PG_OFFSET set.  Otherwise the page could be
reused with a non-zero offset by a tag that doesn't have
BUS_DMA_KEEP_PG_OFFSET leading to data corruption.

Sleuthing by:	avg
Reviewed by:	scottl
2009-04-17 13:22:18 +00:00
Konstantin Belousov
3feb57a0a8 The bus_dmamap_load_uio(9) shall use pmap of the thread recorded in the
uio_td to extract pages from, instead of unconditionally use kernel
pmap.

Submitted by:	Jason Harmening <jason.harmening gmail com> (amd64 version)
PR:	amd64/133592
Reviewed by:	scottl (original patch), jhb
MFC after:	2 weeks
2009-04-13 19:20:32 +00:00
Dmitry Chagin
cd899aad76 Fix KBI breakage by r190520 which affects older linux.ko binaries:
1) Move the new field (brand_note) to the end of the Brandinfo structure.
2) Add a new flag BI_BRAND_NOTE that indicates that the brand_note pointer
   is valid.
3) Use the brand_note field if the flag BI_BRAND_NOTE is set and as old
   modules won't have the flag set, so the new field brand_note would be
   ignored.

Suggested by:	jhb
Reviewed by:	jhb
Approved by:	kib (mentor)
MFC after:	6 days
2009-04-05 09:27:19 +00:00
Konstantin Belousov
e564825182 Add trivial implementation for the freebsd32_sysarch on ia64.
Fix comapt32 and LINT build on ia64.

Discussed with:	jhb
2009-04-01 19:23:07 +00:00
Konstantin Belousov
a4f2b2b0c6 Add AT_EXECPATH ELF auxinfo entry type. The value's a_ptr is a pointer
to the full path of the image that is being executed.
Increase AT_COUNT.

Remove no longer true comment about types used in Linux ELF binaries,
listed types contain FreeBSD-specific entries.

Reviewed by:	kan
2009-03-17 12:50:16 +00:00
Dmitry Chagin
32c01de21c Implement new way of branding ELF binaries by looking to a
".note.ABI-tag" section.

The search order of a brand is changed, now first of all the
".note.ABI-tag" is looked through.

Move code which fetch osreldate for ELF binary to check_note() handler.

PR:		118473
Approved by:	kib (mentor)
2009-03-13 16:40:51 +00:00
Andrew Thompson
c89d41e5ff Change over the usb kernel options to the new stack (retaining existing
naming). The old usb stack can be compiled in my prefixing the name with 'o'.
2009-02-23 18:34:56 +00:00
Andrew Thompson
e31a070263 Add uslcom to the build too.
Reminded by:	Michael Butler
2009-02-15 23:40:29 +00:00
Andrew Thompson
e4edc14efd Switch over GENERIC kernels to USB2 by default.
Tested by:	make universe
2009-02-15 22:33:44 +00:00
Marcel Moolenaar
5d1df4b56d Mark the BSP as being awake. This supresses the message
that not all usable CPUs could be woken up...
2009-02-10 20:29:57 +00:00
Warner Losh
047e5fdabc When bouncing pages, allow a new option to preserve the intra-page
offset.  This is needed for the ehci hardware buffer rings that assume
this behavior.

This is an interim solution, and a more general one is being worked
on.  This solution doesn't break anything that doesn't ask for it
directly.  The mbuf and uio variants with this flag likely don't work
and haven't been tested.

Universe builds with these changes.  I don't have a huge-memory
machine to test these changes with, but will be happy to work with
folks that do and hps if this changes turns out not to be sufficient.

Submitted by:	alfred@ from Hans Peter Selasky's original
2009-02-08 22:54:58 +00:00
Wojciech A. Koszek
eb0a4a4b80 Don't forget to create opt_agp.h on ia64, which also uses agp(4). 2009-02-07 09:57:14 +00:00
John Baldwin
148a5cf9e8 Tweak the ia64 machine check handling code to not register new sysctl nodes
while holding a spin mutex.  Instead, it now shoves the machine check
records onto a queue that is later drained to add sysctl nodes for each
record.  While a routine to drain the queue is present, it is not currently
called.

Reviewed by:	marcel
2009-02-04 18:44:29 +00:00
Alan Cox
2dad52b0c5 Correct an error in revision 1.170 of this file. When get_pv_entry() is
forced to reclaim pv entries, the one pv entry that it returns should not
be freed.
2009-01-18 08:00:55 +00:00