Commit Graph

1785 Commits

Author SHA1 Message Date
Marcel Moolenaar
58a0206d63 Define struct pcpu_md as the only MD field of struct pcpu (pc_acpi_id
excluded, as it's used by MI code) and mode the sysctl variables from
pcpu_stats to pcpu_md.
Adjust all references accordingly.

While nearby, change the PCPU sysctl tree so that they match the CPU
device sysctl tree -- they are now children of a static node called
"machdep.cpu" and are named only with their cpu ID.
2009-12-07 06:41:27 +00:00
Marcel Moolenaar
4dbb79b42d Allocate the VHPT for each CPU in cpu_mp_start(), rather than
allocating MAXCPU VHPTs up-front. This allows us to max-out MAXCPU
without memory waste -- MAXCPU is now 32 for SMP kernels.

This change also eliminates the VHPT scaling based in the total
memory in the system. It's the workload that determines the best size
of the VHPT. The workload can be affected by the amount of memory,
but not necessarily. For example, there's no performance difference
between VHPT sizes of 256KB, 512KB and 1MB when building the LINT
kernel. This was observed with a system that has 8GB of memory.
By default the kernel will allocate a 1MB VHPT. The user can tune the
system with the "machdep.vhpt.log2size" tunable.
2009-12-07 00:54:02 +00:00
Marcel Moolenaar
4827e0cd5c Make sure bus space accesses use unorder memory loads and stores.
Memory accesses are posted in program order by virtue of the
uncacheable memory attribute.
Since GCC, by default, adds acquire and release semantics to
volatile memory loads and stores, we need to use inline assembly
to guarantee it. With inline assembly, we don't need volatile
pointers anymore.

Itanium does not support semaphore instructions to uncacheable
memory.
2009-12-03 04:06:48 +00:00
Marcel Moolenaar
6852bd671f Move the sysctl related fields to the end of the structure and
make them conditional upon _KERNEL. libkvm includes <sys/pcpu.h>
and <sys/sysctl.h> does not expose the structure definitions to
userland.
2009-11-29 20:17:50 +00:00
Marcel Moolenaar
1011cc260e Eliminate teh use of MAXCPU in static arrays of interrupt counters by
adding statistics counters to the PCPU structure. Export the counters
through sysctl by giving each PCPU structure its own sysctl context.

While here, fix cnt.v_intr by not just having it count clock interrupts,
but every interrupt and add more counters for each interrupt source.
2009-11-28 21:01:15 +00:00
Alan Cox
e2997fea72 Simplify the invocation of vm_fault(). Specifically, eliminate the flag
VM_FAULT_DIRTY.  The information provided by this flag can be trivially
inferred by vm_fault().

Discussed with:	kib
2009-11-27 20:24:11 +00:00
Marcel Moolenaar
9c6a6bc422 Improve upon revision 196196 by removing the newly added comment
in the wrong place and instead add a KASSERT in the right place.
2009-11-24 01:35:21 +00:00
Marcel Moolenaar
65e962fb76 Revert previous commit. The problem was not related to overrunning
the kernel stack at all. The new USB stack simply caused a change
in timing that triggered a firmware bug more often. The addition
of PRINTF_BUFR_SIZE apparently triggered the same firmware bug
even more reliably.

But even with KSTACK_PAGES=5, one instance of the firmware bug
remained: booting with a CD inserted. This problem was run into
by accident after installing Debian and having to boot FreeBSD
to fixup the GPT partitioning (Thanks... not). After bumping
KSTACK_PAGES to 5, it was pretty unbelievable that the stack was
still being too small.

After updating the firmware we could boot with a CD inserted and
KSTACK_PAGES could be lowered back to 4 pages without problems.

Note: It is believed to be a timing related firmware bug, because
the machine check information showed access to the serial console
on one CPU and access to the EHCI HCD on the other CPU. Since
both are devices on the management unit and thus virtualized in
some way, any execution trace that does not include concurrent
access to the BMC from both CPUs is fine.

Note also that it's not understood exactly how increasing the
kernel stack avoided hitting the firmware bug. A change in page
faults does change timing, but it's not known if that's what's
happening here.

In any case: the problem is being monitored. Reverting back to
4 pages for the kernel stack is preferred, because it makes it
easier to switch to 16K pages (double the page size) without
wasting too much memory by not being able to half the number of
pages...
2009-11-23 21:09:23 +00:00
Marcel Moolenaar
1c8a163c8b No need to include opt_kstack_pages.h, because KSTACK_PAGES is
already defined through genassym.c
2009-11-20 07:40:02 +00:00
Marcel Moolenaar
02b5a86f38 Add a seatbelt to the Nested TLB Fault handler to give us a chance
to panic when we have an unexpected TLB fault while interrupt
collection is disabled. Use a token rather than the actual address
of the restart point to avoid the need for the movl instruction.
The token is arbitrary. For the drummers: it's based on a single
paradiddle.
2009-11-20 03:14:54 +00:00
Marcel Moolenaar
bcaf1959ec opt_* headers are included using the quoted form. 2009-11-19 01:27:22 +00:00
Konstantin Belousov
a7b890448c Extract the code that records syscall results in the frame into MD
function cpu_set_syscall_retval().

Suggested by:	marcel
Reviewed by:	marcel, davidxu
PowerPC, ARM, ia64 changes:	marcel
Sparc64 tested and reviewed by:	marius, also sunv reviewed
MIPS tested by:	gonzo
MFC after:	1 month
2009-11-10 11:43:07 +00:00
Marcel Moolenaar
8d077f48f0 Reimplement the lazy FP context switching:
o   Move all code into a single file for easier maintenance.
o   Use a single global lock to avoid having to handle either
    multiple locks or race conditions.
o   Make sure to disable the high FP registers after saving
    or dropping them.
o   use msleep() to wait for the other CPU to save the high
    FP registers.

This change fixes the high FP inconsistency panics.

A single global lock typically serializes too much, which may
be noticable when a lot of threads use the high FP registers,
but in that case it's probably better to switch the high FP
context synchronuously. Put differently: cpu_switch() should
switch the high FP registers if the incoming and outgoing
threads both use the high FP registers.
2009-10-31 22:27:31 +00:00
Konstantin Belousov
d6e029adbe In r197963, a race with thread being selected for signal delivery
while in kernel mode, and later changing signal mask to block the
signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race
exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls.

Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals, closing the race.

Reviewed by:	davidxu
Tested by:	pho
MFC after:	1 month
2009-10-27 10:47:58 +00:00
Marcel Moolenaar
59085a35e8 Add PRINTF_BUFR_SIZE=128, since we have SMP by default.
While here, fix tabulation.
2009-10-24 20:35:34 +00:00
Marcel Moolenaar
b475d22d67 A 32KB kernel stack is not quite enough. The new USB stack is a bit
more stack hungry as compared to the old one that my RX2660 gets
a machine check and spontaneously reboots at the time the USB DVD
drive is found and attached to CAM as a mass storage device. This
doesn't happen always, but definitely varies per kernel build.
Likewise when using a 128-byte printf buffer. The additional 128
bytes that printf needs seems to be enough to have the memory stack
and register stack collide and causing a machine check.

Thus: Bump KSTACK_PAGES from 4 to 5.
2009-10-24 20:28:42 +00:00
Marcel Moolenaar
1a4fcaebe3 o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
    vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
    that translates the vm_map_t argumument to pmap_t.
o   Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
    it replaces the pmap_page_executable() function, added to solve
    the I-cache problem in uiomove_fromphys().
o   In proc_rwmem() call vm_sync_icache() when writing to a page that
    has execute permissions. This assures that when breakpoints are
    written, the I-cache will be coherent and the process will actually
    hit the breakpoint.
o   This also fixes the Book-E PMAP implementation that was missing
    necessary locking while trying to deal with the I-cache coherency
    in pmap_enter() (read: mmu_booke_enter_locked).

The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.
2009-10-21 18:38:02 +00:00
Marcel Moolenaar
d79186defa o Align function on a 32-byte boundary so that the core's front-end
can deliver 2 bundles per cycle to the back-end.
o   Mark syscall stubs with a special unwind ABI tag so that unwind
    libraries know how to unwind.
2009-10-21 18:09:48 +00:00
Konstantin Belousov
023063938a Define architectural load bases for PIE binaries. Addresses were selected
by looking at the bases used for non-relocatable executables by gnu ld(1),
and adjusting it slightly.

Discussed with:	bz
Reviewed by:	kan
Tested by:	bz (i386, amd64), bsam (linux)
MFC after:	some time
2009-10-10 15:31:24 +00:00
Bjoern A. Zeeb
52bf2041ac Make sure that the primary native brandinfo always gets added
first and the native ia32 compat as middle (before other things).
o(ld)brandinfo as well as third party like linux, kfreebsd, etc.
stays on SI_ORDER_ANY coming last.

The reason for this is only to make sure that even in case we would
overflow the MAX_BRANDS sized array, the native FreeBSD brandinfo
would still be there and the system would be operational.

Reviewed by:	kib
MFC after:	1 month
2009-10-03 11:57:21 +00:00
Alan Cox
fe105d45a2 Add a new sysctl for reporting all of the supported page sizes.
Reviewed by:	jhb
MFC after:	3 weeks
2009-09-18 17:04:57 +00:00
Poul-Henning Kamp
a254d1f16d Get rid of the _NO_NAMESPACE_POLLUTION kludge by creating an
architecture specific include file containing the _ALIGN*
stuff which <sys/socket.h> needs.
2009-09-08 20:45:40 +00:00
Marcel Moolenaar
97e84697ae Decouple ACPI CPU Ids from FreeBSD's cpuid. The ACPI Ids can be
sparse, which causes a kernel assert.

Approved by:	re (kensmith)
2009-08-16 01:43:08 +00:00
Attilio Rao
dc6fbf6545 * Completely Remove the option STOP_NMI from the kernel. This option
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions.  This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Adds an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding

This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.

Please don't forget to update your config file with the STOP_NMI
option removal

Reviewed by:	jhb
Tested by:	pho, bz, rink
Approved by:	re (kib)
2009-08-13 17:09:45 +00:00
John Baldwin
013818111a Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses.  The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by:	alc
Approved by:	re (kensmith)
MFC after:	2 weeks
2009-07-24 13:50:29 +00:00
Alan Cox
3153e878dd Add support to the virtual memory system for configuring machine-
dependent memory attributes:

Rename vm_cache_mode_t to vm_memattr_t.  The new name reflects the
fact that there are machine-dependent memory attributes that have
nothing to do with controlling the cache's behavior.

Introduce vm_object_set_memattr() for setting the default memory
attributes that will be given to an object's pages.

Introduce and use pmap_page_{get,set}_memattr() for getting and
setting a page's machine-dependent memory attributes.  Add full
support for these functions on amd64 and i386 and stubs for them on
the other architectures.  The function pmap_page_set_memattr() is also
responsible for any other machine-dependent aspects of changing a
page's memory attributes, such as flushing the cache or updating the
direct map.  The uses include kmem_alloc_contig(), vm_page_alloc(),
and the device pager:

  kmem_alloc_contig() can now be used to allocate kernel memory with
  non-default memory attributes on amd64 and i386.

  vm_page_alloc() and the device pager will set the memory attributes
  for the real or fictitious page according to the object's default
  memory attributes.

Update the various pmap functions on amd64 and i386 that map pages to
incorporate each page's memory attributes in the mapping.

Notes: (1) Inherent to this design are safety features that prevent
the specification of inconsistent memory attributes by different
mappings on amd64 and i386.  In addition, the device pager provides a
warning when a device driver creates a fictitious page with memory
attributes that are inconsistent with the real page that the
fictitious page is an alias for. (2) Storing the machine-dependent
memory attributes for amd64 and i386 as a dedicated "int" in "struct
md_page" represents a compromise between space efficiency and the ease
of MFCing these changes to RELENG_7.

In collaboration with: jhb

Approved by:	re (kib)
2009-07-12 23:31:20 +00:00
Marcel Moolenaar
1ed01448fb On exec(2), when loading the ELF image, pmap_enter_object() is
called to prefault pages. This is an obvious place for making
sure the I-cache is coherent. It was missing though. As such,
execution over NFS and ZFS file systems was failing. NFS was
fixed the wrong way (by flushing the D-cache as part of the
NFS code) in a previous commit. ZFS problems were encountered
after that and indicated that something else was wrong...

Approved by:	re (kib)
2009-07-11 22:27:20 +00:00
Sam Leffler
8c393fd1f0 Cleanup ALIGNED_POINTER:
o add to platforms where it was missing (arm, i386, powerpc, sparc64, sun4v)
o define as "1" on amd64 and i386 where there is no restriction
o make the type returned consistent with ALIGN
o remove _ALIGNED_POINTER
o make associated comments consistent

Reviewed by:	bde, imp, marcel
Approved by:	re (kensmith)
2009-07-05 17:45:48 +00:00
Ed Schouten
89fe4c0a2b Enable POSIX semaphores on all non-embedded architectures by default.
More applications (including Firefox) seem to depend on this nowadays,
so not having this enabled by default is a bad idea.

Proposed by:	miwi
Patch by:	Florian Smeets <flo kasimir com>
Approved by:	re (kib)
2009-07-02 18:24:37 +00:00
Alan Cox
5797795f5a Correct the #endif comment.
Noticed by:	jmallett
Approved by:	re (kib)
2009-06-26 16:22:24 +00:00
Alan Cox
e999111ae7 This change is the next step in implementing the cache control functionality
required by video card drivers.  Specifically, this change introduces
vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all
architectures.  In addition, this changes adds a vm_cache_mode_t parameter
to kmem_alloc_contig() and vm_phys_alloc_contig().  These will be the
interfaces for allocating mapped kernel memory and physical memory,
respectively, with non-default cache modes.

In collaboration with:	jhb
2009-06-26 04:47:43 +00:00
Jeff Roberson
50c202c592 Implement a facility for dynamic per-cpu variables.
- Modules and kernel code alike may use DPCPU_DEFINE(),
   DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined
   PCPU_*.  Requires only one extra instruction more than PCPU_* and is
   virtually the same as __thread for builtin and much faster for shared
   objects.  DPCPU variables can be initialized when defined.
 - Modules are supported by relocating the module's per-cpu linker set
   over space reserved in the kernel.  Modules may fail to load if there
   is insufficient space available.
 - Track space available for modules with a one-off extent allocator.
   Free may block for memory to allocate space for an extent.

Reviewed by:    jhb, rwatson, kan, sam, grehan, marius, marcel, stas
2009-06-23 22:42:39 +00:00
Marcel Moolenaar
a1b6466f8f Drop the high FP state of an exiting thread in cpu_thread_exit() and
not in cpu_exit(). The latter is called after td_md.md_highfp_mtx
has been destroyed, which results in a race condition when another
thread wants to use the high FP registers on the CPU that still has
the high FP registers in question.
2009-06-20 05:36:53 +00:00
Jung-uk Kim
129d3046ef Import ACPICA 20090521. 2009-06-05 18:44:36 +00:00
Robert Watson
bd875f5f13 Remove MAC kernel config files and add "options MAC" to GENERIC, with the
goal of shipping 8.0 with MAC support in the default kernel.  No policies
will be compiled in or enabled by default, but it will now be possible to
load them at boot or runtime without a kernel recompile.

While the framework is not believed to impose measurable overhead when no
policies are loaded (a result of optimization over the past few months in
HEAD), we'll continue to benchmark and optimize as the release approaches.
Please keep an eye out for performance or functionality regressions that
could be a result of this change.

Approved by:	re (kensmith)
Obtained from:	TrustedBSD Project
2009-06-02 18:31:08 +00:00
Jamie Gritton
76ca6f88da Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex.  Jails may
have their own host information, or they may inherit it from the
parent/system.  The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL.  The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by:	bz (mentor)
2009-05-29 21:27:12 +00:00
Ed Schouten
c5e30cc02b Last minute TTY API change: remove mutex argument from tty_alloc().
I don't want people to override the mutex when allocating a TTY. It has
to be there, to keep drivers like syscons happy. So I'm creating a
tty_alloc_mutex() which can be used in those cases. tty_alloc_mutex()
should eventually be removed.

The advantage of this approach, is that we can just remove a function,
without breaking the regular API in the future.
2009-05-29 06:41:23 +00:00
Rink Springer
5c6a5b0200 ia64: Move MCA information retrieval to a per-CPU kthread
Once AP's are launched, their MCA state information is stored and later obtainable using a sysctl. Since the size of the MCA state information is unknown, it will be malloc'ed as needed. However, when 'ia64_ap_startup' runs, it's not yet safe to call malloc and this may cause 'panic: blockable sleep lock (sleep mutex) 8192 @ /usr/src/sys/vm/uma_core.c'. This commit avoids this issue by scheduling a separate kthread to obtain this information, which immediately terminates afterwards.
2009-05-27 18:12:27 +00:00
Marcel Moolenaar
f1c12cd66d Rename ia64_invalidate_icache() to ia64_sync_icache(). We're
not invalidating anything.
2009-05-18 18:44:54 +00:00
Marcel Moolenaar
dbb95048da Add cpu_flush_dcache() for use after non-DMA based I/O so that a
possible future I-cache coherency operation can succeed. On ARM
for example the L1 cache can be (is) virtually mapped, which
means that any I/O that uses temporary mappings will not see the
I-cache made coherent. On ia64 a similar behaviour has been
observed. By flushing the D-cache, execution of binaries backed
by md(4) and/or NFS work reliably.
For Book-E (powerpc), execution over NFS exhibits SIGILL once in
a while as well, though cpu_flush_dcache() hasn't been implemented
yet.

Doing an explicit D-cache flush as part of the non-DMA based I/O
read operation eliminates the need to do it as part of the
I-cache coherency operation itself and as such avoids pessimizing
the DMA-based I/O read operations for which D-cache are already
flushed/invalidated. It also allows future optimizations whereby
the bcopy() followed by the D-cache flush can be integrated in a
single operation, which could be implemented using on-chips DMA
engines, by-passing the D-cache altogether.
2009-05-18 18:37:18 +00:00
Jun Kuriyama
b3b17597ea - Use "device\t" and "options \t" for consistency. 2009-05-10 00:00:25 +00:00
Marcel Moolenaar
23815def34 Remove isa_irq_pending(). It's not used. 2009-04-24 03:43:20 +00:00
Robert Watson
9725389e1e Don't conditionally define CACHE_LINE_SHIFT, as we anticipate sizing
a fair number of static data structures, making this an unlikely
option to try to change without also changing source code. [1]

Change default cache line size on ia64, sparc64, and sun4v to 128
bytes, as this was what rtld-elf was already using on those
platforms. [2]

Suggested by:	bde [1], jhb [2]
MFC after:	2 weeks
2009-04-20 12:59:23 +00:00
Robert Watson
22037b2d2c Add description and cautionary note regarding CACHE_LINE_SIZE.
MFC after:	2 weeks
Suggested by:	alc
2009-04-19 21:26:36 +00:00
Robert Watson
a93fa8f2bb For each architecture, define CACHE_LINE_SHIFT and a derived
CACHE_LINE_SIZE constant.  These constants are intended to
over-estimate the cache line size, and be used at compile-time
when a run-time tuning alternative isn't appropriate or
available.

Defaults for all architectures are 64 bytes, except powerpc
where it is 128 bytes (used on G5 systems).

MFC after:	2 weeks
Discussed on:   arch@
2009-04-19 20:19:13 +00:00
John Baldwin
842f11bef6 Restore bus DMA bounce pages to an offset of 0 when they are released by
a tag that has BUS_DMA_KEEP_PG_OFFSET set.  Otherwise the page could be
reused with a non-zero offset by a tag that doesn't have
BUS_DMA_KEEP_PG_OFFSET leading to data corruption.

Sleuthing by:	avg
Reviewed by:	scottl
2009-04-17 13:22:18 +00:00
Konstantin Belousov
3feb57a0a8 The bus_dmamap_load_uio(9) shall use pmap of the thread recorded in the
uio_td to extract pages from, instead of unconditionally use kernel
pmap.

Submitted by:	Jason Harmening <jason.harmening gmail com> (amd64 version)
PR:	amd64/133592
Reviewed by:	scottl (original patch), jhb
MFC after:	2 weeks
2009-04-13 19:20:32 +00:00
Dmitry Chagin
cd899aad76 Fix KBI breakage by r190520 which affects older linux.ko binaries:
1) Move the new field (brand_note) to the end of the Brandinfo structure.
2) Add a new flag BI_BRAND_NOTE that indicates that the brand_note pointer
   is valid.
3) Use the brand_note field if the flag BI_BRAND_NOTE is set and as old
   modules won't have the flag set, so the new field brand_note would be
   ignored.

Suggested by:	jhb
Reviewed by:	jhb
Approved by:	kib (mentor)
MFC after:	6 days
2009-04-05 09:27:19 +00:00
Konstantin Belousov
e564825182 Add trivial implementation for the freebsd32_sysarch on ia64.
Fix comapt32 and LINT build on ia64.

Discussed with:	jhb
2009-04-01 19:23:07 +00:00
Konstantin Belousov
a4f2b2b0c6 Add AT_EXECPATH ELF auxinfo entry type. The value's a_ptr is a pointer
to the full path of the image that is being executed.
Increase AT_COUNT.

Remove no longer true comment about types used in Linux ELF binaries,
listed types contain FreeBSD-specific entries.

Reviewed by:	kan
2009-03-17 12:50:16 +00:00
Dmitry Chagin
32c01de21c Implement new way of branding ELF binaries by looking to a
".note.ABI-tag" section.

The search order of a brand is changed, now first of all the
".note.ABI-tag" is looked through.

Move code which fetch osreldate for ELF binary to check_note() handler.

PR:		118473
Approved by:	kib (mentor)
2009-03-13 16:40:51 +00:00
Andrew Thompson
c89d41e5ff Change over the usb kernel options to the new stack (retaining existing
naming). The old usb stack can be compiled in my prefixing the name with 'o'.
2009-02-23 18:34:56 +00:00
Andrew Thompson
e31a070263 Add uslcom to the build too.
Reminded by:	Michael Butler
2009-02-15 23:40:29 +00:00
Andrew Thompson
e4edc14efd Switch over GENERIC kernels to USB2 by default.
Tested by:	make universe
2009-02-15 22:33:44 +00:00
Marcel Moolenaar
5d1df4b56d Mark the BSP as being awake. This supresses the message
that not all usable CPUs could be woken up...
2009-02-10 20:29:57 +00:00
Warner Losh
047e5fdabc When bouncing pages, allow a new option to preserve the intra-page
offset.  This is needed for the ehci hardware buffer rings that assume
this behavior.

This is an interim solution, and a more general one is being worked
on.  This solution doesn't break anything that doesn't ask for it
directly.  The mbuf and uio variants with this flag likely don't work
and haven't been tested.

Universe builds with these changes.  I don't have a huge-memory
machine to test these changes with, but will be happy to work with
folks that do and hps if this changes turns out not to be sufficient.

Submitted by:	alfred@ from Hans Peter Selasky's original
2009-02-08 22:54:58 +00:00
Wojciech A. Koszek
eb0a4a4b80 Don't forget to create opt_agp.h on ia64, which also uses agp(4). 2009-02-07 09:57:14 +00:00
John Baldwin
148a5cf9e8 Tweak the ia64 machine check handling code to not register new sysctl nodes
while holding a spin mutex.  Instead, it now shoves the machine check
records onto a queue that is later drained to add sysctl nodes for each
record.  While a routine to drain the queue is present, it is not currently
called.

Reviewed by:	marcel
2009-02-04 18:44:29 +00:00
Alan Cox
2dad52b0c5 Correct an error in revision 1.170 of this file. When get_pv_entry() is
forced to reclaim pv entries, the one pv entry that it returns should not
be freed.
2009-01-18 08:00:55 +00:00
Warner Losh
db3cd725a5 AT_DEBUG and AT_BRK were OBE like 10 years ago, so retire them.
Reviewed by:	peter
2008-12-17 06:56:58 +00:00
Ed Schouten
bfba40a452 Remove "[KEEP THIS!]" from COMPAT_43TTY. It's not really that important.
Sgtty is a programming interface that has been replaced by termios over
the years. In June we already removed <sgtty.h>, which exposes the
ioctl()'s that are implemented by this interface. The importance of this
flag is overrated right now.
2008-12-02 19:09:08 +00:00
Konstantin Belousov
b4cf0e62f4 Add sv_flags field to struct sysentvec with intention to provide description
of the ABI of the currently executing image. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures to determine ABI features.

Discussed with:	dchagin, imp, jhb, peter
2008-11-22 12:36:15 +00:00
Marcel Moolenaar
ff8c51cf3e Define mb(), rmb() and wmb() for real. 2008-11-22 06:56:49 +00:00
Kip Macy
db7f0b974f - bump __FreeBSD version to reflect added buf_ring, memory barriers,
and ifnet functions

- add memory barriers to <machine/atomic.h>
- update drivers to only conditionally define their own

- add lockless producer / consumer ring buffer
- remove ring buffer implementation from cxgb and update its callers

- add if_transmit(struct ifnet *ifp, struct mbuf *m) to ifnet to
  allow drivers to efficiently manage multiple hardware queues
  (i.e. not serialize all packets through one ifq)
- expose if_qflush to allow drivers to flush any driver managed queues

This work was supported by Bitgravity Inc. and Chelsio Inc.
2008-11-22 05:55:56 +00:00
Dag-Erling Smørgrav
1ede983cc9 Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after:	3 months
2008-10-23 15:53:51 +00:00
Marcel Moolenaar
1800080e68 Atomically increment the number of awoken APs as all APs will
be unleashed here.

Pointed out by: christian.kandeler@hob.de
2008-10-19 20:14:48 +00:00
Peter Wemm
e6592ee55c Collect N identical (or near identical) mkdumpheader() implementations into
one, as threatened in the comment.  Textdump magic can be passed in.
2008-10-01 22:08:53 +00:00
Marius Strobl
6f04e7b9aa Remove ipi_all() and ipi_self() as the former hasn't been used at
all to date and the latter also is only used in ia64 and powerpc
code which no longer serves a real purpose after bring-up and just
can be removed as well. Note that architectures like sun4u also
provide no means of implementing IPI'ing a CPU itself natively
in the first place.

Suggested by:	jhb
Reviewed by:	arch, grehan, jhb
2008-09-28 18:34:14 +00:00
Ed Schouten
6bfa9a2d66 Replace all calls to minor() with dev2unit().
After I removed all the unit2minor()/minor2unit() calls from the kernel
yesterday, I realised calling minor() everywhere is quite confusing.
Character devices now only have the ability to store a unit number, not
a minor number. Remove the confusion by using dev2unit() everywhere.

This commit could also be considered as a bug fix. A lot of drivers call
minor(), while they should actually be calling dev2unit(). In -CURRENT
this isn't a problem, but it turns out we never had any problem reports
related to that issue in the past. I suspect not many people connect
more than 256 pieces of the same hardware.

Reviewed by:	kib
2008-09-27 08:51:18 +00:00
Konstantin Belousov
a8d403e102 Change the static struct sysentvec and struct Elf_Brandinfo initializers
to the C99 style. At least, it is easier to read sysent definitions
that way, and search for the actual instances of sigcode etc.

Explicitely initialize sysentvec.sv_maxssiz that was missed in most
sysvecs.

No objection from:	jhb
MFC after:	1 month
2008-09-24 10:14:37 +00:00
David E. O'Brien
ae72afe0f2 The kernel implemented 'memcmp' is an alias for 'bcmp'. However, memcmp
and bcmp are not the same thing.  'man bcmp' states that the return is
"non-zero" if the two byte strings are not identical.  Where as,
'man memcmp' states that the return is the "difference between the
first two differing bytes (treated as unsigned char values" if the
two byte strings are not identical.

So provide a proper memcmp(9), but it is a C implementation not a tuned
assembly implementation.  Therefore bcmp(9) should be preferred over memcmp(9).
2008-09-23 14:45:10 +00:00
Ed Schouten
bc093719ca Integrate the new MPSAFE TTY layer to the FreeBSD operating system.
The last half year I've been working on a replacement TTY layer for the
FreeBSD kernel. The new TTY layer was designed to improve the following:

- Improved driver model:

  The old TTY layer has a driver model that is not abstract enough to
  make it friendly to use. A good example is the output path, where the
  device drivers directly access the output buffers. This means that an
  in-kernel PPP implementation must always convert network buffers into
  TTY buffers.

  If a PPP implementation would be built on top of the new TTY layer
  (still needs a hooks layer, though), it would allow the PPP
  implementation to directly hand the data to the TTY driver.

- Improved hotplugging:

  With the old TTY layer, it isn't entirely safe to destroy TTY's from
  the system. This implementation has a two-step destructing design,
  where the driver first abandons the TTY. After all threads have left
  the TTY, the TTY layer calls a routine in the driver, which can be
  used to free resources (unit numbers, etc).

  The pts(4) driver also implements this feature, which means
  posix_openpt() will now return PTY's that are created on the fly.

- Improved performance:

  One of the major improvements is the per-TTY mutex, which is expected
  to improve scalability when compared to the old Giant locking.
  Another change is the unbuffered copying to userspace, which is both
  used on TTY device nodes and PTY masters.

Upgrading should be quite straightforward. Unlike previous versions,
existing kernel configuration files do not need to be changed, except
when they reference device drivers that are listed in UPDATING.

Obtained from:		//depot/projects/mpsafetty/...
Approved by:		philip (ex-mentor)
Discussed:		on the lists, at BSDCan, at the DevSummit
Sponsored by:		Snow B.V., the Netherlands
dcons(4) fixed by:	kan
2008-08-20 08:31:58 +00:00
John Baldwin
70d12a18f2 Export 'struct pcpu' to userland w/o requiring _KERNEL. A few ports
already define _KERNEL to get to this and I'm about to add hooks to
libkvm to access per-CPU data.

MFC after:	1 week
2008-08-19 19:53:52 +00:00
Bjoern A. Zeeb
603724d3ab Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from:	//depot/projects/vimage-commit2/...
Reviewed by:	brooks, des, ed, mav, julian,
		jamie, kris, rwatson, zec, ...
		(various people I forgot, different versions)
		md5 (with a bit of help)
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
X-MFC after:	never
V_Commit_Message_Reviewed_By:	more people than the patch
2008-08-17 23:27:27 +00:00
Alan Cox
36e6513df5 Update bus_dmamem_alloc()'s first call to malloc() such that M_WAITOK is
specified when appropriate.

Reviewed by:	scottl
2008-07-15 03:34:49 +00:00
Xin LI
dbd47f1592 Add HWPMC_HOOKS to GENERIC kernels, this makes hwpmc.ko work out
of the box.
2008-07-07 22:55:11 +00:00
Marcel Moolenaar
d3fc9d46d4 Add inline function ia64_fc_i() to abstract inline assembly.
Use the new inline function in ia64_invalidate_icache().
While there, add proper synchronization so that we know
the fc.i instructions have taken effect when we return.
2008-07-07 17:43:56 +00:00
Ed Schouten
721351876c Remove the unused major/minor numbers from iodev and memdev.
Now that st_rdev is being automatically generated by the kernel, there
is no need to define static major/minor numbers for the iodev and
memdev. We still need the minor numbers for the memdev, however, to
distinguish between /dev/mem and /dev/kmem.

Approved by:	philip (mentor)
2008-06-25 07:45:31 +00:00
Marcel Moolenaar
f9d9182d64 Work-around a compiler optimization bug, that broke libthr. Massive
inlining resulted in constant propagation to the extend that cmpval
was known to the compiler to be URWLOCK_WRITE_OWNER (= 0x80000000U).
Unfortunately, instead of zero-extending the unsigned constant, it
was sign-extended. As such, the cmpxchg instruction was comparing
0x0000000080000000LU to 0xffffffff80000000LU and obviously didn't
perform the exchange.
But, since the value returned by cmpxhg equalled cmpval (when zero-
extended), the _thr_rtld_lock_release() function thought the exchange
did happen and as such returned as if having released the lock. This
was not the case. Subsequent locking requests found rw_state non-zero
and the thread in question entered the kernel and block indefinitely.

The work-around is to zero-extend by casting to uint64_t.
2008-05-28 16:41:02 +00:00
Marcel Moolenaar
2cddc3d722 Account for IPI_PREEMPT. We don't want to call sched_preempt() with
interrupts disabled or with td_intr_nesting_level non-zero.
2008-05-23 19:53:50 +00:00
Alan Cox
d1fdd63483 The VM system no longer uses setPQL2(). Remove it and its helpers. 2008-05-23 04:03:54 +00:00
Marcel Moolenaar
c1e0811ea3 Create the bucket mutexes with MTX_NOWITNESS. There's now a
hard limit of 512 pending mutexes in the witness code and
we can easily have 1 million bucket mutexes initialized before
witness is up and running. Bumping the limit from 512 to 1M
is not really an option here...
2008-05-22 06:27:46 +00:00
Marcel Moolenaar
0fbd447b92 We can call ia64_flush_dirty() when the corresponding process is
locked or not. As such, use PROC_LOCKED() to determine which case
it is and lock the process when not.
2008-05-21 05:15:27 +00:00
Alan Cox
1ec1304bdb Retire pmap_addr_hint(). It is no longer used. 2008-05-18 04:16:57 +00:00
Alan Cox
2d17f90775 Add a stub for pmap_align_superpage() on machines that don't (yet)
implement pmap-level support for superpages.
2008-05-09 23:31:42 +00:00
Marcel Moolenaar
fe39c042ca Unbreak previous commit. While here, refactor the code a bit. 2008-04-25 16:09:03 +00:00
Jeff Roberson
6c47aaae12 - Add an integer argument to idle to indicate how likely we are to wake
from idle over the next tick.
 - Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are
   suspended in cpu specific states.  This function can fail and cause the
   scheduler to fall back to another mechanism (ipi).
 - Implement support for mwait in cpu_idle() on i386/amd64 machines that
   support it.  mwait is a higher performance way to synchronize cpus
   as compared to hlt & ipis.
 - Allow selecting the idle routine by name via sysctl machdep.idle.  This
   replaces machdep.cpu_idle_hlt.  Only idle routines supported by the
   current machine are permitted.

Sponsored by:	Nokia
2008-04-25 05:18:50 +00:00
Poul-Henning Kamp
9b4a8ab7ba Now that all platforms use genclock, shuffle things around slightly
for better structure.

Much of this is related to <sys/clock.h>, which should really have
been called <sys/calendar.h>, but unless and until we need the name,
the repocopy can wait.

In general the kernel does not know about minutes, hours, days,
timezones, daylight savings time, leap-years and such.  All that
is theoretically a matter for userland only.

Parts of kernel code does however care: badly designed filesystems
store timestamps in local time and RTC chips almost universally
track time in a YY-MM-DD HH:MM:SS format, and sometimes in local
timezone instead of UTC.  For this we have <sys/clock.h>

<sys/time.h> on the other hand, deals with time_t, timeval, timespec
and so on.  These know only seconds and fractions thereof.

Move inittodr() and resettodr() prototypes to <sys/time.h>.
Retain the names as it is one of the few surviving PDP/VAX references.

Move startrtclock() to <machine/clock.h> on relevant platforms, it
is a MD call between machdep.c/clock.c.  Remove references to it
elsewhere.

Remove a lot of unnecessary <sys/clock.h> includes.

Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs.
XXX: should be kern.disable_rtc_set really, it's not MD.
2008-04-22 19:38:30 +00:00
Poul-Henning Kamp
0051271e12 Make genclock standard on all platforms.
Thanks to: grehan & marcel for platform support on ia64 and ppc.
2008-04-21 10:09:55 +00:00
Marcel Moolenaar
fca1689378 Sanitize the malloc types: M_PMAP is not used in pmap.c, so don't
define it there. Don't use M_PMAP in mp_machdep.c; define M_SMP
instead.
2008-04-19 04:56:16 +00:00
Marcel Moolenaar
6bdf667b51 Remove cruft we got from Alpha, which was probably inherited
from NetBSD. I.e. make it more like a FreeBSD header.
2008-04-18 02:21:11 +00:00
Marcel Moolenaar
22cc9ba0f0 Use genclock for RTC handling. This eliminates the MD versions for
inittodr() and resettodr(). Have nexus double as the clock device,
because it's the firmware that provides RTC services. We could
create a special (pseudo-) device for it, but that wasn't superior
enough to actually do it. Maybe later...

Requested by: phk
2008-04-15 17:02:23 +00:00
Marcel Moolenaar
495168ba8d Support and switch to the ULE scheduler:
o  Implement IPI_PREEMPT,
o  Set td_lock for the thread being switched out,
o  For ULE & SMP, loop while td_lock points to blocked_lock for
   the thread being switched in,
o  Enable ULE by default in GENERIC and SKI,
2008-04-15 05:02:42 +00:00
Marcel Moolenaar
23080c0bd3 Revision 1.9 changes the delivery mode from the magic constant 0
(i.e. fixed delivery) to SAPIC_DELMODE_LOWPRI. While the commit
log doesn't mention the change in behaviour, it is believed to be
deliberate. In the last 5.5 years this hasn't been a problem. Nor
do I think did it make any difference, but who knows. However, I
do know that it break SMP support for Montecito-based machines.
Switch back to fixed-CPU delivery so that SMP works again. This
gives me some time to look more closely at the problem, as well
as make sure the I-cache validation as it's implemented currently
is sufficient in SMP configurations...
2008-04-14 20:34:45 +00:00
Jeff Roberson
d13829f04a - Pass the irq and not the vector to intr_event_create().
Reviewed by:	marcel
2008-04-11 23:10:39 +00:00
Jeff Roberson
9b33b154b5 - Add the interrupt vector number to intr_event_create so MI code can
lookup hard interrupt events by number.  Ignore the irq# for soft intrs.
 - Add support to cpuset for binding hardware interrupts.  This has the
   side effect of binding any ithread associated with the hard interrupt.
   As per restrictions imposed by MD code we can only bind interrupts to
   a single cpu presently.  Interrupts can be 'unbound' by binding them
   to all cpus.

Reviewed by:	jhb
Sponsored by:	Nokia
2008-04-11 03:26:41 +00:00
Marcel Moolenaar
34aec6b9f8 Unbreak after removal of SI_SUB_MOUNT_ROOT. 2008-04-09 03:32:48 +00:00
John Baldwin
1ee1b68792 Add a MI intr_event_handle() routine for the non-INTR_FILTER case. This
allows all the INTR_FILTER #ifdef's to be removed from the MD interrupt
code.
- Rename the intr_event 'eoi', 'disable', and 'enable' hooks to
  'post_filter', 'pre_ithread', and 'post_ithread' to be less x86-centric.
  Also, add a comment describe what the MI code expects them to do.
- On amd64, i386, and powerpc this is effectively a NOP.
- On arm, don't bother masking the interrupt unless the ithread is
  scheduled in the non-INTR_FILTER case to match what INTR_FILTER did.
  Also, don't bother unmasking the interrupt in the post_filter case if
  we never masked it.  The INTR_FILTER case had been doing this by having
  arm_unmask_irq for the post_filter (formerly 'eoi') hook.
- On ia64, stray interrupts are now masked for the non-INTR_FILTER case.
  They were already masked in the INTR_FILTER case.
- On sparc64, use the a NULL pre_ithread hook and use intr_enable_eoi() for
  both the 'post_filter' and 'post_ithread' hooks to match what the
  non-INTR_FILTER code did.
- On sun4v, retire the ithread wrapper hack by using an appropriate
  'post_ithread' hook instead (it's what 'post_ithread'/'enable' was
  designed to do even in 5.x).

Glanced at by:	piso
Reviewed by:	marius
Requested by:	marius [1], [5]
Tested on:	amd64, i386, arm, sparc64
2008-04-05 19:58:30 +00:00
Marcel Moolenaar
b81b7f0a7d Better implement I-cache invalidation. The previous implementation
was a kluge. This implementation matches the behaviour on powerpc
and sparc64.
While on the subject, make sure to invalidate the I-cache after
loading a kernel module.

MFC after: 2 weeks
2008-03-30 23:09:14 +00:00
Doug Rabson
fa9d9930ca Add kernel module support for nfslockd and krpc. Use the module system
to detect (or load) kernel NLM support in rpc.lockd. Remove the '-k'
option to rpc.lockd and make kernel NLM the default. A user can still
force the use of the old user NLM by building a kernel without NFSLOCKD
and/or removing the nfslockd.ko module.
2008-03-27 11:54:20 +00:00
John Birrell
e483943791 When building a kernel module, define MAXCPU the same as SMP so
that modules work with and without SMP.
2008-03-27 05:03:26 +00:00
Poul-Henning Kamp
e465985885 The "free-lance" timer in the i8254 is only used for the speaker
these days, so de-generalize the acquire_timer/release_timer api
to just deal with speakers.

The new (optional) MD functions are:
	timer_spkr_acquire()
	timer_spkr_release()
and
	timer_spkr_setfreq()

the last of which configures the timer to generate a tone of a given
frequency, in Hz instead of 1/1193182th of seconds.

Drop entirely timer2 on pc98, it is not used anywhere at all.

Move sysbeep() to kern/tty_cons.c and use the timer_spkr*() if
they exist, and do nothing otherwise.

Remove prototypes and empty acquire-/release-timer() and sysbeep()
functions from the non-beeping archs.

This eliminate the need for the speaker driver to know about
i8254frequency at all.  In theory this makes the speaker driver MI,
contingent on the timer_spkr_*() functions existing but the driver
does not know this yet and still attaches to the ISA bus.

Syscons is more tricky, in one function, sc_tone(), it knows the hz
and things are just fine.

In the other function, sc_bell() it seems to get the period from
the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode
the 1193182 and leave it at that.  It's probably not important.

Change a few other sysbeep() uses which obviously knew that the
argument was in terms of i8254 frequency, and leave alone those
that look like people thought sysbeep() took frequency in hertz.

This eliminates the knowledge of i8254_freq from all but the actual
clock.c code and the prof_machdep.c on amd64 and i386, where I think
it would be smart to ask for help from the timecounters anyway [TBD].
2008-03-26 20:09:21 +00:00
John Baldwin
6d2d1c044f Simplify the interrupt code a bit:
- Always include the ie_disable and ie_eoi methods in 'struct intr_event'
  and collapse down to one intr_event_create() routine.  The disable and
  eoi hooks simply aren't used currently in the !INTR_FILTER case.
- Expand 'disab' to 'disable' in a few places.
- Use function casts for arm and i386:intr_eoi_src() instead of wrapper
  routines since to trim one extra indirection.

Compiled on:	{arm,amd64,i386,ia64,ppc,sparc64} x {FILTER, !FILTER}
Tested on:	{amd64,i386} x {FILTER, !FILTER}
2008-03-17 22:42:01 +00:00
Pawel Jakub Dawidek
6eb4157ffc Implement atomic_fetchadd_long() for all architectures and document it.
Reviewed by:	attilio, jhb, jeff, kris (as a part of the uidinfo_waitfree.patch)
2008-03-16 21:20:50 +00:00
Robert Watson
237fdd787b In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation.  This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after:	1 month
Discussed with:	imp, rink
2008-03-16 10:58:09 +00:00
Warner Losh
dffa4a85ac BUS_DMA_ISA is left over from Alpha, and is not used in the tree at
all.  The reference in ia64 code is due to cutNpaste in its history
and can safely be removed.

Revired by: cognet, raj, marcel, jhb and maybe one other whom I'm forgetting
2008-03-15 06:44:45 +00:00
John Baldwin
eaf86d1678 Add preliminary support for binding interrupts to CPUs:
- Add a new intr_event method ie_assign_cpu() that is invoked when the MI
  code wishes to bind an interrupt source to an individual CPU.  The MD
  code may reject the binding with an error.  If an assign_cpu function
  is not provided, then the kernel assumes the platform does not support
  binding interrupts to CPUs and fails all requests to do so.
- Bind ithreads to CPUs on their next execution loop once an interrupt
  event is bound to a CPU.  Only shared ithreads are bound.  We currently
  leave private ithreads for drivers using filters + ithreads in the
  INTR_FILTER case unbound.
- A new intr_event_bind() routine is used to bind an interrupt event to
  a CPU.
- Implement binding on amd64 and i386 by way of the existing pic_assign_cpu
  PIC method.
- For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up
  an interrupt source and binds its interrupt event to the specified CPU.
  MI code can currently (ab)use this by doing:

	intr_bind(rman_get_start(irq_res), cpu);

  however, I plan to add a truly MI interface (probably a bus_bind_intr(9))
  where the implementation in the x86 nexus(4) driver would end up calling
  intr_bind() internally.

Requested by:	kmacy, gallatin, jeff
Tested on:	{amd64, i386} x {regular, INTR_FILTER}
2008-03-14 19:41:48 +00:00
John Baldwin
5217af301c Rework how the nexus(4) device works on x86 to better handle the idea of
different "platforms" on x86 machines.  The existing code already handles
having two platforms: ACPI and legacy.  However, the existing approach was
rather hardcoded and difficult to extend.  These changes take the approach
that each x86 hardware platform should provide its own nexus(4) driver (it
can inherit most of its behavior from the default legacy nexus(4) driver)
which is responsible for probing for the platform and performing
appropriate platform-specific setup during attach (such as adding a
platform-specific bus device).  This does mean changing the x86 platform
busses to no longer use an identify routine for probing, but to move that
logic into their matching nexus(4) driver instead.
- Make the default nexus(4) driver in nexus.c on i386 and amd64 handle the
  legacy platform.  It's probe routine now returns BUS_PROBE_GENERIC so it
  can be overriden.
- Expose a nexus_init_resources() routine which initializes the various
  resource managers so that subclassed nexus(4) drivers can invoke it from
  their attach routine.
- The legacy nexus(4) driver explicitly adds a legacy0 device in its
  attach routine.
- The ACPI driver no longer contains an new-bus identify method.  Instead
  it exposes a public function (acpi_identify()) which is a probe routine
  that the MD nexus(4) drivers can use to probe for ACPI.  All of the
  probe logic in acpi_probe() is now moved into acpi_identify() and
  acpi_probe() is just a stub.
- On i386 and amd64, an ACPI-specific nexus(4) driver checks for ACPI via
  acpi_identify() and claims the nexus0 device if the probe succeeds.  It
  then explicitly adds an acpi0 device in its attach routine.
- The legacy(4) driver no longer knows anything about the acpi0 device.
- On ia64 if acpi_identify() fails you basically end up with no devices.
  This matches the previous behavior where the old acpi_identify() would
  fail to add an acpi0 device again leaving you with no devices.

Discussed with:	imp
Silence on:	arch@
2008-03-13 20:39:04 +00:00
Jeff Roberson
eab82b2ebe - Fix build breakage; there was a reference to a removed syscall in
a KASSERT().  Attempt to cleanup the comment to reflect reality.
2008-03-12 22:14:14 +00:00
Jeff Roberson
6617724c5f Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential.  Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.
2008-03-12 10:12:01 +00:00
Jeff Roberson
81aa71755b - Remove the old smp cpu topology specification with a new, more flexible
tree structure that encodes the level of cache sharing and other
   properties.
 - Provide several convenience functions for creating one and two level
   cpu trees as well as a default flat topology.  The system now always
   has some topology.
 - On i386 and amd64 create a seperate level in the hierarchy for HTT
   and multi-core cpus.  This will allow the scheduler to intelligently
   load balance non-uniform cores.  Presently we don't detect what level
   of the cache hierarchy is shared at each level in the topology.
 - Add a mechanism for testing common topologies that have more information
   than the MD code is able to provide via the kern.smp.topology tunable.
   This should be considered a debugging tool only and not a stable api.

Sponsored by:	Nokia
2008-03-02 07:58:42 +00:00
Marcel Moolenaar
aeafe92a61 Re-sort options. While here:
o  remove COMPAT_FREEBSD5
o  add INVARIANTS
o  add WITNESS
2008-02-16 18:30:58 +00:00
Marcel Moolenaar
7a1f364c7d On Montecito processors, the instruction cache is in fact not
coherent with the data caches. Implement a quick fix to allow
us to boot on Montecito, while I'm working on a better fix in
the mean time.

Commit made on Montecito-based Itanium...
2008-02-14 18:46:50 +00:00
Marcel Moolenaar
8bd9e9f2df Allocate a stack for thread0 and switch to it before calling
mi_startup(). This frees up kstack for static PAL/SAL calls
and double-fault handling.
2008-02-04 02:21:33 +00:00
Ruslan Ermilov
007b1b7bae Add a wrapper function that bound checks writes to the dump device. 2008-01-28 19:04:07 +00:00
John Baldwin
5965c4b71c Add COMPAT_FREEBSD7 and enable it in configs that have COMPAT_FREEBSD6. 2008-01-07 21:40:11 +00:00
Alan Cox
eb2a051720 Add an access type parameter to pmap_enter(). It will be used to implement
superpage promotion.

Correct a style error in kmem_malloc(): pmap_enter()'s last parameter is
a Boolean.
2008-01-03 07:34:34 +00:00
Warner Losh
cd093614f3 Use correct function name in panic message 2008-01-03 06:44:12 +00:00
Warner Losh
e2888dfc26 Fix obsolete comment. pmap_remove_all is the function we're in. 2008-01-03 06:35:04 +00:00
Alan Cox
b8e7fc24fe Add configuration knobs for the superpage reservation system. Initially,
the reservation will only be enabled on amd64.
2007-12-27 16:45:39 +00:00
Robert Watson
3de213cc00 Add a new 'why' argument to kdb_enter(), and a set of constants to use
for that argument.  This will allow DDB to detect the broad category of
reason why the debugger has been entered, which it can use for the
purposes of deciding which DDB script to run.

Assign approximate why values to all current consumers of the
kdb_enter() interface.
2007-12-25 17:52:02 +00:00
Joseph Koshy
0da7aa7a7d Add stubs to unbreak LINT. 2007-12-07 13:45:47 +00:00
Marcel Moolenaar
5aaa8fefdf Add a BSD disklabel backend to g_part:
o  Disklabels can have between 8 and 20 partitions (inclusive).
o  No device special file is created for the raw partition.
o  Switch ia64 to use this backend.
o  No support for boot code yet.
2007-12-06 02:32:42 +00:00
Robert Watson
3c90d1ea74 Break out stack(9) from ddb(4):
- Introduce per-architecture stack_machdep.c to hold stack_save(9).
- Introduce per-architecture machine/stack.h to capture any common
  definitions required between db_trace.c and stack_machdep.c.
- Add new kernel option "options STACK"; we will build in stack(9) if it is
  defined, or also if "options DDB" is defined to provide compatibility
  with existing users of stack(9).

Add new stack_save_td(9) function, which allows the capture of a stacktrace
of another thread rather than the current thread, which the existing
stack_save(9) was limited to.  It requires that the thread be neither
swapped out nor running, which is the responsibility of the consumer to
enforce.

Update stack(9) man page.

Build tested:	amd64, arm, i386, ia64, powerpc, sparc64, sun4v
Runtime tested:	amd64 (rwatson), arm (cognet), i386 (rwatson)
2007-12-02 20:40:35 +00:00
John Baldwin
23d34db956 Remove the 'needbounce' variable from the _bus_dmamap_load_buffer()
routine.  It is not needed as the existing tests for segment coalescing
already handle bounced addresses and it prevents legal segment coalescing
in certain edge cases.

MFC after:	1 week
Reviewed by:	scottl
2007-11-27 17:28:12 +00:00
Jason Evans
8af8e94855 Define atomic_readandclear_ptr. 2007-11-27 06:34:15 +00:00
Scott Long
8611774e5e Extend critical section coverage in the low-level interrupt handlers to
include the ithread scheduling step.  Without this, a preemption might
occur in between the interrupt getting masked and the ithread getting
scheduled.  Since the interrupt handler runs in the context of curthread,
the scheudler might see it as having a such a low priority on a busy system
that it doesn't get to run for a _long_ time, leaving the interrupt stranded
in a disabled state.  The only way that the preemption can happen is by
a fast/filter handler triggering a schduling event earlier in the handler,
so this problem can only happen for cases where an interrupt is being
shared by both a fast/filter handler and an ithread handler.  Unfortunately,
it seems to be common for this sharing to happen with network and USB
devices, for example.  This fixes many of the mysterious TCP session
timeouts and NIC watchdogs that were being reported.  Many thanks to Sam
Lefler for getting to the bottom of this problem.

Reviewed by: jhb, jeff, silby
2007-11-21 04:03:51 +00:00
Alan Cox
59677d3c0e Prevent the leakage of wired pages in the following circumstances:
First, a file is mmap(2)ed and then mlock(2)ed.  Later, it is truncated.
Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the
pages beyond the EOF are unmapped and freed.  However, when the file is
mlock(2)ed, the pages beyond the EOF are unmapped but not freed because
they have a non-zero wire count.  This can be a mistake.  Specifically,
it is a mistake if the sole reason why the pages are wired is because of
wired, managed mappings.  Previously, unmapping the pages destroys these
wired, managed mappings, but does not reduce the pages' wire count.
Consequently, when the file is unmapped, the pages are not unwired
because the wired mapping has been destroyed.  Moreover, when the vm
object is finally destroyed, the pages are leaked because they are still
wired.  The fix is to reduce the pages' wired count by the number of
wired, managed mappings destroyed.  To do this, I introduce a new pmap
function pmap_page_wired_mappings() that returns the number of managed
mappings to the given physical page that are wired, and I use this
function in vm_object_page_remove().

Reviewed by: tegge
MFC after: 6 weeks
2007-11-17 22:52:29 +00:00
Marcel Moolenaar
0c3967e7fe o Rename cpu_thread_setup() to cpu_thread_alloc() to better
communicate that it relates to (is called by) thread_alloc()
o  Add cpu_thread_free() which is called from thread_free()
   to counter-act cpu_thread_alloc().

i386:	Have cpu_thread_free() call cpu_thread_clean() to
	preserve behaviour.
ia64:	Have cpu_thread_free() call mtx_destroy() for the
	mutex initialized in cpu_thread_alloc().

PR: ia64/118024
2007-11-14 20:21:54 +00:00
Julian Elischer
431f890614 generally we are interested in what thread did something as
opposed to what process. Since threads by default have teh name of the
process unless over-written with more useful information, just print the
thread name instead.
2007-11-14 06:21:24 +00:00
Konstantin Belousov
89b57fcf01 Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with:	Peter Holm
Reviewed by:	jhb
2007-11-05 11:36:16 +00:00
Marcel Moolenaar
c7373ab346 Set PTE_ACCESSED in the PTE and before inserting it in the VHPT.
This avoids back-to-back faults for all TLB misses. This can be
improved further in the future by also setting PTE_DIRTY for TLB
misses for write accesses.

MFC after: 1 week
2007-10-16 03:20:32 +00:00
Marcel Moolenaar
b4431d3218 The flushrs instruction must be the first in an instruction
group. GNU as(1) already made sure of that, but it's better
to actually have the code right.

MFC after: 1 week
2007-10-16 03:07:56 +00:00
Marcel Moolenaar
f04c3a5908 Print instruction stops to improve analysis of dependency
violations.

MFC after: 1 week
2007-10-16 02:59:03 +00:00
Marcel Moolenaar
b17249b1ec Fix disassembly of the invala, itc, itr and hint instructions
by fixing the opcode ordering.

MFC after: 1 week
2007-10-16 02:49:40 +00:00
Christian Brueffer
4fabde5686 Use the correct expanded name for SCTP.
PR:		116496
Submitted by:	koitsu
Reviewed by:	rrs
Approved by:	re (kensmith)
2007-09-26 20:05:07 +00:00
Alan Cox
7bfda801a8 Change the management of cached pages (PQ_CACHE) in two fundamental
ways:

(1) Cached pages are no longer kept in the object's resident page
splay tree and memq.  Instead, they are kept in a separate per-object
splay tree of cached pages.  However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock.  Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.

This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE).  The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held.  Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.

Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case.  Cached pages
are reclaimed far, far more often than they are reactivated.  Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.

(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.

Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated.  Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page.  Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.

Discussed with: many over the course of the summer, including jeff@,
   Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
Alan Cox
6bce07ae73 It has been observed on the mailing lists that the different categories
of pages don't sum to anywhere near the total number of pages on amd64.
This is for the most part because uma_small_alloc() pages have never been
counted as wired pages, like their kmem_malloc() brethren.  They should
be.  This changes fixes that.

It is no longer necessary for the page queues lock to be held to free
pages allocated by uma_small_alloc().  I removed the acquisition and
release of the page queues lock from uma_small_free() on amd64 and ia64
weeks ago.  This patch updates the other architectures that have
uma_small_alloc() and uma_small_free().

Approved by: re (kensmith)
2007-09-15 18:47:02 +00:00
Marcel Moolenaar
ec2af96ad1 Clear pending interrupts before we enable external interrupts.
Recently the AP in my Merced box seems to have grown a habit
of getting unexpected interrupts, such as redundant wake-ups
and legacy interrupts that require an INTA cycle.

While here, replace DELAY(0) with cpu_spinwait() so that it's
clear what we're doing as well as enable the code to take
advantage of cpu_spinwait() when it gets implemented.

Approved by: re (blanket)
2007-08-06 05:15:57 +00:00
Marcel Moolenaar
78afae27e5 Keep interrupts disabled while handling external interrupts.
There's no advantage in allowing nested external interrupts.
In fact, it leads to a potential stack overrun.

While here, put the interrupt vector in the trapframe, so as
to compensate for the 36 cycle latency of reading cr.ivr.

Further simplify assembly code by dealing with ASTs from C.

Approved by: re (blanket)
2007-08-06 05:11:01 +00:00
Marcel Moolenaar
e54994f990 In ia64_set_rr(), don't perform data serialization. This allows
us to do the data serializations once after writing multiple
region registers, as is done in pmap_switch(). All existing
calls to ia64_set_rr() are followed with calls to ia64_srlz_d().

Approved by: re (blanket)
2007-08-05 18:19:38 +00:00
Marcel Moolenaar
f5a9fc710a Replace "__asm __volatile()" by equivalent support functions from
ia64_cpu.h. This improves readability and consistency and aids in
auditing the code.
Add instruction-serialization after writing to cr.pta.

Delay enabling interrupts until after we setup the clocks and after
we program the task priority register.

Approved by: re (blanket)
2007-08-04 19:52:10 +00:00
Marcel Moolenaar
7c31469f67 Replace "__asm __volatile()" by equivalent support functions from
ia64_cpu.h. This improves readability and consistency and aids in
auditing the code.
Add data-serialization after writing to the region registers and
add instruction-serialization after writing to cr.pta.

Approved by: re (blanket)
2007-08-04 19:36:14 +00:00
Marcel Moolenaar
09363c3636 Replace "__asm __volatile()" by equivalent support functions from
ia64_cpu.h. This improves readability and consistency and aids in
auditing the code.
Add data-serialization after writing to cr.tpr.

Approved by: re (blanket)
2007-08-04 19:33:27 +00:00
Marcel Moolenaar
9d662e5c9d Add required data-serialization after writing to cr.itm and cr.itv.
Approved by: re (blanket)
2007-08-04 19:28:19 +00:00
Marcel Moolenaar
855218fbd1 Add ia64_srlz_d() and ia64_srlz_i() functions to aid in serialization.
Approved by: re (blanket)
2007-08-04 19:26:42 +00:00
Marcel Moolenaar
cf681ceef5 o Switch to physical addressing before dereferencing the VHPT
bucket pointer. The virtual mapping may not be present in the
  translation cache. This will result in a nested TLB fault at
  a place we don't handle (and don't want to handle).
o Make sure there's a stop after the rfi instruction, otherwise
  its behaviour is undefined.
o Make sure we switch back to virtual addressing before doing
  a rfi. Behaviour is undefined otherwise.

Approved by: re (blanket)
2007-07-30 22:52:52 +00:00
Marcel Moolenaar
ea5e2a02af Add option EXCEPTION_TRACING, which enables KTR-like functionality
for processor interruptions. This is especially useful to track
unexpected nested TLB faults.

Approved by: re (blanket)
2007-07-30 22:42:33 +00:00
Marcel Moolenaar
fe1c66b9d7 Rework the interrupt code and add support for interrupt filtering
(INTR_FILTER). This includes:
o  Save a pointer to the sapic structure and IRQ for every vector,
   so that we can quickly EOI, mask and unmask the interrupt.
o  Add locking to the sapic code now that we can reprogram a
   sapic on multiple CPUs at the same time.
o  Use u_int for the vector and IRQ. We only have 256 vectors, so
   using a 64-bit type for it is rather excessive.
o  Properly handle concurrent registration of a handler for the
   same vector.

Since vectors have a corresponding priority, we should not map
IRQs to vectors in a linear fashion, but rather pick a vector
that has a priority in line with the interrupt type. This is left
for later. The vector/IRQ interchange has been untangled as much
as possible to make this easier.

Approved by: re (blacket)
2007-07-30 22:29:33 +00:00
Marcel Moolenaar
8a2a70cb02 Explicitly map the VHPT on all processors. Previously we were
merely lucky that the VHPT was mapped as a side-effect of
mapping the kernel, but when there's enough physical memory,
this may not at all be the case.

Approved by: re (blanket)
2007-07-30 22:12:53 +00:00