Commit Graph

1802 Commits

Author SHA1 Message Date
Marius Strobl
c10f5c0eb1 Update a branch missed in r207537.
MFC after:	3 days
2010-06-13 20:29:55 +00:00
Alan Cox
9124d0d6a3 Relax one of the new assertions in pmap_enter() a little. Specifically,
allow pmap_enter() to be performed on an unmanaged page that doesn't have
VPO_BUSY set.  Having VPO_BUSY set really only matters for managed pages.
(See, for example, pmap_remove_write().)
2010-06-11 15:49:39 +00:00
Alan Cox
ce18658792 Reduce the scope of the page queues lock and the number of
PG_REFERENCED changes in vm_pageout_object_deactivate_pages().
Simplify this function's inner loop using TAILQ_FOREACH(), and shorten
some of its overly long lines.  Update a stale comment.

Assert that PG_REFERENCED may be cleared only if the object containing
the page is locked.  Add a comment documenting this.

Assert that a caller to vm_page_requeue() holds the page queues lock,
and assert that the page is on a page queue.

Push down the page queues lock into pmap_ts_referenced() and
pmap_page_exists_quick().  (As of now, there are no longer any pmap
functions that expect to be called with the page queues lock held.)

Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever
be passed an unmanaged page.  Assert this rather than returning "0"
and "FALSE" respectively.

ARM:

Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().

Push down the page queues lock inside of pmap_clearbit(), simplifying
pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write().
Additionally, this allows for avoiding the acquisition of the page
queues lock in some cases.

PowerPC/AIM:

moea*_page_exits_quick() and moea*_page_wired_mappings() will never be
called before pmap initialization is complete.  Therefore, the check
for moea_initialized can be eliminated.

Push down the page queues lock inside of moea*_clear_bit(),
simplifying moea*_clear_modify() and moea*_clear_reference().

The last parameter to moea*_clear_bit() is never used.  Eliminate it.

PowerPC/BookE:

Simplify mmu_booke_page_exists_quick()'s control flow.

Reviewed by:	kib@
2010-06-10 16:56:35 +00:00
Alan Cox
85a71b2578 Don't set PG_WRITEABLE in pmap_enter() unless the page is managed.
Correct a typo in a nearby comment on sparc64.
2010-06-05 18:20:09 +00:00
Alan Cox
c46b90e90a Push down page queues lock acquisition in pmap_enter_object() and
pmap_is_referenced().  Eliminate the corresponding page queues lock
acquisitions from vm_map_pmap_enter() and mincore(), respectively.  In
mincore(), this allows some additional cases to complete without ever
acquiring the page queues lock.

Assert that the page is managed in pmap_is_referenced().

On powerpc/aim, push down the page queues lock acquisition from
moea*_is_modified() and moea*_is_referenced() into moea*_query_bit().
Again, this will allow some additional cases to complete without ever
acquiring the page queues lock.

Reorder a few statements in vm_page_dontneed() so that a race can't lead
to an old reference persisting.  This scenario is described in detail by a
comment.

Correct a spelling error in vm_page_dontneed().

Assert that the object is locked in vm_page_clear_dirty(), and restrict the
page queues lock assertion to just those cases in which the page is
currently writeable.

Add object locking to vnode_pager_generic_putpages().  This was the one
and only place where vm_page_clear_dirty() was being called without the
object being locked.

Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call
to vm_page_clear_dirty().

Change vnode_pager_generic_putpages() to the modern-style of function
definition.  Also, change the name of one of the parameters to follow
virtual memory system naming conventions.

Reviewed by:	kib
2010-05-26 18:00:44 +00:00
Alan Cox
567e51e18c Roughly half of a typical pmap_mincore() implementation is machine-
independent code.  Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().

Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified().  Assert that these
functions are never passed an unmanaged page.

Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization.  Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.

Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean().  Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.

Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().

Reviewed by:	kib (an earlier version)
2010-05-24 14:26:57 +00:00
Konstantin Belousov
afe1a68827 Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
  usermode into struct syscall_args. The structure is machine-depended
  (this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
  from the syscall. It is a generalization of
  cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
  return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively.  The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by:	jhb, marcel, marius, nwhitehorn, stas
Tested by:	marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
	stas (mips)
MFC after:	1 month
2010-05-23 18:32:02 +00:00
Marius Strobl
4461491b3e Change ad_firmware_geom_adjust() to operate on a struct disk * only and
hook it up to ada(4) also. While at it, rename *ad_firmware_geom_adjust()
to *ata_disk_firmware_geom_adjust() etc now that these are no longer
limited to ad(4).

Reviewed by:	mav
MFC after:	3 days
2010-05-20 12:46:19 +00:00
Alan Cox
9ab6032f73 On entry to pmap_enter(), assert that the page is busy. While I'm
here, make the style of assertion used by pmap_enter() consistent
across all architectures.

On entry to pmap_remove_write(), assert that the page is neither
unmanaged nor fictitious, since we cannot remove write access to
either kind of page.

With the push down of the page queues lock, pmap_remove_write() cannot
condition its behavior on the state of the PG_WRITEABLE flag if the
page is busy.  Assert that the object containing the page is locked.
This allows us to know that the page will neither become busy nor will
PG_WRITEABLE be set on it while pmap_remove_write() is running.

Correct a long-standing bug in vm_page_cowsetup().  We cannot possibly
do copy-on-write-based zero-copy transmit on unmanaged or fictitious
pages, so don't even try.  Previously, the call to pmap_remove_write()
would have failed silently.
2010-05-16 23:45:10 +00:00
Marius Strobl
3eb0900b32 - Enable DMA write parity error interrupts on Schizo with a working
implementation.
- Revert the Sun Fire V890 WAR of r205254. Instead let schizo_pci_bus()
  only panic in case of fatal errors as the interrupt triggered by the
  error the firmware of these and also Sun Fire 280R with version 7
  Schizo caused may happen as late as using the HBA and not only prior
  to touching the PCI bus (in the former case the actual error still is
  fatal but we clear it before touching the PCI bus).
  While at it count and export non-fatal error interrupts via sysctl(9).
- Remove unnecessary locking from schizo_ue().
2010-05-14 20:00:21 +00:00
Alan Cox
3c4a24406b Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and
vm_page_try_to_free().  Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().

Push down the page queues lock into Xen's pmap_page_is_mapped().  (I
overlooked the Xen pmap in r207702.)

Switch to a per-processor counter for the total number of pages cached.
2010-05-08 20:34:01 +00:00
Alan Cox
7024db1d40 Push down the page queues lock inside of vm_page_free_toq() and
pmap_page_is_mapped() in preparation for removing page queues locking
around calls to vm_page_free().  Setting aside the assertion that calls
pmap_page_is_mapped(), vm_page_free_toq() now acquires and holds the page
queues lock just long enough to actually add or remove the page from the
paging queues.

Update vm_page_unhold() to reflect the above change.
2010-05-06 16:39:43 +00:00
Alan Cox
d43708c049 Use an OBJT_PHYS object and thus PG_UNMANAGED pages to implement the TSB.
The TSB is not a pageable structure, so there is no point in using managed
pages.

Reviewed by:	kib
2010-05-05 07:47:40 +00:00
Marius Strobl
5a8336816e Add support for SPARC64 V (and where it already makes sense for other
HAL/Fujitsu) CPUs. For the most part this consists of fleshing out the
MMU and cache handling, it doesn't add pmap optimizations possible with
these CPU, yet, though.
With these changes FreeBSD runs stable on Fujitsu Siemens PRIMEPOWER 250
and likely also other models based on SPARC64 V like 450, 650 and 850.
Thanks go to Michael Moll for providing access to a PRIMEPOWER 250.
2010-05-02 19:38:17 +00:00
Marius Strobl
7e9aef235a Add a hack for SPARC64 V CPUs, which set some undocumented bits in the
first data word.
2010-05-02 12:08:15 +00:00
Kip Macy
2965a45315 On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib
2010-04-30 00:46:43 +00:00
Alan Cox
1332aaf9ed MFamd64/i386 r207205
Clearing a page table entry's accessed bit and setting the page's
  PG_REFERENCED flag in pmap_protect() can't really be justified, so
  don't do it.  Moreover, on ia64, don't set the page's dirty field
  unless pmap_protect() is removing write access.
2010-04-29 15:47:31 +00:00
Konstantin Belousov
8bac98182a Style: use #define<TAB> instead of #define<SPACE>.
Noted by:	bde, pluknet gmail com
MFC after:	11 days
2010-04-27 09:48:43 +00:00
Marius Strobl
b641222476 Don't bother enabling interrupts before we're ready to handle them. This
prevents the firmware of Fujitsu Siemens PRIMEPOWER250, which both causes
stray interrupts and erroneously enables interrupts at least when calling
SUNW,set-trap-table, in the foot.
2010-04-26 20:19:49 +00:00
Marius Strobl
e2f198273c Add OF_getscsinitid(), a helper similar to OF_getetheraddr() but for
obtaining the initiator ID to be used for SPI controllers from the
Open Firmware device tree.
2010-04-26 19:13:10 +00:00
Marius Strobl
cbafc5aff6 - Add a missing const.
- Map the NS16550 found in Fujitsu Siemens PRIMEPOWER250 to PNP0501 as well.
2010-04-26 18:49:06 +00:00
Marius Strobl
239bcd0237 Skip the pseudo-devices found in Fujitsu Siemens PRIMEPOWER250. 2010-04-26 18:44:30 +00:00
Alan Cox
7b85f59183 Resurrect pmap_is_referenced() and use it in mincore(). Essentially,
pmap_ts_referenced() is not always appropriate for checking whether or
not pages have been referenced because it clears any reference bits
that it encounters.  For example, in mincore(), clearing the reference
bits has two negative consequences.  First, it throws off the activity
count calculations performed by the page daemon.  Specifically, a page
on which mincore() has called pmap_ts_referenced() looks less active
to the page daemon than it should.  Consequently, the page could be
deactivated prematurely by the page daemon.  Arguably, this problem
could be fixed by having mincore() duplicate the activity count
calculation on the page.  However, there is a second problem for which
that is not a solution.  In order to clear a reference on a 4KB page,
it may be necessary to demote a 2/4MB page mapping.  Thus, a mincore()
by one process can have the side effect of demoting a superpage
mapping within another process!
2010-04-24 17:32:52 +00:00
Konstantin Belousov
ed7806879b Move the constants specifying the size of struct kinfo_proc into
machine-specific header files. Add KINFO_PROC32_SIZE for struct
kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add
CTASSERT for the size of struct kinfo_proc32.

Submitted by:	pluknet
Reviewed by:	imp, jhb, nwhitehorn
MFC after:	2 weeks
2010-04-24 12:49:52 +00:00
Andrew Thompson
b850ecc180 Change USB_DEBUG to #ifdef and allow it to be turned off. Previously this had
the illusion of a tunable setting but was always turned on regardless.

MFC after:	1 week
2010-04-22 21:31:34 +00:00
Marius Strobl
3c7ae7bf67 Update for UltraSPARC-IV{,+} and SPARC64 V, VI, VII and VIIIfx CPUs. 2010-04-11 15:35:17 +00:00
Marius Strobl
89fcb8cfa7 Add missing copyright shebang. 2010-04-10 12:10:11 +00:00
Marius Strobl
d5dba21cf6 Add sbbc(4), a driver for the BootBus controller found in Serengeti and
StarCat systems which provides time-of-day services for both as well as
console service for Serengeti, i.e. Sun Fire V1280. While the latter is
described with a device type of serial in the OFW device tree, it isn't
actually an UART. Nevertheless the console service is handled by uart(4)
as this allowed to re-use quite a bit of MD and MI code. Actually, this
idea is stolen from Linux which interfaces the sun4v hypervisor console
with the Linux counterpart of uart(4).
2010-04-10 11:52:12 +00:00
Marius Strobl
5679850859 Correct the DCR_IPE macro to refer to the right bit. Also improve the
associated comment as besides US-IV+ these bits are only available with
US-III++, i.e. the 1.2GHz version of the US-III+.
2010-04-10 11:13:51 +00:00
Marius Strobl
368dedb6b6 Unlike the sun4v variant, the sun4u version of SUNW,set-trap-table
actually only takes one argument.
2010-04-10 10:56:59 +00:00
Marius Strobl
15edea2038 Do as the comment suggests and determine the bus space based on the last
bus we actually mapped at rather than always based on the last bus we
encountered while moving upward in the tree. Otherwise we might use the
wrong bus space in case the bridge directly underneath the nexus doesn't
require mapping, i.e. was skipped as it's the case for ssm(4) nodes.
2010-04-10 10:44:41 +00:00
Marius Strobl
f9b669080e - Try do deal gracefully with correctable ECC errors.
- Improve the reporting of unhandled kernel and user traps.
2010-04-02 10:36:40 +00:00
Marius Strobl
d7a0fc4dca Use device_get_nameunit(9) rather than device_get_name(9) so one can
identify the reporting bridge in machines with multiple PCI domains.
2010-03-31 22:32:56 +00:00
Marius Strobl
789a702ad3 Don't re-implement device_get_nameunit(9). 2010-03-31 22:27:33 +00:00
Marius Strobl
4a717bcedc - Take advantage of the INTCLR_* macros.
- Right-justify the backslashes as per style(9).
2010-03-31 22:19:00 +00:00
Nathan Whitehorn
a107d8aac9 Change the arguments of exec_setregs() so that it receives a pointer
to the image_params struct instead of several members of that struct
individually. This makes it easier to expand its arguments in the future
without touching all platforms.

Reviewed by:	jhb
2010-03-25 14:24:00 +00:00
Marius Strobl
07e6e81e2f - The firmware of Sun Fire V1280 has a misfeature of setting %wstate to
7 which corresponds to WSTATE_KMIX in OpenSolaris whenever calling into
  it which totally screws us even when restoring %wstate afterwards as
  spill/fill traps can happen while in OFW. The rather hackish OpenBSD
  approach of just setting the equivalent of WSTATE_KERNEL to 7 also is
  no option as we treat %wstate as a bit field. So in order to deal with
  this problem actually implement spill/fill handlers for %wstate 7 which
  just act as the WSTATE_KERNEL ones except of theoretically also handling
  32-bit, turn off interrupts completely so we don't even take IPIs while
  in OFW which should ensure we only take spill/fill traps at most and
  restore %wstate after calling into OFW once we have taken over the trap
  table. While at it, actually set WSTATE_{,PROM}_KMIX before calling into
  OFW just like OpenSolaris does, which should at least help testing this
  change on non-V1280.
- Remove comments referring to the %wstate usage in BSD/OS.
- Remove the no longer used RSF_ALIGN_RETRY macro.
- Correct some trap table addresses in comments.
- Ensure %wstate is set to WSTATE_KERNEL when taking over the trap table.
- Ensure PSTATE_AM is off when entering or exiting to OFW as well as that
  interrupts are also completely off when exiting to OFW as the firmware
  trap table shouldn't be used to handle our interrupts.
2010-03-21 13:09:54 +00:00
Marius Strobl
c613313c10 Improve the KVA space sizing of 186682; on machines with large dTLBs we
can actually use all of the available lockable entries of the tiny dTLB
for the kernel TSB. With this change the KVA space sizing happens to be
more in line with the MI one so up to at least 24GB machines KVA doesn't
need to be limited manually. This is just another stopgap though, the
real solution is to take advantage of ASI_ATOMIC_QUAD_LDD_PHYS on CPUs
providing it so we don't need to lock the kernel TSB pages into the dTLB
in the first place.
2010-03-20 23:00:43 +00:00
Marius Strobl
ddcc3ff59e o Add support for UltraSparc-IV+:
- Swap the configuration of the first and second large dTLB as with
    US-IV+ these can only hold entries of certain page sizes each, which
    we happened to chose the non-working way around.
  - Additionally ensure that the large iTLB is set up to hold 8k pages
    (currently this happens to be a NOP though).
  - Add a workaround for US-IV+ erratum #2.
  - Turn off dTLB parity error reporting as otherwise we get seemingly
    false positives when copying in the user window by simulating a
    fill trap on return to usermode. Given that these parity errors can
    be avoided by disabling multi issue mode and the problem could be
    reproduced with a second machine this appears to be a silicon bug of
    some sort.
  - Add a membar #Sync also before the stores to ASI_DCACHE_TAG. While
    at it, turn of interrupts across the whole cheetah_cache_flush() for
    simplicity instead of around every flush. This should have next to no
    impact as for cheetah-class machines we typically only need to flush
    the caches a few times during boot when recovering from peeking/poking
    non-existent PCI devices, if at all.
  - Just use KERNBASE for FLUSH as we also do elsewhere as the US-IV+
    documentation doesn't seem to mention that these CPUs also ignore the
    address like previous cheetah-class CPUs do. Again the code changing
    LSU_IC is executed seldom enough that the negligible optimization of
    using %g0 instead should have no real impact.

  With these changes FreeBSD runs stable on V890 equipped with US-IV+
  and -j128 buildworlds in a loop for days are no problem. Unfortunately,
  the performance isn't were it should be as a buildworld on a 4x1.5GHz
  US-IV+ V890 takes nearly 3h while on a V440 with (theoretically) less
  powerfull 4x1.5GHz US-IIIi it takes just over 1h. It's unclear whether
  this is related to the supposed silicon bug mentioned above or due to
  another issue. The documentation (which contains a sever bug in the
  description of the bits added to the context registers though) at least
  doesn't mention any requirements for changes in the CPU handling besides
  those implemented and the cache as well as the TLB configurations and
  handling look fine.
o Re-arrange cheetah_init() so it's easier to add support for SPARC64
  V up to VIIIfx CPUs, which only require parts of this initialization.
2010-03-17 22:45:09 +00:00
Marius Strobl
bc11f2d90f Add macros for the VER.impl of SPARC64 II to VIIIfx. 2010-03-17 21:00:39 +00:00
Marius Strobl
319efdb1cc - Add TTE and context register bits for the additional page sizes supported
by UltraSparc-IV and -IV+ as well as SPARC64 V, VI, VII and VIIIfx CPUs.
- Replace TLB_PCXR_PGSZ_MASK and TLB_SCXR_PGSZ_MASK with TLB_CXR_PGSZ_MASK
  which just is the complement of TLB_CXR_CTX_MASK instead of trying to
  assemble it from the page size bits which vary across CPUs.
- Add macros for the remainder of the SFSR bits, which are useful for at
  least debugging purposes.
2010-03-17 20:23:14 +00:00
Marius Strobl
cb2f0c8ce1 - Add quirk handling for Sun Fire V1280. The firmware of these machines
provides no ino-bitmap properties so forge them using the default set
  of controller interrupts and let schizo_setup_intr() take care of the
  children, hoping for non-fancy routing.
- Add quirk handling for Sun Fire V890. When booting these machines from
  disk a Schizo comes up with PCI error residing which triggers as soon
  as we register schizo_pci_bus() even when clearing it from all involved
  registers (it's no longer indicated once we're in schizo_pci_bus()
  though). Thus make PCI bus errors non-fatal until we actually touch the
  bus. With this change schizo_pci_bus() typically triggers once during
  attach in this case. Obviously this approach isn't exactly race free
  but it's about the best we can do about this problem as we're not
  guaranteed that the interrupt will actually trigger on V890 either, as
  it certainly doesn't when for example netbooting them.
2010-03-17 20:01:01 +00:00
Ed Schouten
338f1debcd Remove COMPAT_43TTY from stock kernel configuration files.
COMPAT_43TTY enables the sgtty interface. Even though its exposure has
only been removed in FreeBSD 8.0, it wasn't used by anything in the base
system in FreeBSD 5.x (possibly even 4.x?). On those releases, if your
ports/packages are less than two years old, they will prefer termios
over sgtty.
2010-03-13 09:21:00 +00:00
Joel Dahl
1edcf74de7 The NetBSD Foundation has granted permission to remove clause 3 and 4 from
the software.

Obtained from:	NetBSD
2010-03-03 17:55:51 +00:00
Marius Strobl
ba96c16ae8 Some machines can not only consist of CPUs running at different speeds
but also of different types, f.e. Sun Fire V890 can be equipped with a
mix of UltraSPARC IV and IV+ CPUs, requiring different MMU initialization
and different workarounds for model specific errata. Therefore move the
CPU implementation number from a global variable to the per-CPU data.
Functions which are called before the latter is available are passed the
implementation number as a parameter now.

This file was missed in r204152.
2010-02-21 09:25:53 +00:00
Marius Strobl
c273f8a878 Starting with UltraSPARC IV CPUs the CPU caches are described with different
OFW properties.
2010-02-20 23:42:24 +00:00
Marius Strobl
9b824f84d5 Some machines can not only consist of CPUs running at different speeds
but also of different types, f.e. Sun Fire V890 can be equipped with a
mix of UltraSPARC IV and IV+ CPUs, requiring different MMU initialization
and different workarounds for model specific errata. Therefore move the
CPU implementation number from a global variable to the per-CPU data.
Functions which are called before the latter is available are passed the
implementation number as a parameter now.
2010-02-20 23:24:19 +00:00
Marius Strobl
a675da796f Predict KASSERTs to be true. 2010-02-13 19:17:06 +00:00
Marius Strobl
f00d6a0cc2 Add ssm(4), which serves as a glue device allowing devices beneath the
scalable shared memory node, which is used in large UltraSPARC III based
machines to group snooping-coherency domains together, like schizo(4) to
be treated like nexus(4) children.
2010-02-13 19:05:34 +00:00
Marius Strobl
527eebfeeb - Add the 'cmp' and 'core' pseudo-busses which are used to group CPU cores
to the exclusion lists as the CPU nodes aren't handled as regular devices
  either. Also add the pseudo-devices found in Sun Fire V1280.
- Allow nexus_attach() and nexus_alloc_resource() to be used by drivers
  derived from nexus(4) for subordinate busses.
- Don't add the zero-sized memory resources of glue devices to the resource
  lists.
2010-02-13 18:51:49 +00:00
Marius Strobl
ac144cadbc Resurrect nexusvar.h from r167307. 2010-02-13 18:18:45 +00:00
Marius Strobl
1c05d13e3f Style fixes 2010-02-13 17:05:57 +00:00
Marius Strobl
c61b6da840 - Search the whole OFW device tree instead of only the children of the
root nexus device for the CPUs as starting with UltraSPARC IV the 'cpu'
  nodes hang off of from 'cmp' (chip multi-threading processor) or 'core'
  or combinations thereof. Also in large UltraSPARC III based machines
  the 'cpu' nodes hang off of 'ssm' (scalable shared memory) nodes which
  group snooping-coherency domains together instead of directly from the
  nexus.
  It would be great if we could use newbus to deal with the different ways
  the 'cpu' devices can hang off of pseudo ones but unfortunately both
  cpu_mp_setmaxid() and sparc64_init() have to work prior to regular device
  probing.
- Add support for UltraSPARC IV and IV+ CPUs. Due to the fact that these
  are multi-core each CPU has two Fireplane config registers and thus the
  module/target ID has to be determined differently so the one specific
  to a certain core is used. Similarly, starting with UltraSPARC IV the
  individual cores use a different property in the OFW device tree to
  indicate the CPU/core ID as it no longer is in coincidence with the
  shared slot/socket ID.
  This involves changing the MD KTR code to not directly read the UPA
  module ID either. We use the MID stored in the per-CPU data instead of
  calling cpu_get_mid() as a replacement in order prevent clobbering any
  registers as side-effect in the assembler version. This requires CATR()
  invocations from mp_startup() prior to mapping the per-CPU pages to be
  removed though.
  While at it additionally distinguish between CPUs with Fireplane and
  JBus interconnects as these also use slightly different sizes for the
  JBus/agent/module/target IDs.
- Make sparc64_shutdown_final() static as it's not used outside of
  machdep.c.
2010-02-13 16:52:33 +00:00
Marius Strobl
dfade4c0a4 - At least the trap table of the Sun Fire V1280 firmware apparently has
no cleanwindows handler so just remove trying to trigger it from _start
  and the AP trampoline code as that leads to a crash there. This should
  be okay as leaking data from the OFW via the CPU registers on start of
  the kernel should be no real concern.
- Make the comments of _start and the AP trampoline code regarding the
  initializations they perform match each other and reality.
- Make the comments of the AP trampoline code regarding iTLB accesses
  refer to the right macro.
2010-02-13 15:36:33 +00:00
Marius Strobl
4f607e8ef2 - Assert that HEAPSZ is a multiple of PAGE_SIZE as at least the firmware
of Sun Fire V1280 doesn't round up the size itself but instead lets
  claiming of non page-sized amounts of memory fail.
- Change parameters and variables related to the TLB slots to unsigned
  which is more appropriate.
- Search the whole OFW device tree instead of only the children of the
  root nexus device for the BSP as starting with UltraSPARC IV the 'cpu'
  nodes hang off of from 'cmp' (chip multi-threading processor) or 'core'
  or combinations thereof. Also in large UltraSPARC III based machines
  the 'cpu' nodes hang off of 'ssm' (scalable shared memory) nodes which
  group snooping-coherency domains together instead of directly from the
  nexus.
- Add support for UltraSPARC IV and IV+ BSPs. Due to the fact that these
  are multi-core each CPU has two Fireplane config registers and thus the
  module/target ID has to be determined differently so the one specific
  to a certain core is used. Similarly, starting with UltraSPARC IV the
  individual cores use a different property in the OFW device tree to
  indicate the CPU/core ID as it no longer is in coincidence with the
  shared slot/socket ID.
  While at it additionally distinguish between CPUs with Fireplane and
  JBus interconnects as these also use slightly different sizes for the
  JBus/agent/module/target IDs.
- Check the return value of init_heap(). This requires moving it after
  cons_probe() so we can panic when appropriate. This should be fine as
  the PowerPC OFW loader uses that order for quite some time now.
2010-02-13 14:13:39 +00:00
Attilio Rao
88cbfa852e Add the options DEADLKRES (introducing the deadlock resolver thread) in
the 'debugging' section of any HEAD kernel and enable for the mainstream
ones, excluding the embedded architectures.
It may, of course, enabled on a case-by-case basis.

Sponsored by:	Sandvine Incorporated
Requested by:	emaste
Discussed with:	kib
2010-02-10 16:30:04 +00:00
Marius Strobl
6d709b9a44 Implement handling of the third argument of cpu_switch(). This unbreaks
sparc64 after r202889.

PR:		143215
MFC after:	1 week
2010-01-30 14:04:21 +00:00
Marius Strobl
376a67cf64 - Zero the MSI/MSI-X queue argument, otherwise mtx_init(9) can panic
indicating an already initialized lock.
- Check for an empty MSI/MSI-X queue entry before asserting that we have
  received a MSI/MSI-X message in order to not panic in case of stray MSI/
  MSI-X queue interrupts which may happen in case of using an interrupt
  handler rather than a filter.

MFC after:	3 days
2010-01-27 20:30:14 +00:00
Marius Strobl
29b6820f87 Merge r202882 from amd64/i386:
For PT_TO_SCE stop that stops the ptraced process upon syscall entry,
syscall arguments are collected before ptracestop() is called. As a
consequence, debugger cannot modify syscall or its arguments.

In syscall(), reread syscall number and arguments after ptracestop(),
if debugger modified anything in the process environment. Since procfs
stopevent requires number of syscall arguments in p_xstat, this cannot
be solved by moving stop/trace point before argument fetching.

Move the code to read arguments into separate function
fetch_syscall_args() to avoid code duplication. Note that ktrace point
for modified syscall is intentionally recorded twice, once with original
arguments, and second time with the arguments set by debugger.

PT_TO_SCX stop is executed after cpu_syscall_set_retval() already.

Reviewed by:	kib
2010-01-23 22:11:18 +00:00
John Baldwin
13c18821fa Move the examples for the 'hints' and 'env' keywords from various GENERIC
kernel configs into NOTES.

Reviewed by:	imp
2010-01-19 17:20:34 +00:00
Marius Strobl
0424a37bea Add epic(4) also here.
MFC after:	3 days
2010-01-18 20:25:29 +00:00
Marius Strobl
b10fd7e9cc When setting up MSIs with a filter ensure that the event queue interrupt
is cleared as it might have triggered before and given we supply NULL
as ic_clear, inthand_add() won't do this for us in this case.
2010-01-10 18:39:29 +00:00
Warner Losh
87948dfdf2 Add INCLUDE_CONFIG_FILE in GENERIC on all non-embedded platforms.
# This is the resolution of removing it from DEFAULTS...

MFC after:	5 days
2010-01-10 17:44:22 +00:00
Marius Strobl
319570f9c4 Add epic(4), a driver for the front panel LEDs in Sun Fire V215/V245.
It's named after the driver doing the same job in OpenSolaris.
2010-01-10 15:44:48 +00:00
Marius Strobl
0e34fcfba8 - According to OpenSolaris it's sufficient to align the MSIs of a
device in the table based on the count rather than the maxcount.
  Also the previous code didn't work properly as it would have been
  necessary to reserve the entire maxcount range in order keep later
  requests from filling the spare MSIs between count and maxcount,
  which would be complicated to unreserve in fire_release_msi().
- For MSIs with filters rather than handlers only don't clear the
  event queue interrupt via fire_intr_clear() since given that these
  are executed directly would clear it while we're still processing
  the event queue, which in turn would lead to lost MSIs.
- Save one level of indentation in fire_setup_intr().
- Correct a bug in fire_teardown_intr() which prevented it from
  correctly restoring the MSI in the resource, causing allocation of
  a resource representing an MSI to fail after the first pass when
  repeatedly loading and unloading a driver module.
2010-01-10 15:12:15 +00:00
Bjoern A. Zeeb
193171b7f5 In sys/<arch>/conf/Makefile set TARGET to <arch>. That allows
sys/conf/makeLINT.mk to only do certain things for certain
architectures.

Note that neither arm nor mips have the Makefile there, thus
essentially not (yet) supporting LINT.  This would enable them
do add special treatment to sys/conf/makeLINT.mk as well chosing
one of the many configurations as LINT.

This is a hack of doing this and keeping it in a separate commit
will allow us to more easily identify and back it out.

Discussed on/with:	arch, jhb (as part of the LINT-VIMAGE thread)
MFC after:		1 month
2010-01-08 18:57:31 +00:00
Pyun YongHyeon
fa82d01504 Enable ste(4). ste(4) should work on all architectures. 2010-01-08 02:46:34 +00:00
Warner Losh
56eff2143f Revert 200594. This file isn't intended for these sorts of things. 2010-01-04 21:30:04 +00:00
Brooks Davis
9efde58392 Add vlan(4) to all GENERIC kernels.
MFC after:	1 week
2010-01-03 20:40:54 +00:00
Marius Strobl
9a41d9ef93 Exclude options COMPAT_FREEBSD4 now that the MD freebsd4_sigreturn()
is gone since r201396 and which is also in line with the fact that
FreeBSD 4 didn't supported sparc64.
2010-01-03 02:58:43 +00:00
Marius Strobl
be70411f82 - Demapping unused kernel TLB slots has proven to work reliably so move
the associated debugging under bootverbose.
- Remove freebsd4_sigreturn(); given that FreeBSD 4 didn't supported
  sparc64 this only ever served as a transition aid prior to FreeBSD
  5.0 and is unused by default since COMPAT_FREEBSD4 was removed from
  GENERIC in r143072 nearly 5 years ago.
2010-01-02 15:44:16 +00:00
Marius Strobl
c936816f6d - Preserve the PROM IOMMU in order to allow OFW drivers to continue to
work.
- Sanity check the parameters passed to the implementations of the
  pcib_{read,write}_config() methods. Using illegal values can cause
  no real harm but it doesn't hurt to avoid unnecessary data error
  traps requiring to flush and re-enable the level 1 caches.
2010-01-02 15:19:33 +00:00
Marius Strobl
209817b163 Fix botches in r201005:
- Actually use the newly introduced sc_res in the front-end.
- Remove a whitespace glitch in mk48txx_gettime().
2010-01-01 22:47:53 +00:00
Marius Strobl
c64b2eba8a - Remove a redundant variable and an unnecessary cast.
- Fix whitespace.
2009-12-29 14:06:36 +00:00
Marius Strobl
7d027c4ff2 - Prefer i and j over i and n for temporary integer variables.
- Wrap/shorten too long lines.
- Remove a redundant variable and an unnecessary cast in schizo(4).
2009-12-29 14:03:38 +00:00
Marius Strobl
4c6587a6cf Account for firmware versions which include the CDMA interrupts in
the OFW device tree.

MFC after:	3 days
2009-12-28 14:16:40 +00:00
Marius Strobl
5cb5104246 Add a driver for the `Fire' JBus to PCIe bridges found in at least
the Sun Fire V215/V245 and Sun Ultra 25/45 machines. This driver also
already includes all the code to support the `Oberon' Uranus to PCIe
bridges found in the Fujitsu-Siemens based Mx000 machines but due to
lack of access to such a system for testing, probing of these bridges
is currently disabled.
Unfortunately, the event queue mechanism of these bridges for MSIs/
MSI-Xs matches our current MD and MI interrupt frameworks like square
pegs fit into round holes so for now we are generous and use one event
queue per MSI, which limits us to 35 MSIs/MSI-Xs per Host-PCIe-bridge
(we use one event queue for the PCIe error messages). This seems
tolerable as long as most devices just use one MSI/MSI-X anyway.
Adding knowledge about MSIs/MSI-Xs to the MD interrupt code should
allow us to decouple the 1:1 mapping at the cost of no longer being
able to bind MSIs/MSI-Xs to specific CPUs as we currently have no
reliable way to quiesce a device during the transition of its MSIs/
MSI-Xs to another event queue. This would still require the problem
of interrupt storms generated by devices which have no one-shot
behavior or can't/don't mask interrupts while the filter/handler is
executed (like the older PCIe NICs supported by bge(4)) to be solved
though.

Committed from:	26C3
2009-12-27 16:55:44 +00:00
Marius Strobl
de6a190429 Style changes
Obtained from:	NetBSD (mc146818reg.h)
2009-12-25 22:53:46 +00:00
Marius Strobl
033a5c8ce1 - Take advantage of bus_{read,write}_*(9).
- Set dow = -1 in mk48txx_gettime() because some drivers (for example
  the NetBSD and OpenBSD mk48txx(4)) don't set it correctly.
2009-12-25 21:53:20 +00:00
Marius Strobl
10a290af39 - Hook up the default implementations of the MSI/MSI-X pcib_if methods
so requests may bubble up to a host-PCI bridge driver.
- Distinguish between PCI and PCIe bridges in the device description
  so it's a bit easier to follow what hangs off of what in the dmesg.
  Unfortunately we can't also tell PCI and PCI-X apart based on the
  information provided in the OFW device tree.
- Add quirk handling for the ALi M5249 found in Fire-based machines
  which are used as a PCIe-PCIe bridge there. These are obviously
  subtractive decoding as as they have a PCI-ISA bridge on their
  secondary side (and likewise don't include the ISA I/O range in
  their bridge decode) but don't indicate this via the class code.
  Given that this quirk isn't likely to apply to all ALi M5249 and
  I have no datasheet for these chips so I could implement a check
  using the chip specific bits enabling subtractive decoding this
  quirk handling is added to the MD code rather than the MI one.
2009-12-25 15:03:05 +00:00
Marius Strobl
3438bdc53e Merge from amd64/i386:
Implement support for interrupt descriptions.
2009-12-24 15:43:37 +00:00
Marius Strobl
b1c13fc739 Add missing locking in intr_bind(). 2009-12-24 15:40:08 +00:00
Marius Strobl
d92734d25a - Don't check for a valid interrupt controller on every interrupt
in intr_execute_handlers(). If we managed to get here without an
  associated interrupt controller we have way bigger problems.
  While at it predict stray vector interrupts as false as they are
  rather unlikely.
- Don't blindly call the clear function of an interrupt controller
  when adding a handler in inthand_add() as interrupt controllers
  like the one driven by upa(4) are auto-clearing and thus provide
  NULL instead.
2009-12-24 12:27:22 +00:00
Marius Strobl
c39f5f5af4 - By re-arranging the code in OF_decode_addr() somewhat and accepting
a bit of a detour we can just iterate through the banks array instead
  of having to calculate every offset. This change is inspired by the
  powerpc version of this function.
- Add support for the JBus to EBus bridges which hang off of nexus(4).
2009-12-23 22:25:23 +00:00
Marius Strobl
ee57d839bb Style changes. 2009-12-23 22:11:33 +00:00
Marius Strobl
6ed76228c1 - Add support for the IOMMUs of Fire JBus to PCIe and Oberon Uranus
to PCIe bridges.
- Add support for talking the PROM mappings over to the kernel IOTSB
  just like we do with the kernel TSB in order to allow OFW drivers
  to continue to work.
- Change some members, parameters and variables to unsigned where
  more appropriate.
2009-12-23 22:02:34 +00:00
Marius Strobl
46c9b5d9bd Fix whitespace according to style(9). 2009-12-23 21:51:41 +00:00
Marius Strobl
926f4e43c6 - Add quirk handling for ALi M5229, mainly setting the magic "force
enable IDE I/O" bit which prevents data access traps with revision
  0xc8 in Fire-based machines when pci(4) enables PCIM_CMD_PORTEN.
- Like for sun4v also don't add the PCI side of host-PCIe bridges to
  the bus on sun4u as they don't have configuration space implement
  there either.
2009-12-23 21:38:59 +00:00
Marius Strobl
9cc21da577 - Sort the prototypes.
- Add macros to ease the access of device configuration space in
  ofw_pcibus_setup_device().
2009-12-23 21:25:16 +00:00
Marius Strobl
36fd650f09 Add structures for OFW MSI/MSI-X support. These are identical for
both sun4u and sun4v.
2009-12-23 21:07:49 +00:00
Marius Strobl
7bb518f115 Don't probe the bq4802 variant found in Ultra 25 and 45 for now as
this chip isn't MC146818 compatible and requires different handlers
(but which I can't test due to lack of such hardware).
2009-12-23 20:42:14 +00:00
Marius Strobl
4d116858df Don't use an out register to hold the vector number across the call
of the interrupt handler in intr_fast() as the handler might clobber
it (no in-tree handler currently does but an upcoming one will).
While at it, tidy the register usage in the interrupt counting code.
2009-12-23 20:23:04 +00:00
Marcel Moolenaar
d9c849eceb Calculate the average CPU clock frequency and export that through
the hw.freq.cpu sysctl variable. This can be used by ports that
need to know "the" CPU frequency.
2009-12-23 06:52:12 +00:00
Marius Strobl
b748710c34 - Correct an off-by-one error when calculating the end of a child
range.
- Spell the PCI TLA in uppercase.
2009-12-22 21:53:19 +00:00
Marius Strobl
63cd5b6d38 - Add support for the JBus to EBus bridges which hang off of nexus(4)
and are found in sun4u and sun4v machines based on the Fire ASIC.
- Initialize the configuration space of the PCI to EBus variant the
  same way as OpenSolaris does.
2009-12-22 21:49:53 +00:00
Marius Strobl
8bf72e61a7 - Add macros for the states of the interrupt clear registers.
- Change INTMAP_VEC() to take an INO as its second argument rather
  than an INR. The former is what I actually intended with this
  macro and how it's currently used.
2009-12-22 21:48:18 +00:00
Marius Strobl
ea77e7bb3f Make these constants unsigned which is more appropriate. 2009-12-22 21:42:54 +00:00
Marius Strobl
1bba41a506 Enroll these drivers in multipass probing. The motivation behind this
is that the JBus to EBus bridges share the interrupt controller of a
sibling JBus to PCIe bridge (at least as far as the OFW device tree
is concerned, in reality they are part of the same chip) so we have to
probe and attach the latter first. That happens to be also the case
due to the fact that the JBus to PCIe bridges appear first in the OFW
device tree but it doesn't hurt to ensure the right order.
2009-12-22 21:02:46 +00:00
Marius Strobl
6a963456fd Add missing module dependency information. 2009-12-21 21:41:33 +00:00
Marius Strobl
75b02cac5a Provide and consume missing module dependency information. 2009-12-21 21:29:16 +00:00
Doug Barton
f1bdf073c1 Add INCLUDE_CONFIG_FILE, and a note in comments about how to also
include the comments with CONFIGARGS
2009-12-16 02:17:43 +00:00
Marius Strobl
259100de20 Add additional checks of the kernel stack addresses in order to
ensure we don't overrun the end of the call chain.

MFC after:	1 week
2009-12-08 20:18:54 +00:00
Marius Strobl
22bf171313 Add <machine/pcb.h> missed in r199135. 2009-12-07 15:29:07 +00:00
Alan Cox
e2997fea72 Simplify the invocation of vm_fault(). Specifically, eliminate the flag
VM_FAULT_DIRTY.  The information provided by this flag can be trivially
inferred by vm_fault().

Discussed with:	kib
2009-11-27 20:24:11 +00:00
Marius Strobl
8fe75fc879 Unroll copying of the registers in {g,s}et_mcontext() and limit it
to the set actually restored by tl0_ret() instead of using the whole
trapframe. Additionally skip %g7 as that register is used as the
userland TLS pointer.

PR:		140523
MFC after:	1 week
2009-11-17 21:08:10 +00:00
Konstantin Belousov
a7b890448c Extract the code that records syscall results in the frame into MD
function cpu_set_syscall_retval().

Suggested by:	marcel
Reviewed by:	marcel, davidxu
PowerPC, ARM, ia64 changes:	marcel
Sparc64 tested and reviewed by:	marius, also sunv reviewed
MIPS tested by:	gonzo
MFC after:	1 month
2009-11-10 11:43:07 +00:00
Konstantin Belousov
d6e029adbe In r197963, a race with thread being selected for signal delivery
while in kernel mode, and later changing signal mask to block the
signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race
exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls.

Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals, closing the race.

Reviewed by:	davidxu
Tested by:	pho
MFC after:	1 month
2009-10-27 10:47:58 +00:00
Marius Strobl
bed992dc26 Sync with the other archs and wrapper the prototype of in_cksum_skip(9)
in #ifdef _KERNEL.

Submitted by:	Ulrich Spoerlein
MFC after:	1 month
2009-10-26 22:00:26 +00:00
Marcel Moolenaar
1a4fcaebe3 o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
    vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
    that translates the vm_map_t argumument to pmap_t.
o   Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
    it replaces the pmap_page_executable() function, added to solve
    the I-cache problem in uiomove_fromphys().
o   In proc_rwmem() call vm_sync_icache() when writing to a page that
    has execute permissions. This assures that when breakpoints are
    written, the I-cache will be coherent and the process will actually
    hit the breakpoint.
o   This also fixes the Book-E PMAP implementation that was missing
    necessary locking while trying to deal with the I-cache coherency
    in pmap_enter() (read: mmu_booke_enter_locked).

The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.
2009-10-21 18:38:02 +00:00
Marius Strobl
a430b967b5 Change the load base to below 2GB so PIE binaries work including when
compiled to use the Medium/Low code model, which we currently default
to for the userland. GNU/Linux has moved their default to Medium/Middle
some time ago, which probably explains why the current GNU ld(1) uses
a base in the range between 32 and 44 bits instead.

Submitted by:	kib
2009-10-18 13:08:15 +00:00
John Baldwin
55b6a401ef Move the USB wireless drivers down into their own section next to the USB
ethernet drivers.

Submitted by:	Glen Barber  glen.j.barber @ gmail
MFC after:	1 month
2009-10-13 19:02:03 +00:00
Konstantin Belousov
023063938a Define architectural load bases for PIE binaries. Addresses were selected
by looking at the bases used for non-relocatable executables by gnu ld(1),
and adjusting it slightly.

Discussed with:	bz
Reviewed by:	kan
Tested by:	bz (i386, amd64), bsam (linux)
MFC after:	some time
2009-10-10 15:31:24 +00:00
Bjoern A. Zeeb
52bf2041ac Make sure that the primary native brandinfo always gets added
first and the native ia32 compat as middle (before other things).
o(ld)brandinfo as well as third party like linux, kfreebsd, etc.
stays on SI_ORDER_ANY coming last.

The reason for this is only to make sure that even in case we would
overflow the MAX_BRANDS sized array, the native FreeBSD brandinfo
would still be there and the system would be operational.

Reviewed by:	kib
MFC after:	1 month
2009-10-03 11:57:21 +00:00
Marius Strobl
eaa817d661 Merge r194204 from amd64/i386:
Enable PRINTF_BUFR_SIZE by default.

PR:		139134
MFC after:	3 days
2009-09-25 17:08:51 +00:00
Alan Cox
fe105d45a2 Add a new sysctl for reporting all of the supported page sizes.
Reviewed by:	jhb
MFC after:	3 weeks
2009-09-18 17:04:57 +00:00
Marius Strobl
7b8253c3eb Factor out the duplicated macro for the device type used in the
OFW device tree for PCI bridges and add a new one for PCI Express.
While at it, take advantage of the former for the rman(9) work-
around in jbusppm(4).
2009-09-13 14:47:31 +00:00
Poul-Henning Kamp
a254d1f16d Get rid of the _NO_NAMESPACE_POLLUTION kludge by creating an
architecture specific include file containing the _ALIGN*
stuff which <sys/socket.h> needs.
2009-09-08 20:45:40 +00:00
Attilio Rao
dc6fbf6545 * Completely Remove the option STOP_NMI from the kernel. This option
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions.  This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Adds an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding

This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.

Please don't forget to update your config file with the STOP_NMI
option removal

Reviewed by:	jhb
Tested by:	pho, bz, rink
Approved by:	re (kib)
2009-08-13 17:09:45 +00:00
John Baldwin
013818111a Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses.  The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by:	alc
Approved by:	re (kensmith)
MFC after:	2 weeks
2009-07-24 13:50:29 +00:00
Marius Strobl
fada2a867d Add a MD __PCI_BAR_ZERO_VALID which denotes that BARs containing 0
actually specify valid bases that should be treated just as normal.
The PCI specifications have no indication that 0 would be a magic value
indicating a disabled BAR as commonly used on at least amd64 and i386
but not sparc64. It's unclear what to do in pci_delete_resource()
instead of writing 0 to a BAR though as there's no (other) way do
disable individual BARs so its decoding is left enabled in case of
__PCI_BAR_ZERO_VALID for now.

Approved by:	re (kib), jhb
MFC after:	1 week
2009-07-21 19:06:39 +00:00
Alan Cox
3153e878dd Add support to the virtual memory system for configuring machine-
dependent memory attributes:

Rename vm_cache_mode_t to vm_memattr_t.  The new name reflects the
fact that there are machine-dependent memory attributes that have
nothing to do with controlling the cache's behavior.

Introduce vm_object_set_memattr() for setting the default memory
attributes that will be given to an object's pages.

Introduce and use pmap_page_{get,set}_memattr() for getting and
setting a page's machine-dependent memory attributes.  Add full
support for these functions on amd64 and i386 and stubs for them on
the other architectures.  The function pmap_page_set_memattr() is also
responsible for any other machine-dependent aspects of changing a
page's memory attributes, such as flushing the cache or updating the
direct map.  The uses include kmem_alloc_contig(), vm_page_alloc(),
and the device pager:

  kmem_alloc_contig() can now be used to allocate kernel memory with
  non-default memory attributes on amd64 and i386.

  vm_page_alloc() and the device pager will set the memory attributes
  for the real or fictitious page according to the object's default
  memory attributes.

Update the various pmap functions on amd64 and i386 that map pages to
incorporate each page's memory attributes in the mapping.

Notes: (1) Inherent to this design are safety features that prevent
the specification of inconsistent memory attributes by different
mappings on amd64 and i386.  In addition, the device pager provides a
warning when a device driver creates a fictitious page with memory
attributes that are inconsistent with the real page that the
fictitious page is an alias for. (2) Storing the machine-dependent
memory attributes for amd64 and i386 as a dedicated "int" in "struct
md_page" represents a compromise between space efficiency and the ease
of MFCing these changes to RELENG_7.

In collaboration with: jhb

Approved by:	re (kib)
2009-07-12 23:31:20 +00:00
Rui Paulo
59aa14a91d Implementation of the upcoming Wireless Mesh standard, 802.11s, on the
net80211 wireless stack. This work is based on the March 2009 D3.0 draft
standard. This standard is expected to become final next year.
This includes two main net80211 modules, ieee80211_mesh.c
which deals with peer link management, link metric calculation,
routing table control and mesh configuration and ieee80211_hwmp.c
which deals with the actually routing process on the mesh network.
HWMP is the mandatory routing protocol on by the mesh standard, but
others, such as RA-OLSR, can be implemented.

Authentication and encryption are not implemented.

There are several scripts under tools/tools/net80211/scripts that can be
used to test different mesh network topologies and they also teach you
how to setup a mesh vap (for the impatient: ifconfig wlan0 create
wlandev ... wlanmode mesh).

A new build option is available: IEEE80211_SUPPORT_MESH and it's enabled
by default on GENERIC kernels for i386, amd64, sparc64 and pc98.

Drivers that support mesh networks right now are: ath, ral and mwl.

More information at: http://wiki.freebsd.org/WifiMesh

Please note that this work is experimental. Also, please note that
bridging a mesh vap with another network interface is not yet supported.

Many thanks to the FreeBSD Foundation for sponsoring this project and to
Sam Leffler for his support.
Also, I would like to thank Gateworks Corporation for sending me a
Cambria board which was used during the development of this project.

Reviewed by:	sam
Approved by:	re (kensmith)
Obtained from:	projects/mesh11s
2009-07-11 15:02:45 +00:00
Sam Leffler
8c393fd1f0 Cleanup ALIGNED_POINTER:
o add to platforms where it was missing (arm, i386, powerpc, sparc64, sun4v)
o define as "1" on amd64 and i386 where there is no restriction
o make the type returned consistent with ALIGN
o remove _ALIGNED_POINTER
o make associated comments consistent

Reviewed by:	bde, imp, marcel
Approved by:	re (kensmith)
2009-07-05 17:45:48 +00:00
Ed Schouten
89fe4c0a2b Enable POSIX semaphores on all non-embedded architectures by default.
More applications (including Firefox) seem to depend on this nowadays,
so not having this enabled by default is a bad idea.

Proposed by:	miwi
Patch by:	Florian Smeets <flo kasimir com>
Approved by:	re (kib)
2009-07-02 18:24:37 +00:00
Marius Strobl
49c8326a79 - Work around the broken loader behavior of not demapping no longer
used kernel TLB slots when unloading the kernel or modules, which
  results in havoc when loading a kernel and modules which take up
  less TLB slots afterwards as the unused but locked ones aren't
  accounted for in virtual_avail. Eventually this should be fixed
  in the loader which isn't straight forward though and the kernel
  should be robust against this anyway. [1]
- Ensure that the addresses allocated directly from phys_avail[] by
  pmap_bootstrap_alloc() are always colored properly. This implicit
  assumption was broken in r194784 as unlike the other consumers the
  DPCPU area allocated for the BSP isn't a multiple of PAGE_SIZE *
  DCACHE_COLORS. [2]
- Remove the no longer used global msgbuf_phys.
- Remove the redundant ekva parameter of pmap_bootstrap_alloc().
- Correct some outdated function names in ktr(9) invocations.

Requested by:	jhb [1]
Reported by:	gavin [2]
Approved by:	re (kib)
MFC after:	2 weeks
2009-06-28 22:42:51 +00:00
Alan Cox
5797795f5a Correct the #endif comment.
Noticed by:	jmallett
Approved by:	re (kib)
2009-06-26 16:22:24 +00:00
Alan Cox
e999111ae7 This change is the next step in implementing the cache control functionality
required by video card drivers.  Specifically, this change introduces
vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all
architectures.  In addition, this changes adds a vm_cache_mode_t parameter
to kmem_alloc_contig() and vm_phys_alloc_contig().  These will be the
interfaces for allocating mapped kernel memory and physical memory,
respectively, with non-default cache modes.

In collaboration with:	jhb
2009-06-26 04:47:43 +00:00
Marius Strobl
16d58437ed o merge from amd64:
- r187144: Add a reference to the config(5) manpage and
    to the "env" kernel config option.
  - Add/enable the default USB drivers. Originally the USB
    controller and keyboard drivers were disabled as these
    interacted badly with the Open Firmware console driver,
    i.e. caused the keyboard to not work with ofw_console(4).
    Even when switch to uart(4) and the frame buffer drivers
    most of the USB drivers still were kept disabled as
    several of them, amongst others all of the drivers for
    USB Ethernet controllers, weren't endian clean. With the
    new USB stack these problem should be gone now so there's
    no longer a reason to not include the same set of USB
    drivers amd64 does.
o Remove the commented out device ofw_console; apart from it
  being currently broken by some TTY changes one really needs
  to know how to actually enable and make it work correctly.
2009-06-24 20:49:02 +00:00
Konstantin Belousov
186cff43e3 Unbreak sparc64 after the swap accounting changes: mark kernel_map
entries allocated for translations in pmap_init() as MAP_NOFAULT. This
prevents vm_map_insert from trying to account the entries for swap
usage, that is both wrong and too early to work.

While there, change FALSE to VMFS_NO_SPACE.

Reported and tested by:	Florian Smeets <flo at kasimir com>
Reviewed by:	marius
2009-06-24 16:52:30 +00:00
Jeff Roberson
50c202c592 Implement a facility for dynamic per-cpu variables.
- Modules and kernel code alike may use DPCPU_DEFINE(),
   DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined
   PCPU_*.  Requires only one extra instruction more than PCPU_* and is
   virtually the same as __thread for builtin and much faster for shared
   objects.  DPCPU variables can be initialized when defined.
 - Modules are supported by relocating the module's per-cpu linker set
   over space reserved in the kernel.  Modules may fail to load if there
   is insufficient space available.
 - Track space available for modules with a one-off extent allocator.
   Free may block for memory to allocate space for an extent.

Reviewed by:    jhb, rwatson, kan, sam, grehan, marius, marcel, stas
2009-06-23 22:42:39 +00:00
Marius Strobl
119051cbf9 Add cas(4), a driver for Sun Cassini/Cassini+ and National Semiconductor
DP83065 Saturn Gigabit Ethernet controllers. These are the successors
of the Sun GEM controllers and still have a similar but extended transmit
logic. As such this driver is based on gem(4).
Thanks to marcel@ for providing a Sun Quad GigaSwift Ethernet UTP (QGE)
card which was vital for getting this driver to work on architectures
not using Open Firmware.

Approved by:	re (kib)
MFC after:	2 weeks
2009-06-15 18:22:41 +00:00
Robert Watson
bd875f5f13 Remove MAC kernel config files and add "options MAC" to GENERIC, with the
goal of shipping 8.0 with MAC support in the default kernel.  No policies
will be compiled in or enabled by default, but it will now be possible to
load them at boot or runtime without a kernel recompile.

While the framework is not believed to impose measurable overhead when no
policies are loaded (a result of optimization over the past few months in
HEAD), we'll continue to benchmark and optimize as the release approaches.
Please keep an eye out for performance or functionality regressions that
could be a result of this change.

Approved by:	re (kensmith)
Obtained from:	TrustedBSD Project
2009-06-02 18:31:08 +00:00
Jamie Gritton
76ca6f88da Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex.  Jails may
have their own host information, or they may inherit it from the
parent/system.  The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL.  The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by:	bz (mentor)
2009-05-29 21:27:12 +00:00
Pyun YongHyeon
460da49973 Add nge(4), nge(4) should work on all architectures. 2009-05-21 02:19:01 +00:00
Marcel Moolenaar
dbb95048da Add cpu_flush_dcache() for use after non-DMA based I/O so that a
possible future I-cache coherency operation can succeed. On ARM
for example the L1 cache can be (is) virtually mapped, which
means that any I/O that uses temporary mappings will not see the
I-cache made coherent. On ia64 a similar behaviour has been
observed. By flushing the D-cache, execution of binaries backed
by md(4) and/or NFS work reliably.
For Book-E (powerpc), execution over NFS exhibits SIGILL once in
a while as well, though cpu_flush_dcache() hasn't been implemented
yet.

Doing an explicit D-cache flush as part of the non-DMA based I/O
read operation eliminates the need to do it as part of the
I-cache coherency operation itself and as such avoids pessimizing
the DMA-based I/O read operations for which D-cache are already
flushed/invalidated. It also allows future optimizations whereby
the bcopy() followed by the D-cache flush can be integrated in a
single operation, which could be implemented using on-chips DMA
engines, by-passing the D-cache altogether.
2009-05-18 18:37:18 +00:00
Marius Strobl
1eb3155347 Just like in cpu_halt(), use cpu_shutdown() rather than ofw_exit()
directly in cpu_reset() in order to idle the APs before exiting
the kernel and letting the BSP enter the firmware so that processes
like init(8) which still might be running on an AP at that point
don't cause a panic there when it crashes due to the fact it no
longer can be supported by the kernel.

MFC after:	3 days
2009-05-10 20:41:52 +00:00
Marius Strobl
c689752783 - Fix style.
- Use __FBSDID.
2009-05-10 20:22:41 +00:00
Jun Kuriyama
b3b17597ea - Use "device\t" and "options \t" for consistency. 2009-05-10 00:00:25 +00:00
Robert Watson
9725389e1e Don't conditionally define CACHE_LINE_SHIFT, as we anticipate sizing
a fair number of static data structures, making this an unlikely
option to try to change without also changing source code. [1]

Change default cache line size on ia64, sparc64, and sun4v to 128
bytes, as this was what rtld-elf was already using on those
platforms. [2]

Suggested by:	bde [1], jhb [2]
MFC after:	2 weeks
2009-04-20 12:59:23 +00:00
Robert Watson
22037b2d2c Add description and cautionary note regarding CACHE_LINE_SIZE.
MFC after:	2 weeks
Suggested by:	alc
2009-04-19 21:26:36 +00:00
Robert Watson
a93fa8f2bb For each architecture, define CACHE_LINE_SHIFT and a derived
CACHE_LINE_SIZE constant.  These constants are intended to
over-estimate the cache line size, and be used at compile-time
when a run-time tuning alternative isn't appropriate or
available.

Defaults for all architectures are 64 bytes, except powerpc
where it is 128 bytes (used on G5 systems).

MFC after:	2 weeks
Discussed on:   arch@
2009-04-19 20:19:13 +00:00
Dmitry Chagin
cd899aad76 Fix KBI breakage by r190520 which affects older linux.ko binaries:
1) Move the new field (brand_note) to the end of the Brandinfo structure.
2) Add a new flag BI_BRAND_NOTE that indicates that the brand_note pointer
   is valid.
3) Use the brand_note field if the flag BI_BRAND_NOTE is set and as old
   modules won't have the flag set, so the new field brand_note would be
   ignored.

Suggested by:	jhb
Reviewed by:	jhb
Approved by:	kib (mentor)
MFC after:	6 days
2009-04-05 09:27:19 +00:00
Marius Strobl
8db04c5c8f Revert r190105 so that removing options KDB but DDB or GDB being
available will cause the kernel to not respect -d and boot_kdb=1
for consistency with the other platforms as pointed out by marcel@.
2009-03-20 17:10:50 +00:00
Marius Strobl
f449699d53 Hook up the generic OFW pnpinfo string method. 2009-03-19 21:14:45 +00:00
Marius Strobl
45091a0b8c Bring the implementation of the pnpinfo string function more in
line with the rest of this file.
2009-03-19 21:12:44 +00:00
Marius Strobl
f27d082cdf - As suggested by OpenSolaris use up-burst-sizes for determining the
supported burst sizes.
- Add support for 64-bit burst sizes (required for SBus GEM).
- Failing to register as interrupt controller during attach shouldn't
  be fatal so just inform about this instead of panicing.
- Take advantage of KOBJMETHOD_END.
- Remove some redundant variables.
- Add missing const.
2009-03-19 21:02:36 +00:00
Marius Strobl
ff5a50322a Add device found in B100. 2009-03-19 20:57:59 +00:00
Marius Strobl
81be15f086 Sort include. 2009-03-19 20:54:15 +00:00
Marius Strobl
2ba16c40b4 - Ensure we find no unexpected partner.
- Failing to register as interrupt controller during attach shouldn't
  be fatal so just inform about this instead of panicing.
- Disable rerun of the streaming cache as workaround for a silicon bug
  of certain Psycho versions.
- Remove the comment regarding lack of newbus'ified bus_dma(9) as being
  able to associate a DMA tag with a device would allow to implement
  CDMA flushing/syncing in bus_dmamap_sync(9) but that would totally
  kill performance. Given that for devices not behind a PCI-PCI bridge
  the host-to-PCI bridges also only do CDMA flushing/syncing based on
  interrupts there's no additional disadvantage for polling(4) callbacks
  in the case schizo(4) has to do the CDMA flushing/syncing but rather a
  general problem.
- Don't panic if the power failure, power management or over-temperature
  interrupts doesn't exist as these aren't mandatory and not available
  with all controllers (not even Psychos). [1]
- Take advantage of KOBJMETHOD_END.
- Remove some redundant variables.
- Add missing const.

PR:	131371 [1]
2009-03-19 20:52:46 +00:00
Marius Strobl
d7ae285095 - Take advantage of KOBJMETHOD_END.
- Hook up the streaming buffer (not used by iommu(4) by default, yet)
  if available and usable. [1]
- Move the message regarding belated registration as interrupt control
  under bootverbose as this isn't something the user should worry about.

Tested by:	Michael Moll [1]
2009-03-19 20:48:47 +00:00