Commit Graph

2040 Commits

Author SHA1 Message Date
Attilio Rao
9b571ec6b3 MFC 2011-06-22 19:42:32 +00:00
Marius Strobl
915d84ba38 Fix whitespace 2011-06-21 20:50:55 +00:00
Marius Strobl
0e3d1b3853 On machines where we don't need to lock the kernel TSB into the dTLB and
thus may basically use the entire 64-bit kernel address space reduce
VM_KMEM_SIZE_SCALE to 1 allowing kernel to use more memory.
2011-06-21 20:48:14 +00:00
Marius Strobl
7cdfb4e8f2 On machines where we don't need to lock the kernel TSB into the dTLB and
thus may basically use the entire 64-bit kernel address space increase
the kernel virtual memory to not be limited by VM_KMEM_SIZE_MAX.
2011-06-21 20:47:03 +00:00
Attilio Rao
49ea5c076c MFC 2011-06-21 09:09:53 +00:00
Marius Strobl
6308e06cf1 As astopgap minimize the sched_lock coverage in pmap_activate() in order
to reduce lock contention.
2011-06-20 21:36:53 +00:00
Marius Strobl
207f858338 - Remove MD usage of pc_cpumask and pc_other_cpus. [1]
- Remove CTASSERTs which no longer need to hold since r222813.

Submitted by:	attilio [1]
2011-06-20 21:31:01 +00:00
Attilio Rao
5519971c21 MFC 2011-06-19 14:22:35 +00:00
Marius Strobl
a2f43b6155 - As with stray vector interrupts limit the reporting of stray level
interrupts. Bringup on additional machine models repeatedly reveals
  firmware that enables interrupts behind our back, causing the console
  to be flooded otherwise.
- As with the regular interrupt counters using uint16_t instead of
  u_long for counting the stray vector interrupts should be more than
  sufficient.
- Cache the interrupt vector in intr_stray_vector().
2011-06-18 11:27:44 +00:00
Attilio Rao
8a9ce51786 Remove entirely pc_other_cpus usage and pc_cpumask usage from sparc64.
Tested and reviewed by:	marius
2011-06-16 07:25:53 +00:00
Marius Strobl
82f131f39b Don't include curcpu in the mask which is used as the IPI cookie as we
have to ignore it when sending the IPI anyway. Actually I can't think of
a good reason why this ever was done that way in the first place as it's
not even usefull for debugging.
While at it replace the use of pc_other_cpus as it's slated for deorbit.
2011-06-15 22:41:55 +00:00
Marius Strobl
2ba56f4d23 - Merge r222980 from x86: add sound(4) and common device drivers.
- Fix whitespace.
2011-06-13 12:45:19 +00:00
Marius Strobl
ab267f9dbf - For the case when tl1_align(_trap) is used to call rsf_fatal via
RSF_FATAL we need to switch to alternate globals for KSTACK_CHECK just
  like tl1_data_excptn(_trap) does. This is more or less cosmetic because
  in case RSF_FATAL is called we're already heading south.
- Correct an END().
- Read the window state from the correct register for a CATR().
2011-06-07 23:15:21 +00:00
Marius Strobl
c40847145b Adapt CATR() to r222813. This is somewhat tricky as we can't afford using
more than three temporary register in several places CATR() is used so
this code trades instructions in for registers. Actually, this still isn't
sufficient and CATR() has the side-effect of clobbering %y. Luckily, with
the current uses of CATR() this either doesn't matter or we are able to
(save and) restore it.
Now that there's only one use of AND() and TEST() left inline these.
2011-06-07 17:33:39 +00:00
Marius Strobl
3bd5692b1f Fix a problem with r222813; given that we may only operate on interrupt
globals here but clobber %y save and restore the latter.
2011-06-07 17:19:14 +00:00
Attilio Rao
61b926921f MFC 2011-05-31 21:22:44 +00:00
Attilio Rao
e370959707 Fix KTR_CPUMASK in order to accept a string representing a cpuset_t.
This introduce all the underlying support for making this possible (via
the function cpusetobj_strscan() and keeps ktr_cpumask exported.  sparc64
implements its own assembly primitives for tracing events and needs to
properly check it.  Anyway the sparc64 logic is not implemented yet due
to lack of knowledge (by me) and time (by marius), but it is just a
matter of using ktr_cpumask when possible.

Tested and fixed by:	pluknet
Reviewed by:		marius
2011-05-31 20:48:58 +00:00
Attilio Rao
d0984adc98 Revert a change that crept in during MFC. 2011-05-31 20:23:33 +00:00
Nathan Whitehorn
d098f93019 On multi-core, multi-threaded PPC systems, it is important that the threads
be brought up in the order they are enumerated in the device tree (in
particular, that thread 0 on each core be brought up first). The SLIST
through which we loop to start the CPUs has all of its entries added with
SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration
and so AP startup would always fail in such situations (causing a machine
check or RTAS failure). Fix this by changing the SLIST into an STAILQ,
and inserting new CPUs at the end.

Reviewed by:	jhb
2011-05-31 15:11:43 +00:00
Attilio Rao
5b6ea0b538 MFC 2011-05-31 14:18:10 +00:00
Attilio Rao
217e1c0ebc Revert a patch that unvolountary sneaked in while I was MFCing. 2011-05-23 23:50:21 +00:00
Attilio Rao
a9ff18a210 MFC 2011-05-23 01:17:30 +00:00
Attilio Rao
447274a88b MFC 2011-05-15 15:47:16 +00:00
Marius Strobl
3bb1fd1bc4 Recognize the eeprom device found in Fujitsu PRIMEPOWER650 and 900. 2011-05-15 13:25:26 +00:00
Marius Strobl
93c57e311e Fix yet another inversion in the logic by applying the x86 version of this,
which avoids CPU_EMPTY() in the first place.
Do I get a beer or something for every inversion I find?
2011-05-14 23:20:14 +00:00
Attilio Rao
e0c109e8c1 MFC 2011-05-14 02:28:26 +00:00
Attilio Rao
4b547324c0 Disconnect sun4v architecture from the three.
Some files keep the SUN4V tags as a code reference, for the future,
if any rewamped sun4v support wants to be added again.

Reviewed by:	marius
Tested by:	sbruno
Approved by:	re
2011-05-14 01:53:38 +00:00
Attilio Rao
b2aa562e7b MFC 2011-05-13 20:58:48 +00:00
Matthew D Fleming
cfb00e5aa7 Move the ZERO_REGION_SIZE to a machine-dependent file, as on many
architectures (i386, for example) the virtual memory space may be
constrained enough that 2MB is a large chunk.  Use 64K for arches
other than amd64 and ia64, with special handling for sparc64 due to
differing hardware.

Also commit the comment changes to kmem_init_zero_region() that I
missed due to not saving the file.  (Darn the unfamiliar development
environment).

Arch maintainers, please feel free to adjust ZERO_REGION_SIZE as you
see fit.

Requested by:	alc
MFC after:	1 week
MFC with:	r221853
2011-05-13 19:35:01 +00:00
Marius Strobl
79898bbecd When setting up pc_other_cpus for APs based on pc_allcpu clear pc_cpuid
in the former rather than the latter.
This gets this branch working on at least Jalapeno-class CPUs.
2011-05-13 15:21:31 +00:00
Attilio Rao
ef607a6aa3 MFC 2011-05-12 14:01:40 +00:00
Marius Strobl
717b08036f Update for the fact that the first members of the IPI args structures and
pc_cpumask were changed to cpuset_t. This now calculates the cpumask based
on pc_cpuid itself as pc_cpumask is slated for being deorbited. Note that
this needs r221750 to be MFC'ed in order to compile.
This seems to work fine but after a few dozens of successful IPIs something
suddenly adds pc_cpuid to pc_other_cpus, causing the respective assertions
in mp_machdep.c to be triggered when the latter is used as the base for the
targets.
2011-05-12 09:29:24 +00:00
Marius Strobl
0fd4b3388e The ita_mask should include curcpu but the cpuset passed to cpu_ipi_selected()
must not, otherwise we tell the CPU to IPI itself, which the sun4u CPUs don't
support. For reasons unknown so far MD and MI IPI use actually still triggers
that assertion though.
2011-05-11 21:15:12 +00:00
Marius Strobl
ed36c82cbf Update for the fact that pm_active and pc_cpumask were changed to cpuset_t.
This now calculates pc_cpumask based on pc_cpuid itself as the former is
slated for being deorbited.
This branch now at least boots UP again. MP needs more things converted and
the existing conversion from cpumask_t to cpuset_t still has bugs.
2011-05-11 21:10:43 +00:00
Marius Strobl
707b4f4479 Add an ATOMIC_CLEAR_LONG. 2011-05-10 21:18:45 +00:00
Attilio Rao
3f748615ac Fix an inversion in logic.
Submitted by:	marius
2011-05-10 18:19:56 +00:00
Attilio Rao
c813ed5c36 - Fix a typo
- Fix an inversion in the logic
2011-05-08 14:23:21 +00:00
Attilio Rao
aa8b9e0706 MFC 2011-05-06 22:45:33 +00:00
Attilio Rao
0d9fa7bd31 Add sparc64 support.
Compiled (and helped) by:	pluknet
2011-05-06 21:53:29 +00:00
John Baldwin
f9a9473702 Retire isa_setup_intr() and isa_teardown_intr() and use the generic bus
versions instead.  They were never needed as bus_generic_intr() and
bus_teardown_intr() had been changed to pass the original child device up
in 42734, but the ISA bus was not converted to new-bus until 45720.
2011-05-06 13:48:53 +00:00
John Baldwin
83c41143ca Reimplement how PCI-PCI bridges manage their I/O windows. Previously the
driver would verify that requests for child devices were confined to any
existing I/O windows, but the driver relied on the firmware to initialize
the windows and would never grow the windows for new requests.  Now the
driver actively manages the I/O windows.

This is implemented by allocating a bus resource for each I/O window from
the parent PCI bus and suballocating that resource to child devices.  The
suballocations are managed by creating an rman for each I/O window.  The
suballocated resources are mapped by passing the bus_activate_resource()
call up to the parent PCI bus.  Windows are grown when needed by using
bus_adjust_resource() to adjust the resource allocated from the parent PCI
bus.  If the adjust request succeeds, the window is adjusted and the
suballocation request for the child device is retried.

When growing a window, the rman_first_free_region() and
rman_last_free_region() routines are used to determine if the front or
end of the existing I/O window is free.  From using that, the smallest
ranges that need to be added to either the front or back of the window
are computed.  The driver will first try to grow the window in whichever
direction requires the smallest growth first followed by the other
direction if that fails.

Subtractive bridges will first attempt to satisfy requests for child
resources from I/O windows (including attempts to grow the windows).  If
that fails, the request is passed up to the parent PCI bus directly
however.

The PCI-PCI bridge driver will try to use firmware-assigned ranges for
child BARs first and only allocate a "fresh" range if that specific range
cannot be accommodated in the I/O window.  This allows systems where the
firmware assigns resources during boot but later wipes the I/O windows
(some ACPI BIOSen are known to do this) to "rediscover" the original I/O
window ranges.

The ACPI Host-PCI bridge driver has been adjusted to correctly honor
hw.acpi.host_mem_start and the I/O port equivalent when a PCI-PCI bridge
makes a wildcard request for an I/O window range.

The new PCI-PCI bridge driver is only enabled if the NEW_PCIB kernel option
is enabled.  This is a transition aide to allow platforms that do not
yet support bus_activate_resource() and bus_adjust_resource() in their
Host-PCI bridge drivers (and possibly other drivers as needed) to use the
old driver for now.  Once all platforms support the new driver, the
kernel option and old driver will be removed.

PR:		kern/143874 kern/149306
Tested by:	mav
2011-05-03 17:37:24 +00:00
Rick Macklem
4309e17add This patch changes head so that the default NFS client is now the new
NFS client (which I guess is no longer experimental). The fstype "newnfs"
is now "nfs" and the regular/old NFS client is now fstype "oldnfs".
Although mounts via fstype "nfs" will usually work without userland
changes, an updated mount_nfs(8) binary is needed for kernels built with
"options NFSCL" but not "options NFSCLIENT". Updated mount_nfs(8) and
mount(8) binaries are needed to do mounts for fstype "oldnfs".
The GENERIC kernel configs have been changed to use options
NFSCL and NFSD (the new client and server) instead of NFSCLIENT and NFSSERVER.
For kernels being used on diskless NFS root systems, "options NFSCL"
must be in the kernel config.
Discussed on freebsd-fs@.
2011-04-27 17:51:51 +00:00
Alexander Motin
97b53e3634 Switch the GENERIC kernels for all architectures to the new CAM-based ATA
stack. It means that all legacy ATA drivers are disabled and replaced by
respective CAM drivers. If you are using ATA device names in /etc/fstab or
other places, make sure to update them respectively (adX -> adaY,
acdX -> cdY, afdX -> daY, astX -> saY, where 'Y's are the sequential
numbers for each type in order of detection, unless configured otherwise
with tunables, see cam(4)).

ataraid(4) functionality is now supported by the RAID GEOM class.
To use it you can load geom_raid kernel module and use graid(8) tool
for management. Instead of /dev/arX device names, use /dev/raid/rX.
2011-04-24 08:58:58 +00:00
Marius Strobl
39272630aa Correct spelling in comments.
Submitted by:	brucec
2011-04-22 09:31:40 +00:00
Marius Strobl
eabaaab07c - Use the streaming cache unless BUS_DMA_COHERENT is specified. Since
r220375 all drivers enabled in the sparc64 GENERIC should be either
  correctly using bus_dmamap_sync(9) calls or supply BUS_DMA_COHERENT
  when appropriate or as a workaround for missing bus_dmamap_sync(9)
  calls (sound(4) drivers and partially sym(4)). In at least some
  configurations taking advantage of the streaming cache results in
  a modest performance improvement.
- Remove the memory barrier for BUS_DMASYNC_PREREAD which as the
  comment already suggested is bogus.
- Add my copyright for having implemented several things like support
  for the Fire and Oberon IOMMUs, taking over PROM IOMMU mappings etc.
2011-04-21 21:56:28 +00:00
Adrian Chadd
dba9c85977 Break out the ath PCI logic into a separate device/module.
Introduce the AHB glue for Atheros embedded systems. Right now it's
hard-coded for the AR9130 chip whose support isn't yet in this HAL;
it'll be added in a subsequent commit.

Kernel configuration files now need both 'ath' and 'ath_pci' devices; both
modules need to be loaded for the ath device to work.
2011-03-31 08:07:13 +00:00
Marius Strobl
301afbce99 Allocate memory for a DMA method table only in case we need to override
the iommu(4) provided one, i.e. in case of Hummingbird and Sabre bridges,
otherwise just use the iommu(4) one. This also fixes a bug introduced in
r220039 which caused an empty DMA method table to be used for the second
of a pair of Psycho bridges.
2011-03-29 19:48:03 +00:00
Marius Strobl
22508b0153 - A closer inspection of the OpenSolaris code indicates that the DMA
syncing for Hummingbird and Sabre bridges should be applied with every
  BUS_DMASYNC_POSTREAD instead of in a wrapper around interrupt handlers
  for devices behind PCI-PCI bridges only as suggested by the documentation
  (code for the latter actually exists in OpenSolaris but is disabled by
  default), which also makes more sense.
- Take advantage of the ofw_pci_setup_device method introduced in r220038
  for disabling bus parking for certain EBus bridges in order to
- Mark some unused parameters as such.
2011-03-26 16:52:31 +00:00
Marius Strobl
b09c4bd47a - Merge the *_SET macros from fire(4) which generally print out the
register changes when compiled with SCHIZO_DEBUG and take advantage
  of them.
- Add support for the XMITS Fireplane/Safari to PCI-X bridges. I tought
  I'd need this for a Sun Fire 3800, which then turned out to not being
  equipped with such a bridge though. The support for these should be
  complete but given that it hasn't actually been tested probing is
  disabled for now.
  This required a way to alter the XMITS configuration in case a PCI-X
  device is found further down the device tree so the sparc64 specific
  ofw_pci kobj was revived with a ofw_pci_setup_device method, which is
  called by the ofw_pcibus code for every device added.
- A closer inspection of the OpenSolaris code indicates that consistent
  DMA flushing/syncing as well as the block store workaround should be
  applied with every BUS_DMASYNC_POSTREAD instead of in a wrapper around
  interrupt handlers for devices behind PCI-PCI bridges only as suggested
  by the documentation (code for the latter actually exists in OpenSolaris
  but is disabled by default), which also makes more sense.
- Add a workaround for Casinni/Skyhawk combinations. Chances are that
  this solves the crashes seen when using the the on-board Casinni NICs
  of Sun Fire V480 equipped with centerplanes other than 501-6780 or
  501-6790. This also takes advantage of the ofw_pci_setup_device method.
- Mark some unused parameters as such.
2011-03-26 16:49:12 +00:00
Marius Strobl
05bff80a71 - Make a panic message better reflect the actual problem.
- A closer inspection of the OpenSolaris code indicates the block store
  workaround is only necessary in case of BUS_DMASYNC_POSTREAD.
- Mark some unused parameters as such.
2011-03-19 20:36:05 +00:00
Marius Strobl
6d8b3c2f9f On Serengeti-class machines the OFW root isn't the parent of the CPU
nodes.
2011-03-19 19:39:05 +00:00
Marius Strobl
3273bf2d65 In case reading PCIR_MINGNT fails don't use it for calculating the
latency. This is more or less a theoretical problem though as it
typically indicates way bigger problems.
2011-03-19 19:30:49 +00:00
Marius Strobl
3a8a826af3 Remove the advertising clause from the UCB license according to the
July 22, 1999 addendum.
2011-03-13 13:42:43 +00:00
Marius Strobl
273fb3dc7c Sync licenses and the corresponding RCS IDs with NetBSD, mainly switching
the licenses of Matthew R. Green and the TNF to 2-clause.

Obtained from:	NetBSD
2011-03-12 14:33:32 +00:00
Marius Strobl
080ca1a51b - Add support for TLS relocations.
- Emitt an error when encountering an unsupported and in case of the
  kernel also for unaligned relocations.
- Fix R_SPARC_LOX10 relocations. Apparently these are hardly ever used.
2011-03-11 21:08:02 +00:00
Marius Strobl
cb32ba5229 - Remove clause 3 and 4 from TNF licenses. [1]
- Add the _RF_X committed in r212998 also to the tables in the sparc64
  reloc.c in order reduce differences between the kernel and the userland
  source. This results in no functional change though.
- Fix further inconsistencies in the abbreviations of the names of the
  relocations.
- Further whitespace fixes.

Obtained from:	NetBSD [1]
2011-03-11 20:30:58 +00:00
Marius Strobl
09b64d66b4 Revert the binutils workaround committed in r219340, the underlying
problem has been fixed in r219530.
2011-03-11 20:01:57 +00:00
Matthew D Fleming
c77715ef6c Mostly revert r219468, as I had misremembered the C standard regarding
the size of an extern array.

Keep one change from strncpy to strlcpy.
2011-03-11 18:56:55 +00:00
Matthew D Fleming
cd67ac41ae Use MAXPATHLEN rather than the size of an extern array when copying the
kernel name.  Also consistenly use strlcpy().

Suggested by:	Warner Losh
2011-03-10 22:56:00 +00:00
Julian Elischer
a8066a9d3b Add a small change to the comment in the GENRIC config files that include udbp
Submitted by:	Chris Forgron, cforgeron at acsi dot ca
MFC after:	1 week
2011-03-09 17:15:11 +00:00
Dmitry Chagin
e5d81ef1b5 Extend struct sysvec with new method sv_schedtail, which is used for an
explicit process at fork trampoline path instead of eventhadler(schedtail)
invocation for each child process.

Remove eventhandler(schedtail) code and change linux ABI to use newly added
sysvec method.

While here replace explicit comparing of module sysentvec structure with the
newly created process sysentvec to detect the linux ABI.

Discussed with:	kib

MFC after:	2 Week
2011-03-08 19:01:45 +00:00
Marius Strobl
25b31a9496 - With the addition of TLS support binutils started to make the addend
values for resolved symbols relative to relocbase instead of sections
  so detect this case and handle as appropriate, which allows using
  kernel modules linked with affected versions of binutils. Actually I
  think this is a bug in binutils but given that apparently nobody
  complained for nearly six years and powerpc has basically the same
  workaround I decided to put it in for the sparc64 kernel, too.
- Fix R_SPARC_HIX22 relocations. Apparently these are hardly ever used.
2011-03-06 15:20:11 +00:00
Marius Strobl
d374d11285 - Consistently abbreviate the names of the relocations.
- End sentences with dots.
- Fix whitespace.
2011-03-06 13:25:46 +00:00
Marius Strobl
a2c5ab4bcb Resurrect ofw_pci_if.m from r178578. 2011-02-21 21:13:18 +00:00
Rebecca Cran
6bccea7c2b Fix typos - remove duplicate "the".
PR:	bin/154928
Submitted by:	Eitan Adler <lists at eitanadler.com>
MFC after: 	3 days
2011-02-21 09:01:34 +00:00
Alan Cox
e6ffa21488 Remove pmap fields that are either unused or not fully implemented.
Discussed with:	kib
2011-02-17 15:36:29 +00:00
Marius Strobl
42b9a96080 Set td_kstack_pages for thread0. 2011-02-08 23:21:35 +00:00
Marius Strobl
a66410ec84 Take advantage of accessing the kernel TSB via ASI_ATOMIC_QUAD_LDD_PHYS
on SPARC64-V, too. Tested by: Michael Moll
2011-02-08 21:58:13 +00:00
Matthew D Fleming
08b163fa51 Put the general logic for being a CPU hog into a new function
should_yield().  Use this in various places.  Encapsulate the common
case of check-and-yield into a new function maybe_yield().

Change several checks for a magic number of iterations to use
should_yield() instead.

MFC after:	1 week
2011-02-02 16:35:10 +00:00
Sergey Kandaurov
4053b05b91 Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize.
Submitted by:	perryh pluto.rain.com (previous version)
Reviewed by:	jhb
Approved by:	kib (mentor)
Tested by:	universe
2011-01-21 10:26:26 +00:00
Konstantin Belousov
55aabb7fd1 For architectures not using direct map , and requiring real KVA page for
sf buf allocation, use wakeup() instead of wakeup_one() to notify sf
buffer waiters about free buffer.

sf_buf_alloc() calls msleep(PCATCH) when SFB_CATCH flag was given,
and for simultaneous wakeup and signal delivery, msleep() returns
EINTR/ERESTART despite the thread was selected for wakeup_one(). As
result, we loose a wakeup, and some other waiter will not be woken up.

Reported and tested by:	az
Reviewed by:	alc, jhb
MFC after:	1 week
2011-01-18 21:57:02 +00:00
Jung-uk Kim
bc35e60ec0 Remove empty dev_mem_md_init() stubs. 2011-01-17 23:06:47 +00:00
Jung-uk Kim
2fea643112 Add reader/writer lock around mem_range_attr_get() and mem_range_attr_set().
Compile sys/dev/mem/memutil.c for all supported platforms and remove now
unnecessary dev_mem_md_init().  Consistently define mem_range_softc from
mem.c for all platforms.  Add missing #include guards for machine/memdev.h
and sys/memrange.h.  Clean up some nearby style(9) nits.

MFC after:	1 month
2011-01-17 22:58:28 +00:00
Marius Strobl
db3a488fb0 In order to save instructions the MMU trap handlers assumed that the kernel
TSB is located within the 32-bit address space, which held true as long as
we were using virtual addresses magic-mapped before the location of the
kernel for addressing it. However, with r216803 in place when possible we
address it via its physical address instead, which on machines like Sun Fire
V880 have no physical memory in the 32-bit address space at all requires
to use 64-bit addressing. When using physical addressing it still should
be safe to assume that we can just ignore the lowest 10 bits of the address
as a minor optimization as we did before r216803.
2011-01-17 20:32:17 +00:00
John Baldwin
58ccf5b41c Remove unneeded includes of <sys/linker_set.h>. Other headers that use
it internally contain nested includes.

Reviewed by:	bde
2011-01-11 13:59:06 +00:00
Konstantin Belousov
50a57dfbec Move repeated MAXSLP definition from machine/vmparam.h to sys/vmmeter.h.
Update the outdated comments describing MAXSLP and the process
selection algorithm for swap out.

Comments wording and reviewed by:	alc
2011-01-09 12:50:44 +00:00
David Schultz
633bd99821 Fix the value for DECIMAL_DIG on UltraSparcs. The previous value of
35 wasn't quite big enough to ensure correct rounding for very-close-
to-halfway cases.
2011-01-09 06:05:48 +00:00
Tijl Coosemans
a56e818f29 On mixed 32/64 bit architectures (mips, powerpc) use __LP64__ rather than
architecture macros (__mips_n64, __powerpc64__) when 64 bit types (and
corresponding macros) are different from 32 bit. [1]

Correct the type of INT64_MIN, INT64_MAX and UINT64_MAX.

Define (U)INTMAX_C as an alias for (U)INT64_C matching the type definition
for (u)intmax_t. Do this on all architectures for consistency.

Suggested by:	bde [1]
Approved by:	kib (mentor)
2011-01-08 12:43:05 +00:00
Tijl Coosemans
9858863cd4 Fix types of some values in machine/_limits.h.
On some architectures UCHAR_MAX and USHRT_MAX had type unsigned int.
However, lacking integer suffixes for types smaller than int, their type
should correspond to that of an object of type unsigned char (or short)
when used in an expression with objects of type int. In that case unsigned
char (short) are promoted to int (i.e. signed) so the type of UCHAR_MAX and
USHRT_MAX should also be int.

Where MIN/MAX constants implicitly have the correct type the suffix has
been removed.

While here, correct some comments.

Reviewed by:	bde
Approved by:	kib (mentor)
2011-01-08 11:13:34 +00:00
Konstantin Belousov
39198f15ee Add AT_STACKPROT elf aux vector. Will be used to inform rtld about the
initial stack protection set by the kernel image activator.
2011-01-07 14:22:34 +00:00
Marius Strobl
c3ac38e2cb Remove an unused variable accidentally added in r216803. 2011-01-06 17:28:31 +00:00
Marius Strobl
e68f0c6425 Inherit the APB and the generic OFW PCI-PCI bridge driver from the generic
PCI-PCI bridge driver in order to safe some code.
2011-01-04 16:21:14 +00:00
Marius Strobl
f4ff513c4b Reserve INTR_MD[1-4] similarly to what BUS_DMA_BUS[1-4] are intended for
and switch sparc64 to use the first one for bus error filter handlers of
bridge drivers instead of (ab)using INTR_FAST for that so we eventually
can get rid of the latter.

Reviewed by:	jhb
MFC after:	1 month
2011-01-04 16:11:32 +00:00
Marius Strobl
f8d0071f71 Extend the section in which interrupts are disabled in the TLB demap
functions, otherwise if we get preempted after checking whether a certain
pmap is active on the current CPU but before disabling interrupts we might
operate on an outdated state as the pmap might have been deactivated in
the meantime. As the same issue may arises when the TLB demap function is
interrupted by a TLB demap IPI, just entering a critical section before
the check isn't sufficient so we have to fully disable interrupts instead.

MFC after:	3 days
2011-01-02 15:01:03 +00:00
Marius Strobl
4d05e7b184 On UltraSPARC-III+ and greater take advantage of ASI_ATOMIC_QUAD_LDD_PHYS,
which takes an physical address instead of an virtual one, for loading TTEs
of the kernel TSB so we no longer need to lock the kernel TSB into the dTLB,
which only has a very limited number of lockable dTLB slots. The net result
is that we now basically can handle a kernel TSB of any size and no longer
need to limit the kernel address space based on the number of dTLB slots
available for locked entries. Consequently, other parts of the trap handlers
now also only access the the kernel TSB via its physical address in order
to avoid nested traps, as does the PMAP bootstrap code as we haven't taken
over the trap table at that point, yet. Apart from that the kernel TSB now
is accessed via a direct mapping when we are otherwise taking advantage of
ASI_ATOMIC_QUAD_LDD_PHYS so no further code changes are needed. Most of this
is implemented by extending the patching of the TSB addresses and mask as
well as the ASIs used to load it into the trap table so the runtime overhead
of this change is rather low. Currently the use of ASI_ATOMIC_QUAD_LDD_PHYS
is not yet enabled on SPARC64 CPUs due to lack of testing and due to the
fact it might require minor adjustments there.
Theoretically it should be possible to use the same approach also for the
user TSB, which already is not locked into the dTLB, avoiding nested traps.
However, for reasons I don't understand yet OpenSolaris only does that with
SPARC64 CPUs. On the other hand I think that also addressing the user TSB
physically and thus avoiding nested traps would get us closer to sharing
this code with sun4v, which only supports trap level 0 and 1, so eventually
we could have a single kernel which runs on both sun4u and sun4v (as does
Linux and OpenBSD).

Developed at and committed from:	27C3
2010-12-29 16:59:33 +00:00
Marius Strobl
62cf53e2ea - Move the macros for generating load and store instructions to asmacros.h
so they can be shared by different source files and extend them by a
  variant for atomic compare and swap.
- Consistently use EMPTY.
2010-12-29 14:14:50 +00:00
Marius Strobl
b5b0068b4b Rename the "xor" parameter to "xorval" as the former is a reserved keyword
in C++.

Submitted by:	gahr
2010-12-29 14:11:46 +00:00
Marius Strobl
05bcfef170 Extend the hack of r182730 to trick GAS/GCC into compiling access to
STICK/STICK_COMPARE independently of the selected instruction set by
TICK_COMPARE so tick.c as of r214358 once again can be compiled with
gcc -mcpu=v9 for reference purposes.
2010-12-21 22:03:12 +00:00
Marius Strobl
3318c3ef45 Revert r216080 so kmem_map is capped at 3/5 of the currently rather modest
kernel address space in order to leave space for the buffer cache, pipes,
thread stacks, etc on machines with more physical memory until we take
advantage of ASI_ATOMIC_QUAD_LDD_PHYS on CPUs providing it so we don't need
to lock the kernel TSB pages into the dTLB, basically making the entire
64-bit kernel address space available on relevant machines.

Submitted by:	alc
2010-12-21 21:32:17 +00:00
Rebecca Cran
c90f7d9b44 Revert r216134. This checkin broke platforms where bus_space are macros:
they need to be a single statement, and do { } while (0) doesn't work in this
situation so revert until a solution can be devised.
2010-12-03 07:09:23 +00:00
Rebecca Cran
15b4888a24 Disallow passing in a count of zero bytes to the bus_space(9) functions.
Passing a count of zero on i386 and amd64 for [I386|AMD64]_BUS_SPACE_MEM
causes a crash/hang since the 'loop' instruction decrements the counter
before checking if it's zero.

PR:	kern/80980
Discussed with:	jhb
2010-12-02 22:19:30 +00:00
Max Khon
5bdabbdbf8 Change VM_KMEM_SIZE_MAX to be just (VM_MAX_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS)
Suggested by:	marius
2010-11-30 16:49:06 +00:00
Max Khon
311e93395e Define VM_KMEM_SIZE_MAX on sparc64. Otherwise kernel built with
DEBUG_MEMGUARD panics early in kmeminit() with the message
"kmem_suballoc: bad status return of 1" because of zero "size" argument
passed to kmem_suballoc() due to "vm_kmem_size_max" being zero.

The problem also exists on ia64.
2010-11-28 19:26:20 +00:00
Marius Strobl
6948a04f2c Convert drivers somehow missed in r200874 to multipass probing. 2010-11-15 21:58:10 +00:00
Alan Cox
2cf36c8f67 Enable reservation-based physical memory allocation. Even without the
creation of large page mappings in the pmap, it can provide modest
performance benefits.  In particular, for a "buildworld" on a 2x 1GHz
Ultrasparc IIIi it reduced the wall clock time by 2.2% and the system
time by 12.6%.

Tested by:	marius@
2010-11-10 17:57:34 +00:00
John Baldwin
961135ead8 - Remove <machine/mutex.h>. Most of the headers were empty, and the
contents of the ones that were not empty were stale and unused.
- Now that <machine/mutex.h> no longer exists, there is no need to allow it
  to override various helper macros in <sys/mutex.h>.
- Rename various helper macros for low-level operations on mutexes to live
  in the _mtx_* or __mtx_* namespaces.  While here, change the names to more
  closely match the real API functions they are backing.
- Drop support for including <sys/mutex.h> in assembly source files.

Suggested by:	bde (1, 2)
2010-11-09 20:46:41 +00:00
Marius Strobl
a1cc524045 Implement pmap_is_prefaultable().
Reviewed by:	alc (with bugfix)
2010-11-06 13:58:24 +00:00
John Baldwin
0108cce0a4 Adjust the order of operations in spinlock_enter() and spinlock_exit() to
work properly with single-stepping in a kernel debugger.  Specifically,
these routines have always disabled interrupts before increasing the nesting
count and restored the prior state of interrupts after decreasing the nesting
count to avoid problems with a nested interrupt not disabling interrupts
when acquiring a spin lock.  However, trap interrupts for single-stepping
can still occur even when interrupts are disabled.  Now the saved state of
interrupts is not saved in the thread until after interrupts have been
disabled and the nesting count has been increased.  Similarly, the saved
state from the thread cannot be read once the nesting count has been
decreased to zero.  To fix this, use temporary variables to store interrupt
state and shuffle it between the thread's MD area and the appropriate
registers.

In cooperation with:	bde
MFC after:     1 month
2010-11-05 13:42:58 +00:00
Marius Strobl
0da4045955 - When resetting pm_active and pm_context of a pmap in pmap_pinit() we
need locking as otherwise we may race against the other parts of the
  MD code which expects a consistent state of these. While at it move
  the resetting of the pmap before entering it in the TSB.
- Spell a 0 as TLB_CTX_KERNEL.
2010-10-29 20:51:30 +00:00
Marius Strobl
36c7255a81 - Given that in one-shot mode tick_et_start() also is called frequently
introduce function pointers once set up to the respective implementation
  for reading the (S)TICK and writing the (S)STICK_COMPARE registers as a
  compromise between duplicating code and selecting between different
  implementations during execution over and over again, similar to what is
  done elsewhere in the MD in order to support different CPU models that
  won't ever change at runtime.
- In the remaining tick interrupt handler further push down disabling of
  interrupts to the periodic case as it isn't necessary here in one-shot
  mode at all.
2010-10-25 20:52:33 +00:00
Marius Strobl
10c2bb0a10 - Wrap exchanging td_intr_frame and calling the event timer callback in
a critical section as apparently required by both. I don't think either
  belongs in the event timer front-ends but the callback should handle
  this as necessary instead just like for example intr_event_handle()
  does but this is how the other architectures currently handle it, either
  explicitly or implicitly.
- Further rename and reword references to hardclock as this front-end no
  longer has a notion of actually calling it.
2010-10-19 19:44:05 +00:00
Marius Strobl
17f3c8f1e3 - In oneshot-mode it doesn't make sense to try to compensate the clock
drift in order to achieve a more stable clock as the tick intervals may
  vary in the first place. In fact I haven't seen this code kick in when
  in oneshot-mode so just skip it in that case.
- There's no need to explicitly stop the (S)TICK counter in oneshot-mode
  with every tick as it just won't trigger again with the (S)TICK compare
  register set to a value in the past (with a wrap-around once every ~195
  years of uptime at 1.5 GHz this isn't something we have to worry about
  in practice).
- Given that we'll disable interrupts completely anyway there's no
  need to enter critical sections.
2010-10-17 16:46:54 +00:00
Marius Strobl
15272ec763 Explicitly lower the PIL to 0 as part of enabling interrupts, similar to
what is done on other platforms. Unlike as with the sched_throw(NULL)
called on BSPs during their startup apparently there's nothing which will
reliably lower it on APs. I'm unsure why this only came up on V215 though,
breaking these with r207248. My best guess is that these are the only
supported ones so far fast enough to loose some race.

PR:		151404
MFC after:	3 days
2010-10-14 21:46:53 +00:00
Marius Strobl
45c347bed0 - In the spirit of r212559 add a comment describing what will eventually
lower the PIL.
- Just as with the AP ensure that the (S)TICK timer(s) are in a known
  state when starting BSPs.
2010-10-14 21:34:53 +00:00
Marius Strobl
1fe259cdd8 In the replacement text of the __bswapN_const() macros cast the argument
to the expected type so they work like the corresponding __bswapN_var()
functions and the compiler doesn't complain when arguments of different
width are passed.
2010-10-08 14:59:45 +00:00
Neel Natu
5c1a8dc028 Fix bogus error message from bus_dmamem_alloc() about incorrect alignment.
The check for alignment should be made against the physical address and not
the virtual address that maps it.

Sponsored by:	NetApp
Submitted by:	Will McGovern (will at netapp dot com)
Reviewed by:	mjacob, jhb
2010-09-29 21:53:11 +00:00
Marius Strobl
60dd2bcc05 minor simplifications and cosmetics 2010-09-24 15:12:18 +00:00
David Xu
295fbd498e Now userland POSIX semaphore is based on umtx. The kernel module
is only used to support binary compatible, if want to run old
binary, you need to kldload the module.
2010-09-24 09:04:16 +00:00
Konstantin Belousov
e8b4ca850e For sparc64 relocations that directly put bits of the symbol value into
the location, apply elf_relocaddr to the symbol value to have right
values for the symbols from dpcpu segment.

PR:	kern/147769
Discussed with:	avg
Tested by:	marius
MFC after:	2 weeks
2010-09-22 12:52:12 +00:00
Marius Strobl
3c141a7e1d Remove accidentally committed test code which effectively prevented the
use of the SPARC64 V VIS-based block copy function added in r212709.
Reported by:	Michael Moll
2010-09-16 12:05:46 +00:00
Marius Strobl
2c55431721 Add a VIS-based block copy function for SPARC64 V and later, which
additionally takes advantage of the prefetch cache of these CPUs.
Unlike the uncommitted US-III version, which provide no measurable
speedup or even resulted in a slight slowdown on certain CPUs models
compared to using the US-I version with these, the SPARC64 version
actually results in a slight improvement.
2010-09-15 21:44:31 +00:00
Marius Strobl
c1769fad32 Add macros for alternate entry points. 2010-09-15 21:11:29 +00:00
Marius Strobl
4539e94b61 Sync with other platforms:
- make dflt_lock() always panic,
- add kludge to use contigmalloc() when the alignment is larger than the size
  and print a diagnostic when we didn't satisfy the alignment.
2010-09-15 17:11:15 +00:00
Marius Strobl
198ec86bf9 - Update the comment in swi_vm() regarding busdma bounce buffers; it's
unlikely that support for these ever will be implemented on sparc64 as
  the IOMMUs are able to translate to up to the maximum physical address
  supported by the respective machine, bypassing the IOMMU is affected
  by hardware errata and being able to support DMA engines which cannot
  do at least 32-bit DMA does not justify the costs.
- The page zeroing in uma_small_alloc() may use the VIS-based block zero
  function so take advantage of it.
2010-09-15 15:18:41 +00:00
Marius Strobl
4c206df38f Remove a KASSERT which will also trigger for perfectly valid combinations
of small maxsize and "large" (including BUS_SPACE_UNRESTRICTED) nsegments
parameters. Generally using a presz of 0 (which indeed might indicate the
use of bogus parameters for DMA tag creation) is not fatal, it just means
that no additional DVMA space will be preallocated.
2010-09-14 20:31:09 +00:00
Marius Strobl
9ab2708c52 Remove redundant raising of the PIL to PIL_TICK as the respective locore
code already did that.
2010-09-14 19:35:43 +00:00
Alexander Motin
a157e42516 Refactor timer management code with priority to one-shot operation mode.
The main goal of this is to generate timer interrupts only when there is
some work to do. When CPU is busy interrupts are generating at full rate
of hz + stathz to fullfill scheduler and timekeeping requirements. But
when CPU is idle, only minimum set of interrupts (down to 8 interrupts per
second per CPU now), needed to handle scheduled callouts is executed.
This allows significantly increase idle CPU sleep time, increasing effect
of static power-saving technologies. Also it should reduce host CPU load
on virtualized systems, when guest system is idle.

There is set of tunables, also available as writable sysctls, allowing to
control wanted event timer subsystem behavior:
  kern.eventtimer.timer - allows to choose event timer hardware to use.
On x86 there is up to 4 different kinds of timers. Depending on whether
chosen timer is per-CPU, behavior of other options slightly differs.
  kern.eventtimer.periodic - allows to choose periodic and one-shot
operation mode. In periodic mode, current timer hardware taken as the only
source of time for time events. This mode is quite alike to previous kernel
behavior. One-shot mode instead uses currently selected time counter
hardware to schedule all needed events one by one and program timer to
generate interrupt exactly in specified time. Default value depends of
chosen timer capabilities, but one-shot mode is preferred, until other is
forced by user or hardware.
  kern.eventtimer.singlemul - in periodic mode specifies how much times
higher timer frequency should be, to not strictly alias hardclock() and
statclock() events. Default values are 2 and 4, but could be reduced to 1
if extra interrupts are unwanted.
  kern.eventtimer.idletick - makes each CPU to receive every timer interrupt
independently of whether they busy or not. By default this options is
disabled. If chosen timer is per-CPU and runs in periodic mode, this option
has no effect - all interrupts are generating.

As soon as this patch modifies cpu_idle() on some platforms, I have also
refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions
(if supported) under high sleep/wakeup rate, as fast alternative to other
methods. It allows SMP scheduler to wake up sleeping CPUs much faster
without using IPI, significantly increasing performance on some highly
task-switching loads.

Tested by:	many (on i386, amd64, sparc64 and powerc)
H/W donated by:	Gheorghe Ardelean
Sponsored by:	iXsystems, Inc.
2010-09-13 07:25:35 +00:00
Alexander Motin
dc5b8c2ee7 Sparc64 uses dummy cpu_idle() method. It's CPUs never sleeping. Tell
scheduler that it doesn't need to use IPI to "wake up" CPU.
2010-09-11 07:24:10 +00:00
Andriy Gapon
3d844eddb7 bus_add_child: change type of order parameter to u_int
This reflects actual type used to store and compare child device orders.
Change is mostly done via a Coccinelle (soon to be devel/coccinelle)
semantic patch.
Verified by LINT+modules kernel builds.

Followup to:	r212213
MFC after:	10 days
2010-09-10 11:19:03 +00:00
John Baldwin
d1a02e0932 Catch up to rename of the constant for the Master Data Parity Error bit in
the PCI status register.

Pointed out by:	mdf
Pointy hat to:	jhb
2010-09-09 20:26:30 +00:00
Pyun YongHyeon
57ec924c77 Enable sis(4). sis(4) should work on all architectures. 2010-09-02 18:12:54 +00:00
Marius Strobl
760d65b894 Skip a KASSERT which isn't appropriate when not employing page coloring.
Reported by: Michael Moll
2010-08-21 14:28:48 +00:00
John Baldwin
8c7a92bd4a Remove unused KTRACE includes. 2010-08-19 16:41:27 +00:00
Konstantin Belousov
ee235befcb Supply some useful information to the started image using ELF aux vectors.
In particular, provide pagesize and pagesizes array, the canary value
for SSP use, number of host CPUs and osreldate.

Tested by:	marius (sparc64)
MFC after:	1 month
2010-08-17 08:55:45 +00:00
John Baldwin
60c7b36b7a Update various places that store or manipulate CPU masks to use cpumask_t
instead of int or u_int.  Since cpumask_t is currently u_int on all
platforms this should just be a cosmetic change.
2010-08-11 23:22:53 +00:00
Marius Strobl
15ec942da8 Wrap some sun4u-only symbols. 2010-08-08 14:37:16 +00:00
Marius Strobl
553cf1a13c - As it is not possible for sched_bind(9) to context switch with
td_critnest > 1 when not already running on the desired CPU read the
  TICK counter of the BSP via a direct cross trap request in that case
  instead.
- Treat the STICK based timecounter the same way as the TICK based one
  regarding its quality and obtaining the counter value from the BSP.
  Like the TICK timers the STICK ones also are only synchronized during
  their startup (which might not result in good synchronicity in the
  first place) but not afterwards and might drift over time, causing
  problems when the time is read from different CPUs (see r135972).
2010-08-08 14:00:21 +00:00
Marius Strobl
dfab2088e8 - Introduce a cpu_ipi_single() function pointer in order to send IPIs
to single CPUs more efficiently with Cheetah(-class) and Jalapeno CPUs.
  Besides being used to implement the ipi_cpu() introduced in r210939,
  cpu_ipi_single() will also be used internally by the sparc64 MD code.
- Factor out the Jalapeno support from the Cheetah IPI send functions
  in order to be able to more easily and efficiently implement support
  for more than 32 target CPUs as well as a workaround for Cheetah+
  erratum 25 for the latter.
2010-08-08 00:09:22 +00:00
Marius Strobl
820a9ea5cb For CPUs which ignore TD_CV and support hardware unaliasing don't
bother doing page coloring. This results in a small but measurable
performance improvement in buildworld times.
2010-08-08 00:01:08 +00:00
John Baldwin
d9d8d1449d Add a new ipi_cpu() function to the MI IPI API that can be used to send an
IPI to a specific CPU by its cpuid.  Replace calls to ipi_selected() that
constructed a mask for a single CPU with calls to ipi_cpu() instead.  This
will matter more in the future when we transition from cpumask_t to
cpuset_t for CPU masks in which case building a CPU mask is more expensive.

Submitted by:	peter, sbruno
Reviewed by:	rookie
Obtained from:	Yahoo! (x86)
MFC after:	1 month
2010-08-06 15:36:59 +00:00
Alexander Motin
6c8dd81fa9 Adapt sparc64 and sun4v timer code for the new event timers infrastructure.
Reviewed by:	marius@
2010-07-29 12:08:46 +00:00
Matthew D Fleming
d7854da193 Add MALLOC_DEBUG_MAXZONES debug malloc(9) option to use multiple uma
zones for each malloc bucket size.  The purpose is to isolate
different malloc types into hash classes, so that any buffer overruns
or use-after-free will usually only affect memory from malloc types in
that hash class.  This is purely a debugging tool; by varying the hash
function and tracking which hash class was corrupted, the intersection
of the hash classes from each instance will point to a single malloc
type that is being misused.  At this point inspection or memguard(9)
can be used to catch the offending code.

Add MALLOC_DEBUG_MAXZONES=8 to -current GENERIC configuration files.
The suggestion to have this on by default came from Kostik Belousov on
-arch.

This code is based on work by Ron Steinke at Isilon Systems.

Reviewed by:    -arch (mostly silence)
Reviewed by:    zml
Approved by:    zml (mentor)
2010-07-28 15:36:12 +00:00
John Baldwin
a3870a1826 Very rough first cut at NUMA support for the physical page allocator. For
now it uses a very dumb first-touch allocation policy.  This will change in
the future.
- Each architecture indicates the maximum number of supported memory domains
  via a new VM_NDOMAIN parameter in <machine/vmparam.h>.
- Each cpu now has a PCPU_GET(domain) member to indicate the memory domain
  a CPU belongs to.  Domain values are dense and numbered from 0.
- When a platform supports multiple domains, the default freelist
  (VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain.
  The MD code is required to populate an array of mem_affinity structures.
  Each entry in the array defines a range of memory (start and end) and a
  domain for the range.  Multiple entries may be present for a single
  domain.  The list is terminated by an entry where all fields are zero.
  This array of structures is used to split up phys_avail[] regions that
  fall in VM_FREELIST_DEFAULT into per-domain freelists.
- Each memory domain has a separate lookup-array of freelists that is
  used when fulfulling a physical memory allocation.  Right now the
  per-domain freelists are listed in a round-robin order for each domain.
  In the future a table such as the ACPI SLIT table may be used to order
  the per-domain lookup lists based on the penalty for each memory domain
  relative to a specific domain.  The lookup lists may be examined via a
  new vm.phys.lookup_lists sysctl.
- The first-touch policy is implemented by using PCPU_GET(domain) to
  pick a lookup list when allocating memory.

Reviewed by:	alc
2010-07-27 20:33:50 +00:00
Attilio Rao
651aa2d896 KTR_CTx are long time aliased by existing classes so they can't serve
their purpose anymore. Axe them out.

Sponsored by:	Sandvine Incorporated
Discussed with:	jhb, emaste
Possible MFC:	TBD
2010-07-21 10:05:07 +00:00
Alexander Motin
a448e0d827 Allocate proper ammount of memory for interrupt names on sparc64 and
sun4v, same as done on other architectures. This removes garbage from
`vmstat -ia` output.

Reviewed by:	marius@
2010-07-16 22:09:29 +00:00
Marius Strobl
9f7666cebe - Pin the IPI cache and TLB demap functions in order to prevent migration
between determining the other CPUs and calling cpu_ipi_selected(), which
  apart from generally doing the wrong thing can lead to a panic when a
  CPU is told to IPI itself (which sun4u doesn't support).
  Reported and tested by: Nathaniel W Filardo
- Add __unused where appropriate.

MFC after:	3 days
2010-07-04 12:43:12 +00:00
John Baldwin
fc0de8f0b6 Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to
<sys/syscallsubr.h> where all other kern_<syscall> prototypes live.
2010-06-30 18:03:42 +00:00
Nathan Whitehorn
eaef5f0af8 Provide for multiple, cascaded PICs on PowerPC systems, and extend the
OFW interrupt map interface to also return the device's interrupt parent.

MFC after:	8.1-RELEASE
2010-06-18 14:06:27 +00:00
Marius Strobl
c10f5c0eb1 Update a branch missed in r207537.
MFC after:	3 days
2010-06-13 20:29:55 +00:00
Alan Cox
9124d0d6a3 Relax one of the new assertions in pmap_enter() a little. Specifically,
allow pmap_enter() to be performed on an unmanaged page that doesn't have
VPO_BUSY set.  Having VPO_BUSY set really only matters for managed pages.
(See, for example, pmap_remove_write().)
2010-06-11 15:49:39 +00:00
Alan Cox
ce18658792 Reduce the scope of the page queues lock and the number of
PG_REFERENCED changes in vm_pageout_object_deactivate_pages().
Simplify this function's inner loop using TAILQ_FOREACH(), and shorten
some of its overly long lines.  Update a stale comment.

Assert that PG_REFERENCED may be cleared only if the object containing
the page is locked.  Add a comment documenting this.

Assert that a caller to vm_page_requeue() holds the page queues lock,
and assert that the page is on a page queue.

Push down the page queues lock into pmap_ts_referenced() and
pmap_page_exists_quick().  (As of now, there are no longer any pmap
functions that expect to be called with the page queues lock held.)

Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever
be passed an unmanaged page.  Assert this rather than returning "0"
and "FALSE" respectively.

ARM:

Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().

Push down the page queues lock inside of pmap_clearbit(), simplifying
pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write().
Additionally, this allows for avoiding the acquisition of the page
queues lock in some cases.

PowerPC/AIM:

moea*_page_exits_quick() and moea*_page_wired_mappings() will never be
called before pmap initialization is complete.  Therefore, the check
for moea_initialized can be eliminated.

Push down the page queues lock inside of moea*_clear_bit(),
simplifying moea*_clear_modify() and moea*_clear_reference().

The last parameter to moea*_clear_bit() is never used.  Eliminate it.

PowerPC/BookE:

Simplify mmu_booke_page_exists_quick()'s control flow.

Reviewed by:	kib@
2010-06-10 16:56:35 +00:00
Alan Cox
85a71b2578 Don't set PG_WRITEABLE in pmap_enter() unless the page is managed.
Correct a typo in a nearby comment on sparc64.
2010-06-05 18:20:09 +00:00
Alan Cox
c46b90e90a Push down page queues lock acquisition in pmap_enter_object() and
pmap_is_referenced().  Eliminate the corresponding page queues lock
acquisitions from vm_map_pmap_enter() and mincore(), respectively.  In
mincore(), this allows some additional cases to complete without ever
acquiring the page queues lock.

Assert that the page is managed in pmap_is_referenced().

On powerpc/aim, push down the page queues lock acquisition from
moea*_is_modified() and moea*_is_referenced() into moea*_query_bit().
Again, this will allow some additional cases to complete without ever
acquiring the page queues lock.

Reorder a few statements in vm_page_dontneed() so that a race can't lead
to an old reference persisting.  This scenario is described in detail by a
comment.

Correct a spelling error in vm_page_dontneed().

Assert that the object is locked in vm_page_clear_dirty(), and restrict the
page queues lock assertion to just those cases in which the page is
currently writeable.

Add object locking to vnode_pager_generic_putpages().  This was the one
and only place where vm_page_clear_dirty() was being called without the
object being locked.

Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call
to vm_page_clear_dirty().

Change vnode_pager_generic_putpages() to the modern-style of function
definition.  Also, change the name of one of the parameters to follow
virtual memory system naming conventions.

Reviewed by:	kib
2010-05-26 18:00:44 +00:00
Alan Cox
567e51e18c Roughly half of a typical pmap_mincore() implementation is machine-
independent code.  Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().

Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified().  Assert that these
functions are never passed an unmanaged page.

Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization.  Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.

Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean().  Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.

Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().

Reviewed by:	kib (an earlier version)
2010-05-24 14:26:57 +00:00
Konstantin Belousov
afe1a68827 Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
  usermode into struct syscall_args. The structure is machine-depended
  (this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
  from the syscall. It is a generalization of
  cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
  return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively.  The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by:	jhb, marcel, marius, nwhitehorn, stas
Tested by:	marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
	stas (mips)
MFC after:	1 month
2010-05-23 18:32:02 +00:00
Marius Strobl
4461491b3e Change ad_firmware_geom_adjust() to operate on a struct disk * only and
hook it up to ada(4) also. While at it, rename *ad_firmware_geom_adjust()
to *ata_disk_firmware_geom_adjust() etc now that these are no longer
limited to ad(4).

Reviewed by:	mav
MFC after:	3 days
2010-05-20 12:46:19 +00:00
Alan Cox
9ab6032f73 On entry to pmap_enter(), assert that the page is busy. While I'm
here, make the style of assertion used by pmap_enter() consistent
across all architectures.

On entry to pmap_remove_write(), assert that the page is neither
unmanaged nor fictitious, since we cannot remove write access to
either kind of page.

With the push down of the page queues lock, pmap_remove_write() cannot
condition its behavior on the state of the PG_WRITEABLE flag if the
page is busy.  Assert that the object containing the page is locked.
This allows us to know that the page will neither become busy nor will
PG_WRITEABLE be set on it while pmap_remove_write() is running.

Correct a long-standing bug in vm_page_cowsetup().  We cannot possibly
do copy-on-write-based zero-copy transmit on unmanaged or fictitious
pages, so don't even try.  Previously, the call to pmap_remove_write()
would have failed silently.
2010-05-16 23:45:10 +00:00
Marius Strobl
3eb0900b32 - Enable DMA write parity error interrupts on Schizo with a working
implementation.
- Revert the Sun Fire V890 WAR of r205254. Instead let schizo_pci_bus()
  only panic in case of fatal errors as the interrupt triggered by the
  error the firmware of these and also Sun Fire 280R with version 7
  Schizo caused may happen as late as using the HBA and not only prior
  to touching the PCI bus (in the former case the actual error still is
  fatal but we clear it before touching the PCI bus).
  While at it count and export non-fatal error interrupts via sysctl(9).
- Remove unnecessary locking from schizo_ue().
2010-05-14 20:00:21 +00:00
Alan Cox
3c4a24406b Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and
vm_page_try_to_free().  Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().

Push down the page queues lock into Xen's pmap_page_is_mapped().  (I
overlooked the Xen pmap in r207702.)

Switch to a per-processor counter for the total number of pages cached.
2010-05-08 20:34:01 +00:00
Alan Cox
7024db1d40 Push down the page queues lock inside of vm_page_free_toq() and
pmap_page_is_mapped() in preparation for removing page queues locking
around calls to vm_page_free().  Setting aside the assertion that calls
pmap_page_is_mapped(), vm_page_free_toq() now acquires and holds the page
queues lock just long enough to actually add or remove the page from the
paging queues.

Update vm_page_unhold() to reflect the above change.
2010-05-06 16:39:43 +00:00
Alan Cox
d43708c049 Use an OBJT_PHYS object and thus PG_UNMANAGED pages to implement the TSB.
The TSB is not a pageable structure, so there is no point in using managed
pages.

Reviewed by:	kib
2010-05-05 07:47:40 +00:00
Marius Strobl
5a8336816e Add support for SPARC64 V (and where it already makes sense for other
HAL/Fujitsu) CPUs. For the most part this consists of fleshing out the
MMU and cache handling, it doesn't add pmap optimizations possible with
these CPU, yet, though.
With these changes FreeBSD runs stable on Fujitsu Siemens PRIMEPOWER 250
and likely also other models based on SPARC64 V like 450, 650 and 850.
Thanks go to Michael Moll for providing access to a PRIMEPOWER 250.
2010-05-02 19:38:17 +00:00
Marius Strobl
7e9aef235a Add a hack for SPARC64 V CPUs, which set some undocumented bits in the
first data word.
2010-05-02 12:08:15 +00:00
Kip Macy
2965a45315 On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib
2010-04-30 00:46:43 +00:00
Alan Cox
1332aaf9ed MFamd64/i386 r207205
Clearing a page table entry's accessed bit and setting the page's
  PG_REFERENCED flag in pmap_protect() can't really be justified, so
  don't do it.  Moreover, on ia64, don't set the page's dirty field
  unless pmap_protect() is removing write access.
2010-04-29 15:47:31 +00:00
Konstantin Belousov
8bac98182a Style: use #define<TAB> instead of #define<SPACE>.
Noted by:	bde, pluknet gmail com
MFC after:	11 days
2010-04-27 09:48:43 +00:00
Marius Strobl
b641222476 Don't bother enabling interrupts before we're ready to handle them. This
prevents the firmware of Fujitsu Siemens PRIMEPOWER250, which both causes
stray interrupts and erroneously enables interrupts at least when calling
SUNW,set-trap-table, in the foot.
2010-04-26 20:19:49 +00:00
Marius Strobl
e2f198273c Add OF_getscsinitid(), a helper similar to OF_getetheraddr() but for
obtaining the initiator ID to be used for SPI controllers from the
Open Firmware device tree.
2010-04-26 19:13:10 +00:00
Marius Strobl
cbafc5aff6 - Add a missing const.
- Map the NS16550 found in Fujitsu Siemens PRIMEPOWER250 to PNP0501 as well.
2010-04-26 18:49:06 +00:00
Marius Strobl
239bcd0237 Skip the pseudo-devices found in Fujitsu Siemens PRIMEPOWER250. 2010-04-26 18:44:30 +00:00
Alan Cox
7b85f59183 Resurrect pmap_is_referenced() and use it in mincore(). Essentially,
pmap_ts_referenced() is not always appropriate for checking whether or
not pages have been referenced because it clears any reference bits
that it encounters.  For example, in mincore(), clearing the reference
bits has two negative consequences.  First, it throws off the activity
count calculations performed by the page daemon.  Specifically, a page
on which mincore() has called pmap_ts_referenced() looks less active
to the page daemon than it should.  Consequently, the page could be
deactivated prematurely by the page daemon.  Arguably, this problem
could be fixed by having mincore() duplicate the activity count
calculation on the page.  However, there is a second problem for which
that is not a solution.  In order to clear a reference on a 4KB page,
it may be necessary to demote a 2/4MB page mapping.  Thus, a mincore()
by one process can have the side effect of demoting a superpage
mapping within another process!
2010-04-24 17:32:52 +00:00
Konstantin Belousov
ed7806879b Move the constants specifying the size of struct kinfo_proc into
machine-specific header files. Add KINFO_PROC32_SIZE for struct
kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add
CTASSERT for the size of struct kinfo_proc32.

Submitted by:	pluknet
Reviewed by:	imp, jhb, nwhitehorn
MFC after:	2 weeks
2010-04-24 12:49:52 +00:00
Andrew Thompson
b850ecc180 Change USB_DEBUG to #ifdef and allow it to be turned off. Previously this had
the illusion of a tunable setting but was always turned on regardless.

MFC after:	1 week
2010-04-22 21:31:34 +00:00
Marius Strobl
3c7ae7bf67 Update for UltraSPARC-IV{,+} and SPARC64 V, VI, VII and VIIIfx CPUs. 2010-04-11 15:35:17 +00:00
Marius Strobl
89fcb8cfa7 Add missing copyright shebang. 2010-04-10 12:10:11 +00:00
Marius Strobl
d5dba21cf6 Add sbbc(4), a driver for the BootBus controller found in Serengeti and
StarCat systems which provides time-of-day services for both as well as
console service for Serengeti, i.e. Sun Fire V1280. While the latter is
described with a device type of serial in the OFW device tree, it isn't
actually an UART. Nevertheless the console service is handled by uart(4)
as this allowed to re-use quite a bit of MD and MI code. Actually, this
idea is stolen from Linux which interfaces the sun4v hypervisor console
with the Linux counterpart of uart(4).
2010-04-10 11:52:12 +00:00
Marius Strobl
5679850859 Correct the DCR_IPE macro to refer to the right bit. Also improve the
associated comment as besides US-IV+ these bits are only available with
US-III++, i.e. the 1.2GHz version of the US-III+.
2010-04-10 11:13:51 +00:00
Marius Strobl
368dedb6b6 Unlike the sun4v variant, the sun4u version of SUNW,set-trap-table
actually only takes one argument.
2010-04-10 10:56:59 +00:00
Marius Strobl
15edea2038 Do as the comment suggests and determine the bus space based on the last
bus we actually mapped at rather than always based on the last bus we
encountered while moving upward in the tree. Otherwise we might use the
wrong bus space in case the bridge directly underneath the nexus doesn't
require mapping, i.e. was skipped as it's the case for ssm(4) nodes.
2010-04-10 10:44:41 +00:00
Marius Strobl
f9b669080e - Try do deal gracefully with correctable ECC errors.
- Improve the reporting of unhandled kernel and user traps.
2010-04-02 10:36:40 +00:00
Marius Strobl
d7a0fc4dca Use device_get_nameunit(9) rather than device_get_name(9) so one can
identify the reporting bridge in machines with multiple PCI domains.
2010-03-31 22:32:56 +00:00
Marius Strobl
789a702ad3 Don't re-implement device_get_nameunit(9). 2010-03-31 22:27:33 +00:00
Marius Strobl
4a717bcedc - Take advantage of the INTCLR_* macros.
- Right-justify the backslashes as per style(9).
2010-03-31 22:19:00 +00:00
Nathan Whitehorn
a107d8aac9 Change the arguments of exec_setregs() so that it receives a pointer
to the image_params struct instead of several members of that struct
individually. This makes it easier to expand its arguments in the future
without touching all platforms.

Reviewed by:	jhb
2010-03-25 14:24:00 +00:00
Marius Strobl
07e6e81e2f - The firmware of Sun Fire V1280 has a misfeature of setting %wstate to
7 which corresponds to WSTATE_KMIX in OpenSolaris whenever calling into
  it which totally screws us even when restoring %wstate afterwards as
  spill/fill traps can happen while in OFW. The rather hackish OpenBSD
  approach of just setting the equivalent of WSTATE_KERNEL to 7 also is
  no option as we treat %wstate as a bit field. So in order to deal with
  this problem actually implement spill/fill handlers for %wstate 7 which
  just act as the WSTATE_KERNEL ones except of theoretically also handling
  32-bit, turn off interrupts completely so we don't even take IPIs while
  in OFW which should ensure we only take spill/fill traps at most and
  restore %wstate after calling into OFW once we have taken over the trap
  table. While at it, actually set WSTATE_{,PROM}_KMIX before calling into
  OFW just like OpenSolaris does, which should at least help testing this
  change on non-V1280.
- Remove comments referring to the %wstate usage in BSD/OS.
- Remove the no longer used RSF_ALIGN_RETRY macro.
- Correct some trap table addresses in comments.
- Ensure %wstate is set to WSTATE_KERNEL when taking over the trap table.
- Ensure PSTATE_AM is off when entering or exiting to OFW as well as that
  interrupts are also completely off when exiting to OFW as the firmware
  trap table shouldn't be used to handle our interrupts.
2010-03-21 13:09:54 +00:00
Marius Strobl
c613313c10 Improve the KVA space sizing of 186682; on machines with large dTLBs we
can actually use all of the available lockable entries of the tiny dTLB
for the kernel TSB. With this change the KVA space sizing happens to be
more in line with the MI one so up to at least 24GB machines KVA doesn't
need to be limited manually. This is just another stopgap though, the
real solution is to take advantage of ASI_ATOMIC_QUAD_LDD_PHYS on CPUs
providing it so we don't need to lock the kernel TSB pages into the dTLB
in the first place.
2010-03-20 23:00:43 +00:00
Marius Strobl
ddcc3ff59e o Add support for UltraSparc-IV+:
- Swap the configuration of the first and second large dTLB as with
    US-IV+ these can only hold entries of certain page sizes each, which
    we happened to chose the non-working way around.
  - Additionally ensure that the large iTLB is set up to hold 8k pages
    (currently this happens to be a NOP though).
  - Add a workaround for US-IV+ erratum #2.
  - Turn off dTLB parity error reporting as otherwise we get seemingly
    false positives when copying in the user window by simulating a
    fill trap on return to usermode. Given that these parity errors can
    be avoided by disabling multi issue mode and the problem could be
    reproduced with a second machine this appears to be a silicon bug of
    some sort.
  - Add a membar #Sync also before the stores to ASI_DCACHE_TAG. While
    at it, turn of interrupts across the whole cheetah_cache_flush() for
    simplicity instead of around every flush. This should have next to no
    impact as for cheetah-class machines we typically only need to flush
    the caches a few times during boot when recovering from peeking/poking
    non-existent PCI devices, if at all.
  - Just use KERNBASE for FLUSH as we also do elsewhere as the US-IV+
    documentation doesn't seem to mention that these CPUs also ignore the
    address like previous cheetah-class CPUs do. Again the code changing
    LSU_IC is executed seldom enough that the negligible optimization of
    using %g0 instead should have no real impact.

  With these changes FreeBSD runs stable on V890 equipped with US-IV+
  and -j128 buildworlds in a loop for days are no problem. Unfortunately,
  the performance isn't were it should be as a buildworld on a 4x1.5GHz
  US-IV+ V890 takes nearly 3h while on a V440 with (theoretically) less
  powerfull 4x1.5GHz US-IIIi it takes just over 1h. It's unclear whether
  this is related to the supposed silicon bug mentioned above or due to
  another issue. The documentation (which contains a sever bug in the
  description of the bits added to the context registers though) at least
  doesn't mention any requirements for changes in the CPU handling besides
  those implemented and the cache as well as the TLB configurations and
  handling look fine.
o Re-arrange cheetah_init() so it's easier to add support for SPARC64
  V up to VIIIfx CPUs, which only require parts of this initialization.
2010-03-17 22:45:09 +00:00
Marius Strobl
bc11f2d90f Add macros for the VER.impl of SPARC64 II to VIIIfx. 2010-03-17 21:00:39 +00:00
Marius Strobl
319efdb1cc - Add TTE and context register bits for the additional page sizes supported
by UltraSparc-IV and -IV+ as well as SPARC64 V, VI, VII and VIIIfx CPUs.
- Replace TLB_PCXR_PGSZ_MASK and TLB_SCXR_PGSZ_MASK with TLB_CXR_PGSZ_MASK
  which just is the complement of TLB_CXR_CTX_MASK instead of trying to
  assemble it from the page size bits which vary across CPUs.
- Add macros for the remainder of the SFSR bits, which are useful for at
  least debugging purposes.
2010-03-17 20:23:14 +00:00
Marius Strobl
cb2f0c8ce1 - Add quirk handling for Sun Fire V1280. The firmware of these machines
provides no ino-bitmap properties so forge them using the default set
  of controller interrupts and let schizo_setup_intr() take care of the
  children, hoping for non-fancy routing.
- Add quirk handling for Sun Fire V890. When booting these machines from
  disk a Schizo comes up with PCI error residing which triggers as soon
  as we register schizo_pci_bus() even when clearing it from all involved
  registers (it's no longer indicated once we're in schizo_pci_bus()
  though). Thus make PCI bus errors non-fatal until we actually touch the
  bus. With this change schizo_pci_bus() typically triggers once during
  attach in this case. Obviously this approach isn't exactly race free
  but it's about the best we can do about this problem as we're not
  guaranteed that the interrupt will actually trigger on V890 either, as
  it certainly doesn't when for example netbooting them.
2010-03-17 20:01:01 +00:00
Ed Schouten
338f1debcd Remove COMPAT_43TTY from stock kernel configuration files.
COMPAT_43TTY enables the sgtty interface. Even though its exposure has
only been removed in FreeBSD 8.0, it wasn't used by anything in the base
system in FreeBSD 5.x (possibly even 4.x?). On those releases, if your
ports/packages are less than two years old, they will prefer termios
over sgtty.
2010-03-13 09:21:00 +00:00
Joel Dahl
1edcf74de7 The NetBSD Foundation has granted permission to remove clause 3 and 4 from
the software.

Obtained from:	NetBSD
2010-03-03 17:55:51 +00:00
Marius Strobl
ba96c16ae8 Some machines can not only consist of CPUs running at different speeds
but also of different types, f.e. Sun Fire V890 can be equipped with a
mix of UltraSPARC IV and IV+ CPUs, requiring different MMU initialization
and different workarounds for model specific errata. Therefore move the
CPU implementation number from a global variable to the per-CPU data.
Functions which are called before the latter is available are passed the
implementation number as a parameter now.

This file was missed in r204152.
2010-02-21 09:25:53 +00:00
Marius Strobl
c273f8a878 Starting with UltraSPARC IV CPUs the CPU caches are described with different
OFW properties.
2010-02-20 23:42:24 +00:00
Marius Strobl
9b824f84d5 Some machines can not only consist of CPUs running at different speeds
but also of different types, f.e. Sun Fire V890 can be equipped with a
mix of UltraSPARC IV and IV+ CPUs, requiring different MMU initialization
and different workarounds for model specific errata. Therefore move the
CPU implementation number from a global variable to the per-CPU data.
Functions which are called before the latter is available are passed the
implementation number as a parameter now.
2010-02-20 23:24:19 +00:00
Marius Strobl
a675da796f Predict KASSERTs to be true. 2010-02-13 19:17:06 +00:00
Marius Strobl
f00d6a0cc2 Add ssm(4), which serves as a glue device allowing devices beneath the
scalable shared memory node, which is used in large UltraSPARC III based
machines to group snooping-coherency domains together, like schizo(4) to
be treated like nexus(4) children.
2010-02-13 19:05:34 +00:00
Marius Strobl
527eebfeeb - Add the 'cmp' and 'core' pseudo-busses which are used to group CPU cores
to the exclusion lists as the CPU nodes aren't handled as regular devices
  either. Also add the pseudo-devices found in Sun Fire V1280.
- Allow nexus_attach() and nexus_alloc_resource() to be used by drivers
  derived from nexus(4) for subordinate busses.
- Don't add the zero-sized memory resources of glue devices to the resource
  lists.
2010-02-13 18:51:49 +00:00
Marius Strobl
ac144cadbc Resurrect nexusvar.h from r167307. 2010-02-13 18:18:45 +00:00
Marius Strobl
1c05d13e3f Style fixes 2010-02-13 17:05:57 +00:00
Marius Strobl
c61b6da840 - Search the whole OFW device tree instead of only the children of the
root nexus device for the CPUs as starting with UltraSPARC IV the 'cpu'
  nodes hang off of from 'cmp' (chip multi-threading processor) or 'core'
  or combinations thereof. Also in large UltraSPARC III based machines
  the 'cpu' nodes hang off of 'ssm' (scalable shared memory) nodes which
  group snooping-coherency domains together instead of directly from the
  nexus.
  It would be great if we could use newbus to deal with the different ways
  the 'cpu' devices can hang off of pseudo ones but unfortunately both
  cpu_mp_setmaxid() and sparc64_init() have to work prior to regular device
  probing.
- Add support for UltraSPARC IV and IV+ CPUs. Due to the fact that these
  are multi-core each CPU has two Fireplane config registers and thus the
  module/target ID has to be determined differently so the one specific
  to a certain core is used. Similarly, starting with UltraSPARC IV the
  individual cores use a different property in the OFW device tree to
  indicate the CPU/core ID as it no longer is in coincidence with the
  shared slot/socket ID.
  This involves changing the MD KTR code to not directly read the UPA
  module ID either. We use the MID stored in the per-CPU data instead of
  calling cpu_get_mid() as a replacement in order prevent clobbering any
  registers as side-effect in the assembler version. This requires CATR()
  invocations from mp_startup() prior to mapping the per-CPU pages to be
  removed though.
  While at it additionally distinguish between CPUs with Fireplane and
  JBus interconnects as these also use slightly different sizes for the
  JBus/agent/module/target IDs.
- Make sparc64_shutdown_final() static as it's not used outside of
  machdep.c.
2010-02-13 16:52:33 +00:00
Marius Strobl
dfade4c0a4 - At least the trap table of the Sun Fire V1280 firmware apparently has
no cleanwindows handler so just remove trying to trigger it from _start
  and the AP trampoline code as that leads to a crash there. This should
  be okay as leaking data from the OFW via the CPU registers on start of
  the kernel should be no real concern.
- Make the comments of _start and the AP trampoline code regarding the
  initializations they perform match each other and reality.
- Make the comments of the AP trampoline code regarding iTLB accesses
  refer to the right macro.
2010-02-13 15:36:33 +00:00
Marius Strobl
4f607e8ef2 - Assert that HEAPSZ is a multiple of PAGE_SIZE as at least the firmware
of Sun Fire V1280 doesn't round up the size itself but instead lets
  claiming of non page-sized amounts of memory fail.
- Change parameters and variables related to the TLB slots to unsigned
  which is more appropriate.
- Search the whole OFW device tree instead of only the children of the
  root nexus device for the BSP as starting with UltraSPARC IV the 'cpu'
  nodes hang off of from 'cmp' (chip multi-threading processor) or 'core'
  or combinations thereof. Also in large UltraSPARC III based machines
  the 'cpu' nodes hang off of 'ssm' (scalable shared memory) nodes which
  group snooping-coherency domains together instead of directly from the
  nexus.
- Add support for UltraSPARC IV and IV+ BSPs. Due to the fact that these
  are multi-core each CPU has two Fireplane config registers and thus the
  module/target ID has to be determined differently so the one specific
  to a certain core is used. Similarly, starting with UltraSPARC IV the
  individual cores use a different property in the OFW device tree to
  indicate the CPU/core ID as it no longer is in coincidence with the
  shared slot/socket ID.
  While at it additionally distinguish between CPUs with Fireplane and
  JBus interconnects as these also use slightly different sizes for the
  JBus/agent/module/target IDs.
- Check the return value of init_heap(). This requires moving it after
  cons_probe() so we can panic when appropriate. This should be fine as
  the PowerPC OFW loader uses that order for quite some time now.
2010-02-13 14:13:39 +00:00
Attilio Rao
88cbfa852e Add the options DEADLKRES (introducing the deadlock resolver thread) in
the 'debugging' section of any HEAD kernel and enable for the mainstream
ones, excluding the embedded architectures.
It may, of course, enabled on a case-by-case basis.

Sponsored by:	Sandvine Incorporated
Requested by:	emaste
Discussed with:	kib
2010-02-10 16:30:04 +00:00
Marius Strobl
6d709b9a44 Implement handling of the third argument of cpu_switch(). This unbreaks
sparc64 after r202889.

PR:		143215
MFC after:	1 week
2010-01-30 14:04:21 +00:00
Marius Strobl
376a67cf64 - Zero the MSI/MSI-X queue argument, otherwise mtx_init(9) can panic
indicating an already initialized lock.
- Check for an empty MSI/MSI-X queue entry before asserting that we have
  received a MSI/MSI-X message in order to not panic in case of stray MSI/
  MSI-X queue interrupts which may happen in case of using an interrupt
  handler rather than a filter.

MFC after:	3 days
2010-01-27 20:30:14 +00:00
Marius Strobl
29b6820f87 Merge r202882 from amd64/i386:
For PT_TO_SCE stop that stops the ptraced process upon syscall entry,
syscall arguments are collected before ptracestop() is called. As a
consequence, debugger cannot modify syscall or its arguments.

In syscall(), reread syscall number and arguments after ptracestop(),
if debugger modified anything in the process environment. Since procfs
stopevent requires number of syscall arguments in p_xstat, this cannot
be solved by moving stop/trace point before argument fetching.

Move the code to read arguments into separate function
fetch_syscall_args() to avoid code duplication. Note that ktrace point
for modified syscall is intentionally recorded twice, once with original
arguments, and second time with the arguments set by debugger.

PT_TO_SCX stop is executed after cpu_syscall_set_retval() already.

Reviewed by:	kib
2010-01-23 22:11:18 +00:00
John Baldwin
13c18821fa Move the examples for the 'hints' and 'env' keywords from various GENERIC
kernel configs into NOTES.

Reviewed by:	imp
2010-01-19 17:20:34 +00:00
Marius Strobl
0424a37bea Add epic(4) also here.
MFC after:	3 days
2010-01-18 20:25:29 +00:00
Marius Strobl
b10fd7e9cc When setting up MSIs with a filter ensure that the event queue interrupt
is cleared as it might have triggered before and given we supply NULL
as ic_clear, inthand_add() won't do this for us in this case.
2010-01-10 18:39:29 +00:00
Warner Losh
87948dfdf2 Add INCLUDE_CONFIG_FILE in GENERIC on all non-embedded platforms.
# This is the resolution of removing it from DEFAULTS...

MFC after:	5 days
2010-01-10 17:44:22 +00:00
Marius Strobl
319570f9c4 Add epic(4), a driver for the front panel LEDs in Sun Fire V215/V245.
It's named after the driver doing the same job in OpenSolaris.
2010-01-10 15:44:48 +00:00
Marius Strobl
0e34fcfba8 - According to OpenSolaris it's sufficient to align the MSIs of a
device in the table based on the count rather than the maxcount.
  Also the previous code didn't work properly as it would have been
  necessary to reserve the entire maxcount range in order keep later
  requests from filling the spare MSIs between count and maxcount,
  which would be complicated to unreserve in fire_release_msi().
- For MSIs with filters rather than handlers only don't clear the
  event queue interrupt via fire_intr_clear() since given that these
  are executed directly would clear it while we're still processing
  the event queue, which in turn would lead to lost MSIs.
- Save one level of indentation in fire_setup_intr().
- Correct a bug in fire_teardown_intr() which prevented it from
  correctly restoring the MSI in the resource, causing allocation of
  a resource representing an MSI to fail after the first pass when
  repeatedly loading and unloading a driver module.
2010-01-10 15:12:15 +00:00
Bjoern A. Zeeb
193171b7f5 In sys/<arch>/conf/Makefile set TARGET to <arch>. That allows
sys/conf/makeLINT.mk to only do certain things for certain
architectures.

Note that neither arm nor mips have the Makefile there, thus
essentially not (yet) supporting LINT.  This would enable them
do add special treatment to sys/conf/makeLINT.mk as well chosing
one of the many configurations as LINT.

This is a hack of doing this and keeping it in a separate commit
will allow us to more easily identify and back it out.

Discussed on/with:	arch, jhb (as part of the LINT-VIMAGE thread)
MFC after:		1 month
2010-01-08 18:57:31 +00:00
Pyun YongHyeon
fa82d01504 Enable ste(4). ste(4) should work on all architectures. 2010-01-08 02:46:34 +00:00
Warner Losh
56eff2143f Revert 200594. This file isn't intended for these sorts of things. 2010-01-04 21:30:04 +00:00
Brooks Davis
9efde58392 Add vlan(4) to all GENERIC kernels.
MFC after:	1 week
2010-01-03 20:40:54 +00:00
Marius Strobl
9a41d9ef93 Exclude options COMPAT_FREEBSD4 now that the MD freebsd4_sigreturn()
is gone since r201396 and which is also in line with the fact that
FreeBSD 4 didn't supported sparc64.
2010-01-03 02:58:43 +00:00
Marius Strobl
be70411f82 - Demapping unused kernel TLB slots has proven to work reliably so move
the associated debugging under bootverbose.
- Remove freebsd4_sigreturn(); given that FreeBSD 4 didn't supported
  sparc64 this only ever served as a transition aid prior to FreeBSD
  5.0 and is unused by default since COMPAT_FREEBSD4 was removed from
  GENERIC in r143072 nearly 5 years ago.
2010-01-02 15:44:16 +00:00
Marius Strobl
c936816f6d - Preserve the PROM IOMMU in order to allow OFW drivers to continue to
work.
- Sanity check the parameters passed to the implementations of the
  pcib_{read,write}_config() methods. Using illegal values can cause
  no real harm but it doesn't hurt to avoid unnecessary data error
  traps requiring to flush and re-enable the level 1 caches.
2010-01-02 15:19:33 +00:00
Marius Strobl
209817b163 Fix botches in r201005:
- Actually use the newly introduced sc_res in the front-end.
- Remove a whitespace glitch in mk48txx_gettime().
2010-01-01 22:47:53 +00:00
Marius Strobl
c64b2eba8a - Remove a redundant variable and an unnecessary cast.
- Fix whitespace.
2009-12-29 14:06:36 +00:00
Marius Strobl
7d027c4ff2 - Prefer i and j over i and n for temporary integer variables.
- Wrap/shorten too long lines.
- Remove a redundant variable and an unnecessary cast in schizo(4).
2009-12-29 14:03:38 +00:00
Marius Strobl
4c6587a6cf Account for firmware versions which include the CDMA interrupts in
the OFW device tree.

MFC after:	3 days
2009-12-28 14:16:40 +00:00
Marius Strobl
5cb5104246 Add a driver for the `Fire' JBus to PCIe bridges found in at least
the Sun Fire V215/V245 and Sun Ultra 25/45 machines. This driver also
already includes all the code to support the `Oberon' Uranus to PCIe
bridges found in the Fujitsu-Siemens based Mx000 machines but due to
lack of access to such a system for testing, probing of these bridges
is currently disabled.
Unfortunately, the event queue mechanism of these bridges for MSIs/
MSI-Xs matches our current MD and MI interrupt frameworks like square
pegs fit into round holes so for now we are generous and use one event
queue per MSI, which limits us to 35 MSIs/MSI-Xs per Host-PCIe-bridge
(we use one event queue for the PCIe error messages). This seems
tolerable as long as most devices just use one MSI/MSI-X anyway.
Adding knowledge about MSIs/MSI-Xs to the MD interrupt code should
allow us to decouple the 1:1 mapping at the cost of no longer being
able to bind MSIs/MSI-Xs to specific CPUs as we currently have no
reliable way to quiesce a device during the transition of its MSIs/
MSI-Xs to another event queue. This would still require the problem
of interrupt storms generated by devices which have no one-shot
behavior or can't/don't mask interrupts while the filter/handler is
executed (like the older PCIe NICs supported by bge(4)) to be solved
though.

Committed from:	26C3
2009-12-27 16:55:44 +00:00
Marius Strobl
de6a190429 Style changes
Obtained from:	NetBSD (mc146818reg.h)
2009-12-25 22:53:46 +00:00
Marius Strobl
033a5c8ce1 - Take advantage of bus_{read,write}_*(9).
- Set dow = -1 in mk48txx_gettime() because some drivers (for example
  the NetBSD and OpenBSD mk48txx(4)) don't set it correctly.
2009-12-25 21:53:20 +00:00
Marius Strobl
10a290af39 - Hook up the default implementations of the MSI/MSI-X pcib_if methods
so requests may bubble up to a host-PCI bridge driver.
- Distinguish between PCI and PCIe bridges in the device description
  so it's a bit easier to follow what hangs off of what in the dmesg.
  Unfortunately we can't also tell PCI and PCI-X apart based on the
  information provided in the OFW device tree.
- Add quirk handling for the ALi M5249 found in Fire-based machines
  which are used as a PCIe-PCIe bridge there. These are obviously
  subtractive decoding as as they have a PCI-ISA bridge on their
  secondary side (and likewise don't include the ISA I/O range in
  their bridge decode) but don't indicate this via the class code.
  Given that this quirk isn't likely to apply to all ALi M5249 and
  I have no datasheet for these chips so I could implement a check
  using the chip specific bits enabling subtractive decoding this
  quirk handling is added to the MD code rather than the MI one.
2009-12-25 15:03:05 +00:00
Marius Strobl
3438bdc53e Merge from amd64/i386:
Implement support for interrupt descriptions.
2009-12-24 15:43:37 +00:00
Marius Strobl
b1c13fc739 Add missing locking in intr_bind(). 2009-12-24 15:40:08 +00:00
Marius Strobl
d92734d25a - Don't check for a valid interrupt controller on every interrupt
in intr_execute_handlers(). If we managed to get here without an
  associated interrupt controller we have way bigger problems.
  While at it predict stray vector interrupts as false as they are
  rather unlikely.
- Don't blindly call the clear function of an interrupt controller
  when adding a handler in inthand_add() as interrupt controllers
  like the one driven by upa(4) are auto-clearing and thus provide
  NULL instead.
2009-12-24 12:27:22 +00:00
Marius Strobl
c39f5f5af4 - By re-arranging the code in OF_decode_addr() somewhat and accepting
a bit of a detour we can just iterate through the banks array instead
  of having to calculate every offset. This change is inspired by the
  powerpc version of this function.
- Add support for the JBus to EBus bridges which hang off of nexus(4).
2009-12-23 22:25:23 +00:00
Marius Strobl
ee57d839bb Style changes. 2009-12-23 22:11:33 +00:00
Marius Strobl
6ed76228c1 - Add support for the IOMMUs of Fire JBus to PCIe and Oberon Uranus
to PCIe bridges.
- Add support for talking the PROM mappings over to the kernel IOTSB
  just like we do with the kernel TSB in order to allow OFW drivers
  to continue to work.
- Change some members, parameters and variables to unsigned where
  more appropriate.
2009-12-23 22:02:34 +00:00
Marius Strobl
46c9b5d9bd Fix whitespace according to style(9). 2009-12-23 21:51:41 +00:00
Marius Strobl
926f4e43c6 - Add quirk handling for ALi M5229, mainly setting the magic "force
enable IDE I/O" bit which prevents data access traps with revision
  0xc8 in Fire-based machines when pci(4) enables PCIM_CMD_PORTEN.
- Like for sun4v also don't add the PCI side of host-PCIe bridges to
  the bus on sun4u as they don't have configuration space implement
  there either.
2009-12-23 21:38:59 +00:00
Marius Strobl
9cc21da577 - Sort the prototypes.
- Add macros to ease the access of device configuration space in
  ofw_pcibus_setup_device().
2009-12-23 21:25:16 +00:00
Marius Strobl
36fd650f09 Add structures for OFW MSI/MSI-X support. These are identical for
both sun4u and sun4v.
2009-12-23 21:07:49 +00:00
Marius Strobl
7bb518f115 Don't probe the bq4802 variant found in Ultra 25 and 45 for now as
this chip isn't MC146818 compatible and requires different handlers
(but which I can't test due to lack of such hardware).
2009-12-23 20:42:14 +00:00
Marius Strobl
4d116858df Don't use an out register to hold the vector number across the call
of the interrupt handler in intr_fast() as the handler might clobber
it (no in-tree handler currently does but an upcoming one will).
While at it, tidy the register usage in the interrupt counting code.
2009-12-23 20:23:04 +00:00
Marcel Moolenaar
d9c849eceb Calculate the average CPU clock frequency and export that through
the hw.freq.cpu sysctl variable. This can be used by ports that
need to know "the" CPU frequency.
2009-12-23 06:52:12 +00:00
Marius Strobl
b748710c34 - Correct an off-by-one error when calculating the end of a child
range.
- Spell the PCI TLA in uppercase.
2009-12-22 21:53:19 +00:00
Marius Strobl
63cd5b6d38 - Add support for the JBus to EBus bridges which hang off of nexus(4)
and are found in sun4u and sun4v machines based on the Fire ASIC.
- Initialize the configuration space of the PCI to EBus variant the
  same way as OpenSolaris does.
2009-12-22 21:49:53 +00:00
Marius Strobl
8bf72e61a7 - Add macros for the states of the interrupt clear registers.
- Change INTMAP_VEC() to take an INO as its second argument rather
  than an INR. The former is what I actually intended with this
  macro and how it's currently used.
2009-12-22 21:48:18 +00:00
Marius Strobl
ea77e7bb3f Make these constants unsigned which is more appropriate. 2009-12-22 21:42:54 +00:00
Marius Strobl
1bba41a506 Enroll these drivers in multipass probing. The motivation behind this
is that the JBus to EBus bridges share the interrupt controller of a
sibling JBus to PCIe bridge (at least as far as the OFW device tree
is concerned, in reality they are part of the same chip) so we have to
probe and attach the latter first. That happens to be also the case
due to the fact that the JBus to PCIe bridges appear first in the OFW
device tree but it doesn't hurt to ensure the right order.
2009-12-22 21:02:46 +00:00
Marius Strobl
6a963456fd Add missing module dependency information. 2009-12-21 21:41:33 +00:00
Marius Strobl
75b02cac5a Provide and consume missing module dependency information. 2009-12-21 21:29:16 +00:00
Doug Barton
f1bdf073c1 Add INCLUDE_CONFIG_FILE, and a note in comments about how to also
include the comments with CONFIGARGS
2009-12-16 02:17:43 +00:00
Marius Strobl
259100de20 Add additional checks of the kernel stack addresses in order to
ensure we don't overrun the end of the call chain.

MFC after:	1 week
2009-12-08 20:18:54 +00:00
Marius Strobl
22bf171313 Add <machine/pcb.h> missed in r199135. 2009-12-07 15:29:07 +00:00
Alan Cox
e2997fea72 Simplify the invocation of vm_fault(). Specifically, eliminate the flag
VM_FAULT_DIRTY.  The information provided by this flag can be trivially
inferred by vm_fault().

Discussed with:	kib
2009-11-27 20:24:11 +00:00
Marius Strobl
8fe75fc879 Unroll copying of the registers in {g,s}et_mcontext() and limit it
to the set actually restored by tl0_ret() instead of using the whole
trapframe. Additionally skip %g7 as that register is used as the
userland TLS pointer.

PR:		140523
MFC after:	1 week
2009-11-17 21:08:10 +00:00
Konstantin Belousov
a7b890448c Extract the code that records syscall results in the frame into MD
function cpu_set_syscall_retval().

Suggested by:	marcel
Reviewed by:	marcel, davidxu
PowerPC, ARM, ia64 changes:	marcel
Sparc64 tested and reviewed by:	marius, also sunv reviewed
MIPS tested by:	gonzo
MFC after:	1 month
2009-11-10 11:43:07 +00:00
Konstantin Belousov
d6e029adbe In r197963, a race with thread being selected for signal delivery
while in kernel mode, and later changing signal mask to block the
signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race
exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls.

Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals, closing the race.

Reviewed by:	davidxu
Tested by:	pho
MFC after:	1 month
2009-10-27 10:47:58 +00:00
Marius Strobl
bed992dc26 Sync with the other archs and wrapper the prototype of in_cksum_skip(9)
in #ifdef _KERNEL.

Submitted by:	Ulrich Spoerlein
MFC after:	1 month
2009-10-26 22:00:26 +00:00
Marcel Moolenaar
1a4fcaebe3 o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
    vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
    that translates the vm_map_t argumument to pmap_t.
o   Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
    it replaces the pmap_page_executable() function, added to solve
    the I-cache problem in uiomove_fromphys().
o   In proc_rwmem() call vm_sync_icache() when writing to a page that
    has execute permissions. This assures that when breakpoints are
    written, the I-cache will be coherent and the process will actually
    hit the breakpoint.
o   This also fixes the Book-E PMAP implementation that was missing
    necessary locking while trying to deal with the I-cache coherency
    in pmap_enter() (read: mmu_booke_enter_locked).

The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.
2009-10-21 18:38:02 +00:00
Marius Strobl
a430b967b5 Change the load base to below 2GB so PIE binaries work including when
compiled to use the Medium/Low code model, which we currently default
to for the userland. GNU/Linux has moved their default to Medium/Middle
some time ago, which probably explains why the current GNU ld(1) uses
a base in the range between 32 and 44 bits instead.

Submitted by:	kib
2009-10-18 13:08:15 +00:00
John Baldwin
55b6a401ef Move the USB wireless drivers down into their own section next to the USB
ethernet drivers.

Submitted by:	Glen Barber  glen.j.barber @ gmail
MFC after:	1 month
2009-10-13 19:02:03 +00:00
Konstantin Belousov
023063938a Define architectural load bases for PIE binaries. Addresses were selected
by looking at the bases used for non-relocatable executables by gnu ld(1),
and adjusting it slightly.

Discussed with:	bz
Reviewed by:	kan
Tested by:	bz (i386, amd64), bsam (linux)
MFC after:	some time
2009-10-10 15:31:24 +00:00