Commit Graph

2153 Commits

Author SHA1 Message Date
Alan Cox
e4b8a2fc5a Eliminate a stale comment. It describes another use case for the pmap in
Mach that doesn't exist in FreeBSD.
2012-09-28 05:30:59 +00:00
Eitan Adler
96240c89f0 Correct double "the the"
Approved by:	cperciva
MFC after:	3 days
2012-09-14 21:28:56 +00:00
Attilio Rao
324e57150d userret() already checks for td_locks when INVARIANTS is enabled, so
there is no need to check if Giant is acquired after it.

Reviewed by:	kib
MFC after:	1 week
2012-09-08 18:27:11 +00:00
Gavin Atkinson
915ae29a17 Prevent indent(1) from reformatting this comment, as it contains
a formatting-sensitive table.
2012-09-07 08:18:06 +00:00
Marius Strobl
9e8100e77c Add a global MD macro for the VIS block size instead of duplicating
it and using magic values all over the place.

MFC after:	1 week
2012-08-31 11:15:01 +00:00
Marius Strobl
bf38cf8ab3 - Unlike cache invalidation and TLB demapping IPIs, reading registers from
other CPUs doesn't require locking so get rid of it. As the latter is used
  for the timecounter on certain machine models, using a spin lock in this
  case can lead to a deadlock with the upcoming callout(9) rework.
- Merge r134227/r167250 from x86:
  Avoid cross-IPI SMP deadlock by using the smp_ipi_mtx spin lock not only
  for smp_rendezvous_cpus() but also for the MD cache invalidation and TLB
  demapping IPIs.
- Mark some unused function arguments as such.

MFC after:	1 week
2012-08-29 16:56:50 +00:00
Glen Barber
67944c4572 Grammar fix: s/NIC's/NICs/
MFC after:	3 days
2012-08-26 01:21:02 +00:00
Marius Strobl
55afc4edf1 Merge r236494 from x86:
Isolate the global TTE list lock from data and other locks to prevent false
sharing within the cache.

MFC after:	3 days
2012-08-05 22:03:13 +00:00
Marius Strobl
5f42fa1793 Switch back to the 4BSD scheduler for now. There is some more or less
recent regression with ULE, causing processes to get stuck in getblk
as well as interrupt handler execution delays to rise above the command
timeout of mpt(4).

MFC after:	3 days
2012-06-30 14:55:36 +00:00
Kenneth D. Merry
e36042794f Now that the mps(4) driver is endian-safe, add it to the powerpc and
sparc64 GENERIC config files.

MFC after:	3 days
2012-06-28 20:48:24 +00:00
Alan Cox
e30df26e7b Add new pmap layer locks to the predefined lock order. Change the names
of a few existing VM locks to follow a consistent naming scheme.
2012-06-27 03:45:25 +00:00
Andrew Turner
74dc547e24 Make the wchar_t type machine dependent.
This is required for ARM EABI. Section 7.1.1 of the Procedure Call for the
ARM Architecture (AAPCS) defines wchar_t as either an unsigned int or an
unsigned short with the former preferred.

Because of this requirement we need to move the definition of __wchar_t to
a machine dependent header. It also cleans up the macros defining the limits
of wchar_t by defining __WCHAR_MIN and __WCHAR_MAX in the same machine
dependent header then using them to define WCHAR_MIN and WCHAR_MAX
respectively.

Discussed with:	bde
2012-06-24 04:15:58 +00:00
Konstantin Belousov
aea810386d Implement mechanism to export some kernel timekeeping data to
usermode, using shared page.  The structures and functions have vdso
prefix, to indicate the intended location of the code in some future.

The versioned per-algorithm data is exported in the format of struct
vdso_timehands, which mostly repeats the content of in-kernel struct
timehands. Usermode reading of the structure can be lockless.
Compatibility export for 32bit processes on 64bit host is also
provided. Kernel also provides usermode with indication about
currently used timecounter, so that libc can fall back to syscall if
configured timecounter is unknown to usermode code.

The shared data updates are initiated both from the tc_windup(), where
a fast task is queued to do the update, and from sysctl handlers which
change timecounter. A manual override switch
kern.timecounter.fast_gettime allows to turn off the mechanism.

Only x86 architectures export the real algorithm data, and there, only
for tsc timecounter. HPET counters page could be exported as well, but
I prefer to not further glue the kernel and libc ABI there until
proper vdso-based solution is developed.

Minimal stubs neccessary for non-x86 architectures to still compile
are provided.

Discussed with:	bde
Reviewed by:	jhb
Tested by:	flo
MFC after:	1 month
2012-06-22 07:06:40 +00:00
Konstantin Belousov
232aa31fb9 Reserve AT_TIMEKEEP auxv entry for providing usermode the pointer to
timekeeping information.

MFC after:  1 week
2012-06-22 06:38:31 +00:00
Alan Cox
6031c68de4 The page flag PGA_WRITEABLE is set and cleared exclusively by the pmap
layer, but it is read directly by the MI VM layer.  This change introduces
pmap_page_is_write_mapped() in order to completely encapsulate all direct
access to PGA_WRITEABLE in the pmap layer.

Aesthetics aside, I am making this change because amd64 will likely begin
using an alternative method to track write mappings, and having
pmap_page_is_write_mapped() in place allows me to make such a change
without further modification to the MI VM layer.

As an added bonus, tidy up some nearby comments concerning page flags.

Reviewed by:	kib
MFC after:	6 weeks
2012-06-16 18:56:19 +00:00
Alan Cox
b10ed4a911 Replace all uses of the vm page queues lock by a r/w lock that is private
to this pmap.c.  This new r/w lock is used primarily to synchronize access
to the TTE lists.  However, it will be used in a somewhat unconventional
way.  As finer-grained TTE list locking is added to each of the pmap
functions that acquire this r/w lock, its acquisition will be changed from
write to read, enabling concurrent execution of the pmap functions with
finer-grained locking.

Reviewed by:	attilio
Tested by:	flo
MFC after:	10 days
2012-05-29 01:52:38 +00:00
Marius Strobl
57e42723d8 Merge from x86: r232521
Exclude USB drivers (except umass and ukbd) from main kernel image.
2012-05-25 14:52:05 +00:00
Bjoern A. Zeeb
920b965865 MFp4 bz_ipv6_fast:
in_cksum.h required ip.h to be included for struct ip.  To be
  able to use some general checksum functions like in_addword()
  in a non-IPv4 context, limit the (also exported to user space)
  IPv4 specific functions to the times, when the ip.h header is
  present and IPVERSION is defined (to 4).

  We should consider more general checksum (updating) functions
  to also allow easier incremental checksum updates in the L3/4
  stack and firewalls, as well as ponder further requirements by
  certain NIC drivers needing slightly different pseudo values
  in offloading cases.  Thinking in terms of a better "library".

  Sponsored by:	The FreeBSD Foundation
  Sponsored by:	iXsystems

Reviewed by:	gnn (as part of the whole)
MFC After:	3 days
2012-05-24 22:00:48 +00:00
Alexander Motin
dc0aa406db MFprojects/zfsd:
Generalize and unify ses device description.
2012-05-24 11:20:51 +00:00
Marius Strobl
d0cea659af Fix mismerge in r235231. 2012-05-10 15:23:20 +00:00
Marius Strobl
a6125979d3 Merge r234989 from x86:
Revert part of r234723 by re-enabling the SMP protection for intr_bind().
2012-05-10 15:17:21 +00:00
Dimitry Andric
460378bf13 Add a convenience macro for the returns_twice attribute, and apply it to
the prototypes of the appropriate functions (getcontext, savectx,
setjmp, sigsetjmp and vfork).

MFC after:	2 weeks
2012-04-29 11:04:31 +00:00
Attilio Rao
70dbd1604c Clean up the intr* MD KPI from the SMP dependency, removing a cause of
discrepancy between modules and kernel, but deal with SMP differences
within the functions themselves.

As an added bonus this also helps in terms of code readability.

Requested by:	gibbs
Reviewed by:	jhb, marius
MFC after:	1 week
2012-04-26 20:24:25 +00:00
Marius Strobl
05374275e5 Turn on PREEMPTION by default. After fixing several bugs over time, the
last show-stopper keeping PREEMPTION from being usable on sparc64 should
have been dealt with in r230662.
At least on 2-way systems, PREEMPTION causes a little bit of a degradation
in worldstone performance. However, FreeBSD seems to have started building
up regressions in !PREEMPTION cases so sparc64 better should not be an
oddball in this regard.

MFC after:	1 week
2012-04-16 18:29:07 +00:00
Marius Strobl
7b88f42369 Merge from x86:
r233961:

Fix interrupt load balancing regression, introduced in revision
222813, that left all un-pinned interrupts assigned to CPU 0.
In intr_shuffle_irqs(), remove CPU_SETOF() call that initialized
the "intr_cpus" cpuset to only contain CPU0.

This initialization is too late and nullifies the results of calls
to the intr_add_cpu() that occur much earlier in the boot process.

r234074 (partial):

The BSP is not added to the mask of valid target CPUs for interrupts.
Fix this by adding the BSP as an interrupt target directly in

r234105:

Fix !SMP build after r234074.

MFC after: 3 days
2012-04-13 22:58:23 +00:00
Marius Strobl
3e1745a769 Remove checks that are redundant due to tf_type being unsigned.
MFC after:	3 days
2012-03-31 14:03:16 +00:00
Marius Strobl
2cc3538245 Fix panic on kernel traps having a mapping in trap_sig b0rked in r206086.
Repored by:	David E. Cross

MFC after:	3 days
2012-03-31 13:56:24 +00:00
Marius Strobl
4cb0ce8a60 - Remove erroneous trailing semicolon. [1]
- Correctly determine the maximum payload size for setting the TX link
  frequent NACK latency and replay timer thresholds.

Submitted by:	stefanf [1]
MFC after:	3 days
2012-03-30 15:08:09 +00:00
Marius Strobl
1376f3e1d1 Given that this is a host-PCI-Express bridge driver, create the parent
DMA tag with a 4 GB boundary as required by PCI-Express. With r232403 in
place this actually is redundant. However, the host-PCI-Express bridge
driver is the more appropriate place for implementing this restriction.

MFC after:	3 days
2012-03-24 13:11:58 +00:00
Ed Schouten
92396a3174 Remove pty(4) from our kernel configurations.
As of FreeBSD 8, this driver should not be used. Applications that use
posix_openpt(2) and openpty(3) use the pts(4) that is built into the
kernel unconditionally. If it turns out high profile depend on the
pty(4) module anyway, I'd rather get those fixed. So please report any
issues to me.

The pty(4) module is still available as a kernel module of course, so a
simple `kldload pty' can be used to run old-style pseudo-terminals.
2012-03-21 08:38:42 +00:00
Nathan Whitehorn
0d8d9edaaa Make ofw_bus_get_node() consistently return -1 when there is no associated
OF node, instead of a random mixture of 0 and -1. Update all checks for 0
to check for -1 instead.

MFC after:	4 weeks
2012-03-15 22:53:39 +00:00
Dimitry Andric
63d094a7e2 Add casts to __uint16_t to the __bswap16() macros on all arches which
didn't already have them.  This is because the ternary expression will
return int, due to the Usual Arithmetic Conversions.  Such casts are not
needed for the 32 and 64 bit variants.

While here, add additional parentheses around the x86 variant, to
protect against unintended consequences.

MFC after:	2 weeks
2012-03-09 20:34:31 +00:00
Attilio Rao
9c170fd168 Disable the option VFS_ALLOW_NONMPSAFE by default on all the supported
platforms.
This will make every attempt to mount a non-mpsafe filesystem to the
kernel forbidden, unless it is expressely compiled with
VFS_ALLOW_NONMPSAFE option.

This patch is part of the effort of killing non-MPSAFE filesystems
from the tree.

No MFC is expected for this patch.
2012-03-06 20:01:25 +00:00
John Baldwin
1b1596a3b3 - Add a bus_dma tag to each PCI bus that is a child of a Host-PCI bridge.
The tag enforces a single restriction that all DMA transactions must not
  cross a 4GB boundary.  Note that while this restriction technically only
  applies to PCI-express, this change applies it to all PCI devices as it
  is simpler to implement that way and errs on the side of caution.
- Add a softc structure for PCI bus devices to hold the bus_dma tag and
  a new pci_attach_common() routine that performs actions common to the
  attach phase of all PCI bus drivers.  Right now this only consists of
  a bootverbose printf and the allocate of a bus_dma tag if necessary.
- Adjust all PCI bus drivers to allocate a PCI bus softc and to call
  pci_attach_common() from their attach routines.

MFC after:	2 weeks
2012-03-02 20:38:04 +00:00
John Baldwin
831ce4cb3d - Change contigmalloc() to use the vm_paddr_t type instead of an unsigned
long for specifying a boundary constraint.
- Change bus_dma tags to use bus_addr_t instead of bus_size_t for boundary
  constraints.

These allow boundary constraints to be fully expressed for cases where
sizeof(bus_addr_t) != sizeof(bus_size_t).  Specifically, it allows a
driver to properly specify a 4GB boundary in a PAE kernel.

Note that this cannot be safely MFC'd without a lot of compat shims due
to KBI changes, so I do not intend to merge it.

Reviewed by:	scottl
2012-03-01 19:58:34 +00:00
Marius Strobl
96459db58f As it turns out r227960 may still be insufficient with PREEMPTION
so try harder to get the CDMA sync interrupt delivered and also in
a more efficient way:
- wrap the whole process of sending and receiving the CDMA sync
  interrupt in a critical section so we don't get preempted,
- send the CDMA sync interrupt to the CPU that is actually waiting
  for it to happen so we don't take a detour via another CPU,
- instead of waiting for up to 15 seconds for the interrupt to
  trigger try the whole process for up to 15 times using a one
  second timeout (the code was also changed to just ignore belated
  interrupts of a previous tries should they appear).

According to testing done by Peter Jeremy with the debugging also
added as part of this commit the first two changes apparently are
sufficient to now properly get the CDMA sync interrupts delivered
at the first try though.
2012-01-28 22:42:33 +00:00
Marius Strobl
3c0f8828f4 Fully disable interrupts while we fiddle with the FP context in the
VIS-based block copy/zero implementations. While with 4BSD it's
sufficient to just disable the tick interrupts, with ULE+PREEMPTION
it's otherwise also possible that these are preempted via IPIs.
2012-01-28 22:22:05 +00:00
Marius Strobl
ad21bcec21 Commit file missed in r230633. 2012-01-27 23:25:24 +00:00
Marius Strobl
546e977edb Now that we have a working OF_printf() since r230631 and a OF_panic()
helper since r230632, use these for output and panicing during the
early cycles and move cninit() until after the static per-CPU data
has been set up. This solves a couple of issue regarding the non-
availability of the static per-CPU data:
- panic() not working and only making things worse when called,
- having to supply a special DELAY() implementation to the low-level
  console drivers,
- curthread accesses of mutex(9) usage in low-level console drivers
  that aren't conditional due to compiler optimizations (basically,
  this is the problem described in r227537 but in this case for
  keyboards attached via uart(4)). [1]

PR:	164123 [1]
2012-01-27 23:21:54 +00:00
Marius Strobl
a500e985ad - Now that we have a working OF_printf() since r230631, use it for
implementing a simple OF_panic() that may be used during the early
  cycles when panic() isn't available, yet.
- Mark cpu_{exit,shutdown}() as __dead2 as appropriate.
2012-01-27 22:35:53 +00:00
Marius Strobl
21bbe2ac13 For machines where the kernel address space is unrestricted increase
VM_KMEM_SIZE_SCALE to 2, awaiting more insight from alc@. As it turns
out, the VM apparently has problems with machines that have large holes
in the physical address space, causing the kmem_suballoc() call in
kmeminit() to fail with a VM_KMEM_SIZE_SCALE of 1. Using a value of 2
allows these, namely Blade 1500 with 2GB of RAM, to boot.

PR:	164227
2012-01-27 22:25:46 +00:00
Marius Strobl
733ce5d25e Mark cpu_{halt,reset}() as __dead2 as appropriate. 2012-01-27 22:04:43 +00:00
David Schultz
2ee7b1d4ae Add C11 macros describing subnormal numbers to float.h.
Reviewed by:	bde
2012-01-23 06:36:41 +00:00
Kenneth D. Merry
130f4520cb Add the CAM Target Layer (CTL).
CTL is a disk and processor device emulation subsystem originally written
for Copan Systems under Linux starting in 2003.  It has been shipping in
Copan (now SGI) products since 2005.

It was ported to FreeBSD in 2008, and thanks to an agreement between SGI
(who acquired Copan's assets in 2010) and Spectra Logic in 2010, CTL is
available under a BSD-style license.  The intent behind the agreement was
that Spectra would work to get CTL into the FreeBSD tree.

Some CTL features:

 - Disk and processor device emulation.
 - Tagged queueing
 - SCSI task attribute support (ordered, head of queue, simple tags)
 - SCSI implicit command ordering support.  (e.g. if a read follows a mode
   select, the read will be blocked until the mode select completes.)
 - Full task management support (abort, LUN reset, target reset, etc.)
 - Support for multiple ports
 - Support for multiple simultaneous initiators
 - Support for multiple simultaneous backing stores
 - Persistent reservation support
 - Mode sense/select support
 - Error injection support
 - High Availability support (1)
 - All I/O handled in-kernel, no userland context switch overhead.

(1) HA Support is just an API stub, and needs much more to be fully
    functional.

ctl.c:			The core of CTL.  Command handlers and processing,
			character driver, and HA support are here.

ctl.h:			Basic function declarations and data structures.

ctl_backend.c,
ctl_backend.h:		The basic CTL backend API.

ctl_backend_block.c,
ctl_backend_block.h:	The block and file backend.  This allows for using
			a disk or a file as the backing store for a LUN.
			Multiple threads are started to do I/O to the
			backing device, primarily because the VFS API
			requires that to get any concurrency.

ctl_backend_ramdisk.c:	A "fake" ramdisk backend.  It only allocates a
			small amount of memory to act as a source and sink
			for reads and writes from an initiator.  Therefore
			it cannot be used for any real data, but it can be
			used to test for throughput.  It can also be used
			to test initiators' support for extremely large LUNs.

ctl_cmd_table.c:	This is a table with all 256 possible SCSI opcodes,
			and command handler functions defined for supported
			opcodes.

ctl_debug.h:		Debugging support.

ctl_error.c,
ctl_error.h:		CTL-specific wrappers around the CAM sense building
			functions.

ctl_frontend.c,
ctl_frontend.h:		These files define the basic CTL frontend port API.

ctl_frontend_cam_sim.c:	This is a CTL frontend port that is also a CAM SIM.
			This frontend allows for using CTL without any
			target-capable hardware.  So any LUNs you create in
			CTL are visible in CAM via this port.

ctl_frontend_internal.c,
ctl_frontend_internal.h:
			This is a frontend port written for Copan to do
			some system-specific tasks that required sending
			commands into CTL from inside the kernel.  This
			isn't entirely relevant to FreeBSD in general,
			but can perhaps be repurposed.

ctl_ha.h:		This is a stubbed-out High Availability API.  Much
			more is needed for full HA support.  See the
			comments in the header and the description of what
			is needed in the README.ctl.txt file for more
			details.

ctl_io.h:		This defines most of the core CTL I/O structures.
			union ctl_io is conceptually very similar to CAM's
			union ccb.

ctl_ioctl.h:		This defines all ioctls available through the CTL
			character device, and the data structures needed
			for those ioctls.

ctl_mem_pool.c,
ctl_mem_pool.h:		Generic memory pool implementation used by the
			internal frontend.

ctl_private.h:		Private data structres (e.g. CTL softc) and
			function prototypes.  This also includes the SCSI
			vendor and product names used by CTL.

ctl_scsi_all.c,
ctl_scsi_all.h:		CTL wrappers around CAM sense printing functions.

ctl_ser_table.c:	Command serialization table.  This defines what
			happens when one type of command is followed by
			another type of command.

ctl_util.c,
ctl_util.h:		CTL utility functions, primarily designed to be
			used from userland.  See ctladm for the primary
			consumer of these functions.  These include CDB
			building functions.

scsi_ctl.c:		CAM target peripheral driver and CTL frontend port.
			This is the path into CTL for commands from
			target-capable hardware/SIMs.

README.ctl.txt:		CTL code features, roadmap, to-do list.

usr.sbin/Makefile:	Add ctladm.

ctladm/Makefile,
ctladm/ctladm.8,
ctladm/ctladm.c,
ctladm/ctladm.h,
ctladm/util.c:		ctladm(8) is the CTL management utility.
			It fills a role similar to camcontrol(8).
			It allow configuring LUNs, issuing commands,
			injecting errors and various other control
			functions.

usr.bin/Makefile:	Add ctlstat.

ctlstat/Makefile
ctlstat/ctlstat.8,
ctlstat/ctlstat.c:	ctlstat(8) fills a role similar to iostat(8).
			It reports I/O statistics for CTL.

sys/conf/files:		Add CTL files.

sys/conf/NOTES:		Add device ctl.

sys/cam/scsi_all.h:	To conform to more recent specs, the inquiry CDB
			length field is now 2 bytes long.

			Add several mode page definitions for CTL.

sys/cam/scsi_all.c:	Handle the new 2 byte inquiry length.

sys/dev/ciss/ciss.c,
sys/dev/ata/atapi-cam.c,
sys/cam/scsi/scsi_targ_bh.c,
scsi_target/scsi_cmds.c,
mlxcontrol/interface.c:	Update for 2 byte inquiry length field.

scsi_da.h:		Add versions of the format and rigid disk pages
			that are in a more reasonable format for CTL.

amd64/conf/GENERIC,
i386/conf/GENERIC,
ia64/conf/GENERIC,
sparc64/conf/GENERIC:	Add device ctl.

i386/conf/PAE:		The CTL frontend SIM at least does not compile
			cleanly on PAE.

Sponsored by:	Copan Systems, SGI and Spectra Logic
MFC after:	1 month
2012-01-12 00:34:33 +00:00
Robert Watson
009d2032af Add "options CAPABILITY_MODE" and "options CAPABILITIES" to GENERIC kernel
configurations for various architectures in FreeBSD 10.x.  This allows
basic Capsicum functionality to be used in the default FreeBSD
configuration on non-embedded architectures; process descriptors are not
yet enabled by default.

MFC after:	3 months
Sponsored by:	Google, Inc
2011-12-29 22:48:36 +00:00
Alan Cox
3b03ca3bbe Eliminate vestiges of page coloring. 2011-12-15 05:07:16 +00:00
Ed Schouten
53627e400f Replace __signed by signed.
The signed keyword is an integral part of the C syntax. There's no need
to use __signed.
2011-12-13 13:38:03 +00:00
Marius Strobl
66b82ed179 Revert r225889 a bit. While it's correct that in total store order there's
no need to additionally add CPU memory barriers to the acquire variants of
atomic(9), these are documented to also include compiler memory barriers.
So add the latter, which were previously included by using membar(), back.
2011-12-03 13:51:57 +00:00
Jayachandran C.
07042bef45 Fix OF_finddevice error return value in case of FDT.
According to the open firmware standard, finddevice call has to return
a phandle with value of -1 in case of error.

This commit is to:
- Fix the FDT implementation of this interface (ofw_fdt_finddevice) to
  return (phandle_t)-1 in case of error, instead of 0 as it does now.
- Fix up the callers of OF_finddevice() to compare the return value with
  -1 instead of 0 to check for errors.
- Since phandle_t is unsigned, the return value of OF_finddevice should
  be checked with '== -1' rather than '<= 0' or '> 0', fix up these cases
  as well.

Reported by:	nwhitehorn

Reviewed by:	raj
Approved by:	raj, nwhitehorn
2011-12-02 15:24:39 +00:00
Marius Strobl
c7bba74954 Update comment. 2011-11-27 15:49:46 +00:00
Marius Strobl
abef0e6700 For sparc64 also adjust the geometry of da(4) driven disks to not overflow
the 16-bit cylinders field of the VTOC8 disk label (at around 502GB). The
geometry chosen for disks above that limit allows to use disks up to 2TB,
which is the limit of the extended VTOC8 format. The geometry used for
disks smaller than the 16-bit cylinders limit stays the same as used by
cam_calc_geometry(9) for extended translation.
Thanks to Hans-Joerg Sirtl for providing hardware for testing this change.

MFC after:	3 days
2011-11-27 15:43:40 +00:00
Marius Strobl
1fa2664f97 Move to SCHED_ULE by default. Since r226057 SCHED_ULE and sparc64 are
compatible with each other and since r227539 the last issue seen when
using SCHED_ULE is fixed. At least on UP and 2-way machines SCHED_4BSD
still performs better than SCHED_ULE, however, the optimizations done
in r225889 pretty much compensate that so there's at least no net
regression.
Thanks go to Peter Jeremy for extensive testing.
2011-11-25 17:40:01 +00:00
Marius Strobl
7266520baf Increase the CDMA sync timeout for Schizo bridges to 15 seconds as used by
OpenSolaris. One second turned out to be not enough for certain loads while
10 seconds were sufficient.
Reported by: Peter Jeremy

MFC after:	3 days
2011-11-24 23:48:22 +00:00
Marius Strobl
848e30ff51 s,KOBJMETHOD_END,DEVMETHOD_END,g in order to fully hide the explicit mention
of kobj(9) from device drivers.
2011-11-22 21:55:40 +00:00
Marius Strobl
4b7ec27007 - There's no need to overwrite the default device method with the default
one. Interestingly, these are actually the default for quite some time
  (bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9)
  since r52045) but even recently added device drivers do this unnecessarily.
  Discussed with: jhb, marcel
- While at it, use DEVMETHOD_END.
  Discussed with: jhb
- Also while at it, use __FBSDID.
2011-11-22 21:28:20 +00:00
Pawel Jakub Dawidek
1859c4740e Fix make universe. 2011-11-16 18:42:43 +00:00
Marius Strobl
1249ba5fc6 Define curthread as an inline function that loads the thread pointer
directly from g7, the pcpu pointer. This guarantees correct behavior
when the thread migrates to a different CPU.
Commit message stolen from r205431. Additional testing by Peter Jeremy.

MFC after:	3 days
2011-11-15 20:17:18 +00:00
Attilio Rao
ed1f6dc235 Introduce the option VFS_ALLOW_NONMPSAFE and turn it on by default on
all the architectures.
The option allows to mount non-MPSAFE filesystem. Without it, the
kernel will refuse to mount a non-MPSAFE filesytem.

This patch is part of the effort of killing non-MPSAFE filesystems
from the tree.

No MFC is expected for this patch.

Tested by:	gianni
Reviewed by:	kib
2011-11-08 10:18:07 +00:00
Ed Schouten
6472ac3d8a Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
Ed Schouten
d745c852be Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.
This means that their use is restricted to a single C file.
2011-11-07 06:44:47 +00:00
Marius Strobl
a9ab459b31 Add a PCI front-end to esp(4) allowing it to support AMD Am53C974 and
replace amd(4) with the former in the amd64, i386 and pc98 GENERIC kernel
configuration files. Besides duplicating functionality, amd(4), which
previously also supported the AMD Am53C974, unlike esp(4) is no longer
maintained and has accumulated enough bit rot over time to always cause
a panic during boot as long as at least one target is attached to it
(see PR 124667).

PR:		124667
Obtained from:	NetBSD (based on)
MFC after:	3 days
2011-11-01 21:26:57 +00:00
Marius Strobl
fb1b34b313 Actually, limit to 32-bit DMA for the transfer buffers as the address is
written into a 32-bit register.
2011-10-30 21:42:35 +00:00
Marius Strobl
f1ffe65cc5 Correct the DMA constraints, the LSI64854 isn't limited to 32-bit DMA. 2011-10-30 21:19:13 +00:00
Marius Strobl
5e141ae05f - Use device_t rather than the NetBSDish struct device.
- Move esp_devclass to ncr53c9x.c in order to allow different bus front-ends
  to use it.
- Use KOBJMETHOD_END.
- Remove the gl_clear_latched_intr hook as it's not needed for any of the
  chips nor the front-ends supported in FreeBSD and likely never will be.
- Correct the DMA constraints used in the SBus front-end, the LSI64854 isn't
  limited to 32-bit DMA.
- The ESP200 also only supports up to 64k transfers.
- Don't let the DMA and SBus front-end supply a maximum transfer size larger
  than MAXPHYS as that's the maximum the upper layers use and we otherwise
  just waste resources unnecessarily.
- Initialize the ECB callout and don't zero the handle when returning ECBs
  to the free list so that ncr53c9x_callout() actually is called with the
  driver lock held.
- On detach the driver lock should be held across cam_sim_free() according
  to isp(4) and a panic received.
- Check the return value of NCRDMA_SETUP(), i.e. bus_dmamap_load(9), and try
  to handle failures gracefully.
- In ncr53c9x_action() replace N calls to xpt_done() in a switch with just
  one at the end.
- On XPT_PATH_INQ report "NCR" rather than "Sun" as the vendor as the former
  is somewhat more correct as well as the maximum supported transfer size via
  maxio in order to take advantage of controllers that that can handle more
  than DFLTPHYS.
- Print the number of MESSAGE (EXTENDED) rejected.
- Fix the path encoded in the multiple inclusion protection of ncr53c9xvar.h.
- Correct the DMA constraints used in the LSI64854 core to not exceed the
  maximum supported transfer size and include the boundary so we don't need
  to check on every setup of a DMA transfer.
- Let the bus DMA map callbacks do nothing in case of an error.
- Correctly handle > 64k transfers for FAS366 in the LSI64854. A new feature
  flag NCR_F_LARGEXFER was introduced so we just need to check for this one
  and not for individual controllers supporting large transfers in several
  places.
- Let the LSI64854 core load transfer buffers using BUS_DMA_NOWAIT as the
  NCR53C9x core can't handle EINPROGRESS. Due to lack of bounce buffers
  support, sparc64 doesn't actually use EINPROGRESS and likely never will,
  as an example for writing additional front-ends for the NCR53C9x core it
  makes sense to set BUS_DMA_NOWAIT anyway though.
- Some minor cleanup.
2011-10-30 21:17:42 +00:00
Ken Smith
6168545a11 Adjust the debugger options slightly. This should help me do the right
thing when changing the debugging options as part of head becoming a new
stable branch.  It may also help people who for one reason or another want
to run head but don't want it slowed down by the debugging support.

Reviewed by:	kib
2011-10-27 13:07:49 +00:00
David Schultz
a50079b7ff People porting FreeBSD to new architectures ought not have to
implement a deprecated FPU control interface in addition to the
standard one.  To make this clearer, further deprecate ieeefp.h
by not declaring the function prototypes except on architectures
that implement them already.

Currently i386 and amd64 implement the ieeefp.h interface for
compatibility, and for fp[gs]etprec(), which doesn't exist on
most other hardware.  Powerpc, sparc64, and ia64 partially implement
it and probably shouldn't, and other architectures don't implement it
at all.
2011-10-21 06:41:46 +00:00
Ken Smith
7042aba738 Add a warning about why sbp(4) is commented out so that curious folks
are forewarned they might wind up with a hole in their foot if they
decide to give it a try.

Suggested by:	dougb
2011-10-19 21:55:20 +00:00
Ken Smith
4c0ba9b742 Comment out the sbp(4) driver for architectures that support it.
As part of the 8.0-RELEASE cycle this was done in stable/8 (r199112)
but was left alone in head so people could work on fixing an issue that
caused boot failure on some motherboards.  Apparently nobody has worked
on it and we are getting reports of boot failure with the 9.0 test builds.
So this time I'll comment out the driver in head (still hoping someone
will work on it) and MFC to stable/9.

Submitted by:	Alberto Villa <avilla at FreeBSD dot org>
2011-10-18 13:45:16 +00:00
Dag-Erling Smørgrav
a417d4a46b Trace attempts to call restricted MD syscalls. 2011-10-18 07:39:27 +00:00
Marius Strobl
2b6ae84b63 Merge from NetBSD:
- Remove clause 3 and 4 from TNF licenses.
- Fix memset usage.
- Various cleanup.
- Kill caddr_t.
2011-10-15 09:29:43 +00:00
Konstantin Belousov
6bfe4c78c8 Remove unused define.
MFC after:	1 month
2011-10-07 16:09:44 +00:00
Marius Strobl
c95589317d - Use atomic operations rather than sched_lock for safely assigning pm_active
and pc_pmap for SMP. This is key to allowing adding support for SCHED_ULE.
  Thanks go to Peter Jeremy for additional testing.
- Add support for SCHED_ULE to cpu_switch().

Committed from:	201110DevSummit
2011-10-06 11:01:31 +00:00
Marius Strobl
2ab9ab1ffe Actually enable NEW_PCIB by default, missed in r225931. 2011-10-02 23:31:14 +00:00
Marius Strobl
bda8e754a1 Make sparc64 compatible with NEW_PCIB and enable it:
- Implement bus_adjust_resource() methods as far as necessary and in non-PCI
  bridge drivers as far as feasible without rototilling them.
- As NEW_PCIB does a layering violation by activating resources at layers
  above pci(4) without previously bubbling up their allocation there, move
  the assignment of bus tags and handles from the bus_alloc_resource() to
  the bus_activate_resource() methods like at least the other NEW_PCIB
  enabled architectures do. This is somewhat unfortunate as previously
  sparc64 (ab)used resource activation to indicate whether SYS_RES_MEMORY
  resources should be mapped into KVA, which is only necessary if their
  going to be accessed via the pointer returned from rman_get_virtual() but
  not for bus_space(9) as the later always uses physical access on sparc64.
  Besides wasting KVA if we always map in SYS_RES_MEMORY resources, a driver
  also may deliberately not map them in if the firmware already has done so,
  possibly in a special way. So in order to still allow a driver to decide
  whether a SYS_RES_MEMORY resource should be mapped into KVA we let it
  indicate that by calling bus_space_map(9) with BUS_SPACE_MAP_LINEAR as
  actually documented in the bus_space(9) page. This is implemented by
  allocating a separate bus tag per SYS_RES_MEMORY resource and passing the
  resource via the previously unused bus tag cookie so we later on can call
  rman_set_virtual() in sparc64_bus_mem_map(). As a side effect this now
  also allows to actually indicate that a SYS_RES_MEMORY resource should be
  mapped in as cacheable and/or read-only via BUS_SPACE_MAP_CACHEABLE and
  BUS_SPACE_MAP_READONLY respectively.
- Do some minor cleanup like taking advantage of rman_init_from_resource(),
  factor out the common part of bus tag allocation into a newly added
  sparc64_alloc_bus_tag(), hook up some missing newbus methods and replace
  some homegrown versions with the generic counterparts etc.
- While at it, let apb_attach() (which can't use the generic NEW_PCIB code
  as APB bridges just don't have the base and limit registers implemented)
  regarding the config space registers cached in pcib_softc and the SYSCTL
  reporting nodes set up.
2011-10-02 23:22:38 +00:00
Marius Strobl
1866626239 Remove obsolete macros. 2011-10-01 13:33:14 +00:00
Marius Strobl
6319597402 Nuke SUN4U #ifdef's which with the demise of sun4v no longer serve any
purpose.
2011-10-01 13:16:01 +00:00
Marius Strobl
1ef3f048e0 Also allocate space for the PIL counters. Given that no machine actually
uses IV_MAX interrupt vectors this wasn't a problem in practice though.
2011-10-01 13:11:29 +00:00
Marius Strobl
64020d8477 Re-reading the Schizo errata suggests that it's actually tolerable to
also use the streaming buffer of pre version 5/revision 2.3 hardware as
long as we stay away from context flushes (which iommu(4) so far doesn't
take advantage of). OpenSolaris does the same.
2011-10-01 00:31:30 +00:00
Marius Strobl
0224e43d7c - Add protective parentheses to macros as far as possible.
- Move {r,w,}mb() to the top of this file where they live on most of the
  other architectures.
2011-10-01 00:22:24 +00:00
Marius Strobl
fafda37b15 In total store which we use for running the kernel and all of the userland
atomic operations behave as if the were followed by a memory barrier so
there's no need to include ones in the acquire variants of atomic(9).
Removing these results a small performance improvement, specifically this
is sufficient to compensate the performance loss seen in the worldstone
benchmark seen when using SCHED_ULE instead of SCHED_4BSD.
This change is inspired by Linux even more radically doing the equivalent
thing some time ago.
Thanks go to Peter Jeremy for additional testing.
2011-10-01 00:11:03 +00:00
Marius Strobl
9a91e2aa2e Add a comment about why contrary to what once would think running all of
userland with total store order actually is appropriate.
2011-09-30 20:23:18 +00:00
Marius Strobl
ade68e910d Use the extended integer condition code when comparing 64-bit values. Given
that ATOMIC_INC_LONG currently is unused this happened to not be fatal.
2011-09-30 20:13:51 +00:00
Marius Strobl
6fd7e2b7c6 - Right-justify backslashes as suggested by style(9).
- Rename ATOMIC_INC_ULONG to ATOMIC_INC_LONG in order to be consistent with
  the names of the other macros in this file an adjust accordingly.
2011-09-30 20:06:23 +00:00
Konstantin Belousov
578113aaa3 Remove locking of the vm page queues from several pmaps, which only
protected the dirty mask updates. The dirty mask updates are handled
by atomics after the r225840.

Submitted by:	alc
Tested by:	flo (sparc64)
MFC after:	2 weeks
2011-09-28 15:01:20 +00:00
Attilio Rao
e984e612a7 It is safe to initialize locks even on early boot (and it is the same
thing all the other architectures already do) thus just initialize
kernel_pmap in pmap_bootstrap().

Reported by:	alc
Reviewed by:	alc, marius
Tested by:	flo, marius
Approved by:	re (kib)
MFC after:	1 week
2011-09-19 18:29:15 +00:00
Kip Macy
8451d0dd78 In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
Christian Brueffer
b48f7c4c8d Fix a zyd(4) comment typo that was copy+pasted into most kernel config files.
PR:		160276
Submitted by:	MATSUMIYA Ryo <matsumiya@mma.club.uec.ac.jp>
Approved by:	re (kib)
MFC after:	1 week
2011-09-11 17:39:51 +00:00
Konstantin Belousov
26ccf4f10f Inline the syscallenter() and syscallret(). This reduces the time measured
by the syscall entry speed microbenchmarks by ~10% on amd64.

Submitted by:	jhb
Approved by:	re (bz)
MFC after:	2 weeks
2011-09-11 16:05:09 +00:00
Konstantin Belousov
3407fefef6 Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.

Document the changes to flags field to only require the page lock.

Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.

Reviewed by:    alc, attilio
Tested by:      marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by:	re (bz)
2011-09-06 10:30:11 +00:00
Marius Strobl
c4a2a39004 Since r221218 rman_manage_region(9) actually honors rm_start and rm_end
which may cause problems when these contain garbage so zero the range
descriptors embedding the rmans when allocating them.

Approved by:	re (kib)
MFC after:	3 days
2011-08-28 11:49:53 +00:00
Konstantin Belousov
d98d0ce27a - Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag
to VPO_UNMANAGED (and also making the flag protected by the vm object
  lock, instead of vm page queue lock).
- Mark the fake pages with both PG_FICTITIOUS (as it is now) and
  VPO_UNMANAGED. As a consequence, pmap code now can use use just
  VPO_UNMANAGED to decide whether the page is unmanaged.

Reviewed by:	alc
Tested by:	pho (x86, previous version), marius (sparc64),
    marcel (arm, ia64, powerpc), ray (mips)
Sponsored by:	The FreeBSD Foundation
Approved by:	re (bz)
2011-08-09 21:01:36 +00:00
Rick Macklem
88c037e26a Change all the sample kernel configurations to use
NFSCL, NFSD instead of NFSCLIENT, NFSSERVER since
NFSCL and NFSD are now the defaults. The client change is
needed for diskless configurations, so that the root
mount works for fstype nfs.
Reported by seanbru at yahoo-inc.com for i386/XEN.

Approved by:	re (hrs)
2011-08-07 20:16:46 +00:00
Marius Strobl
1fc2c55253 - Merge from r147740:
When the last, possibly partially filled buffer is flushed, we didn't
  reset fragsz to 0 and as such would stop reflecting reality.
- Use __FBSDID.
- Wrap a too long line.

Approved by:	re (kib)
MFC after:	1 week
2011-08-06 17:45:52 +00:00
Marius Strobl
d16ea2be4e Remove a shortcut which is invalid with MAXCPU > IDR_CHEETAH_MAX_BN_PAIRS.
Approved by:	re (kib)
2011-08-06 17:45:11 +00:00
Marius Strobl
1b57ae60a7 Merge from r224217:
Bump MAXCPU to 64.

Approved by:	re (kib)
2011-07-20 18:51:18 +00:00
Attilio Rao
732772c701 On 64 bit architectures size_t is 8 bytes, thus it should use an 8 bytes
storage.
Fix the sintrcnt/sintrnames specification.

No MFC is previewed for this patch.

Reported, reviewed and tested by:	marcel
Approved by:	re (kib)
2011-07-19 12:41:57 +00:00
Attilio Rao
68b739cd6f Add the possibility to specify from kernel configs MAXCPU value.
This patch is going to help in cases like mips flavours where you
want a more granular support on MAXCPU.

No MFC is previewed for this patch.

Tested by:	pluknet
Approved by:	re (kib)
2011-07-19 00:37:24 +00:00
Attilio Rao
521ea19d1c - Remove the eintrcnt/eintrnames usage and introduce the concept of
sintrcnt/sintrnames which are symbols containing the size of the 2
  tables.
- For amd64/i386 remove the storage of intr* stuff from assembly files.
  This area can be widely improved by applying the same to other
  architectures and likely finding an unified approach among them and
  move the whole code to be MI. More work in this area is expected to
  happen fairly soon.

No MFC is previewed for this patch.

Tested by:	pluknet
Reviewed by:	jhb
Approved by:	re (kib)
2011-07-18 15:19:40 +00:00
Marius Strobl
da0fad6a08 Remove NULL assignments which are redundant for static timecounters.
Submitted by:	jkim
2011-07-12 18:10:56 +00:00
Marius Strobl
4a1fe7fa09 - Remove redundant timecounter masking from counter_get_timecount().
- Zero the timecounter when allocation so we don't need to initialize unused
  members and remove a now redundant NULL assignment.

Submitted by:	jkim
2011-07-12 18:02:37 +00:00
Marius Strobl
fed20d2081 - Current testing shows that (ab)using the JBC performance counter in bus
cycle mode as timecounter just works fine. My best guess is that a firmware
  update has fixed this, check at run-time whether it advances and use a
  positive quality if it does. The latter will cause this timecounter to be
  used instead of the tick counter based one, which just sucks for SMP.
- Remove a redundant NULL assignment from the timecounter initialization.
2011-07-12 17:56:42 +00:00
Marius Strobl
bd1c8dd51b - Add a missing shift in schizo_get_timecount(). This happened to be non-fatal
as STX_CTRL_PERF_CNT_CNT0_SHIFT actually is zero, if we were using the
  second counter in the upper 32 bits this would be required though as the MI
  timecounter code doesn't support 64-bit counters/counter registers.
- Remove a redundant NULL assignment from the timecounter initialization.
2011-07-12 17:55:34 +00:00
Marius Strobl
410cde006a Remove the IDR_CHEETAH_MAX_BN_PAIRS limit from cheetah_ipi_selected().
This is just a simple approach. For reasons unknown OpenSolaris uses a
more sophisticated one involving IPIing the remaining CPUs in reverse
order after the first batch of 32.
2011-07-05 20:05:06 +00:00
Marius Strobl
6df19902a5 It can be useful to know which page still has mappings. 2011-07-05 18:55:56 +00:00
Marius Strobl
0e5b645f76 - pmap_cache_remove() and pmap_protect_tte() are only used within pmap.c
so static'ize them.
- Correct a typo.
2011-07-05 18:50:40 +00:00
Marius Strobl
63db5ba435 In pmap_remove_all() assert that the page is neither fictitious nor
unmanaged as also done on other architectures.

Reviewed by:	alc
2011-07-05 18:46:19 +00:00
Marius Strobl
2e569926f8 Call pmap_qremove() before freeing or unwiring the pages, otherwise
there's a window during which a page can be re-used before its previous
mapping is removed.

Reviewed by:	alc
MFC after:	1 week
2011-07-05 18:40:37 +00:00
Attilio Rao
470107b2f1 MFC 2011-07-04 11:13:00 +00:00
Marius Strobl
df41287464 UltraSPARC-IV CPUs seem to be affected by a not publicly documented
erratum causing them to trigger stray vector interrupts accompanied by a
state in which they even fault on locked TLB entries. Just retrying the
instruction in that case gets the CPU back on track though. OpenSolaris
also just ignores a certain number of stray vector interrupts.
While at it, implement the stray vector interrupt handling for SPARC64-VI
which use these for indicating uncorrectable errors in interrupt packets.
2011-07-02 12:56:03 +00:00
Marius Strobl
c70f826b25 Don't waste a delay slot. 2011-07-02 11:46:23 +00:00
Marius Strobl
4a35efc720 - For Cheetah- and Zeus-class CPUs don't flush all unlocked entries from
the TLBs in order to get rid of the user mappings but instead traverse
  them an flush only the latter like we also do for the Spitfire-class.
  Also flushing the unlocked kernel entries can cause instant faults which
  when called from within cpu_switch() are handled with the scheduler lock
  held which in turn can cause timeouts on the acquisition of the lock by
  other CPUs. This was easily seen with a 16-core V890 but occasionally
  also happened with 2-way machines.
  While at it, move the SPARC64-V support code entirely to zeus.c. This
  causes a little bit of duplication but is less confusing than partially
  using Cheetah-class bits for these.
- For SPARC64-V ensure that 4-Mbyte page entries are stored in the 1024-
  entry, 2-way set associative TLB.
- In {d,i}tlb_get_data_sun4u() turn off the interrupts in order to ensure
  that ASI_{D,I}TLB_DATA_ACCESS_REG actually are read twice back-to-back.

Tested by:      Peter Jeremy (16-core US-IV), Michael Moll (2-way SPARC64-V)
2011-07-02 11:14:54 +00:00
Marius Strobl
80006832f6 Using .comm to declare intrnames and eintrnames causes binutils 2.17.50 to
merge the two.
2011-07-02 10:17:26 +00:00
Jonathan Anderson
12bc222e57 Add some checks to ensure that Capsicum is behaving correctly, and add some
more explicit comments about what's going on and what future maintainers
need to do when e.g. adding a new operation to a sys_machdep.c.

Approved by: mentor(rwatson), re(bz)
2011-06-30 10:56:02 +00:00
Attilio Rao
9b571ec6b3 MFC 2011-06-22 19:42:32 +00:00
Marius Strobl
915d84ba38 Fix whitespace 2011-06-21 20:50:55 +00:00
Marius Strobl
0e3d1b3853 On machines where we don't need to lock the kernel TSB into the dTLB and
thus may basically use the entire 64-bit kernel address space reduce
VM_KMEM_SIZE_SCALE to 1 allowing kernel to use more memory.
2011-06-21 20:48:14 +00:00
Marius Strobl
7cdfb4e8f2 On machines where we don't need to lock the kernel TSB into the dTLB and
thus may basically use the entire 64-bit kernel address space increase
the kernel virtual memory to not be limited by VM_KMEM_SIZE_MAX.
2011-06-21 20:47:03 +00:00
Attilio Rao
49ea5c076c MFC 2011-06-21 09:09:53 +00:00
Marius Strobl
6308e06cf1 As astopgap minimize the sched_lock coverage in pmap_activate() in order
to reduce lock contention.
2011-06-20 21:36:53 +00:00
Marius Strobl
207f858338 - Remove MD usage of pc_cpumask and pc_other_cpus. [1]
- Remove CTASSERTs which no longer need to hold since r222813.

Submitted by:	attilio [1]
2011-06-20 21:31:01 +00:00
Attilio Rao
5519971c21 MFC 2011-06-19 14:22:35 +00:00
Marius Strobl
a2f43b6155 - As with stray vector interrupts limit the reporting of stray level
interrupts. Bringup on additional machine models repeatedly reveals
  firmware that enables interrupts behind our back, causing the console
  to be flooded otherwise.
- As with the regular interrupt counters using uint16_t instead of
  u_long for counting the stray vector interrupts should be more than
  sufficient.
- Cache the interrupt vector in intr_stray_vector().
2011-06-18 11:27:44 +00:00
Attilio Rao
8a9ce51786 Remove entirely pc_other_cpus usage and pc_cpumask usage from sparc64.
Tested and reviewed by:	marius
2011-06-16 07:25:53 +00:00
Marius Strobl
82f131f39b Don't include curcpu in the mask which is used as the IPI cookie as we
have to ignore it when sending the IPI anyway. Actually I can't think of
a good reason why this ever was done that way in the first place as it's
not even usefull for debugging.
While at it replace the use of pc_other_cpus as it's slated for deorbit.
2011-06-15 22:41:55 +00:00
Marius Strobl
2ba56f4d23 - Merge r222980 from x86: add sound(4) and common device drivers.
- Fix whitespace.
2011-06-13 12:45:19 +00:00
Marius Strobl
ab267f9dbf - For the case when tl1_align(_trap) is used to call rsf_fatal via
RSF_FATAL we need to switch to alternate globals for KSTACK_CHECK just
  like tl1_data_excptn(_trap) does. This is more or less cosmetic because
  in case RSF_FATAL is called we're already heading south.
- Correct an END().
- Read the window state from the correct register for a CATR().
2011-06-07 23:15:21 +00:00
Marius Strobl
c40847145b Adapt CATR() to r222813. This is somewhat tricky as we can't afford using
more than three temporary register in several places CATR() is used so
this code trades instructions in for registers. Actually, this still isn't
sufficient and CATR() has the side-effect of clobbering %y. Luckily, with
the current uses of CATR() this either doesn't matter or we are able to
(save and) restore it.
Now that there's only one use of AND() and TEST() left inline these.
2011-06-07 17:33:39 +00:00
Marius Strobl
3bd5692b1f Fix a problem with r222813; given that we may only operate on interrupt
globals here but clobber %y save and restore the latter.
2011-06-07 17:19:14 +00:00
Attilio Rao
61b926921f MFC 2011-05-31 21:22:44 +00:00
Attilio Rao
e370959707 Fix KTR_CPUMASK in order to accept a string representing a cpuset_t.
This introduce all the underlying support for making this possible (via
the function cpusetobj_strscan() and keeps ktr_cpumask exported.  sparc64
implements its own assembly primitives for tracing events and needs to
properly check it.  Anyway the sparc64 logic is not implemented yet due
to lack of knowledge (by me) and time (by marius), but it is just a
matter of using ktr_cpumask when possible.

Tested and fixed by:	pluknet
Reviewed by:		marius
2011-05-31 20:48:58 +00:00
Attilio Rao
d0984adc98 Revert a change that crept in during MFC. 2011-05-31 20:23:33 +00:00
Nathan Whitehorn
d098f93019 On multi-core, multi-threaded PPC systems, it is important that the threads
be brought up in the order they are enumerated in the device tree (in
particular, that thread 0 on each core be brought up first). The SLIST
through which we loop to start the CPUs has all of its entries added with
SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration
and so AP startup would always fail in such situations (causing a machine
check or RTAS failure). Fix this by changing the SLIST into an STAILQ,
and inserting new CPUs at the end.

Reviewed by:	jhb
2011-05-31 15:11:43 +00:00
Attilio Rao
5b6ea0b538 MFC 2011-05-31 14:18:10 +00:00
Attilio Rao
217e1c0ebc Revert a patch that unvolountary sneaked in while I was MFCing. 2011-05-23 23:50:21 +00:00
Attilio Rao
a9ff18a210 MFC 2011-05-23 01:17:30 +00:00
Attilio Rao
447274a88b MFC 2011-05-15 15:47:16 +00:00
Marius Strobl
3bb1fd1bc4 Recognize the eeprom device found in Fujitsu PRIMEPOWER650 and 900. 2011-05-15 13:25:26 +00:00
Marius Strobl
93c57e311e Fix yet another inversion in the logic by applying the x86 version of this,
which avoids CPU_EMPTY() in the first place.
Do I get a beer or something for every inversion I find?
2011-05-14 23:20:14 +00:00
Attilio Rao
e0c109e8c1 MFC 2011-05-14 02:28:26 +00:00
Attilio Rao
4b547324c0 Disconnect sun4v architecture from the three.
Some files keep the SUN4V tags as a code reference, for the future,
if any rewamped sun4v support wants to be added again.

Reviewed by:	marius
Tested by:	sbruno
Approved by:	re
2011-05-14 01:53:38 +00:00
Attilio Rao
b2aa562e7b MFC 2011-05-13 20:58:48 +00:00
Matthew D Fleming
cfb00e5aa7 Move the ZERO_REGION_SIZE to a machine-dependent file, as on many
architectures (i386, for example) the virtual memory space may be
constrained enough that 2MB is a large chunk.  Use 64K for arches
other than amd64 and ia64, with special handling for sparc64 due to
differing hardware.

Also commit the comment changes to kmem_init_zero_region() that I
missed due to not saving the file.  (Darn the unfamiliar development
environment).

Arch maintainers, please feel free to adjust ZERO_REGION_SIZE as you
see fit.

Requested by:	alc
MFC after:	1 week
MFC with:	r221853
2011-05-13 19:35:01 +00:00
Marius Strobl
79898bbecd When setting up pc_other_cpus for APs based on pc_allcpu clear pc_cpuid
in the former rather than the latter.
This gets this branch working on at least Jalapeno-class CPUs.
2011-05-13 15:21:31 +00:00
Attilio Rao
ef607a6aa3 MFC 2011-05-12 14:01:40 +00:00
Marius Strobl
717b08036f Update for the fact that the first members of the IPI args structures and
pc_cpumask were changed to cpuset_t. This now calculates the cpumask based
on pc_cpuid itself as pc_cpumask is slated for being deorbited. Note that
this needs r221750 to be MFC'ed in order to compile.
This seems to work fine but after a few dozens of successful IPIs something
suddenly adds pc_cpuid to pc_other_cpus, causing the respective assertions
in mp_machdep.c to be triggered when the latter is used as the base for the
targets.
2011-05-12 09:29:24 +00:00
Marius Strobl
0fd4b3388e The ita_mask should include curcpu but the cpuset passed to cpu_ipi_selected()
must not, otherwise we tell the CPU to IPI itself, which the sun4u CPUs don't
support. For reasons unknown so far MD and MI IPI use actually still triggers
that assertion though.
2011-05-11 21:15:12 +00:00
Marius Strobl
ed36c82cbf Update for the fact that pm_active and pc_cpumask were changed to cpuset_t.
This now calculates pc_cpumask based on pc_cpuid itself as the former is
slated for being deorbited.
This branch now at least boots UP again. MP needs more things converted and
the existing conversion from cpumask_t to cpuset_t still has bugs.
2011-05-11 21:10:43 +00:00
Marius Strobl
707b4f4479 Add an ATOMIC_CLEAR_LONG. 2011-05-10 21:18:45 +00:00
Attilio Rao
3f748615ac Fix an inversion in logic.
Submitted by:	marius
2011-05-10 18:19:56 +00:00
Attilio Rao
c813ed5c36 - Fix a typo
- Fix an inversion in the logic
2011-05-08 14:23:21 +00:00
Attilio Rao
aa8b9e0706 MFC 2011-05-06 22:45:33 +00:00
Attilio Rao
0d9fa7bd31 Add sparc64 support.
Compiled (and helped) by:	pluknet
2011-05-06 21:53:29 +00:00
John Baldwin
f9a9473702 Retire isa_setup_intr() and isa_teardown_intr() and use the generic bus
versions instead.  They were never needed as bus_generic_intr() and
bus_teardown_intr() had been changed to pass the original child device up
in 42734, but the ISA bus was not converted to new-bus until 45720.
2011-05-06 13:48:53 +00:00
John Baldwin
83c41143ca Reimplement how PCI-PCI bridges manage their I/O windows. Previously the
driver would verify that requests for child devices were confined to any
existing I/O windows, but the driver relied on the firmware to initialize
the windows and would never grow the windows for new requests.  Now the
driver actively manages the I/O windows.

This is implemented by allocating a bus resource for each I/O window from
the parent PCI bus and suballocating that resource to child devices.  The
suballocations are managed by creating an rman for each I/O window.  The
suballocated resources are mapped by passing the bus_activate_resource()
call up to the parent PCI bus.  Windows are grown when needed by using
bus_adjust_resource() to adjust the resource allocated from the parent PCI
bus.  If the adjust request succeeds, the window is adjusted and the
suballocation request for the child device is retried.

When growing a window, the rman_first_free_region() and
rman_last_free_region() routines are used to determine if the front or
end of the existing I/O window is free.  From using that, the smallest
ranges that need to be added to either the front or back of the window
are computed.  The driver will first try to grow the window in whichever
direction requires the smallest growth first followed by the other
direction if that fails.

Subtractive bridges will first attempt to satisfy requests for child
resources from I/O windows (including attempts to grow the windows).  If
that fails, the request is passed up to the parent PCI bus directly
however.

The PCI-PCI bridge driver will try to use firmware-assigned ranges for
child BARs first and only allocate a "fresh" range if that specific range
cannot be accommodated in the I/O window.  This allows systems where the
firmware assigns resources during boot but later wipes the I/O windows
(some ACPI BIOSen are known to do this) to "rediscover" the original I/O
window ranges.

The ACPI Host-PCI bridge driver has been adjusted to correctly honor
hw.acpi.host_mem_start and the I/O port equivalent when a PCI-PCI bridge
makes a wildcard request for an I/O window range.

The new PCI-PCI bridge driver is only enabled if the NEW_PCIB kernel option
is enabled.  This is a transition aide to allow platforms that do not
yet support bus_activate_resource() and bus_adjust_resource() in their
Host-PCI bridge drivers (and possibly other drivers as needed) to use the
old driver for now.  Once all platforms support the new driver, the
kernel option and old driver will be removed.

PR:		kern/143874 kern/149306
Tested by:	mav
2011-05-03 17:37:24 +00:00
Rick Macklem
4309e17add This patch changes head so that the default NFS client is now the new
NFS client (which I guess is no longer experimental). The fstype "newnfs"
is now "nfs" and the regular/old NFS client is now fstype "oldnfs".
Although mounts via fstype "nfs" will usually work without userland
changes, an updated mount_nfs(8) binary is needed for kernels built with
"options NFSCL" but not "options NFSCLIENT". Updated mount_nfs(8) and
mount(8) binaries are needed to do mounts for fstype "oldnfs".
The GENERIC kernel configs have been changed to use options
NFSCL and NFSD (the new client and server) instead of NFSCLIENT and NFSSERVER.
For kernels being used on diskless NFS root systems, "options NFSCL"
must be in the kernel config.
Discussed on freebsd-fs@.
2011-04-27 17:51:51 +00:00
Alexander Motin
97b53e3634 Switch the GENERIC kernels for all architectures to the new CAM-based ATA
stack. It means that all legacy ATA drivers are disabled and replaced by
respective CAM drivers. If you are using ATA device names in /etc/fstab or
other places, make sure to update them respectively (adX -> adaY,
acdX -> cdY, afdX -> daY, astX -> saY, where 'Y's are the sequential
numbers for each type in order of detection, unless configured otherwise
with tunables, see cam(4)).

ataraid(4) functionality is now supported by the RAID GEOM class.
To use it you can load geom_raid kernel module and use graid(8) tool
for management. Instead of /dev/arX device names, use /dev/raid/rX.
2011-04-24 08:58:58 +00:00
Marius Strobl
39272630aa Correct spelling in comments.
Submitted by:	brucec
2011-04-22 09:31:40 +00:00
Marius Strobl
eabaaab07c - Use the streaming cache unless BUS_DMA_COHERENT is specified. Since
r220375 all drivers enabled in the sparc64 GENERIC should be either
  correctly using bus_dmamap_sync(9) calls or supply BUS_DMA_COHERENT
  when appropriate or as a workaround for missing bus_dmamap_sync(9)
  calls (sound(4) drivers and partially sym(4)). In at least some
  configurations taking advantage of the streaming cache results in
  a modest performance improvement.
- Remove the memory barrier for BUS_DMASYNC_PREREAD which as the
  comment already suggested is bogus.
- Add my copyright for having implemented several things like support
  for the Fire and Oberon IOMMUs, taking over PROM IOMMU mappings etc.
2011-04-21 21:56:28 +00:00
Adrian Chadd
dba9c85977 Break out the ath PCI logic into a separate device/module.
Introduce the AHB glue for Atheros embedded systems. Right now it's
hard-coded for the AR9130 chip whose support isn't yet in this HAL;
it'll be added in a subsequent commit.

Kernel configuration files now need both 'ath' and 'ath_pci' devices; both
modules need to be loaded for the ath device to work.
2011-03-31 08:07:13 +00:00
Marius Strobl
301afbce99 Allocate memory for a DMA method table only in case we need to override
the iommu(4) provided one, i.e. in case of Hummingbird and Sabre bridges,
otherwise just use the iommu(4) one. This also fixes a bug introduced in
r220039 which caused an empty DMA method table to be used for the second
of a pair of Psycho bridges.
2011-03-29 19:48:03 +00:00
Marius Strobl
22508b0153 - A closer inspection of the OpenSolaris code indicates that the DMA
syncing for Hummingbird and Sabre bridges should be applied with every
  BUS_DMASYNC_POSTREAD instead of in a wrapper around interrupt handlers
  for devices behind PCI-PCI bridges only as suggested by the documentation
  (code for the latter actually exists in OpenSolaris but is disabled by
  default), which also makes more sense.
- Take advantage of the ofw_pci_setup_device method introduced in r220038
  for disabling bus parking for certain EBus bridges in order to
- Mark some unused parameters as such.
2011-03-26 16:52:31 +00:00
Marius Strobl
b09c4bd47a - Merge the *_SET macros from fire(4) which generally print out the
register changes when compiled with SCHIZO_DEBUG and take advantage
  of them.
- Add support for the XMITS Fireplane/Safari to PCI-X bridges. I tought
  I'd need this for a Sun Fire 3800, which then turned out to not being
  equipped with such a bridge though. The support for these should be
  complete but given that it hasn't actually been tested probing is
  disabled for now.
  This required a way to alter the XMITS configuration in case a PCI-X
  device is found further down the device tree so the sparc64 specific
  ofw_pci kobj was revived with a ofw_pci_setup_device method, which is
  called by the ofw_pcibus code for every device added.
- A closer inspection of the OpenSolaris code indicates that consistent
  DMA flushing/syncing as well as the block store workaround should be
  applied with every BUS_DMASYNC_POSTREAD instead of in a wrapper around
  interrupt handlers for devices behind PCI-PCI bridges only as suggested
  by the documentation (code for the latter actually exists in OpenSolaris
  but is disabled by default), which also makes more sense.
- Add a workaround for Casinni/Skyhawk combinations. Chances are that
  this solves the crashes seen when using the the on-board Casinni NICs
  of Sun Fire V480 equipped with centerplanes other than 501-6780 or
  501-6790. This also takes advantage of the ofw_pci_setup_device method.
- Mark some unused parameters as such.
2011-03-26 16:49:12 +00:00
Marius Strobl
05bff80a71 - Make a panic message better reflect the actual problem.
- A closer inspection of the OpenSolaris code indicates the block store
  workaround is only necessary in case of BUS_DMASYNC_POSTREAD.
- Mark some unused parameters as such.
2011-03-19 20:36:05 +00:00
Marius Strobl
6d8b3c2f9f On Serengeti-class machines the OFW root isn't the parent of the CPU
nodes.
2011-03-19 19:39:05 +00:00
Marius Strobl
3273bf2d65 In case reading PCIR_MINGNT fails don't use it for calculating the
latency. This is more or less a theoretical problem though as it
typically indicates way bigger problems.
2011-03-19 19:30:49 +00:00
Marius Strobl
3a8a826af3 Remove the advertising clause from the UCB license according to the
July 22, 1999 addendum.
2011-03-13 13:42:43 +00:00
Marius Strobl
273fb3dc7c Sync licenses and the corresponding RCS IDs with NetBSD, mainly switching
the licenses of Matthew R. Green and the TNF to 2-clause.

Obtained from:	NetBSD
2011-03-12 14:33:32 +00:00
Marius Strobl
080ca1a51b - Add support for TLS relocations.
- Emitt an error when encountering an unsupported and in case of the
  kernel also for unaligned relocations.
- Fix R_SPARC_LOX10 relocations. Apparently these are hardly ever used.
2011-03-11 21:08:02 +00:00
Marius Strobl
cb32ba5229 - Remove clause 3 and 4 from TNF licenses. [1]
- Add the _RF_X committed in r212998 also to the tables in the sparc64
  reloc.c in order reduce differences between the kernel and the userland
  source. This results in no functional change though.
- Fix further inconsistencies in the abbreviations of the names of the
  relocations.
- Further whitespace fixes.

Obtained from:	NetBSD [1]
2011-03-11 20:30:58 +00:00
Marius Strobl
09b64d66b4 Revert the binutils workaround committed in r219340, the underlying
problem has been fixed in r219530.
2011-03-11 20:01:57 +00:00
Matthew D Fleming
c77715ef6c Mostly revert r219468, as I had misremembered the C standard regarding
the size of an extern array.

Keep one change from strncpy to strlcpy.
2011-03-11 18:56:55 +00:00
Matthew D Fleming
cd67ac41ae Use MAXPATHLEN rather than the size of an extern array when copying the
kernel name.  Also consistenly use strlcpy().

Suggested by:	Warner Losh
2011-03-10 22:56:00 +00:00
Julian Elischer
a8066a9d3b Add a small change to the comment in the GENRIC config files that include udbp
Submitted by:	Chris Forgron, cforgeron at acsi dot ca
MFC after:	1 week
2011-03-09 17:15:11 +00:00
Dmitry Chagin
e5d81ef1b5 Extend struct sysvec with new method sv_schedtail, which is used for an
explicit process at fork trampoline path instead of eventhadler(schedtail)
invocation for each child process.

Remove eventhandler(schedtail) code and change linux ABI to use newly added
sysvec method.

While here replace explicit comparing of module sysentvec structure with the
newly created process sysentvec to detect the linux ABI.

Discussed with:	kib

MFC after:	2 Week
2011-03-08 19:01:45 +00:00
Marius Strobl
25b31a9496 - With the addition of TLS support binutils started to make the addend
values for resolved symbols relative to relocbase instead of sections
  so detect this case and handle as appropriate, which allows using
  kernel modules linked with affected versions of binutils. Actually I
  think this is a bug in binutils but given that apparently nobody
  complained for nearly six years and powerpc has basically the same
  workaround I decided to put it in for the sparc64 kernel, too.
- Fix R_SPARC_HIX22 relocations. Apparently these are hardly ever used.
2011-03-06 15:20:11 +00:00
Marius Strobl
d374d11285 - Consistently abbreviate the names of the relocations.
- End sentences with dots.
- Fix whitespace.
2011-03-06 13:25:46 +00:00
Marius Strobl
a2c5ab4bcb Resurrect ofw_pci_if.m from r178578. 2011-02-21 21:13:18 +00:00
Rebecca Cran
6bccea7c2b Fix typos - remove duplicate "the".
PR:	bin/154928
Submitted by:	Eitan Adler <lists at eitanadler.com>
MFC after: 	3 days
2011-02-21 09:01:34 +00:00
Alan Cox
e6ffa21488 Remove pmap fields that are either unused or not fully implemented.
Discussed with:	kib
2011-02-17 15:36:29 +00:00
Marius Strobl
42b9a96080 Set td_kstack_pages for thread0. 2011-02-08 23:21:35 +00:00
Marius Strobl
a66410ec84 Take advantage of accessing the kernel TSB via ASI_ATOMIC_QUAD_LDD_PHYS
on SPARC64-V, too. Tested by: Michael Moll
2011-02-08 21:58:13 +00:00
Matthew D Fleming
08b163fa51 Put the general logic for being a CPU hog into a new function
should_yield().  Use this in various places.  Encapsulate the common
case of check-and-yield into a new function maybe_yield().

Change several checks for a magic number of iterations to use
should_yield() instead.

MFC after:	1 week
2011-02-02 16:35:10 +00:00
Sergey Kandaurov
4053b05b91 Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize.
Submitted by:	perryh pluto.rain.com (previous version)
Reviewed by:	jhb
Approved by:	kib (mentor)
Tested by:	universe
2011-01-21 10:26:26 +00:00
Konstantin Belousov
55aabb7fd1 For architectures not using direct map , and requiring real KVA page for
sf buf allocation, use wakeup() instead of wakeup_one() to notify sf
buffer waiters about free buffer.

sf_buf_alloc() calls msleep(PCATCH) when SFB_CATCH flag was given,
and for simultaneous wakeup and signal delivery, msleep() returns
EINTR/ERESTART despite the thread was selected for wakeup_one(). As
result, we loose a wakeup, and some other waiter will not be woken up.

Reported and tested by:	az
Reviewed by:	alc, jhb
MFC after:	1 week
2011-01-18 21:57:02 +00:00
Jung-uk Kim
bc35e60ec0 Remove empty dev_mem_md_init() stubs. 2011-01-17 23:06:47 +00:00
Jung-uk Kim
2fea643112 Add reader/writer lock around mem_range_attr_get() and mem_range_attr_set().
Compile sys/dev/mem/memutil.c for all supported platforms and remove now
unnecessary dev_mem_md_init().  Consistently define mem_range_softc from
mem.c for all platforms.  Add missing #include guards for machine/memdev.h
and sys/memrange.h.  Clean up some nearby style(9) nits.

MFC after:	1 month
2011-01-17 22:58:28 +00:00
Marius Strobl
db3a488fb0 In order to save instructions the MMU trap handlers assumed that the kernel
TSB is located within the 32-bit address space, which held true as long as
we were using virtual addresses magic-mapped before the location of the
kernel for addressing it. However, with r216803 in place when possible we
address it via its physical address instead, which on machines like Sun Fire
V880 have no physical memory in the 32-bit address space at all requires
to use 64-bit addressing. When using physical addressing it still should
be safe to assume that we can just ignore the lowest 10 bits of the address
as a minor optimization as we did before r216803.
2011-01-17 20:32:17 +00:00
John Baldwin
58ccf5b41c Remove unneeded includes of <sys/linker_set.h>. Other headers that use
it internally contain nested includes.

Reviewed by:	bde
2011-01-11 13:59:06 +00:00
Konstantin Belousov
50a57dfbec Move repeated MAXSLP definition from machine/vmparam.h to sys/vmmeter.h.
Update the outdated comments describing MAXSLP and the process
selection algorithm for swap out.

Comments wording and reviewed by:	alc
2011-01-09 12:50:44 +00:00
David Schultz
633bd99821 Fix the value for DECIMAL_DIG on UltraSparcs. The previous value of
35 wasn't quite big enough to ensure correct rounding for very-close-
to-halfway cases.
2011-01-09 06:05:48 +00:00
Tijl Coosemans
a56e818f29 On mixed 32/64 bit architectures (mips, powerpc) use __LP64__ rather than
architecture macros (__mips_n64, __powerpc64__) when 64 bit types (and
corresponding macros) are different from 32 bit. [1]

Correct the type of INT64_MIN, INT64_MAX and UINT64_MAX.

Define (U)INTMAX_C as an alias for (U)INT64_C matching the type definition
for (u)intmax_t. Do this on all architectures for consistency.

Suggested by:	bde [1]
Approved by:	kib (mentor)
2011-01-08 12:43:05 +00:00
Tijl Coosemans
9858863cd4 Fix types of some values in machine/_limits.h.
On some architectures UCHAR_MAX and USHRT_MAX had type unsigned int.
However, lacking integer suffixes for types smaller than int, their type
should correspond to that of an object of type unsigned char (or short)
when used in an expression with objects of type int. In that case unsigned
char (short) are promoted to int (i.e. signed) so the type of UCHAR_MAX and
USHRT_MAX should also be int.

Where MIN/MAX constants implicitly have the correct type the suffix has
been removed.

While here, correct some comments.

Reviewed by:	bde
Approved by:	kib (mentor)
2011-01-08 11:13:34 +00:00
Konstantin Belousov
39198f15ee Add AT_STACKPROT elf aux vector. Will be used to inform rtld about the
initial stack protection set by the kernel image activator.
2011-01-07 14:22:34 +00:00
Marius Strobl
c3ac38e2cb Remove an unused variable accidentally added in r216803. 2011-01-06 17:28:31 +00:00
Marius Strobl
e68f0c6425 Inherit the APB and the generic OFW PCI-PCI bridge driver from the generic
PCI-PCI bridge driver in order to safe some code.
2011-01-04 16:21:14 +00:00
Marius Strobl
f4ff513c4b Reserve INTR_MD[1-4] similarly to what BUS_DMA_BUS[1-4] are intended for
and switch sparc64 to use the first one for bus error filter handlers of
bridge drivers instead of (ab)using INTR_FAST for that so we eventually
can get rid of the latter.

Reviewed by:	jhb
MFC after:	1 month
2011-01-04 16:11:32 +00:00
Marius Strobl
f8d0071f71 Extend the section in which interrupts are disabled in the TLB demap
functions, otherwise if we get preempted after checking whether a certain
pmap is active on the current CPU but before disabling interrupts we might
operate on an outdated state as the pmap might have been deactivated in
the meantime. As the same issue may arises when the TLB demap function is
interrupted by a TLB demap IPI, just entering a critical section before
the check isn't sufficient so we have to fully disable interrupts instead.

MFC after:	3 days
2011-01-02 15:01:03 +00:00
Marius Strobl
4d05e7b184 On UltraSPARC-III+ and greater take advantage of ASI_ATOMIC_QUAD_LDD_PHYS,
which takes an physical address instead of an virtual one, for loading TTEs
of the kernel TSB so we no longer need to lock the kernel TSB into the dTLB,
which only has a very limited number of lockable dTLB slots. The net result
is that we now basically can handle a kernel TSB of any size and no longer
need to limit the kernel address space based on the number of dTLB slots
available for locked entries. Consequently, other parts of the trap handlers
now also only access the the kernel TSB via its physical address in order
to avoid nested traps, as does the PMAP bootstrap code as we haven't taken
over the trap table at that point, yet. Apart from that the kernel TSB now
is accessed via a direct mapping when we are otherwise taking advantage of
ASI_ATOMIC_QUAD_LDD_PHYS so no further code changes are needed. Most of this
is implemented by extending the patching of the TSB addresses and mask as
well as the ASIs used to load it into the trap table so the runtime overhead
of this change is rather low. Currently the use of ASI_ATOMIC_QUAD_LDD_PHYS
is not yet enabled on SPARC64 CPUs due to lack of testing and due to the
fact it might require minor adjustments there.
Theoretically it should be possible to use the same approach also for the
user TSB, which already is not locked into the dTLB, avoiding nested traps.
However, for reasons I don't understand yet OpenSolaris only does that with
SPARC64 CPUs. On the other hand I think that also addressing the user TSB
physically and thus avoiding nested traps would get us closer to sharing
this code with sun4v, which only supports trap level 0 and 1, so eventually
we could have a single kernel which runs on both sun4u and sun4v (as does
Linux and OpenBSD).

Developed at and committed from:	27C3
2010-12-29 16:59:33 +00:00
Marius Strobl
62cf53e2ea - Move the macros for generating load and store instructions to asmacros.h
so they can be shared by different source files and extend them by a
  variant for atomic compare and swap.
- Consistently use EMPTY.
2010-12-29 14:14:50 +00:00
Marius Strobl
b5b0068b4b Rename the "xor" parameter to "xorval" as the former is a reserved keyword
in C++.

Submitted by:	gahr
2010-12-29 14:11:46 +00:00
Marius Strobl
05bcfef170 Extend the hack of r182730 to trick GAS/GCC into compiling access to
STICK/STICK_COMPARE independently of the selected instruction set by
TICK_COMPARE so tick.c as of r214358 once again can be compiled with
gcc -mcpu=v9 for reference purposes.
2010-12-21 22:03:12 +00:00
Marius Strobl
3318c3ef45 Revert r216080 so kmem_map is capped at 3/5 of the currently rather modest
kernel address space in order to leave space for the buffer cache, pipes,
thread stacks, etc on machines with more physical memory until we take
advantage of ASI_ATOMIC_QUAD_LDD_PHYS on CPUs providing it so we don't need
to lock the kernel TSB pages into the dTLB, basically making the entire
64-bit kernel address space available on relevant machines.

Submitted by:	alc
2010-12-21 21:32:17 +00:00
Rebecca Cran
c90f7d9b44 Revert r216134. This checkin broke platforms where bus_space are macros:
they need to be a single statement, and do { } while (0) doesn't work in this
situation so revert until a solution can be devised.
2010-12-03 07:09:23 +00:00
Rebecca Cran
15b4888a24 Disallow passing in a count of zero bytes to the bus_space(9) functions.
Passing a count of zero on i386 and amd64 for [I386|AMD64]_BUS_SPACE_MEM
causes a crash/hang since the 'loop' instruction decrements the counter
before checking if it's zero.

PR:	kern/80980
Discussed with:	jhb
2010-12-02 22:19:30 +00:00
Max Khon
5bdabbdbf8 Change VM_KMEM_SIZE_MAX to be just (VM_MAX_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS)
Suggested by:	marius
2010-11-30 16:49:06 +00:00
Max Khon
311e93395e Define VM_KMEM_SIZE_MAX on sparc64. Otherwise kernel built with
DEBUG_MEMGUARD panics early in kmeminit() with the message
"kmem_suballoc: bad status return of 1" because of zero "size" argument
passed to kmem_suballoc() due to "vm_kmem_size_max" being zero.

The problem also exists on ia64.
2010-11-28 19:26:20 +00:00
Marius Strobl
6948a04f2c Convert drivers somehow missed in r200874 to multipass probing. 2010-11-15 21:58:10 +00:00
Alan Cox
2cf36c8f67 Enable reservation-based physical memory allocation. Even without the
creation of large page mappings in the pmap, it can provide modest
performance benefits.  In particular, for a "buildworld" on a 2x 1GHz
Ultrasparc IIIi it reduced the wall clock time by 2.2% and the system
time by 12.6%.

Tested by:	marius@
2010-11-10 17:57:34 +00:00
John Baldwin
961135ead8 - Remove <machine/mutex.h>. Most of the headers were empty, and the
contents of the ones that were not empty were stale and unused.
- Now that <machine/mutex.h> no longer exists, there is no need to allow it
  to override various helper macros in <sys/mutex.h>.
- Rename various helper macros for low-level operations on mutexes to live
  in the _mtx_* or __mtx_* namespaces.  While here, change the names to more
  closely match the real API functions they are backing.
- Drop support for including <sys/mutex.h> in assembly source files.

Suggested by:	bde (1, 2)
2010-11-09 20:46:41 +00:00
Marius Strobl
a1cc524045 Implement pmap_is_prefaultable().
Reviewed by:	alc (with bugfix)
2010-11-06 13:58:24 +00:00
John Baldwin
0108cce0a4 Adjust the order of operations in spinlock_enter() and spinlock_exit() to
work properly with single-stepping in a kernel debugger.  Specifically,
these routines have always disabled interrupts before increasing the nesting
count and restored the prior state of interrupts after decreasing the nesting
count to avoid problems with a nested interrupt not disabling interrupts
when acquiring a spin lock.  However, trap interrupts for single-stepping
can still occur even when interrupts are disabled.  Now the saved state of
interrupts is not saved in the thread until after interrupts have been
disabled and the nesting count has been increased.  Similarly, the saved
state from the thread cannot be read once the nesting count has been
decreased to zero.  To fix this, use temporary variables to store interrupt
state and shuffle it between the thread's MD area and the appropriate
registers.

In cooperation with:	bde
MFC after:     1 month
2010-11-05 13:42:58 +00:00
Marius Strobl
0da4045955 - When resetting pm_active and pm_context of a pmap in pmap_pinit() we
need locking as otherwise we may race against the other parts of the
  MD code which expects a consistent state of these. While at it move
  the resetting of the pmap before entering it in the TSB.
- Spell a 0 as TLB_CTX_KERNEL.
2010-10-29 20:51:30 +00:00
Marius Strobl
36c7255a81 - Given that in one-shot mode tick_et_start() also is called frequently
introduce function pointers once set up to the respective implementation
  for reading the (S)TICK and writing the (S)STICK_COMPARE registers as a
  compromise between duplicating code and selecting between different
  implementations during execution over and over again, similar to what is
  done elsewhere in the MD in order to support different CPU models that
  won't ever change at runtime.
- In the remaining tick interrupt handler further push down disabling of
  interrupts to the periodic case as it isn't necessary here in one-shot
  mode at all.
2010-10-25 20:52:33 +00:00
Marius Strobl
10c2bb0a10 - Wrap exchanging td_intr_frame and calling the event timer callback in
a critical section as apparently required by both. I don't think either
  belongs in the event timer front-ends but the callback should handle
  this as necessary instead just like for example intr_event_handle()
  does but this is how the other architectures currently handle it, either
  explicitly or implicitly.
- Further rename and reword references to hardclock as this front-end no
  longer has a notion of actually calling it.
2010-10-19 19:44:05 +00:00
Marius Strobl
17f3c8f1e3 - In oneshot-mode it doesn't make sense to try to compensate the clock
drift in order to achieve a more stable clock as the tick intervals may
  vary in the first place. In fact I haven't seen this code kick in when
  in oneshot-mode so just skip it in that case.
- There's no need to explicitly stop the (S)TICK counter in oneshot-mode
  with every tick as it just won't trigger again with the (S)TICK compare
  register set to a value in the past (with a wrap-around once every ~195
  years of uptime at 1.5 GHz this isn't something we have to worry about
  in practice).
- Given that we'll disable interrupts completely anyway there's no
  need to enter critical sections.
2010-10-17 16:46:54 +00:00
Marius Strobl
15272ec763 Explicitly lower the PIL to 0 as part of enabling interrupts, similar to
what is done on other platforms. Unlike as with the sched_throw(NULL)
called on BSPs during their startup apparently there's nothing which will
reliably lower it on APs. I'm unsure why this only came up on V215 though,
breaking these with r207248. My best guess is that these are the only
supported ones so far fast enough to loose some race.

PR:		151404
MFC after:	3 days
2010-10-14 21:46:53 +00:00
Marius Strobl
45c347bed0 - In the spirit of r212559 add a comment describing what will eventually
lower the PIL.
- Just as with the AP ensure that the (S)TICK timer(s) are in a known
  state when starting BSPs.
2010-10-14 21:34:53 +00:00
Marius Strobl
1fe259cdd8 In the replacement text of the __bswapN_const() macros cast the argument
to the expected type so they work like the corresponding __bswapN_var()
functions and the compiler doesn't complain when arguments of different
width are passed.
2010-10-08 14:59:45 +00:00
Neel Natu
5c1a8dc028 Fix bogus error message from bus_dmamem_alloc() about incorrect alignment.
The check for alignment should be made against the physical address and not
the virtual address that maps it.

Sponsored by:	NetApp
Submitted by:	Will McGovern (will at netapp dot com)
Reviewed by:	mjacob, jhb
2010-09-29 21:53:11 +00:00
Marius Strobl
60dd2bcc05 minor simplifications and cosmetics 2010-09-24 15:12:18 +00:00
David Xu
295fbd498e Now userland POSIX semaphore is based on umtx. The kernel module
is only used to support binary compatible, if want to run old
binary, you need to kldload the module.
2010-09-24 09:04:16 +00:00
Konstantin Belousov
e8b4ca850e For sparc64 relocations that directly put bits of the symbol value into
the location, apply elf_relocaddr to the symbol value to have right
values for the symbols from dpcpu segment.

PR:	kern/147769
Discussed with:	avg
Tested by:	marius
MFC after:	2 weeks
2010-09-22 12:52:12 +00:00
Marius Strobl
3c141a7e1d Remove accidentally committed test code which effectively prevented the
use of the SPARC64 V VIS-based block copy function added in r212709.
Reported by:	Michael Moll
2010-09-16 12:05:46 +00:00
Marius Strobl
2c55431721 Add a VIS-based block copy function for SPARC64 V and later, which
additionally takes advantage of the prefetch cache of these CPUs.
Unlike the uncommitted US-III version, which provide no measurable
speedup or even resulted in a slight slowdown on certain CPUs models
compared to using the US-I version with these, the SPARC64 version
actually results in a slight improvement.
2010-09-15 21:44:31 +00:00
Marius Strobl
c1769fad32 Add macros for alternate entry points. 2010-09-15 21:11:29 +00:00
Marius Strobl
4539e94b61 Sync with other platforms:
- make dflt_lock() always panic,
- add kludge to use contigmalloc() when the alignment is larger than the size
  and print a diagnostic when we didn't satisfy the alignment.
2010-09-15 17:11:15 +00:00
Marius Strobl
198ec86bf9 - Update the comment in swi_vm() regarding busdma bounce buffers; it's
unlikely that support for these ever will be implemented on sparc64 as
  the IOMMUs are able to translate to up to the maximum physical address
  supported by the respective machine, bypassing the IOMMU is affected
  by hardware errata and being able to support DMA engines which cannot
  do at least 32-bit DMA does not justify the costs.
- The page zeroing in uma_small_alloc() may use the VIS-based block zero
  function so take advantage of it.
2010-09-15 15:18:41 +00:00
Marius Strobl
4c206df38f Remove a KASSERT which will also trigger for perfectly valid combinations
of small maxsize and "large" (including BUS_SPACE_UNRESTRICTED) nsegments
parameters. Generally using a presz of 0 (which indeed might indicate the
use of bogus parameters for DMA tag creation) is not fatal, it just means
that no additional DVMA space will be preallocated.
2010-09-14 20:31:09 +00:00
Marius Strobl
9ab2708c52 Remove redundant raising of the PIL to PIL_TICK as the respective locore
code already did that.
2010-09-14 19:35:43 +00:00
Alexander Motin
a157e42516 Refactor timer management code with priority to one-shot operation mode.
The main goal of this is to generate timer interrupts only when there is
some work to do. When CPU is busy interrupts are generating at full rate
of hz + stathz to fullfill scheduler and timekeeping requirements. But
when CPU is idle, only minimum set of interrupts (down to 8 interrupts per
second per CPU now), needed to handle scheduled callouts is executed.
This allows significantly increase idle CPU sleep time, increasing effect
of static power-saving technologies. Also it should reduce host CPU load
on virtualized systems, when guest system is idle.

There is set of tunables, also available as writable sysctls, allowing to
control wanted event timer subsystem behavior:
  kern.eventtimer.timer - allows to choose event timer hardware to use.
On x86 there is up to 4 different kinds of timers. Depending on whether
chosen timer is per-CPU, behavior of other options slightly differs.
  kern.eventtimer.periodic - allows to choose periodic and one-shot
operation mode. In periodic mode, current timer hardware taken as the only
source of time for time events. This mode is quite alike to previous kernel
behavior. One-shot mode instead uses currently selected time counter
hardware to schedule all needed events one by one and program timer to
generate interrupt exactly in specified time. Default value depends of
chosen timer capabilities, but one-shot mode is preferred, until other is
forced by user or hardware.
  kern.eventtimer.singlemul - in periodic mode specifies how much times
higher timer frequency should be, to not strictly alias hardclock() and
statclock() events. Default values are 2 and 4, but could be reduced to 1
if extra interrupts are unwanted.
  kern.eventtimer.idletick - makes each CPU to receive every timer interrupt
independently of whether they busy or not. By default this options is
disabled. If chosen timer is per-CPU and runs in periodic mode, this option
has no effect - all interrupts are generating.

As soon as this patch modifies cpu_idle() on some platforms, I have also
refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions
(if supported) under high sleep/wakeup rate, as fast alternative to other
methods. It allows SMP scheduler to wake up sleeping CPUs much faster
without using IPI, significantly increasing performance on some highly
task-switching loads.

Tested by:	many (on i386, amd64, sparc64 and powerc)
H/W donated by:	Gheorghe Ardelean
Sponsored by:	iXsystems, Inc.
2010-09-13 07:25:35 +00:00
Alexander Motin
dc5b8c2ee7 Sparc64 uses dummy cpu_idle() method. It's CPUs never sleeping. Tell
scheduler that it doesn't need to use IPI to "wake up" CPU.
2010-09-11 07:24:10 +00:00
Andriy Gapon
3d844eddb7 bus_add_child: change type of order parameter to u_int
This reflects actual type used to store and compare child device orders.
Change is mostly done via a Coccinelle (soon to be devel/coccinelle)
semantic patch.
Verified by LINT+modules kernel builds.

Followup to:	r212213
MFC after:	10 days
2010-09-10 11:19:03 +00:00
John Baldwin
d1a02e0932 Catch up to rename of the constant for the Master Data Parity Error bit in
the PCI status register.

Pointed out by:	mdf
Pointy hat to:	jhb
2010-09-09 20:26:30 +00:00
Pyun YongHyeon
57ec924c77 Enable sis(4). sis(4) should work on all architectures. 2010-09-02 18:12:54 +00:00
Marius Strobl
760d65b894 Skip a KASSERT which isn't appropriate when not employing page coloring.
Reported by: Michael Moll
2010-08-21 14:28:48 +00:00
John Baldwin
8c7a92bd4a Remove unused KTRACE includes. 2010-08-19 16:41:27 +00:00
Konstantin Belousov
ee235befcb Supply some useful information to the started image using ELF aux vectors.
In particular, provide pagesize and pagesizes array, the canary value
for SSP use, number of host CPUs and osreldate.

Tested by:	marius (sparc64)
MFC after:	1 month
2010-08-17 08:55:45 +00:00
John Baldwin
60c7b36b7a Update various places that store or manipulate CPU masks to use cpumask_t
instead of int or u_int.  Since cpumask_t is currently u_int on all
platforms this should just be a cosmetic change.
2010-08-11 23:22:53 +00:00
Marius Strobl
15ec942da8 Wrap some sun4u-only symbols. 2010-08-08 14:37:16 +00:00
Marius Strobl
553cf1a13c - As it is not possible for sched_bind(9) to context switch with
td_critnest > 1 when not already running on the desired CPU read the
  TICK counter of the BSP via a direct cross trap request in that case
  instead.
- Treat the STICK based timecounter the same way as the TICK based one
  regarding its quality and obtaining the counter value from the BSP.
  Like the TICK timers the STICK ones also are only synchronized during
  their startup (which might not result in good synchronicity in the
  first place) but not afterwards and might drift over time, causing
  problems when the time is read from different CPUs (see r135972).
2010-08-08 14:00:21 +00:00
Marius Strobl
dfab2088e8 - Introduce a cpu_ipi_single() function pointer in order to send IPIs
to single CPUs more efficiently with Cheetah(-class) and Jalapeno CPUs.
  Besides being used to implement the ipi_cpu() introduced in r210939,
  cpu_ipi_single() will also be used internally by the sparc64 MD code.
- Factor out the Jalapeno support from the Cheetah IPI send functions
  in order to be able to more easily and efficiently implement support
  for more than 32 target CPUs as well as a workaround for Cheetah+
  erratum 25 for the latter.
2010-08-08 00:09:22 +00:00
Marius Strobl
820a9ea5cb For CPUs which ignore TD_CV and support hardware unaliasing don't
bother doing page coloring. This results in a small but measurable
performance improvement in buildworld times.
2010-08-08 00:01:08 +00:00
John Baldwin
d9d8d1449d Add a new ipi_cpu() function to the MI IPI API that can be used to send an
IPI to a specific CPU by its cpuid.  Replace calls to ipi_selected() that
constructed a mask for a single CPU with calls to ipi_cpu() instead.  This
will matter more in the future when we transition from cpumask_t to
cpuset_t for CPU masks in which case building a CPU mask is more expensive.

Submitted by:	peter, sbruno
Reviewed by:	rookie
Obtained from:	Yahoo! (x86)
MFC after:	1 month
2010-08-06 15:36:59 +00:00
Alexander Motin
6c8dd81fa9 Adapt sparc64 and sun4v timer code for the new event timers infrastructure.
Reviewed by:	marius@
2010-07-29 12:08:46 +00:00
Matthew D Fleming
d7854da193 Add MALLOC_DEBUG_MAXZONES debug malloc(9) option to use multiple uma
zones for each malloc bucket size.  The purpose is to isolate
different malloc types into hash classes, so that any buffer overruns
or use-after-free will usually only affect memory from malloc types in
that hash class.  This is purely a debugging tool; by varying the hash
function and tracking which hash class was corrupted, the intersection
of the hash classes from each instance will point to a single malloc
type that is being misused.  At this point inspection or memguard(9)
can be used to catch the offending code.

Add MALLOC_DEBUG_MAXZONES=8 to -current GENERIC configuration files.
The suggestion to have this on by default came from Kostik Belousov on
-arch.

This code is based on work by Ron Steinke at Isilon Systems.

Reviewed by:    -arch (mostly silence)
Reviewed by:    zml
Approved by:    zml (mentor)
2010-07-28 15:36:12 +00:00
John Baldwin
a3870a1826 Very rough first cut at NUMA support for the physical page allocator. For
now it uses a very dumb first-touch allocation policy.  This will change in
the future.
- Each architecture indicates the maximum number of supported memory domains
  via a new VM_NDOMAIN parameter in <machine/vmparam.h>.
- Each cpu now has a PCPU_GET(domain) member to indicate the memory domain
  a CPU belongs to.  Domain values are dense and numbered from 0.
- When a platform supports multiple domains, the default freelist
  (VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain.
  The MD code is required to populate an array of mem_affinity structures.
  Each entry in the array defines a range of memory (start and end) and a
  domain for the range.  Multiple entries may be present for a single
  domain.  The list is terminated by an entry where all fields are zero.
  This array of structures is used to split up phys_avail[] regions that
  fall in VM_FREELIST_DEFAULT into per-domain freelists.
- Each memory domain has a separate lookup-array of freelists that is
  used when fulfulling a physical memory allocation.  Right now the
  per-domain freelists are listed in a round-robin order for each domain.
  In the future a table such as the ACPI SLIT table may be used to order
  the per-domain lookup lists based on the penalty for each memory domain
  relative to a specific domain.  The lookup lists may be examined via a
  new vm.phys.lookup_lists sysctl.
- The first-touch policy is implemented by using PCPU_GET(domain) to
  pick a lookup list when allocating memory.

Reviewed by:	alc
2010-07-27 20:33:50 +00:00
Attilio Rao
651aa2d896 KTR_CTx are long time aliased by existing classes so they can't serve
their purpose anymore. Axe them out.

Sponsored by:	Sandvine Incorporated
Discussed with:	jhb, emaste
Possible MFC:	TBD
2010-07-21 10:05:07 +00:00
Alexander Motin
a448e0d827 Allocate proper ammount of memory for interrupt names on sparc64 and
sun4v, same as done on other architectures. This removes garbage from
`vmstat -ia` output.

Reviewed by:	marius@
2010-07-16 22:09:29 +00:00
Marius Strobl
9f7666cebe - Pin the IPI cache and TLB demap functions in order to prevent migration
between determining the other CPUs and calling cpu_ipi_selected(), which
  apart from generally doing the wrong thing can lead to a panic when a
  CPU is told to IPI itself (which sun4u doesn't support).
  Reported and tested by: Nathaniel W Filardo
- Add __unused where appropriate.

MFC after:	3 days
2010-07-04 12:43:12 +00:00
John Baldwin
fc0de8f0b6 Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to
<sys/syscallsubr.h> where all other kern_<syscall> prototypes live.
2010-06-30 18:03:42 +00:00