Commit Graph

2196 Commits

Author SHA1 Message Date
dim
7b2ab078c6 Remove more superfluous const specifiers. 2014-02-23 18:36:45 +00:00
jhb
6e6e271c34 Add support for managing PCI bus numbers. As with BARs and PCI-PCI bridge
I/O windows, the default is to preserve the firmware-assigned resources.
PCI bus numbers are only managed if NEW_PCIB is enabled and the architecture
defines a PCI_RES_BUS resource type.
- Add a helper API to create top-level PCI bus resource managers for each
  PCI domain/segment.  Host-PCI bridge drivers use this API to allocate
  bus numbers from their associated domain.
- Change the PCI bus and CardBus drivers to allocate a bus resource for
  their bus number from the parent PCI bridge device.
- Change the PCI-PCI and PCI-CardBus bridge drivers to allocate the
  full range of bus numbers from secbus to subbus from their parent bridge.
  The drivers also always program their primary bus register.  The bridge
  drivers also support growing their bus range by extending the bus resource
  and updating subbus to match the larger range.
- Add support for managing PCI bus resources to the Host-PCI bridge drivers
  used for amd64 and i386 (acpi_pcib, mptable_pcib, legacy_pcib, and qpi_pcib).
- Define a PCI_RES_BUS resource type for amd64 and i386.

Reviewed by:	imp
MFC after:	1 month
2014-02-12 04:30:37 +00:00
nwhitehorn
5841c2df96 Simplify the ofw_bus_lookup_imap() API slightly: make it allocate maskbuf
internally instead of requiring the caller to allocate it.
2013-12-17 15:11:24 +00:00
marius
694f538513 Restore a vital comment nuked in r259016. 2013-12-08 15:25:19 +00:00
ray
1af064917e MFC @r258947.
Sponsored by:	The FreeBSD Foundation
2013-12-05 00:57:53 +00:00
pjd
4ac2e7d8d9 Make process descriptors standard part of the kernel. rwhod(8) already
requires process descriptors to work and having PROCDESC in GENERIC
seems not enough, especially that we hope to have more and more consumers
in the base.

MFC after:	3 days
2013-11-30 15:08:35 +00:00
ray
7317ada90c MFC @r258091. 2013-11-13 13:41:36 +00:00
alc
6c60c88452 As of r257209, all architectures have defined VM_KMEM_SIZE_SCALE. In other
words, every architecture is now auto-sizing the kmem arena.  This revision
changes kmeminit() so that the definition of VM_KMEM_SIZE_SCALE becomes
mandatory and the definition of VM_KMEM_SIZE becomes optional.

Replace or eliminate all existing definitions of VM_KMEM_SIZE.  With
auto-sizing enabled, VM_KMEM_SIZE effectively became an alternate spelling
for VM_KMEM_SIZE_MIN on most architectures.  Use VM_KMEM_SIZE_MIN for
clarity.

Change kmeminit() so that the effect of defining VM_KMEM_SIZE is similar to
that of setting the tunable vm.kmem_size.  Whereas the macros
VM_KMEM_SIZE_{MAX,MIN,SCALE} have had the same effect as the tunables
vm.kmem_size_{max,min,scale}, the effects of VM_KMEM_SIZE and vm.kmem_size
have been distinct.  In particular, whereas VM_KMEM_SIZE was overridden by
VM_KMEM_SIZE_{MAX,MIN,SCALE} and vm.kmem_size_{max,min,scale}, vm.kmem_size
was not.  Remedy this inconsistency.  Now, VM_KMEM_SIZE can be used to set
the size of the kmem arena at compile-time without that value being
overridden by auto-sizing.

Update the nearby comments to reflect the kmem submap being replaced by the
kmem arena.  Stop duplicating the auto-sizing formula in every machine-
dependent vmparam.h and place it in kmeminit() where auto-sizing takes
place.

Reviewed by:	kib (an earlier version)
Sponsored by:	EMC / Isilon Storage Division
2013-11-08 16:25:00 +00:00
ray
c28e9771d6 MFC @r257698. 2013-11-05 12:55:28 +00:00
kib
79afbd5fdd Add bus_dmamap_load_ma() function to load map with the array of
vm_pages.  Provide trivial implementation which forwards the load to
_bus_dmamap_load_phys() page by page.  Right now all architectures use
bus_dmamap_load_ma_triv().

Tested by:	pho (as part of the functional patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
2013-10-27 21:39:16 +00:00
ray
f690470e36 MFC @r257107. 2013-10-25 08:41:36 +00:00
marius
20009df359 Move the implementation of bus_space_barrier(9) to the inline function in
the header. Actually, there's only one version for all types of busses, so
it doesn't make sense to walk up the hierarchy.
2013-10-24 17:06:41 +00:00
ray
da824c4a2b MFC @r256148. 2013-10-08 14:02:35 +00:00
marius
fe39ca00e6 Implement GET_STACK_USAGE.
Discussed with:	mav

Approved by:	re (kib)
MFC after:	1 week
2013-09-29 13:09:25 +00:00
glebius
12fb397079 - Create kern.ipc.sendfile namespace, and put the new "readhead" OID
there as "kern.ipc.sendfile.readahead".
- Push all nsfbuf related tunables into MD code. Don't move them
  to new namespace in favor of POLA.

Reviewed by:	scottl
Approved by:	re (gjb)
2013-09-22 13:36:52 +00:00
alc
88a4d0f31a The pmap function pmap_clear_reference() is no longer used. Remove it.
pmap_clear_reference() has had exactly one caller in the kernel for
several years, more precisely, since FreeBSD 8.  Now, that call no
longer exists.

Approved by:	re (kib)
Sponsored by:	EMC / Isilon Storage Division
2013-09-20 04:30:18 +00:00
pjd
667d7255be Fix panic in ktrcapfail() when no capability rights are passed.
While here, correct all consumers to pass NULL instead of 0 as we pass
capability rights as pointers now, not uint64_t.

Reported by:	Daniel Peyrolon
Tested by:	Daniel Peyrolon
Approved by:	re (marius)
2013-09-18 19:26:08 +00:00
jhb
04bb6e10cd Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use
an address in the first 2GB of the process's address space.  This flag should
have the same semantics as the same flag on Linux.

To facilitate this, add a new parameter to vm_map_find() that specifies an
optional maximum virtual address.  While here, fix several callers of
vm_map_find() to use a VMFS_* constant for the findspace argument instead of
TRUE and FALSE.

Reviewed by:	alc
Approved by:	re (kib)
2013-09-09 18:11:59 +00:00
glebius
cf37135185 Fix build with gcc. Move sf_buf_alloc()/sf_buf_free() declarations
to MD headers.
2013-09-06 17:44:13 +00:00
ray
c555be1512 MFC @r255128. 2013-09-01 23:06:28 +00:00
alc
aa9a7bb9e6 Significantly reduce the cost, i.e., run time, of calls to madvise(...,
MADV_DONTNEED) and madvise(..., MADV_FREE).  Specifically, introduce a new
pmap function, pmap_advise(), that operates on a range of virtual addresses
within the specified pmap, allowing for a more efficient implementation of
MADV_DONTNEED and MADV_FREE.  Previously, the implementation of
MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as
pmap_clear_reference().  Intuitively, the problem with this implementation
is that the pmap-level locks are acquired and released and the page table
traversed repeatedly, once for each resident page in the range
that was specified to madvise(2).  A more subtle flaw with the previous
implementation is that pmap_clear_reference() would clear the reference bit
on all mappings to the specified page, not just the mapping in the range
specified to madvise(2).

Since our malloc(3) makes heavy use of madvise(2), this change can have a
measureable impact.  For example, the system time for completing a parallel
"buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%.

Note: This change only contains pmap_advise() implementations for a subset
of our supported architectures.  I will commit implementations for the
remaining architectures after further testing.  For now, a stub function is
sufficient because of the advisory nature of pmap_advise().

Discussed with: jeff, jhb, kib
Tested by:      pho (i386), marcel (ia64)
Sponsored by:   EMC / Isilon Storage Division
2013-08-29 15:49:05 +00:00
kib
05a9dff802 Revert r254501. Instead, reuse the type stability of the struct pmap
which is the part of struct vmspace, allocated from UMA_ZONE_NOFREE
zone.  Initialize the pmap lock in the vmspace zone init function, and
remove pmap lock initialization and destruction from pmap_pinit() and
pmap_release().

Suggested and reviewed by:	alc (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2013-08-22 18:12:24 +00:00
kib
ba12eedccd Remove the deprecated VM_ALLOC_RETRY flag for the vm_page_grab(9).
The flag was mandatory since r209792, where vm_page_grab(9) was
changed to only support the alloc retry semantic.

Suggested and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
2013-08-22 07:39:53 +00:00
pjd
ae1f4cb9ce Add process descriptors support to the GENERIC kernel. It is already being
used by the tools in base systems and with sandboxing more and more tools
the usage should only increase.

Submitted by:	Mariusz Zaborski <oshogbo@FreeBSD.org>
Sponsored by:	Google Summer of Code 2013
MFC after:	1 month
2013-08-18 10:21:29 +00:00
ray
3f79da49d7 MFC @r254374. 2013-08-15 19:32:08 +00:00
attilio
16c7563cf4 The soft and hard busy mechanism rely on the vm object lock to work.
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.

Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
  and vm_page_grab are being executed.  This will be very helpful
  once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag

The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.

Sponsored by:	EMC / Isilon storage division
Discussed with:	alc
Reviewed by:	jeff, kib
Tested by:	gavin, bapt (older version)
Tested by:	pho, scottl
2013-08-09 11:11:11 +00:00
avg
6f05d223ad follow up to r254051
- update powerpc/GENERIC64 as well, suggested by mdf
- update comments so that they make sense after the change, suggested by
  jhb

X-MFC after:	never (change specific to head)
2013-08-09 08:11:09 +00:00
kib
8de1718b60 Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain.  The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.

The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.

Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass.  This is not optimal, since it
could cause excessive page deactivation and freeing.  The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.

The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues.  Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.

Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.

Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
avg
0255b98224 enable KDB_TRACE in GENERICs
KDB_TRACE is not an alternative to DDB/etc, they are complementary.
So I do not see any reason to not enable KDB_TRACE by default.

X-MFC after:	never (change specific to head)
2013-08-07 08:03:50 +00:00
jeff
de4ecca213 Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

 - Normalize functions that allocate memory to use kmem_*
 - Those that allocate address space are named kva_*
 - Those that operate on maps are named kmap_*
 - Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	EMC / Isilon Storage Division
2013-08-07 06:21:20 +00:00
marius
95f847fd26 Add MD (for now) atomic_store_acq_<type>() and use it in pmap_activate()
to get the semantics when setting the PMAP right. Prior to r251782, the
latter already used implicit acquire semantics, which - currently - means
to not employ additional explicit memory barriers under the hood (see also
r225889).
2013-08-06 15:34:11 +00:00
attilio
598e23d2ea Remove unused member.
Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
Tested by:	pho
2013-08-04 21:17:05 +00:00
obrien
7999076e3e Back out r253779 & r253786. 2013-07-31 17:21:18 +00:00
obrien
721ce839c7 Decouple yarrow from random(4) device.
* Make Yarrow an optional kernel component -- enabled by "YARROW_RNG" option.
  The files sha2.c, hash.c, randomdev_soft.c and yarrow.c comprise yarrow.

* random(4) device doesn't really depend on rijndael-*.  Yarrow, however, does.

* Add random_adaptors.[ch] which is basically a store of random_adaptor's.
  random_adaptor is basically an adapter that plugs in to random(4).
  random_adaptor can only be plugged in to random(4) very early in bootup.
  Unplugging random_adaptor from random(4) is not supported, and is probably a
  bad idea anyway, due to potential loss of entropy pools.
  We currently have 3 random_adaptors:
  + yarrow
  + rdrand (ivy.c)
  + nehemeiah

* Remove platform dependent logic from probe.c, and move it into
  corresponding registration routines of each random_adaptor provider.
  probe.c doesn't do anything other than picking a specific random_adaptor
  from a list of registered ones.

* If the kernel doesn't have any random_adaptor adapters present then the
  creation of /dev/random is postponed until next random_adaptor is kldload'ed.

* Fix randomdev_soft.c to refer to its own random_adaptor, instead of a
  system wide one.

Submitted by: arthurmesh@gmail.com, obrien
Obtained from: Juniper Networks
Reviewed by: obrien
2013-07-29 20:26:27 +00:00
avg
4e6c4b2a36 Revert r253748,253749
This WIP should not have been committed yet.

Pointyhat to:	avg
2013-07-28 18:44:17 +00:00
avg
ea04b46439 put contents of cpu.h under _KERNEL
no userland-serviceable parts inside

MFC after:	20 days
2013-07-28 18:32:27 +00:00
ray
0ac02980ce MFC @r219886. 2013-07-25 12:43:22 +00:00
ae
998db3fe4a Include sys/systm.h after sys/param.h.
Suggested by:	pluknet
2013-07-15 15:40:57 +00:00
ae
6f8e41d6cb Introduce new structure sfstat for collecting sendfile's statistics
and remove corresponding fields from struct mbstat. Use PCPU counters
and SFSTAT_INC() macro for update these statistics.

Discussed with:	glebius
2013-07-15 06:16:57 +00:00
marius
98abe96b02 Prefix the alias macros for members of struct __mcontext with an underscore
in order to avoid a clash in the net80211 code.
2013-07-12 14:24:52 +00:00
kib
8a0279994d Fix issues with zeroing and fetching the counters, on x86 and ppc64.
Issues were noted by Bruce Evans and are present on all architectures.

On i386, a counter fetch should use atomic read of 64bit value,
otherwise carry from the increment on other CPU could be lost for the
given fetch, making error of 2^32.  If 64bit read (cmpxchg8b) is not
available on the machine, it cannot be SMP and it is enough to disable
preemption around read to avoid the split read.

On x86 the counter increment is not atomic on purpose, which makes it
possible for the store of the incremented result to override just
zeroed per-cpu slot.  The effect would be a counter going off by
arbitrary value after zeroing.  Perform the counter zeroing on the
same processor which does the increments, making the operations
mutually exclusive.  On i386, same as for the fetching, if the
cmpxchg8b is not available, machine is not SMP and we disable
preemption for zeroing.

PowerPC64 is treated the same as amd64.

For other architectures, the changes made to allow the compilation to
succeed, without fixing the issues with zeroing or fetching.  It
should be possible to handle them by using the 64bit loads and stores
atomic WRT preemption (assuming the architectures also converted from
using critical sections to proper asm).  If architecture does not
provide the facility, using global (spin) mutex would be non-optimal
but working solution.

Noted by:  bde
Sponsored by:	The FreeBSD Foundation
2013-07-01 02:48:27 +00:00
ed
1fdf0e5b3c Remove conflicting macros from SPARC64's atomic(9) header.
The atomic_load() and atomic_store() macros conflict with the equally
named macros from <stdatomic.h>. Remove them, as they are only used to
implement functions that are not present on any of the other
architectures.
2013-06-15 08:23:53 +00:00
ed
13f575ad68 Stick to using the documented atomic(9) API.
The atomic_store_ptr() function is not part of the atomic(9) API. We
only provide a version with a release barrier.
2013-06-15 08:21:54 +00:00
jeff
7ee88fb112 - Add a BIT_FFS() macro and use it to replace cpusetffs_obj()
Discussed with:	attilio
Sponsored by:	EMC / Isilon Storage Division
2013-06-13 20:46:03 +00:00
attilio
fdf82ef9cf o Relax locking assertions for vm_page_find_least()
o Relax locking assertions for pmap_enter_object() and add them also
  to architectures that currently don't have any
o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade
  operation on the per-object rwlock
o Use all the mechanisms above to make vm_map_pmap_enter() to work
  mostl of the times only with readlocks.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
2013-05-21 20:38:19 +00:00
alc
b55ba81a92 Relax the object locking assertion in pmap_enter_locked().
Reviewed by:	attilio
Sponsored by:	EMC / Isilon Storage Division
2013-05-17 18:59:00 +00:00
peter
cec511ab65 Tidy up some CVS workarounds. 2013-05-12 01:53:47 +00:00
attilio
b24a52ec9e Rename VM_NDOMAIN into MAXMEMDOM and move it into machine/param.h in
order to match the MAXCPU concept.  The change should also be useful
for consolidation and consistency.

Sponsored by:	EMC / Isilon storage division
Obtained from:	jeff
Reviewed by:	alc
2013-05-07 22:46:24 +00:00
trasz
80b8b2f779 Remove ctl(4) from GENERIC. Also remove 'options CTL_DISABLE'
and kern.cam.ctl.disable tunable; those were introduced as a workaround
to make it possible to boot GENERIC on low memory machines.

With ctl(4) being built as a module and automatically loaded by ctladm(8),
this makes CTL work out of the box.

Reviewed by:	ken
Sponsored by:	FreeBSD Foundation
2013-04-12 16:25:03 +00:00
glebius
9cf64d6c35 Merge from projects/counters: counter(9).
Introduce counter(9) API, that implements fast and raceless counters,
provided (but not limited to) for gathering of statistical data.

See http://lists.freebsd.org/pipermail/freebsd-arch/2013-April/014204.html
for more details.

In collaboration with:	kib
Reviewed by:		luigi
Tested by:		ae, ray
Sponsored by:		Nginx, Inc.
2013-04-08 19:40:53 +00:00
glebius
8c6eba117e Merge from projects/counters:
Pad struct pcpu so that its size is denominator of PAGE_SIZE. This
is done to reduce memory waste in UMA_PCPU_ZONE zones.

Sponsored by:	Nginx, Inc.
2013-04-08 19:19:10 +00:00
mav
7c2b81b0e9 Remove all legacy ATA code parts, not used since options ATA_CAM enabled in
most kernels before FreeBSD 9.0.  Remove such modules and respective kernel
options: atadisk, ataraid, atapicd, atapifd, atapist, atapicam.  Remove the
atacontrol utility and some man pages.  Remove useless now options ATA_CAM.

No objections:	current@, stable@
MFC after:	never
2013-04-04 07:12:24 +00:00
ian
5b38501da6 Fix low-level uart drivers that set their fifo sizes in the softc too late.
uart(4) allocates send and receiver buffers in attach() before it calls
the low-level driver's attach routine.  Many low-level drivers set the
fifo sizes in their attach routine, which is too late.  Other drivers set
them in the probe() routine, so that they're available when uart(4)
allocates buffers.  This fixes the ones that were setting the values too
late by moving the code to probe().
2013-04-01 00:44:20 +00:00
ray
3625eb7f3d MFC@r248830
Approved by:	ed (project owner)
2013-03-28 20:27:01 +00:00
kib
7c26a038f9 Implement the concept of the unmapped VMIO buffers, i.e. buffers which
do not map the b_pages pages into buffer_map KVA.  The use of the
unmapped buffers eliminate the need to perform TLB shootdown for
mapping on the buffer creation and reuse, greatly reducing the amount
of IPIs for shootdown on big-SMP machines and eliminating up to 25-30%
of the system time on i/o intensive workloads.

The unmapped buffer should be explicitely requested by the GB_UNMAPPED
flag by the consumer.  For unmapped buffer, no KVA reservation is
performed at all. The consumer might request unmapped buffer which
does have a KVA reserve, to manually map it without recursing into
buffer cache and blocking, with the GB_KVAALLOC flag.

When the mapped buffer is requested and unmapped buffer already
exists, the cache performs an upgrade, possibly reusing the KVA
reservation.

Unmapped buffer is translated into unmapped bio in g_vfs_strategy().
Unmapped bio carry a pointer to the vm_page_t array, offset and length
instead of the data pointer.  The provider which processes the bio
should explicitely specify a readiness to accept unmapped bio,
otherwise g_down geom thread performs the transient upgrade of the bio
request by mapping the pages into the new bio_transient_map KVA
submap.

The bio_transient_map submap claims up to 10% of the buffer map, and
the total buffer_map + bio_transient_map KVA usage stays the
same. Still, it could be manually tuned by kern.bio_transient_maxcnt
tunable, in the units of the transient mappings.  Eventually, the
bio_transient_map could be removed after all geom classes and drivers
can accept unmapped i/o requests.

Unmapped support can be turned off by the vfs.unmapped_buf_allowed
tunable, disabling which makes the buffer (or cluster) creation
requests to ignore GB_UNMAPPED and GB_KVAALLOC flags.  Unmapped
buffers are only enabled by default on the architectures where
pmap_copy_page() was implemented and tested.

In the rework, filesystem metadata is not the subject to maxbufspace
limit anymore. Since the metadata buffers are always mapped, the
buffers still have to fit into the buffer map, which provides a
reasonable (but practically unreachable) upper bound on it. The
non-metadata buffer allocations, both mapped and unmapped, is
accounted against maxbufspace, as before. Effectively, this means that
the maxbufspace is forced on mapped and unmapped buffers separately.
The pre-patch bufspace limiting code did not worked, because
buffer_map fragmentation does not allow the limit to be reached.

By Jeff Roberson request, the getnewbuf() function was split into
smaller single-purpose functions.

Sponsored by:	The FreeBSD Foundation
Discussed with:	jeff (previous version)
Tested by:	pho, scottl (previous version), jhb, bf
MFC after:	2 weeks
2013-03-19 14:13:12 +00:00
kib
63efc821c3 Add pmap function pmap_copy_pages(), which copies the content of the
pages around, taking array of vm_page_t both for source and
destination.  Starting offsets and total transfer size are specified.

The function implements optimal algorithm for copying using the
platform-specific optimizations.  For instance, on the architectures
were the direct map is available, no transient mappings are created,
for i386 the per-cpu ephemeral page frame is used.  The code was
typically borrowed from the pmap_copy_page() for the same
architecture.

Only i386/amd64, powerpc aim and arm/arm-v6 implementations were
tested at the time of commit. High-level code, not committed yet to
the tree, ensures that the use of the function is only allowed after
explicit enablement.

For sparc64, the existing code has known issues and a stab is added
instead, to allow the kernel linking.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho (i386, amd64), scottl (amd64), ian (arm and arm-v6)
MFC after:	2 weeks
2013-03-14 20:18:12 +00:00
attilio
7fd2627275 MFC 2013-03-09 01:39:42 +00:00
attilio
bf1dc90446 MFC 2013-03-08 00:03:07 +00:00
gavin
2caee59a46 Correct two spelling mistakes in a comment. 2013-03-07 13:24:49 +00:00
attilio
e98f58faf6 MFC 2013-03-02 14:48:41 +00:00
marius
ee248b021f - Revert the part of r247601 which turned the overtemperature and power fail
interrupt shutdown handlers into filters. Shutdown_nice(9) acquires a sleep
  lock, which filters shouldn't do. It also seems that kern_reboot(9) still
  may require Giant to be hold.
- Correct an incorrect argument to shutdown_nice(9).

Submitted by:	bde
2013-03-02 13:08:13 +00:00
marius
b1d7b9754b Revert the part of r247600 which turned the overtemperature and power fail
interrupt shutdown handlers into filters. Shutdown_nice(9) acquires a sleep
lock, which filters shouldn't do. It also seems that kern_reboot(9) still
may require Giant to be hold.

Submitted by:	bde
2013-03-02 13:04:58 +00:00
marius
dd15932a15 - Apparently, it's no longer a problem to call shutdown_nice(9) from within
an interrupt filter (some other drivers in the tree do the same). So
  change the overtemperature and power fail interrupts from handlers in order
  to code and get rid of a !INTR_MPSAFE handlers.
- Mark unused parameters as such.
- Use NULL instead of 0 for pointers.

MFC after:	1 week
2013-03-02 00:41:51 +00:00
marius
2774e0404e - While Netra X1 generally show no ill effects when registering a power
fail interrupt handler, there seems to be either a broken batch of them
  or a tendency to develop a defect which causes this interrupt to fire
  inadvertedly. Given that apart from this problem these machines work
  just fine, add a tunable allowing the setup of the power fail interrupt
  to be disabled.
  While at it, remove the DEBUGGER_ON_POWERFAIL compile time option and
  make that behavior also selectable via the newly added tunable.
- Apparently, it's no longer a problem to call shutdown_nice(9) from within
  an interrupt filter (some other drivers in the tree do the same). So
  change the power fail interrupt from an handler in order to simplify the
  code and get rid of a !INTR_MPSAFE handler.
- Use NULL instead of 0 for pointers.

MFC after:	1 week
2013-03-02 00:37:31 +00:00
marius
0c5e0b209e - In sbbc_pci_attach() just pass the already obtained bus tag and handle
instead of acquiring these anew.
- Use NULL instead of 0 for pointers.

MFC after:	1 week
2013-03-01 20:36:59 +00:00
marius
944a48f5cd - Remove an unused header.
- Use NULL instead of 0 for pointers.
- Let ofw_pcib_probe() return BUS_PROBE_DEFAULT instead of 0 so specialized
  PCI-PCI-bridge drivers may attach instead.
- Add WARs for PLX Technology PEX 8114 bridges and PEX 8532 switches.
  Ideally, these should live in MI code but at least for the latter we're
  missing the necessary infrastructure there.

MFC after:	1 week
2013-03-01 20:34:02 +00:00
mav
6cf7cc6e4d MFcalloutng:
Switch eventtimers(9) from using struct bintime to sbintime_t.
Even before this not a single driver really supported full dynamic range of
struct bintime even in theory, not speaking about practical inexpediency.
This change legitimates the status quo and cleans up the code.
2013-02-28 13:46:03 +00:00
attilio
8d28f94790 Merge from vmobj-rwlock:
VM_OBJECT_LOCKED() macro is only used to implement a custom version
of lock assertions right now (which likely spread out thanks to
copy and paste).
Remove it and implement actual assertions.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
Tested by:	pho
2013-02-27 18:12:13 +00:00
attilio
cb47f0509b Merge from vmobj-rwlock branch:
Remove unused inclusion of vm/vm_pager.h and vm/vnode_pager.h.

Sponsored by:	EMC / Isilon storage division
Tested by:	pho
Reviewed by:	alc
2013-02-26 01:00:11 +00:00
attilio
905e648d42 Hide the details for the assertion for VM_OBJECT_LOCK operations.
Rename current VM_OBJECT_LOCK_ASSERT(foo, RA_WLOCKED) into
VM_OBJECT_ASSERT_WLOCKED(foo)

Sponsored by:	EMC / Isilon storage division
Requested by:	alc
2013-02-21 21:54:53 +00:00
attilio
066bbc97b6 Fix other architectures and ZFS.
Sponsored by:	EMC / Isilon storage division
2013-02-21 15:02:36 +00:00
nwhitehorn
c5f6ffc937 IFC @ 247036 2013-02-20 14:26:51 +00:00
attilio
15bf891afe Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() to
their "write" versions.

Sponsored by:	EMC / Isilon storage division
2013-02-20 12:03:20 +00:00
attilio
1f1e13ca03 There is no need to use VM_OBJECT_LOCKED() as the assertion won't
make the check available in any case if INVARIANTS is switched off.
Remove VM_OBJECT_LOCKED().
2013-02-20 10:51:34 +00:00
attilio
658534ed5a Switch vm_object lock to be a rwlock.
* VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations
* VM_OBJECT_SLEEP() is introduced as a general purpose primitve to
  get a sleep operation using a VM_OBJECT_LOCK() as protection
* The approach must bear with vm_pager.h namespace pollution so many
  files require including directly rwlock.h
2013-02-20 10:38:34 +00:00
kib
bd7f0fa0bb Reform the busdma API so that new types may be added without modifying
every architecture's busdma_machdep.c.  It is done by unifying the
bus_dmamap_load_buffer() routines so that they may be called from MI
code.  The MD busdma is then given a chance to do any final processing
in the complete() callback.

The cam changes unify the bus_dmamap_load* handling in cam drivers.

The arm and mips implementations are updated to track virtual
addresses for sync().  Previously this was done in a type specific
way.  Now it is done in a generic way by recording the list of
virtuals in the map.

Submitted by:	jeff (sponsored by EMC/Isilon)
Reviewed by:	kan (previous version), scottl,
	mjacob (isp(4), no objections for target mode changes)
Discussed with:	     ian (arm changes)
Tested by:	marius (sparc64), mips (jmallet), isci(4) on x86 (jharris),
	amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)
2013-02-12 16:57:20 +00:00
kib
dfffe11d71 The 'end' word was missed in the comment.
MFC after:     3 days
2013-02-08 15:52:20 +00:00
eadler
6a1efe1ad9 Remove support for plip from the GENERIC kernel as no systems in the
last 10 years require this support.

Discussed with:	db
Discussed with:	kib
Reviewed by:	imp
Reviewed by:	jhb
Reviewed by:	-hackers
Approved by:	cperciva (mentor)
2013-02-01 20:17:11 +00:00
marius
6892c8a3be Revert the part of r239864 which removed obtaining the SMP mutex around
reading registers from other CPUs. As it turns out, the hardware doesn't
really like concurrent IPI'ing causing adverse effects. Also the thought
deadlock when using this spin lock here and the targeted CPU(s) are also
holding or in case of nested locks can't actually happen. This is due to
the fact that on sparc64, spinlock_enter() only raises the PIL but doesn't
disable interrupts completely. Thus direct cross calls as used for the
register reading (and all other MD IPI needs) still will be executed by
the targeted CPU(s) in that case.

MFC after:	3 days
2013-01-23 22:52:20 +00:00
marius
0e18f8466b Revert bogus part of r241740.
Reported by:	Michael Moll

MFC after:	3 days
2013-01-03 23:12:08 +00:00
kib
5a9188f8d3 Enable the UFS quotas for big-iron GENERIC kernels.
Discussed with:	      mckusick
MFC after:	      2 weeks
2013-01-03 19:03:41 +00:00
des
67e77c00a8 As discussed on -current last October, remove the firewire drivers from
GENERIC.
2013-01-03 14:30:24 +00:00
marius
f218d978bf Revert r237842 and switch back to SCHED_ULE. All problems I encountered
with the latter have been fixed with r241780.

MFC after:	3 days
2012-12-16 20:54:07 +00:00
nwhitehorn
76289396e6 IFC @ 243795 2012-12-02 21:00:52 +00:00
kib
bc5bfde14d Move the declaration of vm_phys_paddr_to_vm_page() from vm/vm_page.h
to vm/vm_phys.h, where it belongs.

Requested and reviewed by:	alc
MFC after:	2 weeks
2012-11-16 05:55:56 +00:00
jeff
f40f3c3255 - Implement run-time expansion of the KTR buffer via sysctl.
- Implement a function to ensure that all preempted threads have switched
   back out at least once.  Use this to make sure there are no stale
   references to the old ktr_buf or the lock profiling buffers before
   updating them.

Reviewed by:	marius (sparc64 parts), attilio (earlier patch)
Sponsored by:	EMC / Isilon Storage Division
2012-11-15 00:51:57 +00:00
kib
e8ae50d444 Flip the semantic of M_NOWAIT to only require the allocation to not
sleep, and perform the page allocations with VM_ALLOC_SYSTEM
class. Previously, the allocation was also allowed to completely drain
the reserve of the free pages, being translated to VM_ALLOC_INTERRUPT
request class for vm_page_alloc() and similar functions.

Allow the caller of malloc* to request the 'deep drain' semantic by
providing M_USE_RESERVE flag, now translated to VM_ALLOC_INTERRUPT
class. Previously, it resulted in less aggressive VM_ALLOC_SYSTEM
allocation class.

Centralize the translation of the M_* malloc(9) flags in the single
inline function malloc2vm_flags().

Discussion started by:	"Sears, Steven" <Steven.Sears@netapp.com>
Reviewed by:	alc, mdf (previous version)
Tested by:	pho (previous version)
MFC after:	2 weeks
2012-11-14 20:01:40 +00:00
dim
373133f0ad Remove duplicate const specifiers in many drivers (I hope I got all of
them, please let me know if not).  Most of these are of the form:

static const struct bzzt_type {
	[...list of members...]
} const bzzt_devs[] = {
	[...list of initializers...]
};

The second const is unnecessary, as arrays cannot be modified anyway,
and if the elements are const, the whole thing is const automatically
(e.g. it is placed in .rodata).

I have verified this does not change the binary output of a full kernel
build (except for build timestamps embedded in the object files).

Reviewed by:	yongari, marius
MFC after:	1 week
2012-11-05 19:16:27 +00:00
attilio
f3501b109e Rework the known rwlock to benefit about staying on their own
cache line in order to avoid manual frobbing but using
struct rwlock_padalign.

Reviewed by:	alc, jimharris
2012-11-03 23:03:14 +00:00
marius
807619d8ba - Give PIL_PREEMPT the lowest priority just above low/stray interrupts.
The reason for this is that the SPARC v9 architecture allows nested
  interrupts of higher priority/level than that of the current interrupt
  to occur (and we can't just entirely bypass this model, also, at least
  for tick interrupts, this also wouldn't be wise). However, when a
  preemption interrupt interrupts another interrupt of lower priority,
  f.e. PIL_ITHREAD, and that one in turn is nested by a third interrupt,
  f.e. PIL_TICK, with SCHED_ULE the execution of interrupts higher than
  PIL_PREEMPT may be migrated to another CPU. In particular, tl1_ret(),
  which is responsible for restoring the state of the CPU prior to entry
  to the interrupt based on the (also migrated) trap frame, then is run
  on a CPU which actually didn't receive the interrupt in question,
  causing an inappropriate processor interrupt level to be "restored".
  In turn, this causes interrupts of the first level, i.e. PIL_ITHREAD
  in the above scenario, to be blocked on the target of the migration
  until the correct PIL happens to be restored again on that CPU again.
  Making PIL_PREEMPT the lowest real priority, this effectively prevents
  this scenario from happening, as preemption interrupts no longer can
  interrupt any other interrupt besides stray ones (which is no issue).
  Thanks to attilio@ and especially mav@ for helping me to understand
  this problem at the 201208DevSummit.
- Give PIL_STOP (which is also used for IPI_STOP_HARD, given that there's
  no real equivalent to NMIs on SPARC v9) the highest possible priority
  just below the hardwired PIL_TICK, so it has a chance to interrupt
  more things.

MFC after:	1 week
2012-10-20 12:07:48 +00:00
marius
0e1f679c31 - Remove an unused header.
- Don't waste a delay slot.

MFC after:	3 days
2012-10-19 17:12:55 +00:00
marius
9e47c0d1ff Let SCHED_ULE give affinity to the CPU the tick interrupt triggered on
when running tick_process(), similarly to what the x86 equivalents of
this function do, however employing the less racy sequence also used in
intr_event_handle().

MFC after:	3 days
2012-10-19 13:32:37 +00:00
attilio
6997194551 Add an unified macro to deny ability from the compiler to reorder
instruction loads/stores at its will.
The macro __compiler_membar() is currently supported for both gcc and
clang, but kernel compilation will fail otherwise.

Reviewed by:	bde, kib
Discussed with:	dim, theraven
MFC after:	2 weeks
2012-10-09 14:32:30 +00:00
attilio
3212891c92 Reverts r234074,234105,234564,234723,234989,235231-235232 and part of
r234247.
Use, instead, the static intializer introduced in r239923 for x86 and
sparc64 intr_cpus, unwinding the code to the initial version.

Reviewed by:	marius
2012-10-09 12:22:43 +00:00
alc
55f6ff40ed Eliminate a stale comment. It describes another use case for the pmap in
Mach that doesn't exist in FreeBSD.
2012-09-28 05:30:59 +00:00
eadler
8600cbb5b6 Correct double "the the"
Approved by:	cperciva
MFC after:	3 days
2012-09-14 21:28:56 +00:00
attilio
8dece93b14 userret() already checks for td_locks when INVARIANTS is enabled, so
there is no need to check if Giant is acquired after it.

Reviewed by:	kib
MFC after:	1 week
2012-09-08 18:27:11 +00:00
gavin
9db084a6c3 Prevent indent(1) from reformatting this comment, as it contains
a formatting-sensitive table.
2012-09-07 08:18:06 +00:00
marius
1829ce3546 Add a global MD macro for the VIS block size instead of duplicating
it and using magic values all over the place.

MFC after:	1 week
2012-08-31 11:15:01 +00:00
marius
9f542929d1 - Unlike cache invalidation and TLB demapping IPIs, reading registers from
other CPUs doesn't require locking so get rid of it. As the latter is used
  for the timecounter on certain machine models, using a spin lock in this
  case can lead to a deadlock with the upcoming callout(9) rework.
- Merge r134227/r167250 from x86:
  Avoid cross-IPI SMP deadlock by using the smp_ipi_mtx spin lock not only
  for smp_rendezvous_cpus() but also for the MD cache invalidation and TLB
  demapping IPIs.
- Mark some unused function arguments as such.

MFC after:	1 week
2012-08-29 16:56:50 +00:00
gjb
3f013cdf9f Grammar fix: s/NIC's/NICs/
MFC after:	3 days
2012-08-26 01:21:02 +00:00
marius
4c012f63e6 Merge r236494 from x86:
Isolate the global TTE list lock from data and other locks to prevent false
sharing within the cache.

MFC after:	3 days
2012-08-05 22:03:13 +00:00
marius
d339b71305 Switch back to the 4BSD scheduler for now. There is some more or less
recent regression with ULE, causing processes to get stuck in getblk
as well as interrupt handler execution delays to rise above the command
timeout of mpt(4).

MFC after:	3 days
2012-06-30 14:55:36 +00:00
ken
da17d879d7 Now that the mps(4) driver is endian-safe, add it to the powerpc and
sparc64 GENERIC config files.

MFC after:	3 days
2012-06-28 20:48:24 +00:00
alc
c5e6daff9d Add new pmap layer locks to the predefined lock order. Change the names
of a few existing VM locks to follow a consistent naming scheme.
2012-06-27 03:45:25 +00:00
andrew
0a7002aae7 Make the wchar_t type machine dependent.
This is required for ARM EABI. Section 7.1.1 of the Procedure Call for the
ARM Architecture (AAPCS) defines wchar_t as either an unsigned int or an
unsigned short with the former preferred.

Because of this requirement we need to move the definition of __wchar_t to
a machine dependent header. It also cleans up the macros defining the limits
of wchar_t by defining __WCHAR_MIN and __WCHAR_MAX in the same machine
dependent header then using them to define WCHAR_MIN and WCHAR_MAX
respectively.

Discussed with:	bde
2012-06-24 04:15:58 +00:00
kib
7b36a08108 Implement mechanism to export some kernel timekeeping data to
usermode, using shared page.  The structures and functions have vdso
prefix, to indicate the intended location of the code in some future.

The versioned per-algorithm data is exported in the format of struct
vdso_timehands, which mostly repeats the content of in-kernel struct
timehands. Usermode reading of the structure can be lockless.
Compatibility export for 32bit processes on 64bit host is also
provided. Kernel also provides usermode with indication about
currently used timecounter, so that libc can fall back to syscall if
configured timecounter is unknown to usermode code.

The shared data updates are initiated both from the tc_windup(), where
a fast task is queued to do the update, and from sysctl handlers which
change timecounter. A manual override switch
kern.timecounter.fast_gettime allows to turn off the mechanism.

Only x86 architectures export the real algorithm data, and there, only
for tsc timecounter. HPET counters page could be exported as well, but
I prefer to not further glue the kernel and libc ABI there until
proper vdso-based solution is developed.

Minimal stubs neccessary for non-x86 architectures to still compile
are provided.

Discussed with:	bde
Reviewed by:	jhb
Tested by:	flo
MFC after:	1 month
2012-06-22 07:06:40 +00:00
kib
d98ad62d7e Reserve AT_TIMEKEEP auxv entry for providing usermode the pointer to
timekeeping information.

MFC after:  1 week
2012-06-22 06:38:31 +00:00
alc
6eeaee04e4 The page flag PGA_WRITEABLE is set and cleared exclusively by the pmap
layer, but it is read directly by the MI VM layer.  This change introduces
pmap_page_is_write_mapped() in order to completely encapsulate all direct
access to PGA_WRITEABLE in the pmap layer.

Aesthetics aside, I am making this change because amd64 will likely begin
using an alternative method to track write mappings, and having
pmap_page_is_write_mapped() in place allows me to make such a change
without further modification to the MI VM layer.

As an added bonus, tidy up some nearby comments concerning page flags.

Reviewed by:	kib
MFC after:	6 weeks
2012-06-16 18:56:19 +00:00
alc
2616fac7a2 Replace all uses of the vm page queues lock by a r/w lock that is private
to this pmap.c.  This new r/w lock is used primarily to synchronize access
to the TTE lists.  However, it will be used in a somewhat unconventional
way.  As finer-grained TTE list locking is added to each of the pmap
functions that acquire this r/w lock, its acquisition will be changed from
write to read, enabling concurrent execution of the pmap functions with
finer-grained locking.

Reviewed by:	attilio
Tested by:	flo
MFC after:	10 days
2012-05-29 01:52:38 +00:00
marius
3b937b5682 Merge from x86: r232521
Exclude USB drivers (except umass and ukbd) from main kernel image.
2012-05-25 14:52:05 +00:00
bz
cd8b136e12 MFp4 bz_ipv6_fast:
in_cksum.h required ip.h to be included for struct ip.  To be
  able to use some general checksum functions like in_addword()
  in a non-IPv4 context, limit the (also exported to user space)
  IPv4 specific functions to the times, when the ip.h header is
  present and IPVERSION is defined (to 4).

  We should consider more general checksum (updating) functions
  to also allow easier incremental checksum updates in the L3/4
  stack and firewalls, as well as ponder further requirements by
  certain NIC drivers needing slightly different pseudo values
  in offloading cases.  Thinking in terms of a better "library".

  Sponsored by:	The FreeBSD Foundation
  Sponsored by:	iXsystems

Reviewed by:	gnn (as part of the whole)
MFC After:	3 days
2012-05-24 22:00:48 +00:00
mav
56ef4027fe MFprojects/zfsd:
Generalize and unify ses device description.
2012-05-24 11:20:51 +00:00
marius
e5cc26d604 Fix mismerge in r235231. 2012-05-10 15:23:20 +00:00
marius
d42dce6416 Merge r234989 from x86:
Revert part of r234723 by re-enabling the SMP protection for intr_bind().
2012-05-10 15:17:21 +00:00
dim
81a3e2b46d Add a convenience macro for the returns_twice attribute, and apply it to
the prototypes of the appropriate functions (getcontext, savectx,
setjmp, sigsetjmp and vfork).

MFC after:	2 weeks
2012-04-29 11:04:31 +00:00
attilio
0b98e6d835 Clean up the intr* MD KPI from the SMP dependency, removing a cause of
discrepancy between modules and kernel, but deal with SMP differences
within the functions themselves.

As an added bonus this also helps in terms of code readability.

Requested by:	gibbs
Reviewed by:	jhb, marius
MFC after:	1 week
2012-04-26 20:24:25 +00:00
marius
910f312b32 Turn on PREEMPTION by default. After fixing several bugs over time, the
last show-stopper keeping PREEMPTION from being usable on sparc64 should
have been dealt with in r230662.
At least on 2-way systems, PREEMPTION causes a little bit of a degradation
in worldstone performance. However, FreeBSD seems to have started building
up regressions in !PREEMPTION cases so sparc64 better should not be an
oddball in this regard.

MFC after:	1 week
2012-04-16 18:29:07 +00:00
marius
7f51c048df Merge from x86:
r233961:

Fix interrupt load balancing regression, introduced in revision
222813, that left all un-pinned interrupts assigned to CPU 0.
In intr_shuffle_irqs(), remove CPU_SETOF() call that initialized
the "intr_cpus" cpuset to only contain CPU0.

This initialization is too late and nullifies the results of calls
to the intr_add_cpu() that occur much earlier in the boot process.

r234074 (partial):

The BSP is not added to the mask of valid target CPUs for interrupts.
Fix this by adding the BSP as an interrupt target directly in

r234105:

Fix !SMP build after r234074.

MFC after: 3 days
2012-04-13 22:58:23 +00:00
nwhitehorn
a694ff5eb4 Fix mismerge in r234200. 2012-04-13 14:36:57 +00:00
marcel
53feefc8b5 Sync with head@234197
Ok'd by: ed
2012-04-13 04:21:54 +00:00
marius
83cd4d31b1 Remove checks that are redundant due to tf_type being unsigned.
MFC after:	3 days
2012-03-31 14:03:16 +00:00
marius
6b88064bda Fix panic on kernel traps having a mapping in trap_sig b0rked in r206086.
Repored by:	David E. Cross

MFC after:	3 days
2012-03-31 13:56:24 +00:00
marius
07d4fd218f - Remove erroneous trailing semicolon. [1]
- Correctly determine the maximum payload size for setting the TX link
  frequent NACK latency and replay timer thresholds.

Submitted by:	stefanf [1]
MFC after:	3 days
2012-03-30 15:08:09 +00:00
marius
306ae35dca Given that this is a host-PCI-Express bridge driver, create the parent
DMA tag with a 4 GB boundary as required by PCI-Express. With r232403 in
place this actually is redundant. However, the host-PCI-Express bridge
driver is the more appropriate place for implementing this restriction.

MFC after:	3 days
2012-03-24 13:11:58 +00:00
ed
619ab2327e Remove pty(4) from our kernel configurations.
As of FreeBSD 8, this driver should not be used. Applications that use
posix_openpt(2) and openpty(3) use the pts(4) that is built into the
kernel unconditionally. If it turns out high profile depend on the
pty(4) module anyway, I'd rather get those fixed. So please report any
issues to me.

The pty(4) module is still available as a kernel module of course, so a
simple `kldload pty' can be used to run old-style pseudo-terminals.
2012-03-21 08:38:42 +00:00
nwhitehorn
6a20b9d5e1 Make ofw_bus_get_node() consistently return -1 when there is no associated
OF node, instead of a random mixture of 0 and -1. Update all checks for 0
to check for -1 instead.

MFC after:	4 weeks
2012-03-15 22:53:39 +00:00
dim
cf2c2fde9c Add casts to __uint16_t to the __bswap16() macros on all arches which
didn't already have them.  This is because the ternary expression will
return int, due to the Usual Arithmetic Conversions.  Such casts are not
needed for the 32 and 64 bit variants.

While here, add additional parentheses around the x86 variant, to
protect against unintended consequences.

MFC after:	2 weeks
2012-03-09 20:34:31 +00:00
attilio
9906b913d9 Disable the option VFS_ALLOW_NONMPSAFE by default on all the supported
platforms.
This will make every attempt to mount a non-mpsafe filesystem to the
kernel forbidden, unless it is expressely compiled with
VFS_ALLOW_NONMPSAFE option.

This patch is part of the effort of killing non-MPSAFE filesystems
from the tree.

No MFC is expected for this patch.
2012-03-06 20:01:25 +00:00
jhb
d6e546f14a - Add a bus_dma tag to each PCI bus that is a child of a Host-PCI bridge.
The tag enforces a single restriction that all DMA transactions must not
  cross a 4GB boundary.  Note that while this restriction technically only
  applies to PCI-express, this change applies it to all PCI devices as it
  is simpler to implement that way and errs on the side of caution.
- Add a softc structure for PCI bus devices to hold the bus_dma tag and
  a new pci_attach_common() routine that performs actions common to the
  attach phase of all PCI bus drivers.  Right now this only consists of
  a bootverbose printf and the allocate of a bus_dma tag if necessary.
- Adjust all PCI bus drivers to allocate a PCI bus softc and to call
  pci_attach_common() from their attach routines.

MFC after:	2 weeks
2012-03-02 20:38:04 +00:00
jhb
5013ab31bd - Change contigmalloc() to use the vm_paddr_t type instead of an unsigned
long for specifying a boundary constraint.
- Change bus_dma tags to use bus_addr_t instead of bus_size_t for boundary
  constraints.

These allow boundary constraints to be fully expressed for cases where
sizeof(bus_addr_t) != sizeof(bus_size_t).  Specifically, it allows a
driver to properly specify a 4GB boundary in a PAE kernel.

Note that this cannot be safely MFC'd without a lot of compat shims due
to KBI changes, so I do not intend to merge it.

Reviewed by:	scottl
2012-03-01 19:58:34 +00:00
marius
16da82cb91 As it turns out r227960 may still be insufficient with PREEMPTION
so try harder to get the CDMA sync interrupt delivered and also in
a more efficient way:
- wrap the whole process of sending and receiving the CDMA sync
  interrupt in a critical section so we don't get preempted,
- send the CDMA sync interrupt to the CPU that is actually waiting
  for it to happen so we don't take a detour via another CPU,
- instead of waiting for up to 15 seconds for the interrupt to
  trigger try the whole process for up to 15 times using a one
  second timeout (the code was also changed to just ignore belated
  interrupts of a previous tries should they appear).

According to testing done by Peter Jeremy with the debugging also
added as part of this commit the first two changes apparently are
sufficient to now properly get the CDMA sync interrupts delivered
at the first try though.
2012-01-28 22:42:33 +00:00
marius
87d49c606e Fully disable interrupts while we fiddle with the FP context in the
VIS-based block copy/zero implementations. While with 4BSD it's
sufficient to just disable the tick interrupts, with ULE+PREEMPTION
it's otherwise also possible that these are preempted via IPIs.
2012-01-28 22:22:05 +00:00
marius
b8bb3e4d75 Commit file missed in r230633. 2012-01-27 23:25:24 +00:00
marius
8ec59afebb Now that we have a working OF_printf() since r230631 and a OF_panic()
helper since r230632, use these for output and panicing during the
early cycles and move cninit() until after the static per-CPU data
has been set up. This solves a couple of issue regarding the non-
availability of the static per-CPU data:
- panic() not working and only making things worse when called,
- having to supply a special DELAY() implementation to the low-level
  console drivers,
- curthread accesses of mutex(9) usage in low-level console drivers
  that aren't conditional due to compiler optimizations (basically,
  this is the problem described in r227537 but in this case for
  keyboards attached via uart(4)). [1]

PR:	164123 [1]
2012-01-27 23:21:54 +00:00
marius
43204e516a - Now that we have a working OF_printf() since r230631, use it for
implementing a simple OF_panic() that may be used during the early
  cycles when panic() isn't available, yet.
- Mark cpu_{exit,shutdown}() as __dead2 as appropriate.
2012-01-27 22:35:53 +00:00
marius
f3dd9013de For machines where the kernel address space is unrestricted increase
VM_KMEM_SIZE_SCALE to 2, awaiting more insight from alc@. As it turns
out, the VM apparently has problems with machines that have large holes
in the physical address space, causing the kmem_suballoc() call in
kmeminit() to fail with a VM_KMEM_SIZE_SCALE of 1. Using a value of 2
allows these, namely Blade 1500 with 2GB of RAM, to boot.

PR:	164227
2012-01-27 22:25:46 +00:00
marius
16ae182079 Mark cpu_{halt,reset}() as __dead2 as appropriate. 2012-01-27 22:04:43 +00:00
das
9feb719605 Add C11 macros describing subnormal numbers to float.h.
Reviewed by:	bde
2012-01-23 06:36:41 +00:00
nwhitehorn
39e90fb18b Make vt(4) work on sparc64, at least on systems with CHRP-compliant
framebuffer devices like the Ultra 5. Because vt(4) depends on mutexes
and at least one other problematic kernel subsystem I can't identify,
and these things become available much later on sparc64 that on other
architectures, it only works if console init is moved much later in the
boot process. Moving it may not be the best solution here, but does work.

Note than newcons as a whole doesn't quite work on the Ultra 5 yet, at
least with a regular sun keyboard: the sunkbd driver seems not to attach
without syscons for reasons that are obscure to me.
2012-01-22 19:37:40 +00:00
ken
fce645c153 Add the CAM Target Layer (CTL).
CTL is a disk and processor device emulation subsystem originally written
for Copan Systems under Linux starting in 2003.  It has been shipping in
Copan (now SGI) products since 2005.

It was ported to FreeBSD in 2008, and thanks to an agreement between SGI
(who acquired Copan's assets in 2010) and Spectra Logic in 2010, CTL is
available under a BSD-style license.  The intent behind the agreement was
that Spectra would work to get CTL into the FreeBSD tree.

Some CTL features:

 - Disk and processor device emulation.
 - Tagged queueing
 - SCSI task attribute support (ordered, head of queue, simple tags)
 - SCSI implicit command ordering support.  (e.g. if a read follows a mode
   select, the read will be blocked until the mode select completes.)
 - Full task management support (abort, LUN reset, target reset, etc.)
 - Support for multiple ports
 - Support for multiple simultaneous initiators
 - Support for multiple simultaneous backing stores
 - Persistent reservation support
 - Mode sense/select support
 - Error injection support
 - High Availability support (1)
 - All I/O handled in-kernel, no userland context switch overhead.

(1) HA Support is just an API stub, and needs much more to be fully
    functional.

ctl.c:			The core of CTL.  Command handlers and processing,
			character driver, and HA support are here.

ctl.h:			Basic function declarations and data structures.

ctl_backend.c,
ctl_backend.h:		The basic CTL backend API.

ctl_backend_block.c,
ctl_backend_block.h:	The block and file backend.  This allows for using
			a disk or a file as the backing store for a LUN.
			Multiple threads are started to do I/O to the
			backing device, primarily because the VFS API
			requires that to get any concurrency.

ctl_backend_ramdisk.c:	A "fake" ramdisk backend.  It only allocates a
			small amount of memory to act as a source and sink
			for reads and writes from an initiator.  Therefore
			it cannot be used for any real data, but it can be
			used to test for throughput.  It can also be used
			to test initiators' support for extremely large LUNs.

ctl_cmd_table.c:	This is a table with all 256 possible SCSI opcodes,
			and command handler functions defined for supported
			opcodes.

ctl_debug.h:		Debugging support.

ctl_error.c,
ctl_error.h:		CTL-specific wrappers around the CAM sense building
			functions.

ctl_frontend.c,
ctl_frontend.h:		These files define the basic CTL frontend port API.

ctl_frontend_cam_sim.c:	This is a CTL frontend port that is also a CAM SIM.
			This frontend allows for using CTL without any
			target-capable hardware.  So any LUNs you create in
			CTL are visible in CAM via this port.

ctl_frontend_internal.c,
ctl_frontend_internal.h:
			This is a frontend port written for Copan to do
			some system-specific tasks that required sending
			commands into CTL from inside the kernel.  This
			isn't entirely relevant to FreeBSD in general,
			but can perhaps be repurposed.

ctl_ha.h:		This is a stubbed-out High Availability API.  Much
			more is needed for full HA support.  See the
			comments in the header and the description of what
			is needed in the README.ctl.txt file for more
			details.

ctl_io.h:		This defines most of the core CTL I/O structures.
			union ctl_io is conceptually very similar to CAM's
			union ccb.

ctl_ioctl.h:		This defines all ioctls available through the CTL
			character device, and the data structures needed
			for those ioctls.

ctl_mem_pool.c,
ctl_mem_pool.h:		Generic memory pool implementation used by the
			internal frontend.

ctl_private.h:		Private data structres (e.g. CTL softc) and
			function prototypes.  This also includes the SCSI
			vendor and product names used by CTL.

ctl_scsi_all.c,
ctl_scsi_all.h:		CTL wrappers around CAM sense printing functions.

ctl_ser_table.c:	Command serialization table.  This defines what
			happens when one type of command is followed by
			another type of command.

ctl_util.c,
ctl_util.h:		CTL utility functions, primarily designed to be
			used from userland.  See ctladm for the primary
			consumer of these functions.  These include CDB
			building functions.

scsi_ctl.c:		CAM target peripheral driver and CTL frontend port.
			This is the path into CTL for commands from
			target-capable hardware/SIMs.

README.ctl.txt:		CTL code features, roadmap, to-do list.

usr.sbin/Makefile:	Add ctladm.

ctladm/Makefile,
ctladm/ctladm.8,
ctladm/ctladm.c,
ctladm/ctladm.h,
ctladm/util.c:		ctladm(8) is the CTL management utility.
			It fills a role similar to camcontrol(8).
			It allow configuring LUNs, issuing commands,
			injecting errors and various other control
			functions.

usr.bin/Makefile:	Add ctlstat.

ctlstat/Makefile
ctlstat/ctlstat.8,
ctlstat/ctlstat.c:	ctlstat(8) fills a role similar to iostat(8).
			It reports I/O statistics for CTL.

sys/conf/files:		Add CTL files.

sys/conf/NOTES:		Add device ctl.

sys/cam/scsi_all.h:	To conform to more recent specs, the inquiry CDB
			length field is now 2 bytes long.

			Add several mode page definitions for CTL.

sys/cam/scsi_all.c:	Handle the new 2 byte inquiry length.

sys/dev/ciss/ciss.c,
sys/dev/ata/atapi-cam.c,
sys/cam/scsi/scsi_targ_bh.c,
scsi_target/scsi_cmds.c,
mlxcontrol/interface.c:	Update for 2 byte inquiry length field.

scsi_da.h:		Add versions of the format and rigid disk pages
			that are in a more reasonable format for CTL.

amd64/conf/GENERIC,
i386/conf/GENERIC,
ia64/conf/GENERIC,
sparc64/conf/GENERIC:	Add device ctl.

i386/conf/PAE:		The CTL frontend SIM at least does not compile
			cleanly on PAE.

Sponsored by:	Copan Systems, SGI and Spectra Logic
MFC after:	1 month
2012-01-12 00:34:33 +00:00
rwatson
a7ba7afe9c Add "options CAPABILITY_MODE" and "options CAPABILITIES" to GENERIC kernel
configurations for various architectures in FreeBSD 10.x.  This allows
basic Capsicum functionality to be used in the default FreeBSD
configuration on non-embedded architectures; process descriptors are not
yet enabled by default.

MFC after:	3 months
Sponsored by:	Google, Inc
2011-12-29 22:48:36 +00:00
alc
d368df9b98 Eliminate vestiges of page coloring. 2011-12-15 05:07:16 +00:00
ed
cb983d98e7 Replace __signed by signed.
The signed keyword is an integral part of the C syntax. There's no need
to use __signed.
2011-12-13 13:38:03 +00:00
marius
6779d3e54c Revert r225889 a bit. While it's correct that in total store order there's
no need to additionally add CPU memory barriers to the acquire variants of
atomic(9), these are documented to also include compiler memory barriers.
So add the latter, which were previously included by using membar(), back.
2011-12-03 13:51:57 +00:00
jchandra
e5b89f2d70 Fix OF_finddevice error return value in case of FDT.
According to the open firmware standard, finddevice call has to return
a phandle with value of -1 in case of error.

This commit is to:
- Fix the FDT implementation of this interface (ofw_fdt_finddevice) to
  return (phandle_t)-1 in case of error, instead of 0 as it does now.
- Fix up the callers of OF_finddevice() to compare the return value with
  -1 instead of 0 to check for errors.
- Since phandle_t is unsigned, the return value of OF_finddevice should
  be checked with '== -1' rather than '<= 0' or '> 0', fix up these cases
  as well.

Reported by:	nwhitehorn

Reviewed by:	raj
Approved by:	raj, nwhitehorn
2011-12-02 15:24:39 +00:00
marius
eda2646a46 Update comment. 2011-11-27 15:49:46 +00:00
marius
5846bf2bc1 For sparc64 also adjust the geometry of da(4) driven disks to not overflow
the 16-bit cylinders field of the VTOC8 disk label (at around 502GB). The
geometry chosen for disks above that limit allows to use disks up to 2TB,
which is the limit of the extended VTOC8 format. The geometry used for
disks smaller than the 16-bit cylinders limit stays the same as used by
cam_calc_geometry(9) for extended translation.
Thanks to Hans-Joerg Sirtl for providing hardware for testing this change.

MFC after:	3 days
2011-11-27 15:43:40 +00:00
marius
0808475dfb Move to SCHED_ULE by default. Since r226057 SCHED_ULE and sparc64 are
compatible with each other and since r227539 the last issue seen when
using SCHED_ULE is fixed. At least on UP and 2-way machines SCHED_4BSD
still performs better than SCHED_ULE, however, the optimizations done
in r225889 pretty much compensate that so there's at least no net
regression.
Thanks go to Peter Jeremy for extensive testing.
2011-11-25 17:40:01 +00:00
marius
9fd6682e78 Increase the CDMA sync timeout for Schizo bridges to 15 seconds as used by
OpenSolaris. One second turned out to be not enough for certain loads while
10 seconds were sufficient.
Reported by: Peter Jeremy

MFC after:	3 days
2011-11-24 23:48:22 +00:00
marius
1b8636b892 s,KOBJMETHOD_END,DEVMETHOD_END,g in order to fully hide the explicit mention
of kobj(9) from device drivers.
2011-11-22 21:55:40 +00:00
marius
17e14c6132 - There's no need to overwrite the default device method with the default
one. Interestingly, these are actually the default for quite some time
  (bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9)
  since r52045) but even recently added device drivers do this unnecessarily.
  Discussed with: jhb, marcel
- While at it, use DEVMETHOD_END.
  Discussed with: jhb
- Also while at it, use __FBSDID.
2011-11-22 21:28:20 +00:00
pjd
bd5d71350c Fix make universe. 2011-11-16 18:42:43 +00:00
marius
6e89cebeb5 Define curthread as an inline function that loads the thread pointer
directly from g7, the pcpu pointer. This guarantees correct behavior
when the thread migrates to a different CPU.
Commit message stolen from r205431. Additional testing by Peter Jeremy.

MFC after:	3 days
2011-11-15 20:17:18 +00:00
attilio
8e918ec439 Introduce the option VFS_ALLOW_NONMPSAFE and turn it on by default on
all the architectures.
The option allows to mount non-MPSAFE filesystem. Without it, the
kernel will refuse to mount a non-MPSAFE filesytem.

This patch is part of the effort of killing non-MPSAFE filesystems
from the tree.

No MFC is expected for this patch.

Tested by:	gianni
Reviewed by:	kib
2011-11-08 10:18:07 +00:00
ed
0c56cf839d Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
ed
e97eae1577 Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.
This means that their use is restricted to a single C file.
2011-11-07 06:44:47 +00:00
marius
788539ae31 Add a PCI front-end to esp(4) allowing it to support AMD Am53C974 and
replace amd(4) with the former in the amd64, i386 and pc98 GENERIC kernel
configuration files. Besides duplicating functionality, amd(4), which
previously also supported the AMD Am53C974, unlike esp(4) is no longer
maintained and has accumulated enough bit rot over time to always cause
a panic during boot as long as at least one target is attached to it
(see PR 124667).

PR:		124667
Obtained from:	NetBSD (based on)
MFC after:	3 days
2011-11-01 21:26:57 +00:00
marius
3ba0346c4d Actually, limit to 32-bit DMA for the transfer buffers as the address is
written into a 32-bit register.
2011-10-30 21:42:35 +00:00
marius
c0f6f83d04 Correct the DMA constraints, the LSI64854 isn't limited to 32-bit DMA. 2011-10-30 21:19:13 +00:00
marius
e4f8c28c9e - Use device_t rather than the NetBSDish struct device.
- Move esp_devclass to ncr53c9x.c in order to allow different bus front-ends
  to use it.
- Use KOBJMETHOD_END.
- Remove the gl_clear_latched_intr hook as it's not needed for any of the
  chips nor the front-ends supported in FreeBSD and likely never will be.
- Correct the DMA constraints used in the SBus front-end, the LSI64854 isn't
  limited to 32-bit DMA.
- The ESP200 also only supports up to 64k transfers.
- Don't let the DMA and SBus front-end supply a maximum transfer size larger
  than MAXPHYS as that's the maximum the upper layers use and we otherwise
  just waste resources unnecessarily.
- Initialize the ECB callout and don't zero the handle when returning ECBs
  to the free list so that ncr53c9x_callout() actually is called with the
  driver lock held.
- On detach the driver lock should be held across cam_sim_free() according
  to isp(4) and a panic received.
- Check the return value of NCRDMA_SETUP(), i.e. bus_dmamap_load(9), and try
  to handle failures gracefully.
- In ncr53c9x_action() replace N calls to xpt_done() in a switch with just
  one at the end.
- On XPT_PATH_INQ report "NCR" rather than "Sun" as the vendor as the former
  is somewhat more correct as well as the maximum supported transfer size via
  maxio in order to take advantage of controllers that that can handle more
  than DFLTPHYS.
- Print the number of MESSAGE (EXTENDED) rejected.
- Fix the path encoded in the multiple inclusion protection of ncr53c9xvar.h.
- Correct the DMA constraints used in the LSI64854 core to not exceed the
  maximum supported transfer size and include the boundary so we don't need
  to check on every setup of a DMA transfer.
- Let the bus DMA map callbacks do nothing in case of an error.
- Correctly handle > 64k transfers for FAS366 in the LSI64854. A new feature
  flag NCR_F_LARGEXFER was introduced so we just need to check for this one
  and not for individual controllers supporting large transfers in several
  places.
- Let the LSI64854 core load transfer buffers using BUS_DMA_NOWAIT as the
  NCR53C9x core can't handle EINPROGRESS. Due to lack of bounce buffers
  support, sparc64 doesn't actually use EINPROGRESS and likely never will,
  as an example for writing additional front-ends for the NCR53C9x core it
  makes sense to set BUS_DMA_NOWAIT anyway though.
- Some minor cleanup.
2011-10-30 21:17:42 +00:00
kensmith
72c29d057f Adjust the debugger options slightly. This should help me do the right
thing when changing the debugging options as part of head becoming a new
stable branch.  It may also help people who for one reason or another want
to run head but don't want it slowed down by the debugging support.

Reviewed by:	kib
2011-10-27 13:07:49 +00:00
das
28e8dea258 People porting FreeBSD to new architectures ought not have to
implement a deprecated FPU control interface in addition to the
standard one.  To make this clearer, further deprecate ieeefp.h
by not declaring the function prototypes except on architectures
that implement them already.

Currently i386 and amd64 implement the ieeefp.h interface for
compatibility, and for fp[gs]etprec(), which doesn't exist on
most other hardware.  Powerpc, sparc64, and ia64 partially implement
it and probably shouldn't, and other architectures don't implement it
at all.
2011-10-21 06:41:46 +00:00
kensmith
68e5099fa6 Add a warning about why sbp(4) is commented out so that curious folks
are forewarned they might wind up with a hole in their foot if they
decide to give it a try.

Suggested by:	dougb
2011-10-19 21:55:20 +00:00
kensmith
756024305f Comment out the sbp(4) driver for architectures that support it.
As part of the 8.0-RELEASE cycle this was done in stable/8 (r199112)
but was left alone in head so people could work on fixing an issue that
caused boot failure on some motherboards.  Apparently nobody has worked
on it and we are getting reports of boot failure with the 9.0 test builds.
So this time I'll comment out the driver in head (still hoping someone
will work on it) and MFC to stable/9.

Submitted by:	Alberto Villa <avilla at FreeBSD dot org>
2011-10-18 13:45:16 +00:00
des
9ae3191473 Trace attempts to call restricted MD syscalls. 2011-10-18 07:39:27 +00:00
marius
a4fb38841f Merge from NetBSD:
- Remove clause 3 and 4 from TNF licenses.
- Fix memset usage.
- Various cleanup.
- Kill caddr_t.
2011-10-15 09:29:43 +00:00
kib
1134edae2b Remove unused define.
MFC after:	1 month
2011-10-07 16:09:44 +00:00
marius
8cb5f6fec8 - Use atomic operations rather than sched_lock for safely assigning pm_active
and pc_pmap for SMP. This is key to allowing adding support for SCHED_ULE.
  Thanks go to Peter Jeremy for additional testing.
- Add support for SCHED_ULE to cpu_switch().

Committed from:	201110DevSummit
2011-10-06 11:01:31 +00:00
marius
955c9c92f4 Actually enable NEW_PCIB by default, missed in r225931. 2011-10-02 23:31:14 +00:00
marius
b730263346 Make sparc64 compatible with NEW_PCIB and enable it:
- Implement bus_adjust_resource() methods as far as necessary and in non-PCI
  bridge drivers as far as feasible without rototilling them.
- As NEW_PCIB does a layering violation by activating resources at layers
  above pci(4) without previously bubbling up their allocation there, move
  the assignment of bus tags and handles from the bus_alloc_resource() to
  the bus_activate_resource() methods like at least the other NEW_PCIB
  enabled architectures do. This is somewhat unfortunate as previously
  sparc64 (ab)used resource activation to indicate whether SYS_RES_MEMORY
  resources should be mapped into KVA, which is only necessary if their
  going to be accessed via the pointer returned from rman_get_virtual() but
  not for bus_space(9) as the later always uses physical access on sparc64.
  Besides wasting KVA if we always map in SYS_RES_MEMORY resources, a driver
  also may deliberately not map them in if the firmware already has done so,
  possibly in a special way. So in order to still allow a driver to decide
  whether a SYS_RES_MEMORY resource should be mapped into KVA we let it
  indicate that by calling bus_space_map(9) with BUS_SPACE_MAP_LINEAR as
  actually documented in the bus_space(9) page. This is implemented by
  allocating a separate bus tag per SYS_RES_MEMORY resource and passing the
  resource via the previously unused bus tag cookie so we later on can call
  rman_set_virtual() in sparc64_bus_mem_map(). As a side effect this now
  also allows to actually indicate that a SYS_RES_MEMORY resource should be
  mapped in as cacheable and/or read-only via BUS_SPACE_MAP_CACHEABLE and
  BUS_SPACE_MAP_READONLY respectively.
- Do some minor cleanup like taking advantage of rman_init_from_resource(),
  factor out the common part of bus tag allocation into a newly added
  sparc64_alloc_bus_tag(), hook up some missing newbus methods and replace
  some homegrown versions with the generic counterparts etc.
- While at it, let apb_attach() (which can't use the generic NEW_PCIB code
  as APB bridges just don't have the base and limit registers implemented)
  regarding the config space registers cached in pcib_softc and the SYSCTL
  reporting nodes set up.
2011-10-02 23:22:38 +00:00
marius
5d35bac957 Remove obsolete macros. 2011-10-01 13:33:14 +00:00
marius
56e98307d3 Nuke SUN4U #ifdef's which with the demise of sun4v no longer serve any
purpose.
2011-10-01 13:16:01 +00:00
marius
4ec9e241d8 Also allocate space for the PIL counters. Given that no machine actually
uses IV_MAX interrupt vectors this wasn't a problem in practice though.
2011-10-01 13:11:29 +00:00
marius
e7b8707889 Re-reading the Schizo errata suggests that it's actually tolerable to
also use the streaming buffer of pre version 5/revision 2.3 hardware as
long as we stay away from context flushes (which iommu(4) so far doesn't
take advantage of). OpenSolaris does the same.
2011-10-01 00:31:30 +00:00
marius
1b77ae249c - Add protective parentheses to macros as far as possible.
- Move {r,w,}mb() to the top of this file where they live on most of the
  other architectures.
2011-10-01 00:22:24 +00:00
marius
3c6a68b66d In total store which we use for running the kernel and all of the userland
atomic operations behave as if the were followed by a memory barrier so
there's no need to include ones in the acquire variants of atomic(9).
Removing these results a small performance improvement, specifically this
is sufficient to compensate the performance loss seen in the worldstone
benchmark seen when using SCHED_ULE instead of SCHED_4BSD.
This change is inspired by Linux even more radically doing the equivalent
thing some time ago.
Thanks go to Peter Jeremy for additional testing.
2011-10-01 00:11:03 +00:00
marius
c4a8a741fd Add a comment about why contrary to what once would think running all of
userland with total store order actually is appropriate.
2011-09-30 20:23:18 +00:00
marius
0175576bbb Use the extended integer condition code when comparing 64-bit values. Given
that ATOMIC_INC_LONG currently is unused this happened to not be fatal.
2011-09-30 20:13:51 +00:00
marius
a16da0d422 - Right-justify backslashes as suggested by style(9).
- Rename ATOMIC_INC_ULONG to ATOMIC_INC_LONG in order to be consistent with
  the names of the other macros in this file an adjust accordingly.
2011-09-30 20:06:23 +00:00
kib
126da2118c Remove locking of the vm page queues from several pmaps, which only
protected the dirty mask updates. The dirty mask updates are handled
by atomics after the r225840.

Submitted by:	alc
Tested by:	flo (sparc64)
MFC after:	2 weeks
2011-09-28 15:01:20 +00:00
attilio
420a3a3f8b It is safe to initialize locks even on early boot (and it is the same
thing all the other architectures already do) thus just initialize
kernel_pmap in pmap_bootstrap().

Reported by:	alc
Reviewed by:	alc, marius
Tested by:	flo, marius
Approved by:	re (kib)
MFC after:	1 week
2011-09-19 18:29:15 +00:00
kmacy
99851f359e In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
brueffer
129f307eb8 Fix a zyd(4) comment typo that was copy+pasted into most kernel config files.
PR:		160276
Submitted by:	MATSUMIYA Ryo <matsumiya@mma.club.uec.ac.jp>
Approved by:	re (kib)
MFC after:	1 week
2011-09-11 17:39:51 +00:00
kib
55d0a85118 Inline the syscallenter() and syscallret(). This reduces the time measured
by the syscall entry speed microbenchmarks by ~10% on amd64.

Submitted by:	jhb
Approved by:	re (bz)
MFC after:	2 weeks
2011-09-11 16:05:09 +00:00
kib
a9d505a22a Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.

Document the changes to flags field to only require the page lock.

Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.

Reviewed by:    alc, attilio
Tested by:      marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by:	re (bz)
2011-09-06 10:30:11 +00:00
marius
f93067d423 Since r221218 rman_manage_region(9) actually honors rm_start and rm_end
which may cause problems when these contain garbage so zero the range
descriptors embedding the rmans when allocating them.

Approved by:	re (kib)
MFC after:	3 days
2011-08-28 11:49:53 +00:00
kib
f408aa11a3 - Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag
to VPO_UNMANAGED (and also making the flag protected by the vm object
  lock, instead of vm page queue lock).
- Mark the fake pages with both PG_FICTITIOUS (as it is now) and
  VPO_UNMANAGED. As a consequence, pmap code now can use use just
  VPO_UNMANAGED to decide whether the page is unmanaged.

Reviewed by:	alc
Tested by:	pho (x86, previous version), marius (sparc64),
    marcel (arm, ia64, powerpc), ray (mips)
Sponsored by:	The FreeBSD Foundation
Approved by:	re (bz)
2011-08-09 21:01:36 +00:00
rmacklem
3b8ed22952 Change all the sample kernel configurations to use
NFSCL, NFSD instead of NFSCLIENT, NFSSERVER since
NFSCL and NFSD are now the defaults. The client change is
needed for diskless configurations, so that the root
mount works for fstype nfs.
Reported by seanbru at yahoo-inc.com for i386/XEN.

Approved by:	re (hrs)
2011-08-07 20:16:46 +00:00
marius
7da5e4e4a0 - Merge from r147740:
When the last, possibly partially filled buffer is flushed, we didn't
  reset fragsz to 0 and as such would stop reflecting reality.
- Use __FBSDID.
- Wrap a too long line.

Approved by:	re (kib)
MFC after:	1 week
2011-08-06 17:45:52 +00:00
marius
5c4da90f12 Remove a shortcut which is invalid with MAXCPU > IDR_CHEETAH_MAX_BN_PAIRS.
Approved by:	re (kib)
2011-08-06 17:45:11 +00:00
marius
133e220c9e Merge from r224217:
Bump MAXCPU to 64.

Approved by:	re (kib)
2011-07-20 18:51:18 +00:00
attilio
1f74ad4e1b On 64 bit architectures size_t is 8 bytes, thus it should use an 8 bytes
storage.
Fix the sintrcnt/sintrnames specification.

No MFC is previewed for this patch.

Reported, reviewed and tested by:	marcel
Approved by:	re (kib)
2011-07-19 12:41:57 +00:00
attilio
a73e834ebb Add the possibility to specify from kernel configs MAXCPU value.
This patch is going to help in cases like mips flavours where you
want a more granular support on MAXCPU.

No MFC is previewed for this patch.

Tested by:	pluknet
Approved by:	re (kib)
2011-07-19 00:37:24 +00:00
attilio
9a6ff5ad37 - Remove the eintrcnt/eintrnames usage and introduce the concept of
sintrcnt/sintrnames which are symbols containing the size of the 2
  tables.
- For amd64/i386 remove the storage of intr* stuff from assembly files.
  This area can be widely improved by applying the same to other
  architectures and likely finding an unified approach among them and
  move the whole code to be MI. More work in this area is expected to
  happen fairly soon.

No MFC is previewed for this patch.

Tested by:	pluknet
Reviewed by:	jhb
Approved by:	re (kib)
2011-07-18 15:19:40 +00:00
marius
e264353fe1 Remove NULL assignments which are redundant for static timecounters.
Submitted by:	jkim
2011-07-12 18:10:56 +00:00
marius
105be4437c - Remove redundant timecounter masking from counter_get_timecount().
- Zero the timecounter when allocation so we don't need to initialize unused
  members and remove a now redundant NULL assignment.

Submitted by:	jkim
2011-07-12 18:02:37 +00:00
marius
263e819722 - Current testing shows that (ab)using the JBC performance counter in bus
cycle mode as timecounter just works fine. My best guess is that a firmware
  update has fixed this, check at run-time whether it advances and use a
  positive quality if it does. The latter will cause this timecounter to be
  used instead of the tick counter based one, which just sucks for SMP.
- Remove a redundant NULL assignment from the timecounter initialization.
2011-07-12 17:56:42 +00:00
marius
447e9ac31d - Add a missing shift in schizo_get_timecount(). This happened to be non-fatal
as STX_CTRL_PERF_CNT_CNT0_SHIFT actually is zero, if we were using the
  second counter in the upper 32 bits this would be required though as the MI
  timecounter code doesn't support 64-bit counters/counter registers.
- Remove a redundant NULL assignment from the timecounter initialization.
2011-07-12 17:55:34 +00:00
marius
81d1868c05 Remove the IDR_CHEETAH_MAX_BN_PAIRS limit from cheetah_ipi_selected().
This is just a simple approach. For reasons unknown OpenSolaris uses a
more sophisticated one involving IPIing the remaining CPUs in reverse
order after the first batch of 32.
2011-07-05 20:05:06 +00:00