Commit Graph

1988 Commits

Author SHA1 Message Date
ian
5b38501da6 Fix low-level uart drivers that set their fifo sizes in the softc too late.
uart(4) allocates send and receiver buffers in attach() before it calls
the low-level driver's attach routine.  Many low-level drivers set the
fifo sizes in their attach routine, which is too late.  Other drivers set
them in the probe() routine, so that they're available when uart(4)
allocates buffers.  This fixes the ones that were setting the values too
late by moving the code to probe().
2013-04-01 00:44:20 +00:00
kib
7c26a038f9 Implement the concept of the unmapped VMIO buffers, i.e. buffers which
do not map the b_pages pages into buffer_map KVA.  The use of the
unmapped buffers eliminate the need to perform TLB shootdown for
mapping on the buffer creation and reuse, greatly reducing the amount
of IPIs for shootdown on big-SMP machines and eliminating up to 25-30%
of the system time on i/o intensive workloads.

The unmapped buffer should be explicitely requested by the GB_UNMAPPED
flag by the consumer.  For unmapped buffer, no KVA reservation is
performed at all. The consumer might request unmapped buffer which
does have a KVA reserve, to manually map it without recursing into
buffer cache and blocking, with the GB_KVAALLOC flag.

When the mapped buffer is requested and unmapped buffer already
exists, the cache performs an upgrade, possibly reusing the KVA
reservation.

Unmapped buffer is translated into unmapped bio in g_vfs_strategy().
Unmapped bio carry a pointer to the vm_page_t array, offset and length
instead of the data pointer.  The provider which processes the bio
should explicitely specify a readiness to accept unmapped bio,
otherwise g_down geom thread performs the transient upgrade of the bio
request by mapping the pages into the new bio_transient_map KVA
submap.

The bio_transient_map submap claims up to 10% of the buffer map, and
the total buffer_map + bio_transient_map KVA usage stays the
same. Still, it could be manually tuned by kern.bio_transient_maxcnt
tunable, in the units of the transient mappings.  Eventually, the
bio_transient_map could be removed after all geom classes and drivers
can accept unmapped i/o requests.

Unmapped support can be turned off by the vfs.unmapped_buf_allowed
tunable, disabling which makes the buffer (or cluster) creation
requests to ignore GB_UNMAPPED and GB_KVAALLOC flags.  Unmapped
buffers are only enabled by default on the architectures where
pmap_copy_page() was implemented and tested.

In the rework, filesystem metadata is not the subject to maxbufspace
limit anymore. Since the metadata buffers are always mapped, the
buffers still have to fit into the buffer map, which provides a
reasonable (but practically unreachable) upper bound on it. The
non-metadata buffer allocations, both mapped and unmapped, is
accounted against maxbufspace, as before. Effectively, this means that
the maxbufspace is forced on mapped and unmapped buffers separately.
The pre-patch bufspace limiting code did not worked, because
buffer_map fragmentation does not allow the limit to be reached.

By Jeff Roberson request, the getnewbuf() function was split into
smaller single-purpose functions.

Sponsored by:	The FreeBSD Foundation
Discussed with:	jeff (previous version)
Tested by:	pho, scottl (previous version), jhb, bf
MFC after:	2 weeks
2013-03-19 14:13:12 +00:00
kib
63efc821c3 Add pmap function pmap_copy_pages(), which copies the content of the
pages around, taking array of vm_page_t both for source and
destination.  Starting offsets and total transfer size are specified.

The function implements optimal algorithm for copying using the
platform-specific optimizations.  For instance, on the architectures
were the direct map is available, no transient mappings are created,
for i386 the per-cpu ephemeral page frame is used.  The code was
typically borrowed from the pmap_copy_page() for the same
architecture.

Only i386/amd64, powerpc aim and arm/arm-v6 implementations were
tested at the time of commit. High-level code, not committed yet to
the tree, ensures that the use of the function is only allowed after
explicit enablement.

For sparc64, the existing code has known issues and a stab is added
instead, to allow the kernel linking.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho (i386, amd64), scottl (amd64), ian (arm and arm-v6)
MFC after:	2 weeks
2013-03-14 20:18:12 +00:00
attilio
7fd2627275 MFC 2013-03-09 01:39:42 +00:00
attilio
bf1dc90446 MFC 2013-03-08 00:03:07 +00:00
gavin
2caee59a46 Correct two spelling mistakes in a comment. 2013-03-07 13:24:49 +00:00
attilio
e98f58faf6 MFC 2013-03-02 14:48:41 +00:00
marius
ee248b021f - Revert the part of r247601 which turned the overtemperature and power fail
interrupt shutdown handlers into filters. Shutdown_nice(9) acquires a sleep
  lock, which filters shouldn't do. It also seems that kern_reboot(9) still
  may require Giant to be hold.
- Correct an incorrect argument to shutdown_nice(9).

Submitted by:	bde
2013-03-02 13:08:13 +00:00
marius
b1d7b9754b Revert the part of r247600 which turned the overtemperature and power fail
interrupt shutdown handlers into filters. Shutdown_nice(9) acquires a sleep
lock, which filters shouldn't do. It also seems that kern_reboot(9) still
may require Giant to be hold.

Submitted by:	bde
2013-03-02 13:04:58 +00:00
marius
dd15932a15 - Apparently, it's no longer a problem to call shutdown_nice(9) from within
an interrupt filter (some other drivers in the tree do the same). So
  change the overtemperature and power fail interrupts from handlers in order
  to code and get rid of a !INTR_MPSAFE handlers.
- Mark unused parameters as such.
- Use NULL instead of 0 for pointers.

MFC after:	1 week
2013-03-02 00:41:51 +00:00
marius
2774e0404e - While Netra X1 generally show no ill effects when registering a power
fail interrupt handler, there seems to be either a broken batch of them
  or a tendency to develop a defect which causes this interrupt to fire
  inadvertedly. Given that apart from this problem these machines work
  just fine, add a tunable allowing the setup of the power fail interrupt
  to be disabled.
  While at it, remove the DEBUGGER_ON_POWERFAIL compile time option and
  make that behavior also selectable via the newly added tunable.
- Apparently, it's no longer a problem to call shutdown_nice(9) from within
  an interrupt filter (some other drivers in the tree do the same). So
  change the power fail interrupt from an handler in order to simplify the
  code and get rid of a !INTR_MPSAFE handler.
- Use NULL instead of 0 for pointers.

MFC after:	1 week
2013-03-02 00:37:31 +00:00
marius
0c5e0b209e - In sbbc_pci_attach() just pass the already obtained bus tag and handle
instead of acquiring these anew.
- Use NULL instead of 0 for pointers.

MFC after:	1 week
2013-03-01 20:36:59 +00:00
marius
944a48f5cd - Remove an unused header.
- Use NULL instead of 0 for pointers.
- Let ofw_pcib_probe() return BUS_PROBE_DEFAULT instead of 0 so specialized
  PCI-PCI-bridge drivers may attach instead.
- Add WARs for PLX Technology PEX 8114 bridges and PEX 8532 switches.
  Ideally, these should live in MI code but at least for the latter we're
  missing the necessary infrastructure there.

MFC after:	1 week
2013-03-01 20:34:02 +00:00
mav
6cf7cc6e4d MFcalloutng:
Switch eventtimers(9) from using struct bintime to sbintime_t.
Even before this not a single driver really supported full dynamic range of
struct bintime even in theory, not speaking about practical inexpediency.
This change legitimates the status quo and cleans up the code.
2013-02-28 13:46:03 +00:00
attilio
8d28f94790 Merge from vmobj-rwlock:
VM_OBJECT_LOCKED() macro is only used to implement a custom version
of lock assertions right now (which likely spread out thanks to
copy and paste).
Remove it and implement actual assertions.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
Tested by:	pho
2013-02-27 18:12:13 +00:00
attilio
cb47f0509b Merge from vmobj-rwlock branch:
Remove unused inclusion of vm/vm_pager.h and vm/vnode_pager.h.

Sponsored by:	EMC / Isilon storage division
Tested by:	pho
Reviewed by:	alc
2013-02-26 01:00:11 +00:00
attilio
905e648d42 Hide the details for the assertion for VM_OBJECT_LOCK operations.
Rename current VM_OBJECT_LOCK_ASSERT(foo, RA_WLOCKED) into
VM_OBJECT_ASSERT_WLOCKED(foo)

Sponsored by:	EMC / Isilon storage division
Requested by:	alc
2013-02-21 21:54:53 +00:00
attilio
066bbc97b6 Fix other architectures and ZFS.
Sponsored by:	EMC / Isilon storage division
2013-02-21 15:02:36 +00:00
attilio
15bf891afe Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() to
their "write" versions.

Sponsored by:	EMC / Isilon storage division
2013-02-20 12:03:20 +00:00
attilio
1f1e13ca03 There is no need to use VM_OBJECT_LOCKED() as the assertion won't
make the check available in any case if INVARIANTS is switched off.
Remove VM_OBJECT_LOCKED().
2013-02-20 10:51:34 +00:00
attilio
658534ed5a Switch vm_object lock to be a rwlock.
* VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations
* VM_OBJECT_SLEEP() is introduced as a general purpose primitve to
  get a sleep operation using a VM_OBJECT_LOCK() as protection
* The approach must bear with vm_pager.h namespace pollution so many
  files require including directly rwlock.h
2013-02-20 10:38:34 +00:00
kib
bd7f0fa0bb Reform the busdma API so that new types may be added without modifying
every architecture's busdma_machdep.c.  It is done by unifying the
bus_dmamap_load_buffer() routines so that they may be called from MI
code.  The MD busdma is then given a chance to do any final processing
in the complete() callback.

The cam changes unify the bus_dmamap_load* handling in cam drivers.

The arm and mips implementations are updated to track virtual
addresses for sync().  Previously this was done in a type specific
way.  Now it is done in a generic way by recording the list of
virtuals in the map.

Submitted by:	jeff (sponsored by EMC/Isilon)
Reviewed by:	kan (previous version), scottl,
	mjacob (isp(4), no objections for target mode changes)
Discussed with:	     ian (arm changes)
Tested by:	marius (sparc64), mips (jmallet), isci(4) on x86 (jharris),
	amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)
2013-02-12 16:57:20 +00:00
kib
dfffe11d71 The 'end' word was missed in the comment.
MFC after:     3 days
2013-02-08 15:52:20 +00:00
eadler
6a1efe1ad9 Remove support for plip from the GENERIC kernel as no systems in the
last 10 years require this support.

Discussed with:	db
Discussed with:	kib
Reviewed by:	imp
Reviewed by:	jhb
Reviewed by:	-hackers
Approved by:	cperciva (mentor)
2013-02-01 20:17:11 +00:00
marius
6892c8a3be Revert the part of r239864 which removed obtaining the SMP mutex around
reading registers from other CPUs. As it turns out, the hardware doesn't
really like concurrent IPI'ing causing adverse effects. Also the thought
deadlock when using this spin lock here and the targeted CPU(s) are also
holding or in case of nested locks can't actually happen. This is due to
the fact that on sparc64, spinlock_enter() only raises the PIL but doesn't
disable interrupts completely. Thus direct cross calls as used for the
register reading (and all other MD IPI needs) still will be executed by
the targeted CPU(s) in that case.

MFC after:	3 days
2013-01-23 22:52:20 +00:00
marius
0e18f8466b Revert bogus part of r241740.
Reported by:	Michael Moll

MFC after:	3 days
2013-01-03 23:12:08 +00:00
kib
5a9188f8d3 Enable the UFS quotas for big-iron GENERIC kernels.
Discussed with:	      mckusick
MFC after:	      2 weeks
2013-01-03 19:03:41 +00:00
des
67e77c00a8 As discussed on -current last October, remove the firewire drivers from
GENERIC.
2013-01-03 14:30:24 +00:00
marius
f218d978bf Revert r237842 and switch back to SCHED_ULE. All problems I encountered
with the latter have been fixed with r241780.

MFC after:	3 days
2012-12-16 20:54:07 +00:00
kib
bc5bfde14d Move the declaration of vm_phys_paddr_to_vm_page() from vm/vm_page.h
to vm/vm_phys.h, where it belongs.

Requested and reviewed by:	alc
MFC after:	2 weeks
2012-11-16 05:55:56 +00:00
jeff
f40f3c3255 - Implement run-time expansion of the KTR buffer via sysctl.
- Implement a function to ensure that all preempted threads have switched
   back out at least once.  Use this to make sure there are no stale
   references to the old ktr_buf or the lock profiling buffers before
   updating them.

Reviewed by:	marius (sparc64 parts), attilio (earlier patch)
Sponsored by:	EMC / Isilon Storage Division
2012-11-15 00:51:57 +00:00
kib
e8ae50d444 Flip the semantic of M_NOWAIT to only require the allocation to not
sleep, and perform the page allocations with VM_ALLOC_SYSTEM
class. Previously, the allocation was also allowed to completely drain
the reserve of the free pages, being translated to VM_ALLOC_INTERRUPT
request class for vm_page_alloc() and similar functions.

Allow the caller of malloc* to request the 'deep drain' semantic by
providing M_USE_RESERVE flag, now translated to VM_ALLOC_INTERRUPT
class. Previously, it resulted in less aggressive VM_ALLOC_SYSTEM
allocation class.

Centralize the translation of the M_* malloc(9) flags in the single
inline function malloc2vm_flags().

Discussion started by:	"Sears, Steven" <Steven.Sears@netapp.com>
Reviewed by:	alc, mdf (previous version)
Tested by:	pho (previous version)
MFC after:	2 weeks
2012-11-14 20:01:40 +00:00
dim
373133f0ad Remove duplicate const specifiers in many drivers (I hope I got all of
them, please let me know if not).  Most of these are of the form:

static const struct bzzt_type {
	[...list of members...]
} const bzzt_devs[] = {
	[...list of initializers...]
};

The second const is unnecessary, as arrays cannot be modified anyway,
and if the elements are const, the whole thing is const automatically
(e.g. it is placed in .rodata).

I have verified this does not change the binary output of a full kernel
build (except for build timestamps embedded in the object files).

Reviewed by:	yongari, marius
MFC after:	1 week
2012-11-05 19:16:27 +00:00
attilio
f3501b109e Rework the known rwlock to benefit about staying on their own
cache line in order to avoid manual frobbing but using
struct rwlock_padalign.

Reviewed by:	alc, jimharris
2012-11-03 23:03:14 +00:00
marius
807619d8ba - Give PIL_PREEMPT the lowest priority just above low/stray interrupts.
The reason for this is that the SPARC v9 architecture allows nested
  interrupts of higher priority/level than that of the current interrupt
  to occur (and we can't just entirely bypass this model, also, at least
  for tick interrupts, this also wouldn't be wise). However, when a
  preemption interrupt interrupts another interrupt of lower priority,
  f.e. PIL_ITHREAD, and that one in turn is nested by a third interrupt,
  f.e. PIL_TICK, with SCHED_ULE the execution of interrupts higher than
  PIL_PREEMPT may be migrated to another CPU. In particular, tl1_ret(),
  which is responsible for restoring the state of the CPU prior to entry
  to the interrupt based on the (also migrated) trap frame, then is run
  on a CPU which actually didn't receive the interrupt in question,
  causing an inappropriate processor interrupt level to be "restored".
  In turn, this causes interrupts of the first level, i.e. PIL_ITHREAD
  in the above scenario, to be blocked on the target of the migration
  until the correct PIL happens to be restored again on that CPU again.
  Making PIL_PREEMPT the lowest real priority, this effectively prevents
  this scenario from happening, as preemption interrupts no longer can
  interrupt any other interrupt besides stray ones (which is no issue).
  Thanks to attilio@ and especially mav@ for helping me to understand
  this problem at the 201208DevSummit.
- Give PIL_STOP (which is also used for IPI_STOP_HARD, given that there's
  no real equivalent to NMIs on SPARC v9) the highest possible priority
  just below the hardwired PIL_TICK, so it has a chance to interrupt
  more things.

MFC after:	1 week
2012-10-20 12:07:48 +00:00
marius
0e1f679c31 - Remove an unused header.
- Don't waste a delay slot.

MFC after:	3 days
2012-10-19 17:12:55 +00:00
marius
9e47c0d1ff Let SCHED_ULE give affinity to the CPU the tick interrupt triggered on
when running tick_process(), similarly to what the x86 equivalents of
this function do, however employing the less racy sequence also used in
intr_event_handle().

MFC after:	3 days
2012-10-19 13:32:37 +00:00
attilio
6997194551 Add an unified macro to deny ability from the compiler to reorder
instruction loads/stores at its will.
The macro __compiler_membar() is currently supported for both gcc and
clang, but kernel compilation will fail otherwise.

Reviewed by:	bde, kib
Discussed with:	dim, theraven
MFC after:	2 weeks
2012-10-09 14:32:30 +00:00
attilio
3212891c92 Reverts r234074,234105,234564,234723,234989,235231-235232 and part of
r234247.
Use, instead, the static intializer introduced in r239923 for x86 and
sparc64 intr_cpus, unwinding the code to the initial version.

Reviewed by:	marius
2012-10-09 12:22:43 +00:00
alc
55f6ff40ed Eliminate a stale comment. It describes another use case for the pmap in
Mach that doesn't exist in FreeBSD.
2012-09-28 05:30:59 +00:00
eadler
8600cbb5b6 Correct double "the the"
Approved by:	cperciva
MFC after:	3 days
2012-09-14 21:28:56 +00:00
attilio
8dece93b14 userret() already checks for td_locks when INVARIANTS is enabled, so
there is no need to check if Giant is acquired after it.

Reviewed by:	kib
MFC after:	1 week
2012-09-08 18:27:11 +00:00
gavin
9db084a6c3 Prevent indent(1) from reformatting this comment, as it contains
a formatting-sensitive table.
2012-09-07 08:18:06 +00:00
marius
1829ce3546 Add a global MD macro for the VIS block size instead of duplicating
it and using magic values all over the place.

MFC after:	1 week
2012-08-31 11:15:01 +00:00
marius
9f542929d1 - Unlike cache invalidation and TLB demapping IPIs, reading registers from
other CPUs doesn't require locking so get rid of it. As the latter is used
  for the timecounter on certain machine models, using a spin lock in this
  case can lead to a deadlock with the upcoming callout(9) rework.
- Merge r134227/r167250 from x86:
  Avoid cross-IPI SMP deadlock by using the smp_ipi_mtx spin lock not only
  for smp_rendezvous_cpus() but also for the MD cache invalidation and TLB
  demapping IPIs.
- Mark some unused function arguments as such.

MFC after:	1 week
2012-08-29 16:56:50 +00:00
gjb
3f013cdf9f Grammar fix: s/NIC's/NICs/
MFC after:	3 days
2012-08-26 01:21:02 +00:00
marius
4c012f63e6 Merge r236494 from x86:
Isolate the global TTE list lock from data and other locks to prevent false
sharing within the cache.

MFC after:	3 days
2012-08-05 22:03:13 +00:00
marius
d339b71305 Switch back to the 4BSD scheduler for now. There is some more or less
recent regression with ULE, causing processes to get stuck in getblk
as well as interrupt handler execution delays to rise above the command
timeout of mpt(4).

MFC after:	3 days
2012-06-30 14:55:36 +00:00
ken
da17d879d7 Now that the mps(4) driver is endian-safe, add it to the powerpc and
sparc64 GENERIC config files.

MFC after:	3 days
2012-06-28 20:48:24 +00:00
alc
c5e6daff9d Add new pmap layer locks to the predefined lock order. Change the names
of a few existing VM locks to follow a consistent naming scheme.
2012-06-27 03:45:25 +00:00