Commit Graph

98229 Commits

Author SHA1 Message Date
delphij
0171695909 MFV r254070:
Merge vendor bugfix for ZFS test suite that triggers false positives.

Illumos ZFS issues:
  3949 ztest fault injection should avoid resilvering devices
  3950 ztest: deadman fires when we're doing a scan
  3951 ztest hang when running dedup test
  3952 ztest: ztest_reguid test and ztest_fault_inject don't place nice together
2013-08-07 21:16:14 +00:00
jhb
9481e259bb Don't emit a spurious EVFILT_PROC event with no fflags set on process exit
if NOTE_EXIT is not being monitored.  The rationale is that a listener
should only get an event for exit() if they registered interest via
NOTE_EXIT.  This matches the behavior on OS X.
- Don't save the exit status on process exit unless NOTE_EXIT is being
  monitored.
- Add an internal EV_DROP flag that requests kqueue_scan() to free the
  knote without signalling it to userland and use this when a process
  exits but the fflags in the knote is zero.

Reviewed by:	jmg
MFC after:	1 month
2013-08-07 19:56:35 +00:00
kib
8de1718b60 Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain.  The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.

The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.

Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass.  This is not optimal, since it
could cause excessive page deactivation and freeing.  The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.

The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues.  Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.

Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.

Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
kib
a3142db9ac Change the pmap_ts_referenced() method of amd64 pmap to use shared
pvh_global_lock.  This allows the method to be executed in parallel,
avoiding undue contention on the pvh_global_lock for the multithreaded
pagedaemon.

The pmap_ts_referenced() function has to inspect the page mappings for
several pmaps, which need to be locked while pv list lock is owned.
This contradicts to the lock order, where pmap lock is before pv list
lock.  Introduce the generation count for the pv list of the page or
superpage, which indicate any change in the pv list, and, as usual,
perform restart of the iteration if generation changed while pv lock
was dropped for blocking acquire of a pmap lock.

Reported and tested by:	pho
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
2013-08-07 16:33:15 +00:00
cognet
410d14e3ed Don't bother trying to work around buffers which are not aligned on a cache
line boundary. It has never been 100% correct, and it can't work on SMP,
because nothing prevents another core from accessing data from an unrelated
buffer in the same cache line while we invalidated it. Just use bounce pages
instead.

Reviewed by:	ian
Approved by:	mux (mentor) (implicit)
2013-08-07 15:44:58 +00:00
mav
1e43e695b7 Remove droping topology mutex after iterating 100 periphs in CAMGETPASSTHRU.
That is not so slow and so often operation to handle unneeded otherwise
xsoftc.xpt_generation and respective locking complications.
2013-08-07 11:34:20 +00:00
ganbold
7f7658c9c8 Bring initial support for Allwinner A20 SoC (Cubieboard2).
Add support for A20 timer.
	Correct interrupt offset depending from chip.
	Add basic code for CPU configuration module.
	For now, add kernel config and dts file
	(only FDT blob related problem needs to be solved later in
	order to have one kernel for both cubieboard1 and 2).

Approved by: ray@
2013-08-07 11:07:56 +00:00
mav
7ddb89a6c3 Improve r253721 by reporting detected lack of BIO_FLUSH support to GEOM.
That prevents more of such requests from coming and errors from logging.
2013-08-07 08:20:11 +00:00
avg
0255b98224 enable KDB_TRACE in GENERICs
KDB_TRACE is not an alternative to DDB/etc, they are complementary.
So I do not see any reason to not enable KDB_TRACE by default.

X-MFC after:	never (change specific to head)
2013-08-07 08:03:50 +00:00
kevlo
52419f21e1 Remove unsigned comparison < 0
Found by:	LLVM
Reviewed by:	luigi
2013-08-07 07:22:56 +00:00
jeff
de4ecca213 Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

 - Normalize functions that allocate memory to use kmem_*
 - Those that allocate address space are named kva_*
 - Those that operate on maps are named kmap_*
 - Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	EMC / Isilon Storage Division
2013-08-07 06:21:20 +00:00
markj
f82b0dd694 Add a missing module version declaration to if_tun(4).
PR:		181078
Submitted by:	Brandon Gooch <jamesbrandongooch@gmail.com>
MFC after:	1 week
2013-08-07 01:32:08 +00:00
markj
44ee260831 Fill in the description fields for M_FICT_PAGES.
Reviewed by:	kib
MFC after:	3 days
2013-08-07 00:20:30 +00:00
marcel
9f2f2e171a Change <sys/diskpc98.h> to not redefine the same symbols that are
being defined in <sys/diskmbr.h>. Instead give the symbols here a
"PC98_" prefix. This way, both <sys/diskmbr.h> and <sys/diskpc98.h>
can be included in the same C source file.

The renaming is trivial. The only gotcha is that DOSBBSECTOR is
also redefined from 0 to 1. This because DOSBBSECTOR was always
used in conjunction with an addition of 1. The PC98_BBSECTOR symbol
is defined as 1 and the expression is simplified.

Note: it is not believed that ports are seriously impacted; or at
all for that matter.

Approved by: nyan@
2013-08-07 00:00:48 +00:00
delphij
f5fd32bca5 MFV r254011:
This change have no effect to FreeBSD but integrated for
completeness.

Illumos ZFS issues:
  348 ZFS should handle DKIOCGMEDIAINFOEXT failure
2013-08-06 21:36:01 +00:00
jfv
b2d5c6bc2a Make the various driver MSIX setup routines fallback to MSI more
gracefully. This change was suggested by Marius Strobl, thank you.

PR: kern/181016
MFC after: ASAP
2013-08-06 21:01:38 +00:00
marius
281e37952d - Fix a bug in the MSI allocation logic so an MSI is also employed if a
controller supports only a single message. I haven't seen such an adapter
  out in the wild, though, so this change likely is a NOP.
  While at it, further simplify the MSI allocation logic; there's no need
  to check the number of available messages on our own as pci_alloc_msi(9)
  will just fail if it can't provide us with the single message we want.
- Nuke the unused softc of aacch(4).

MFC after:	1 month
2013-08-06 19:14:02 +00:00
marius
576dff05bc As it turns out, MSIs are broken with 2820SA so introduce an AAC_FLAGS_NOMSI
quirk and apply it to these controllers [1]. The same problem was reported
for 2230S, in which case it wasn't actually clear whether the culprit is the
controller or the mainboard, though. In order to be on the safe side, flag
MSIs as being broken with the latter type of controller as well. Given that
these are the only reports of MSI-related breakage with aac(4) so far and
OSes like OpenSolaris unconditionally employ MSIs for all adapters of this
family, however, it doesn't seem warranted to generally disable the use of
MSIs in aac(4).
While it, simplify the MSI allocation logic a bit; there's no need to check
for the presence of the MSI capability on our own as pci_alloc_msi(9) will
just fail when these kind of interrupts are not available.
Reported and tested by: David Boyd [1]

MFC after:	3 days
2013-08-06 18:55:59 +00:00
jfv
abe5830b3c When the igb driver is static there are cases when early interrupts occur,
resulting in a panic in refresh_mbufs, to prevent this add a check in the
interrupt handler for DRV_RUNNING.

MFC after: 1 day (critical for 9.2)
2013-08-06 18:00:53 +00:00
hrs
c5a14d7164 Fix incompatibility in ICMPV6CTL_ND6_PRLIST sysctl, and SIOCGPRLST_IN6,
SIOCGDRLST_IN6, and SIOCGNBRINFO_IN6 ioctl.  These userland interfaces
treat expiration times in time_second, not time_uptime.
2013-08-06 17:10:52 +00:00
mckusick
9e78a97e00 This bug fix is in a code path in rename taken when there is a
collision between a rename and an open system call for the same
target file. Here, rename releases its vnode references, waits for
the open to finish, and then restarts by reacquiring its needed
vnode locks. In this case, rename was unlocking but failing to
release its reference to one of its held vnodes. The effect was
that even after all the actual references to the vnode had gone,
the vnode still showed active references. For files that had been
removed, their space was not reclaimed until the filesystem was
forcibly unmounted.

This bug manifested itself in the Postgres server which would
leak/lose hundreds of files per day amounting to many gigabytes of
disk space. This bug required shutting down Postgres, forcibly
unmounting its filesystem, remounting its filesystem and restarting
Postgres every few days to recover the lost space.

Reported by: Dan Thomas and Palle Girgensohn
Bug-fix by:  kib
Tested by:   Dan Thomas and Palle Girgensohn
MFC after:   2 weeks
2013-08-06 16:50:05 +00:00
avg
cea3d0ebcf fix fat-fingering in r253996
MFC after:	17 days
X-MFC with:	r253996
2013-08-06 16:18:07 +00:00
avg
a07c9d34c3 opensolaris code: translate INVARIANTS to DEBUG and ZFS_DEBUG
Do this by forcing inclusion of
sys/cddl/compat/opensolaris/sys/debug_compat.h
via -include option into all source files from OpenSolaris.
Note that this -include option must always be after -include opt_global.h.

Additionally, remove forced definition of DEBUG for some modules and fix
their build without DEBUG.

Also, meaning of DEBUG was overloaded to enable WITNESS support for some
OpenSolaris (primarily ZFS) locks.  Now this overloading is removed and
that use of DEBUG is replaced with a new option OPENSOLARIS_WITNESS.

MFC after:	17 days
2013-08-06 15:51:56 +00:00
marius
95f847fd26 Add MD (for now) atomic_store_acq_<type>() and use it in pmap_activate()
to get the semantics when setting the PMAP right. Prior to r251782, the
latter already used implicit acquire semantics, which - currently - means
to not employ additional explicit memory barriers under the hood (see also
r225889).
2013-08-06 15:34:11 +00:00
mav
73afbff431 Block reporting of ZFS features for suspended pools.
Before executing any subcommand, zpool tool fetches pools configuration from
the kernel.  Before features support was added, kernel was regenerating that
configuration based on data always present in memory.  Unfortunately, pool
features list and activity counters are not such. They are stored in ZAP,
that normally resides in ARC, but under heavy memory pressure may be swapped
out.  If pool is suspended at this point, there is no way to recover it back
since any zpool command will stuck.

This change has one predictable flaw: `zpool upgrade` always wish to upgrade
suspended pools, but fortunately it can't do it due to the suspension.
2013-08-06 14:41:41 +00:00
mav
55ff4dd445 Disable r252840 when ZFS TRIM is enabled (vfs.zfs.trim.enabled=1) and really
disable TRIM otherwise.

r252840 (illumos bug 3836) is based on assumption that zio_free_sync() has
no lock dependencies and should complete immediately. Unfortunately, with our
TRIM implementation that is not true due to ZIO_STAGE_VDEV_IO_START added
to the ZIO_FREE_PIPELINE, which, while not really accessing devices, still
acquires SCL_ZIO lock for read to be sure devices won't disappear.

When TRIM is disabled, this patch enables direct free execution from r252840
and removes ZIO_STAGE_VDEV_IO_START and ZIO_STAGE_VDEV_IO_ASSESS stages from
the pipeline to avoid lock acquisition.  Otherwise it queues free request as
it was before r252840.
2013-08-06 14:30:28 +00:00
mav
12049ee5d5 Make zpool clear to reopen also reconnected cache and spare devices.
Since `zpool status` reports about such kinds of errors, it is strange
that they are not cleared by `zpool clear`.
2013-08-06 14:23:33 +00:00
mav
07b58a38cf Make ZFS to use separate thread to handle SPA_ASYNC_REMOVE async events.
Existing async thread is running only on successfull spa_sync() completion,
that is impossible in case of pool loosing required (last) disk(s).  That
indefinite delay of SPA_ASYNC_REMOVE processing made ZFS to not close the
lost disks, preventing GEOM/CAM from destroying devices and reusing names
on later disk reattach.

In earlier version of the patch I've tried to just run existing thread
immediately, unrelated to spa_sync() completion, but that exposed number
of situations where it could stuck due to locks held by stuck spa_sync(),
that are required for other kinds of async events.

Experiments with OpenIndiana snapshot confirmed that they also have this
issue with lost disks reattach.
2013-08-06 14:20:41 +00:00
avg
9db37f40ce dtrace: fix compilation with gcc
Cowardly taking the easiest way and using -Wno-*

MFC after:	3 days
X-MFC with:	r253772
2013-08-06 13:55:39 +00:00
trasz
e6db10c1dd Remove dead code. 2013-08-06 10:42:18 +00:00
andrew
eacc009ef6 We no longer need to align the stack before calling swi_handler as it is
already aligned correctly in the PUSHFRAME macro.
2013-08-06 10:03:44 +00:00
sbruno
c116c25dd5 Update ciss(4) with new models of raid controllers from HP
Submitted by:	scott.benesh@hp.com
MFC after:	2 weeks
Sponsored by:	Hewlett Packard
2013-08-06 03:17:01 +00:00
jhibbits
7ea290d1eb Micro-optimize OFW syscons 8-bit blank.
MFC after:	1 week
2013-08-06 03:09:44 +00:00
jhibbits
e280c38ab5 Remove an unnecessary panic. The PVO's PTE entry and the PTEG's PTE entry may
not match, if the PVO's PTE is invalid.
2013-08-06 02:58:16 +00:00
hrs
fadcfc6c44 - Use pget(PGET_CANDEBUG | PGET_NOTWEXIT) to determine if the specified
PID is valid for monitoring in FILEMON_SET_PID ioctl.

- Set the monitored PID to -1 when the process exits.

Suggested by:	jilles
Tested by:	sjg
MFC after:	3 days
2013-08-06 02:14:30 +00:00
jhibbits
271fe456c4 Evict pages from the PTEG when it's full and trying to insert a new PTE,
rather than panicking.

Reviewed by:	nwhitehorn
MFC after:	3 weeks
2013-08-06 01:01:15 +00:00
mckusick
c3a8e70389 With the addition of journalled soft updates, the "newblk" structures
persist much longer than previously. Historically we had at most 100
entries; now the count may reach a million. With the increased count
we spent far too much time looking them up in the grossly undersized
newblk hash table. Configure the newblk hash table to accurately reflect
the number of entries that it must index.

Reviewed by: kib
Tested by:   Peter Holm
MFC after:   2 weeks
2013-08-05 22:02:45 +00:00
mckusick
c4523d974e To better understand performance problems with journalled soft updates,
we need to collect the highest level of allocation for each of the
different soft update dependency structures. This change collects these
statistics and makes them available using `sysctl debug.softdep.highuse'.

Reviewed by: kib
Tested by:   Peter Holm
MFC after:   2 weeks
2013-08-05 22:01:16 +00:00
cognet
6c012d075a Let the platform calculate the timer frequency at runtime, and use that for
the omap4, instead of relying on the (wrong) value provided in the dts.
2013-08-05 20:14:56 +00:00
hrs
13c1bcf2c1 - Use time_uptime instead of time_second in data structures for
PF_INET6 in kernel.  This fixes various malfunction when the wall time
  clock is changed.  Bump __FreeBSD_version to 1000041.

- Use clock_gettime(CLOCK_MONOTONIC_FAST) in userland utilities.

MFC after:	1 month
2013-08-05 20:13:02 +00:00
kib
103825c951 Do not override the ENOENT error for the empty path, or EFAULT errors
from copyins, with the relative lookup check.

Discussed with:	rwatson
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-08-05 19:42:03 +00:00
andrew
1e070fe985 When entering exception handlers we may not have an aligned stack. This is
because an exception may happen at any time. The stack alignment rules on
ARM EABI state the only place the stack must be 8-byte aligned is on a
function boundary.

If an exception happens while a function is setting up or tearing down it's
stack frame it may not be correctly aligned. There is also no requirement
for it to be when the function is a leaf node.

The fix is to align the stack after we have stored a backup of the old stack
pointer, but before we have stored anything in the trapframe. Along with
this we need to adjust the size of the trapframe by 4 bytes to ensure the
stack below it is also correctly aligned.
2013-08-05 19:06:28 +00:00
kib
b8fe5eca7f The tmpfs_alloc_vp() is used to instantiate vnode for the tmpfs node,
in particular, from the tmpfs_lookup VOP method.  If LK_NOWAIT is not
specified in the lkflags, the lookup is supposed to return an alive
vnode whenever the underlying node is valid.

Currently, the tmpfs_alloc_vp() returns ENOENT if the vnode attached
to node exists and is being reclaimed.  This causes spurious ENOENT
errors from lookup on tmpfs and corresponding random 'No such file'
failures from syscalls working with tmpfs files.

Fix this by waiting for the doomed vnode to be detached from the tmpfs
node if sleepable allocation is requested.

Note that filesystems which use vfs_hash.c, correctly handle the case
due to vfs_hash_get() looping when vget() returns ENOENT for sleepable
requests.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2013-08-05 18:53:59 +00:00
jfv
fcc01a7fa0 Correct a fat-finger in the last delta.
MFC after: ASAP
2013-08-05 16:16:50 +00:00
mav
bbcaa9e294 MFprojects/camlock r249006:
Pass SIM pointer as an argument to camisr_runqueue() instead of doneq
pointer.
2013-08-05 12:15:53 +00:00
mav
bbfed93309 MFprojects/camlock r249505:
Change CCB queue resize logic to be able safely handle overallocations:
 - (re)allocate queue space in power of 2 chunks with 64 elements minimum
and never shrink it; with only 4/8 bytes per element size is insignificant.
 - automatically reallocate the queue to double size if it is overflowed.
 - if queue reallocation failed, store extra CCBs in unsorted TAILQ,
fetching them back as soon as some queue element is freed.

To free space in CCB for TAILQ linking, change highpowerq from keeping
high-power CCBs to keeping devices frozen due to high-power CCBs.

This encloses all pieces of queue resize logic inside of cam_queue.[ch],
removing some not obvious duties from xpt_release_ccb().
2013-08-05 11:48:40 +00:00
gjb
a35ede59d8 Redirect svnversion stderr to /dev/null if we cannot determine
the tree version, for example if the tree is checked out with an
outdated svn from ports, but the base system svnlite is built.

Approved by:	kib (mentor)
2013-08-05 10:26:42 +00:00
attilio
899ab64514 Revert r253939:
We cannot busy a page before doing pagefaults.
Infact, it can deadlock against vnode lock, as it tries to vget().
Other functions, right now, have an opposite lock ordering, like
vm_object_sync(), which acquires the vnode lock first and then
sleeps on the busy mechanism.

Before this patch is reinserted we need to break this ordering.

Sponsored by:	EMC / Isilon storage division
Reported by:	kib
2013-08-05 08:55:35 +00:00
hrs
05101f7501 Fix a panic in tmpaddrtimer. 2013-08-05 00:36:12 +00:00
jeff
baea70e2d2 - Introduce a specific function, pmap_remove_kernel_pde, for removing
huge pages in the kernel's address space.  This works around several
   asserts from pmap_demote_pde_locked that did not apply and gave false
   warnings.

Discovered by:	pho
Reviewed by:	alc
Sponsored by:	EMC / Isilon Storage Division
2013-08-05 00:28:03 +00:00