Commit Graph

94245 Commits

Author SHA1 Message Date
Jeff Roberson
863c7e4562 - Reserve a special AF for SDP. The one we were incorrectly using before
was taken by another AF.

Sponsored by:	EMC / Isilon Storage Division
2013-08-09 03:26:17 +00:00
Jeff Roberson
3d71c6cf45 - Correctly handle various edge cases in sysfs emulation.
Sponsored by:	EMC / Isilon Storage Division
2013-08-09 03:24:48 +00:00
Jeff Roberson
ba397d0f16 - Use the correct type in the linux bitops emulation.
Submitted by:	Maxim Ignatenko <gelraen.ua@gmail.com>
2013-08-09 03:24:12 +00:00
Pyun YongHyeon
69b1f509a4 Fix for IPv4 fragment packets treated as RMCP.
bit25 of rxMode MAC register of 5762 needs to be set for rx mgmt
filter to work correctly when processing match for UDP header
fields.  Otherwise false positive can occur which causes IPv4
fragment to be received by APE instead of host.

Reported by:	Geans Pin <geanspin@broadcom.com>
2013-08-09 01:15:32 +00:00
Scott Long
fe8391035a Rate limit the 'out of chain frame' messages to once per 60 seconds.
Obtained from:	Netflix
MFC after:	3 days
2013-08-09 01:10:33 +00:00
Scott Long
d9802deb4e Sometimes a device misbehaves so badly that it disrupts the entire system.
Add a tunable that allows such a device to be excluded from the driver.
The id parameter is the target id that the driver assigns to a given device.

dev.mps.X.exclude_ids=<id>,<id>

Obtained from:	Netflix
MFC after:	3 days
2013-08-09 01:09:02 +00:00
Scott Long
f510415d84 Add a helpful message that can help point to why a sysctl tree removal failed
Obtained from:	Netflix
MFC after:	3 days
2013-08-09 01:04:44 +00:00
Xin LI
43667c1f68 MFV r254079:
Illumos ZFS issues:
  3957 ztest should update the cachefile before killing itself
  3958 multiple scans can lead to partial resilvering
  3959 ddt entries are not always resilvered
  3960 dsl_scan can skip over dedup-ed blocks if
       physical birth != logical birth
  3961 freed gang blocks are not resilvered and can cause pool to suspend
  3962 ztest should print out zfs debug buffer before exiting
2013-08-08 23:38:31 +00:00
Devin Teske
5b502b3b4c Update legacy static assignments in old code to support dynamic framing,
plotting, and alignment coinciding with enhancements in SVN r242667.
2013-08-08 22:34:00 +00:00
Devin Teske
5e82c321f8 Since the introduction of SVN r244048 and [follow-up] r244089, it is now
safe to build upon ``boot_serial?'' functionality to make safer UI choices.
2013-08-08 22:09:46 +00:00
Pedro F. Giffuni
95f1f8d262 Small typo.
MFC after:	3 days
2013-08-08 22:07:59 +00:00
Ryan Stone
08a42caa50 Allow drivers to return BUS_PROBE_NOWILDCARD from their attach routine to
match devices where the driver class was fixed but the unit number was
wildcarded.  This better matches the documented behaviour in
DEVICE_PROBE(9).

Reviewed by:	imp
2013-08-08 19:30:49 +00:00
Andrey V. Elsukov
b74dd6c77b gpt_entries is used as limit for the number of partition entries in
the GEOM_PART. Instead of just using number of entries from the GPT
header, calculate this limit based on the reserved space between
GPT header and first available LBA.

MFC after:	2 weeks
2013-08-08 16:09:20 +00:00
Glen Barber
a61914445d When newvers.sh is run, it is possible that the svnversion
(or svnliteversion) in the current lookup path is not what
was used to check out the tree.  If an incompatible version
is used, the svn revision number is not reported in uname(1).

Run ${svnversion} on newvers.sh itself when evaluating if the
svn(1) in use is compatible with the tree.  Fallback to an
empty ${svnversion} if necessary.

With this change, svnliteversion from base is only used
if no compatible svnversion is found, so with this change,
the version of svn(1) from the ports tree is evaluated first.

Requested by:	many
MFC after:	3 days
X-MFC-To:	stable/9, releng/9.2 only
2013-08-08 15:59:00 +00:00
Andrey V. Elsukov
4371b649aa Make the check for number of entries less strict.
Some partitioning tools can create GPT with number of entries less
than 128.

MFC after:	1 week
2013-08-08 11:24:25 +00:00
Adrian Chadd
4030a4b2a3 Cap the number of streams supported to two for now.
I haven't yet reviewed the Intel driver(s) in more depth to see if
there are 1x1 NICs that report they support 2 transmit/receive chains..
if so then we'll have to update this.

Tested:

* Intel 4965, which is a 2x2 device with 3 RX and 2 TX chains.

PR:		kern/181132
2013-08-08 05:52:41 +00:00
Adrian Chadd
e7495198d5 Convert net80211 over to using if_transmit for the dispatch from the
upper layer(s).

This eliminates the if_snd queue from net80211. Yay!

This unfortunately has a few side effects:

* It breaks ALTQ to net80211 for now - sorry everyone, but fixing
  parallelism and eliminating the if_snd queue is more important
  than supporting this broken traffic scheduling model. :-)

* There's no VAP and IC flush methods just yet - I think I'll add
  some NULL methods for now just as placeholders.

* It reduces throughput a little because now net80211 will drop packets
  rather than buffer them if the driver doesn't do its own buffering.
  This will be addressed in the future as I implement per-node software
  queues.

Tested:

* ath(4) and iwn(4) in STA operation
2013-08-08 05:09:35 +00:00
Neel Natu
f263e391a3 Use local variables with the appropriate types and eliminate a bunch of casts.
This is a cosmetic change but it does help with a proposed change to increase
the maximum size of physical memory supported on amd64 platforms.

Submitted by:	Chris Torek (torek@torek.net)
2013-08-08 03:17:39 +00:00
Xin LI
9d2f243aa6 MFV r254071:
Fix a regression introduced by fix for Illumos bug #3834.  Quote from
Matthew Ahrens on the Illumos issue:

ztest fails this assertion because ztest_dmu_read_write() does
        dmu_tx_hold_free(tx, bigobj, bigoff, bigsize);
and then
    dmu_object_set_checksum(os, bigobj,
        (enum zio_checksum)ztest_random_dsl_prop(ZFS_PROP_CHECKSUM), tx);

If the region to free is past the end of the file, the DMU assumes that there
will be nothing to do for this object.  However, ztest does set_checksum(),
which must modify the dnode.  The fix is for ztest to also call

    dmu_tx_hold_bonus(tx, bigobj);

so we can account for the dirty data associated with setting the checksum

Illumos ZFS issues:
  3955 ztest failure: assertion refcount_count(&tx->tx_space_written)
         + delta <= tx->tx_space_towrite
2013-08-07 22:21:00 +00:00
Adrian Chadd
cc80eae5cf Allow net80211 to compile on stable/9 and stable/8. 2013-08-07 22:01:43 +00:00
Xin LI
4f7b34578b MFV r254070:
Merge vendor bugfix for ZFS test suite that triggers false positives.

Illumos ZFS issues:
  3949 ztest fault injection should avoid resilvering devices
  3950 ztest: deadman fires when we're doing a scan
  3951 ztest hang when running dedup test
  3952 ztest: ztest_reguid test and ztest_fault_inject don't place nice together
2013-08-07 21:16:14 +00:00
John Baldwin
5b596f0f5f Don't emit a spurious EVFILT_PROC event with no fflags set on process exit
if NOTE_EXIT is not being monitored.  The rationale is that a listener
should only get an event for exit() if they registered interest via
NOTE_EXIT.  This matches the behavior on OS X.
- Don't save the exit status on process exit unless NOTE_EXIT is being
  monitored.
- Add an internal EV_DROP flag that requests kqueue_scan() to free the
  knote without signalling it to userland and use this when a process
  exits but the fflags in the knote is zero.

Reviewed by:	jmg
MFC after:	1 month
2013-08-07 19:56:35 +00:00
Konstantin Belousov
449c2e92c9 Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain.  The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.

The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.

Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass.  This is not optimal, since it
could cause excessive page deactivation and freeing.  The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.

The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues.  Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.

Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.

Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
Konstantin Belousov
872d995f76 Change the pmap_ts_referenced() method of amd64 pmap to use shared
pvh_global_lock.  This allows the method to be executed in parallel,
avoiding undue contention on the pvh_global_lock for the multithreaded
pagedaemon.

The pmap_ts_referenced() function has to inspect the page mappings for
several pmaps, which need to be locked while pv list lock is owned.
This contradicts to the lock order, where pmap lock is before pv list
lock.  Introduce the generation count for the pv list of the page or
superpage, which indicate any change in the pv list, and, as usual,
perform restart of the iteration if generation changed while pv lock
was dropped for blocking acquire of a pmap lock.

Reported and tested by:	pho
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
2013-08-07 16:33:15 +00:00
Olivier Houchard
d03426e619 Don't bother trying to work around buffers which are not aligned on a cache
line boundary. It has never been 100% correct, and it can't work on SMP,
because nothing prevents another core from accessing data from an unrelated
buffer in the same cache line while we invalidated it. Just use bounce pages
instead.

Reviewed by:	ian
Approved by:	mux (mentor) (implicit)
2013-08-07 15:44:58 +00:00
Alexander Motin
a29779e865 Remove droping topology mutex after iterating 100 periphs in CAMGETPASSTHRU.
That is not so slow and so often operation to handle unneeded otherwise
xsoftc.xpt_generation and respective locking complications.
2013-08-07 11:34:20 +00:00
Ganbold Tsagaankhuu
696ec285aa Bring initial support for Allwinner A20 SoC (Cubieboard2).
Add support for A20 timer.
	Correct interrupt offset depending from chip.
	Add basic code for CPU configuration module.
	For now, add kernel config and dts file
	(only FDT blob related problem needs to be solved later in
	order to have one kernel for both cubieboard1 and 2).

Approved by: ray@
2013-08-07 11:07:56 +00:00
Alexander Motin
71185c66dd Improve r253721 by reporting detected lack of BIO_FLUSH support to GEOM.
That prevents more of such requests from coming and errors from logging.
2013-08-07 08:20:11 +00:00
Andriy Gapon
818d282e7b enable KDB_TRACE in GENERICs
KDB_TRACE is not an alternative to DDB/etc, they are complementary.
So I do not see any reason to not enable KDB_TRACE by default.

X-MFC after:	never (change specific to head)
2013-08-07 08:03:50 +00:00
Kevin Lo
3de1bd9502 Remove unsigned comparison < 0
Found by:	LLVM
Reviewed by:	luigi
2013-08-07 07:22:56 +00:00
Jeff Roberson
5df87b21d3 Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

 - Normalize functions that allocate memory to use kmem_*
 - Those that allocate address space are named kva_*
 - Those that operate on maps are named kmap_*
 - Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	EMC / Isilon Storage Division
2013-08-07 06:21:20 +00:00
Mark Johnston
5bc4f6b3ab Add a missing module version declaration to if_tun(4).
PR:		181078
Submitted by:	Brandon Gooch <jamesbrandongooch@gmail.com>
MFC after:	1 week
2013-08-07 01:32:08 +00:00
Mark Johnston
c0432fc38b Fill in the description fields for M_FICT_PAGES.
Reviewed by:	kib
MFC after:	3 days
2013-08-07 00:20:30 +00:00
Marcel Moolenaar
e01c6f329a Change <sys/diskpc98.h> to not redefine the same symbols that are
being defined in <sys/diskmbr.h>. Instead give the symbols here a
"PC98_" prefix. This way, both <sys/diskmbr.h> and <sys/diskpc98.h>
can be included in the same C source file.

The renaming is trivial. The only gotcha is that DOSBBSECTOR is
also redefined from 0 to 1. This because DOSBBSECTOR was always
used in conjunction with an addition of 1. The PC98_BBSECTOR symbol
is defined as 1 and the expression is simplified.

Note: it is not believed that ports are seriously impacted; or at
all for that matter.

Approved by: nyan@
2013-08-07 00:00:48 +00:00
Xin LI
c668ff330e MFV r254011:
This change have no effect to FreeBSD but integrated for
completeness.

Illumos ZFS issues:
  348 ZFS should handle DKIOCGMEDIAINFOEXT failure
2013-08-06 21:36:01 +00:00
Jack F Vogel
d0913b7f25 Make the various driver MSIX setup routines fallback to MSI more
gracefully. This change was suggested by Marius Strobl, thank you.

PR: kern/181016
MFC after: ASAP
2013-08-06 21:01:38 +00:00
Marius Strobl
0cfcfc1918 - Fix a bug in the MSI allocation logic so an MSI is also employed if a
controller supports only a single message. I haven't seen such an adapter
  out in the wild, though, so this change likely is a NOP.
  While at it, further simplify the MSI allocation logic; there's no need
  to check the number of available messages on our own as pci_alloc_msi(9)
  will just fail if it can't provide us with the single message we want.
- Nuke the unused softc of aacch(4).

MFC after:	1 month
2013-08-06 19:14:02 +00:00
Marius Strobl
21e5a2223c As it turns out, MSIs are broken with 2820SA so introduce an AAC_FLAGS_NOMSI
quirk and apply it to these controllers [1]. The same problem was reported
for 2230S, in which case it wasn't actually clear whether the culprit is the
controller or the mainboard, though. In order to be on the safe side, flag
MSIs as being broken with the latter type of controller as well. Given that
these are the only reports of MSI-related breakage with aac(4) so far and
OSes like OpenSolaris unconditionally employ MSIs for all adapters of this
family, however, it doesn't seem warranted to generally disable the use of
MSIs in aac(4).
While it, simplify the MSI allocation logic a bit; there's no need to check
for the presence of the MSI capability on our own as pci_alloc_msi(9) will
just fail when these kind of interrupts are not available.
Reported and tested by: David Boyd [1]

MFC after:	3 days
2013-08-06 18:55:59 +00:00
Jack F Vogel
54a6317360 When the igb driver is static there are cases when early interrupts occur,
resulting in a panic in refresh_mbufs, to prevent this add a check in the
interrupt handler for DRV_RUNNING.

MFC after: 1 day (critical for 9.2)
2013-08-06 18:00:53 +00:00
Hiroki Sato
ffa0165ae0 Fix incompatibility in ICMPV6CTL_ND6_PRLIST sysctl, and SIOCGPRLST_IN6,
SIOCGDRLST_IN6, and SIOCGNBRINFO_IN6 ioctl.  These userland interfaces
treat expiration times in time_second, not time_uptime.
2013-08-06 17:10:52 +00:00
Kirk McKusick
824009a16a This bug fix is in a code path in rename taken when there is a
collision between a rename and an open system call for the same
target file. Here, rename releases its vnode references, waits for
the open to finish, and then restarts by reacquiring its needed
vnode locks. In this case, rename was unlocking but failing to
release its reference to one of its held vnodes. The effect was
that even after all the actual references to the vnode had gone,
the vnode still showed active references. For files that had been
removed, their space was not reclaimed until the filesystem was
forcibly unmounted.

This bug manifested itself in the Postgres server which would
leak/lose hundreds of files per day amounting to many gigabytes of
disk space. This bug required shutting down Postgres, forcibly
unmounting its filesystem, remounting its filesystem and restarting
Postgres every few days to recover the lost space.

Reported by: Dan Thomas and Palle Girgensohn
Bug-fix by:  kib
Tested by:   Dan Thomas and Palle Girgensohn
MFC after:   2 weeks
2013-08-06 16:50:05 +00:00
Andriy Gapon
63230519fa fix fat-fingering in r253996
MFC after:	17 days
X-MFC with:	r253996
2013-08-06 16:18:07 +00:00
Andriy Gapon
c319ea15f4 opensolaris code: translate INVARIANTS to DEBUG and ZFS_DEBUG
Do this by forcing inclusion of
sys/cddl/compat/opensolaris/sys/debug_compat.h
via -include option into all source files from OpenSolaris.
Note that this -include option must always be after -include opt_global.h.

Additionally, remove forced definition of DEBUG for some modules and fix
their build without DEBUG.

Also, meaning of DEBUG was overloaded to enable WITNESS support for some
OpenSolaris (primarily ZFS) locks.  Now this overloading is removed and
that use of DEBUG is replaced with a new option OPENSOLARIS_WITNESS.

MFC after:	17 days
2013-08-06 15:51:56 +00:00
Marius Strobl
4af8466d8b Add MD (for now) atomic_store_acq_<type>() and use it in pmap_activate()
to get the semantics when setting the PMAP right. Prior to r251782, the
latter already used implicit acquire semantics, which - currently - means
to not employ additional explicit memory barriers under the hood (see also
r225889).
2013-08-06 15:34:11 +00:00
Alexander Motin
d9aca4ed74 Block reporting of ZFS features for suspended pools.
Before executing any subcommand, zpool tool fetches pools configuration from
the kernel.  Before features support was added, kernel was regenerating that
configuration based on data always present in memory.  Unfortunately, pool
features list and activity counters are not such. They are stored in ZAP,
that normally resides in ARC, but under heavy memory pressure may be swapped
out.  If pool is suspended at this point, there is no way to recover it back
since any zpool command will stuck.

This change has one predictable flaw: `zpool upgrade` always wish to upgrade
suspended pools, but fortunately it can't do it due to the suspension.
2013-08-06 14:41:41 +00:00
Alexander Motin
f8dcf872c4 Disable r252840 when ZFS TRIM is enabled (vfs.zfs.trim.enabled=1) and really
disable TRIM otherwise.

r252840 (illumos bug 3836) is based on assumption that zio_free_sync() has
no lock dependencies and should complete immediately. Unfortunately, with our
TRIM implementation that is not true due to ZIO_STAGE_VDEV_IO_START added
to the ZIO_FREE_PIPELINE, which, while not really accessing devices, still
acquires SCL_ZIO lock for read to be sure devices won't disappear.

When TRIM is disabled, this patch enables direct free execution from r252840
and removes ZIO_STAGE_VDEV_IO_START and ZIO_STAGE_VDEV_IO_ASSESS stages from
the pipeline to avoid lock acquisition.  Otherwise it queues free request as
it was before r252840.
2013-08-06 14:30:28 +00:00
Alexander Motin
526bb4af8a Make zpool clear to reopen also reconnected cache and spare devices.
Since `zpool status` reports about such kinds of errors, it is strange
that they are not cleared by `zpool clear`.
2013-08-06 14:23:33 +00:00
Alexander Motin
ad727e8d64 Make ZFS to use separate thread to handle SPA_ASYNC_REMOVE async events.
Existing async thread is running only on successfull spa_sync() completion,
that is impossible in case of pool loosing required (last) disk(s).  That
indefinite delay of SPA_ASYNC_REMOVE processing made ZFS to not close the
lost disks, preventing GEOM/CAM from destroying devices and reusing names
on later disk reattach.

In earlier version of the patch I've tried to just run existing thread
immediately, unrelated to spa_sync() completion, but that exposed number
of situations where it could stuck due to locks held by stuck spa_sync(),
that are required for other kinds of async events.

Experiments with OpenIndiana snapshot confirmed that they also have this
issue with lost disks reattach.
2013-08-06 14:20:41 +00:00
Andriy Gapon
5d7430f0a8 dtrace: fix compilation with gcc
Cowardly taking the easiest way and using -Wno-*

MFC after:	3 days
X-MFC with:	r253772
2013-08-06 13:55:39 +00:00
Edward Tomasz Napierala
ea7c84e46f Remove dead code. 2013-08-06 10:42:18 +00:00