96515 Commits

Author SHA1 Message Date
mav
a090de11fd Depending on combination of running commands (NCQ/non-NCQ) try to avoid
extra read from PxCI/PxSACT registers.  If only NCQ commands are running, we
don't really need PxCI.  If only non-NCQ commands are running we don't need
PxSACT.  Mixed set may happen only on controllers with FIS-based switching
when port multiplier is attached, and then we have to read both registers.

MFC after:	1 month
2013-03-25 08:50:51 +00:00
ae
3d1df10de4 When we are removing a specific set, call ipfw_expire_dyn_rules only once.
Obtained from:	Yandex LLC
MFC after:	1 week
2013-03-25 07:43:46 +00:00
mav
cdfcce8d39 Make GEOM MULTIPATH to report unmapped bio support if underling path report
it.  GEOM MULTIPATH itself never touches the data and so transparent.
2013-03-25 07:24:58 +00:00
mav
a8910d58f2 Remove two bzero()s that are erasing only few more bytes then set later. 2013-03-25 06:31:17 +00:00
mav
aebf1ef11e In GEOM DISK:
- Replace single done mutex with per-disk ones.  On system with several
disks on several HBAs that removes small, but measurable lock congestion.
 - Modify disk destruction process to not destroy the mutex prematurely.
 - Remove some extra pointer derefences.
2013-03-25 05:45:24 +00:00
pfg
61cbaa2ebd Dtrace: add optional size argument to tracemem().
Merge change from illumos:

1455 DTrace tracemem() should take an optional size argument

Our local enhancements to dt_print_bytes were equivalent to
those in illumos but we made it match the illumos version
to ease further code merges.

For now leave out tst.smallsize.d and tst.smallsize.d.out
since those don't seem to work cleanly on FreeBSD.

This change bumps the DT_VERS_* number to 1.7.1 in accordance
to what is done in illumos.

Illumos Revision:	13457:571b0355c2e3

Reference:
https://www.illumos.org/issues/1455

Tested by:	Fabian Keil
Obtained from:	Illumos
MFC after:	1 month
2013-03-24 19:12:08 +00:00
ian
8846db53ba Set the backlink in mmc commands to the mmc request that contains them. 2013-03-24 17:23:10 +00:00
mav
4486b85658 No need to erase all 64 bytes of CFIS area if we never use more then 16. 2013-03-24 16:51:21 +00:00
alc
97049bb4f6 Micro-optimize the control flow in a few places. Eliminate a panic call
that could never be reached in vm_radix_insert().  (If the pointer being
checked by the panic call were ever NULL, the immmediately preceding loop
would have already crashed on a NULL pointer dereference.)

Reviewed by:	attilio (an earlier version)
Sponsored by:	EMC / Isilon Storage Division
2013-03-24 16:43:07 +00:00
mav
6cd6934fad Fix long known deadlock between geom dev destruction and d_close() call.
Use destroy_dev_sched_cb() to not wait for device destruction while holding
GEOM topology lock (that actually caused deadlock).  Use request counting
protected by mutex to properly wait for outstanding requests completion in
cases of device closing and geom destruction.  Unlike r227009, this code
does not block taskqueue thread for indefinite time, waiting for completion.
2013-03-24 10:14:25 +00:00
adrian
8ce3236c14 Add new regulatory domain.
Obtained from:	Qualcomm Atheros
2013-03-24 04:42:56 +00:00
adrian
e3aae3fa01 Move the TXQ lock earlier in this routine - so to correctly protect the
link pointer check.
2013-03-24 04:09:54 +00:00
adrian
a6ac810bee Fix the locking changes due to the TXQ change drive-by.
Tested:

* AR9580, STA mode
2013-03-24 04:09:29 +00:00
mav
2ebdf6d693 Make g_wither_washer() to not loop by itself, but only when there was some
more topology change done that may require its attention.  Add few missing
g_do_wither() calls in respective places to signal it.

This fixes potential infinite loop here when some provider is withered, but
still opened or connected for some reason and so can not be destroyed.  For
example, see r227009 and r227510.
2013-03-24 03:15:20 +00:00
adrian
0b04a7a29e Overhaul the TXQ locking (again!) as part of some beacon/cabq timing
related issues.

Moving the TX locking under one lock made things easier to progress on
but it had one important side-effect - it increased the latency when
handling CABQ setup when sending beacons.

This commit introduces a bunch of new changes and a few unrelated changs
that are just easier to lump in here.

The aim is to have the CABQ locking separate from other locking.
The CABQ transmit path in the beacon process thus doesn't have to grab
the general TX lock, reducing lock contention/latency and making it
more likely that we'll make the beacon TX timing.

The second half of this commit is the CABQ related setup changes needed
for sane looking EDMA CABQ support.  Right now the EDMA TX code naively
assumes that only one frame (MPDU or A-MPDU) is being pushed into each
FIFO slot.  For the CABQ this isn't true - a whole list of frames is
being pushed in - and thus CABQ handling breaks very quickly.

The aim here is to setup the CABQ list and then push _that list_ to
the hardware for transmission.  I can then extend the EDMA TX code
to stamp that list as being "one" FIFO entry (likely by tagging the
last buffer in that list as "FIFO END") so the EDMA TX completion code
correctly tracks things.

Major:

* Migrate the per-TXQ add/removal locking back to per-TXQ, rather than
  a single lock.

* Leave the software queue side of things under the ATH_TX_LOCK lock,
  (continuing) to serialise things as they are.

* Add a new function which is called whenever there's a beacon miss,
  to print out some debugging.  This is primarily designed to help
  me figure out if the beacon miss events are due to a noisy environment,
  issues with the PHY/MAC, or other.

* Move the CABQ setup/enable to occur _after_ all the VAPs have been
  looked at.  This means that for multiple VAPS in bursted mode, the
  CABQ gets primed once all VAPs are checked, rather than being primed
  on the first VAP and then having frames appended after this.

Minor:

* Add a (disabled) twiddle to let me enable/disable cabq traffic.
  It's primarily there to let me easily debug what's going on with beacon
  and CABQ setup/traffic; there's some DMA engine hangs which I'm finally
  trying to trace down.

* Clear bf_next when flushing frames; it should quieten some warnings
  that show up when a node goes away.

Tested:

* AR9280, STA/hostap, up to 4 vaps (staggered)
* AR5416, STA/hostap, up to 4 vaps (staggered)

TODO:

* (Lots) more AR9380 and later testing, as I may have missed something here.
* Leverage this to fix CABQ hanling for AR9380 and later chips.
* Force bursted beaconing on the chips that default to staggered beacons and
  ensure the CABQ stuff is all sane (eg, the MORE bits that aren't being
  correctly set when chaining descriptors.)
2013-03-24 00:03:12 +00:00
adrian
f820e31fb2 CABQ calculation changes to try and fix some weird corner cases leading
to stuck beacons.

* Set the cabq readytime (ie, how long to burst for) to 50% of the total
  beacon interval time
* fix the cabq adjustment calculation based on how the beacon offset is
  calculated (the SWBA/DBA time offset.)

This is all still a bit magic voodoo but it does seem to have further
quietened issues with missed/stuck beacons under my local testing.
In any case, it better matches what the reference HAL implements.

Obtained from:	Qualcomm Atheros
2013-03-23 23:51:11 +00:00
kib
c41616c667 Do not call malloc(M_WAITOK) while bodev->fence_lock mutex is
held. The ttm_buffer_object_transfer() does not need the mutex locked
at all, except for the call to the driver sync_obj_ref() method.

Reported and tested by:	dumbbell
MFC after:   2 weeks
2013-03-23 22:23:15 +00:00
dumbbell
93bfdfa5b7 drm/ttm: Fix a typo: s/pTTM]/[TTM]/ 2013-03-23 20:46:47 +00:00
dumbbell
e637df4cdf drm/ttm: Explain why we don't need to acquire a ref in ttm_bo_vm_ctor() 2013-03-23 20:43:26 +00:00
mm
d690f9f515 Fix kernel build with options ZFS after r24571 (libzfs_core).
Submitted by:	Bjoern A. Zeeb <bz@FreeBSD.org>
2013-03-23 20:01:45 +00:00
dumbbell
6dc9bf62e8 drm/ttm: Fix TTM buffer object refcount
This fixes memory leaks in the radeonkms driver.

Reviewed by:	Konstantin Belousov (kib@)
Tested by:	J.R. Oldroyd <jr@opal.com>
2013-03-23 19:19:19 +00:00
ian
6b18376e62 Don't check and warn about pmap mismatch on every call to busdma sync.
With some recent busdma refactoring, sometimes it happens that a sync
op gets called when bus_dmamap_load() never got called, which results
in a spurious warning about a map mismatch when no sync operations will
actually happen anyway.  Now the check is done only if a sync operation
is actually performed, and the result of the check is a panic, not just
a printf.

Reviewed by:	cognet (who prevented me from donning a point hat)
2013-03-23 17:17:06 +00:00
will
440f1aaedb Be more explicit about what each bio_cmd & bio_flags value means.
Reviewed by:	ken (mentor)
2013-03-23 16:55:07 +00:00
will
74c2859109 ZFS: Fix a panic while unmounting a busy filesystem.
This particular scenario was easily reproduced using a NFS export.  When the
first 'zfs unmount' occurred, it returned EBUSY via this path, while
vflush() had flushed references on the filesystem's root vnode, which in
turn caused its v_interlock to be destroyed.  The next time 'zfs unmount'
was called, vflush() tried to obtain this lock, which caused this panic.

Since vflush() on FreeBSD is a definitive call, there is no need to check
vfsp->vfs_count after it completes.  Simply #ifdef sun this check.

Submitted by:	avg
Reviewed by:	avg
Approved by:	ken (mentor)
MFC after:	1 month
2013-03-23 16:34:56 +00:00
will
5d3a27c743 Extend taskqueue(9) to enable per-taskqueue callbacks.
The scope of these callbacks is primarily to support actions that affect the
taskqueue's thread environments.  They are entirely optional, and
consequently are introduced as a new API: taskqueue_set_callback().

This interface allows the caller to specify that a taskqueue requires a
callback and optional context pointer for a given callback type.

The callback types included in this commit can be used to register a
constructor and destructor for thread-local storage using osd(9).  This
allows a particular taskqueue to define that its threads require a specific
type of TLS, without the need for a specially-orchestrated task-based
mechanism for startup and shutdown in order to accomplish it.

Two callback types are supported at this point:

- TASKQUEUE_CALLBACK_TYPE_INIT, called by every thread when it starts, prior
  to processing any tasks.
- TASKQUEUE_CALLBACK_TYPE_SHUTDOWN, called by every thread when it exits,
  after it has processed its last task but before the taskqueue is
  reclaimed.

While I'm here:

- Add two new macros, TQ_ASSERT_LOCKED and TQ_ASSERT_UNLOCKED, and use them
  in appropriate locations.
- Fix taskqueue.9 to mention taskqueue_start_threads(), which is a required
  interface for all consumers of taskqueue(9).

Reviewed by:	kib (all), eadler (taskqueue.9), brd (taskqueue.9)
Approved by:	ken (mentor)
Sponsored by:	Spectra Logic
MFC after:	1 month
2013-03-23 15:11:53 +00:00
avg
1c06448efc post mountroot event after a real/final root is mounted
not every time an intermediate root (including the first devfs) is
mounted.
This is also consistent with waking up via root_mount_complete.

Reviewed by:	jhb
MFC after:	13 days
2013-03-23 08:59:34 +00:00
avg
036eb5da00 fbt_getargdesc: correctly handle types for return probes
MFC after:	6 days
2013-03-23 08:52:50 +00:00
avg
0f9660a6f0 fbt_typoff_init: fix an off by one in determining required memory size
This issue would be silent most of the time, but if the requested memory
is a multiple of a page size, then accessing one element beyond the end
would lead to a kernel page fault.
Otherwise, the unlucky last type would just be inaccessible.

Reported by:	glebius
Tested by:	glebius
MFC after:	6 days
2013-03-23 08:48:44 +00:00
delphij
b1bd4e80c4 Don't attempt to reference sc before testing whether it's NULL.
Submitted by:	Sascha Wildner
Obtained from:	DragonFly
MFC after:	2 weeks
2013-03-22 22:46:19 +00:00
mckusick
be2f56b8d7 The purpose of this change to the FFS layout policy is to reduce the
running time for a full fsck. It also reduces the random access time
for large files and speeds the traversal time for directory tree walks.

The key idea is to reserve a small area in each cylinder group
immediately following the inode blocks for the use of metadata,
specifically indirect blocks and directory contents. The new policy
is to preferentially place metadata in the metadata area and
everything else in the blocks that follow the metadata area.

The size of this area can be set when creating a filesystem using
newfs(8) or changed in an existing filesystem using tunefs(8).
Both utilities use the `-k held-for-metadata-blocks' option to
specify the amount of space to be held for metadata blocks in each
cylinder group. By default, newfs(8) sets this area to half of
minfree (typically 4% of the data area).

This work was inspired by a paper presented at Usenix's FAST '13:
www.usenix.org/conference/fast13/ffsck-fast-file-system-checker

Details of this implementation appears in the April 2013 of ;login:
www.usenix.org/publications/login/april-2013-volume-38-number-2.
A copy of the April 2013 ;login: paper can also be downloaded
from: www.mckusick.com/publications/faster_fsck.pdf.

Reviewed by: kib
Tested by:   Peter Holm
MFC after:   4 weeks
2013-03-22 21:45:28 +00:00
glebius
82edd7c363 Remove __FreeBSD_version ifdefs. 2013-03-22 20:44:16 +00:00
pjd
91184d303f - Constify local path variable for chflagsat().
- Use correct format characters (%lx) for u_long.

This fixes the build broken in r248599.
2013-03-22 07:40:34 +00:00
kevlo
0cbbbb7d30 Clean up some unused leftover code.
Pointed out by:	ae
2013-03-22 01:45:54 +00:00
kevlo
b0b955ade2 Remove unused global variables.
Reviewed by:	ae, glebius
2013-03-22 01:40:17 +00:00
smh
75c735d00a Fix for building libzpool under i386.
Reviewed by:	pjd (mentor)
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-03-21 23:06:11 +00:00
pjd
f44b21d5e5 Regenerate after r248599.
Sponsored by:	The FreeBSD Foundation
2013-03-21 23:02:19 +00:00
pjd
635dbe90f2 Implement chflagsat(2) system call, similar to fchmodat(2), but operates on
file flags.

Reviewed by:	kib, jilles
Sponsored by:	The FreeBSD Foundation
2013-03-21 22:59:01 +00:00
pjd
5fc1bac315 Regenerate after r248597.
Sponsored by:	The FreeBSD Foundation
2013-03-21 22:47:03 +00:00
pjd
2a3cf7f364 - Make 'flags' argument to chflags(2), fchflags(2) and lchflags(2) of type
u_long. Before this change it was of type int for syscalls, but prototypes
  in sys/stat.h and documentation for chflags(2) and fchflags(2) (but not
  for lchflags(2)) stated that it was u_long. Now some related functions
  use u_long type for flags (strtofflags(3), fflagstostr(3)).
- Make path argument of type 'const char *' for consistency.

Discussed on:	arch
Sponsored by:	The FreeBSD Foundation
2013-03-21 22:44:33 +00:00
kib
c966fdfb31 Correct the page count when excess length is trimmed from the bio.
Reported and tested by:	Ivan Klymenko <fidaj@ukr.net
2013-03-21 22:36:43 +00:00
jilles
bd09044d61 Allow O_CLOEXEC in posix_openpt() flags.
PR:		kern/162374
Reviewed by:	ed
2013-03-21 21:39:15 +00:00
attilio
83c8ef372d Fix a bug in UMTX_PROFILING:
UMTX_PROFILING should really analyze the distribution of locks as they
index entries in the umtxq_chains hash-table.
However, the current implementation does add/dec the length counters
for *every* thread insert/removal, measuring at all really userland
contention and not the hash distribution.

Fix this by correctly add/dec the length counters in the points where
it is really needed.

Please note that this bug brought us questioning in the past the quality
of the umtx hash table distribution.
To date with all the benchmarks I could try I was not able to reproduce
any issue about the hash distribution on umtx.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	jeff, davide
MFC after:	2 weeks
2013-03-21 19:58:25 +00:00
mav
6f03afeee9 Minimal timer period of 100us introduced in r244758 is overkill. While
original 2us are indeed not enough, 3us are working quite well on my tests.
To be more safe set minimal period to 5us and to be even more safe replicate
here from HPET mechanism of rereading counter after programming comparator.

This change allows to handle 30K of short nanosleep() calls per second on
Raspberry Pi instead of just 8K before.

Discussed with:	gonzo
2013-03-21 15:42:41 +00:00
jhb
1b6f4e466c Another NFS SIGSTOP related fix: Ignore thread suspend requests due to
SIGSTOP if stop signals are currently deferred.  This can occur if a
process is stopped via SIGSTOP while a thread is running or runnable
but before it has set TDF_SBDRY.

Tested by:	pho
Reviewed by:	kib
MFC after:	1 week
2013-03-21 14:06:27 +00:00
kib
0487ef2754 Fix twa(4) after the r246713. The driver copies data around to
satisfy some alignment restrictions.  Do not set TW_OSLI_REQ_FLAGS_CCB
flag for mapped data, pass the csio->data_ptr in the req->data.

Do not put the ccb pointer into req->data ever, ccb is stored in
req->orig_req already.

Submitted by:	Shuichi KITAGUCHI <ki@hh.iij4u.or.jp>
PR:	kern/177020
2013-03-21 13:06:28 +00:00
kib
7225171d66 Initialize the variable to avoid (false) compiler warning about
use of an uninitialized local.

Reported by:	Ivan Klymenko <fidaj@ukr.net>
MFC after:	2 weeks
2013-03-21 12:59:24 +00:00
smh
8bbd746275 Add missing descriptions for ZFS sysctls
Reviewed by:	pjd (mentor)
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-03-21 11:25:21 +00:00
smh
e419fea8b4 Optimisation of TRIM processing.
Previously TRIM processing was very bursty. This was made worse by the fact
that TRIM requests on SSD's are typically much slower than reads or writes.
This often resulted in stalls while large numbers of TRIM's where processed.

In addition due to the way the TRIM thread was only woken by writes, deletes
could stall in the queue for extensive periods of time.

This patch adds a number of controls to how often the TRIM thread for each
SPA processes its outstanding delete requests.
vfs.zfs.trim.timeout: Delay TRIMs by up to this many seconds
vfs.zfs.trim.txg_delay: Delay TRIMs by up to this many TXGs (reduced to 32)
vfs.zfs.vdev.trim_max_bytes: Maximum pending TRIM bytes for a vdev
vfs.zfs.vdev.trim_max_pending: Maximum pending TRIM segments for a vdev
vfs.zfs.trim.max_interval: Maximum interval between TRIM queue processing
(seconds)

Given the most common TRIM implementation is ATA TRIM the current defaults
are targeted at that.

Reviewed by:	pjd (mentor)
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-03-21 11:02:08 +00:00
smh
976f4808aa Names the ZFS TRIM thread
Reviewed by:	pjd (mentor)
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-03-21 10:41:30 +00:00
smh
ee3952780d TRIM cache devices based on time instead of TXGs.
Currently, the trim module uses the same algorithm for data and cache
devices when deciding to issue TRIM requests, based on how far in the
past the TXG is.

Unfortunately, this is not ideal for cache devices, because the L2ARC
doesn't use the concept of TXGs at all. In fact, when using a pool for
reading only, the L2ARC is written but the TXG counter doesn't
increase, and so no new TRIM requests are issued to the cache device.

This patch fixes the issue by using time instead of the TXG number as
the criteria for trimming on cache devices. The basic delay principle
stays the same, but parameters are expressed in seconds instead of
TXGs. The new parameters are named trim_l2arc_limit and
trim_l2arc_batch, and both default to 30 second.

Reviewed by:	pjd (mentor)
Approved by:	pjd (mentor)
Obtained from:	17122c31ac
MFC after:	2 weeks
2013-03-21 10:29:05 +00:00