189574 Commits

Author SHA1 Message Date
will
74c2859109 ZFS: Fix a panic while unmounting a busy filesystem.
This particular scenario was easily reproduced using a NFS export.  When the
first 'zfs unmount' occurred, it returned EBUSY via this path, while
vflush() had flushed references on the filesystem's root vnode, which in
turn caused its v_interlock to be destroyed.  The next time 'zfs unmount'
was called, vflush() tried to obtain this lock, which caused this panic.

Since vflush() on FreeBSD is a definitive call, there is no need to check
vfsp->vfs_count after it completes.  Simply #ifdef sun this check.

Submitted by:	avg
Reviewed by:	avg
Approved by:	ken (mentor)
MFC after:	1 month
2013-03-23 16:34:56 +00:00
will
5d3a27c743 Extend taskqueue(9) to enable per-taskqueue callbacks.
The scope of these callbacks is primarily to support actions that affect the
taskqueue's thread environments.  They are entirely optional, and
consequently are introduced as a new API: taskqueue_set_callback().

This interface allows the caller to specify that a taskqueue requires a
callback and optional context pointer for a given callback type.

The callback types included in this commit can be used to register a
constructor and destructor for thread-local storage using osd(9).  This
allows a particular taskqueue to define that its threads require a specific
type of TLS, without the need for a specially-orchestrated task-based
mechanism for startup and shutdown in order to accomplish it.

Two callback types are supported at this point:

- TASKQUEUE_CALLBACK_TYPE_INIT, called by every thread when it starts, prior
  to processing any tasks.
- TASKQUEUE_CALLBACK_TYPE_SHUTDOWN, called by every thread when it exits,
  after it has processed its last task but before the taskqueue is
  reclaimed.

While I'm here:

- Add two new macros, TQ_ASSERT_LOCKED and TQ_ASSERT_UNLOCKED, and use them
  in appropriate locations.
- Fix taskqueue.9 to mention taskqueue_start_threads(), which is a required
  interface for all consumers of taskqueue(9).

Reviewed by:	kib (all), eadler (taskqueue.9), brd (taskqueue.9)
Approved by:	ken (mentor)
Sponsored by:	Spectra Logic
MFC after:	1 month
2013-03-23 15:11:53 +00:00
des
a4c39d4efd Revert r247892 now that this has been fixed upstream. 2013-03-23 14:52:31 +00:00
mav
76fe1d6463 Make systat -vmstat to use suffixes to display big floating point numbers
that are not fitting into the specified field width, same as done for ints.
In particular that allows to properly display disk tps above 100k, that are
reachable with modern SSDs.
2013-03-23 13:11:54 +00:00
avg
1c06448efc post mountroot event after a real/final root is mounted
not every time an intermediate root (including the first devfs) is
mounted.
This is also consistent with waking up via root_mount_complete.

Reviewed by:	jhb
MFC after:	13 days
2013-03-23 08:59:34 +00:00
avg
880c47f8a7 dtrace: ensure that we can always catch a process (e.g. when -c is used)
It is not guaranteed that a program has a symbol table entry for main
and thus that it would be possible to set a breakpoint on it.

Reviewed by:	rpaulo
Discussed with:	rpaulo
MFC after:	13 days
2013-03-23 08:57:54 +00:00
gjb
6415d908e4 Revert r248639 to fix build failure on head/ 2013-03-23 08:57:14 +00:00
avg
036eb5da00 fbt_getargdesc: correctly handle types for return probes
MFC after:	6 days
2013-03-23 08:52:50 +00:00
avg
2d50e830b3 libdwarf: anonymous types are expected to have empty type names...
or no type attributes at all.
This is according to DWARF specification.

MFC after:	13 days
2013-03-23 08:50:56 +00:00
avg
0f9660a6f0 fbt_typoff_init: fix an off by one in determining required memory size
This issue would be silent most of the time, but if the requested memory
is a multiple of a page size, then accessing one element beyond the end
would lead to a kernel page fault.
Otherwise, the unlucky last type would just be inaccessible.

Reported by:	glebius
Tested by:	glebius
MFC after:	6 days
2013-03-23 08:48:44 +00:00
mckusick
32cda7dd8f Fix the build after addition of cylinder group cacheing (r248625)
Reported by:   Glen Barber (gjb@)
Pointy hat to: Kirk McKusick (mckusick@)
2013-03-23 07:57:30 +00:00
sbruno
ee156374ee Revert svn r248625
Clang errors around printf could be trivially fixed, but the breakage in
sbin/fsdb were to significant for this type of change.

Submitter of this changeset has been notified and hopefully this can be
restored soon.
2013-03-23 04:26:13 +00:00
adrian
1bcec8f048 Add AR9300 descriptor decoding. 2013-03-23 01:25:11 +00:00
delphij
b1bd4e80c4 Don't attempt to reference sc before testing whether it's NULL.
Submitted by:	Sascha Wildner
Obtained from:	DragonFly
MFC after:	2 weeks
2013-03-22 22:46:19 +00:00
mckusick
93fa1464f2 Speed up fsck by caching the cylinder group maps in pass1 so
that they do not need to be read again in pass5. As this nearly
doubles the memory requirement for fsck, the cache is thrown away
if other memory needs in fsck would otherwise fail. Thus, the
memory footprint of fsck remains unchanged in memory constrained
environments.

This work was inspired by a paper presented at Usenix's FAST '13:
www.usenix.org/conference/fast13/ffsck-fast-file-system-checker

Details of this implementation appears in the April 2013 of ;login:
www.usenix.org/publications/login/april-2013-volume-38-number-2.
A copy of the April 2013 ;login: paper can also be downloaded
from: www.mckusick.com/publications/faster_fsck.pdf.

Reviewed by: kib
Tested by:   Peter Holm
MFC after:   4 weeks
2013-03-22 21:50:43 +00:00
cognet
45f62f67d3 As it's done for libstdc++, use SJLJ-based exceptions on arm when we're not
using EABI, and use unwind-arm.h instead of unwind-generic.h when using EABI.
2013-03-22 21:50:32 +00:00
mckusick
be2f56b8d7 The purpose of this change to the FFS layout policy is to reduce the
running time for a full fsck. It also reduces the random access time
for large files and speeds the traversal time for directory tree walks.

The key idea is to reserve a small area in each cylinder group
immediately following the inode blocks for the use of metadata,
specifically indirect blocks and directory contents. The new policy
is to preferentially place metadata in the metadata area and
everything else in the blocks that follow the metadata area.

The size of this area can be set when creating a filesystem using
newfs(8) or changed in an existing filesystem using tunefs(8).
Both utilities use the `-k held-for-metadata-blocks' option to
specify the amount of space to be held for metadata blocks in each
cylinder group. By default, newfs(8) sets this area to half of
minfree (typically 4% of the data area).

This work was inspired by a paper presented at Usenix's FAST '13:
www.usenix.org/conference/fast13/ffsck-fast-file-system-checker

Details of this implementation appears in the April 2013 of ;login:
www.usenix.org/publications/login/april-2013-volume-38-number-2.
A copy of the April 2013 ;login: paper can also be downloaded
from: www.mckusick.com/publications/faster_fsck.pdf.

Reviewed by: kib
Tested by:   Peter Holm
MFC after:   4 weeks
2013-03-22 21:45:28 +00:00
glebius
82edd7c363 Remove __FreeBSD_version ifdefs. 2013-03-22 20:44:16 +00:00
jilles
b69a01f7fa rc.d/sysctl: Fix error messages about unknown OIDs.
There are three situations where the sysctl script is called:
1. "start", very early
2. "lastload", near the end of rc
3. "reload", at admin request while the system is booted

Ignore unknown OIDs in situation 1 because kernel modules may not be loaded
yet and complain about them in situations 2 and 3.

PR:		conf/174595
Submitted by:	Olivier Smedts
2013-03-22 20:12:25 +00:00
des
b291eafe8d Upgrade to OpenSSH 6.2p1. The most important new features are support
for a key revocation list and more fine-grained authentication control.
2013-03-22 17:55:38 +00:00
des
19db167f41 Retire the mislabeled ENABLE_SUID_SSH knob. 2013-03-22 14:10:15 +00:00
mm
5ee0a7b76c MFV r248590,248594:
Update libarchive to 3.1.2

Some of new features:
  - support for lrzip and grzip compression
  - support for writing tar v7 format
  - b64encode and uuencode filters
  - support for __MACOSX directory in Zip archives
  - support for lzop compresion (external utility)
2013-03-22 13:36:03 +00:00
des
5a4dbb8332 Vendor import of OpenSSH 6.2p1. 2013-03-22 11:19:48 +00:00
mm
af89cb16bf Replace deprecated (or remove obsolete) libarchive 2.8 functions
with libarchive 3.0 counterparts
2013-03-22 10:17:42 +00:00
pjd
91184d303f - Constify local path variable for chflagsat().
- Use correct format characters (%lx) for u_long.

This fixes the build broken in r248599.
2013-03-22 07:40:34 +00:00
kevlo
0cbbbb7d30 Clean up some unused leftover code.
Pointed out by:	ae
2013-03-22 01:45:54 +00:00
kevlo
b0b955ade2 Remove unused global variables.
Reviewed by:	ae, glebius
2013-03-22 01:40:17 +00:00
pjd
c93f0c9d3c Update regression tests after adding chflagsat(2).
Sponsored by:	The FreeBSD Foundation
2013-03-21 23:07:04 +00:00
smh
75c735d00a Fix for building libzpool under i386.
Reviewed by:	pjd (mentor)
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-03-21 23:06:11 +00:00
pjd
01401cc9bc Document chflagsat(2).
Obtained from:	jilles
2013-03-21 23:05:44 +00:00
pjd
f44b21d5e5 Regenerate after r248599.
Sponsored by:	The FreeBSD Foundation
2013-03-21 23:02:19 +00:00
pjd
635dbe90f2 Implement chflagsat(2) system call, similar to fchmodat(2), but operates on
file flags.

Reviewed by:	kib, jilles
Sponsored by:	The FreeBSD Foundation
2013-03-21 22:59:01 +00:00
pjd
5fc1bac315 Regenerate after r248597.
Sponsored by:	The FreeBSD Foundation
2013-03-21 22:47:03 +00:00
pjd
2a3cf7f364 - Make 'flags' argument to chflags(2), fchflags(2) and lchflags(2) of type
u_long. Before this change it was of type int for syscalls, but prototypes
  in sys/stat.h and documentation for chflags(2) and fchflags(2) (but not
  for lchflags(2)) stated that it was u_long. Now some related functions
  use u_long type for flags (strtofflags(3), fflagstostr(3)).
- Make path argument of type 'const char *' for consistency.

Discussed on:	arch
Sponsored by:	The FreeBSD Foundation
2013-03-21 22:44:33 +00:00
kib
c966fdfb31 Correct the page count when excess length is trimmed from the bio.
Reported and tested by:	Ivan Klymenko <fidaj@ukr.net
2013-03-21 22:36:43 +00:00
jilles
bd09044d61 Allow O_CLOEXEC in posix_openpt() flags.
PR:		kern/162374
Reviewed by:	ed
2013-03-21 21:39:15 +00:00
attilio
83c8ef372d Fix a bug in UMTX_PROFILING:
UMTX_PROFILING should really analyze the distribution of locks as they
index entries in the umtxq_chains hash-table.
However, the current implementation does add/dec the length counters
for *every* thread insert/removal, measuring at all really userland
contention and not the hash distribution.

Fix this by correctly add/dec the length counters in the points where
it is really needed.

Please note that this bug brought us questioning in the past the quality
of the umtx hash table distribution.
To date with all the benchmarks I could try I was not able to reproduce
any issue about the hash distribution on umtx.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	jeff, davide
MFC after:	2 weeks
2013-03-21 19:58:25 +00:00
mm
366f42737c Update libarchive's vendor dist to version 3.1.2 from release branch.
Git branch:	release
Git commit:	19f23e191f9d3e1dd2a518735046100419965804

Obtained from:	https://github.com/libarchive/libarchive.git
2013-03-21 18:59:02 +00:00
glebius
22bd645df5 Document some flags to the uma_zcreate(). Not all flags are documented,
only those that at least are used in the kernel, or that definitely
work.
2013-03-21 16:19:46 +00:00
glebius
04d26633fa Document uma_find_refcnt(). 2013-03-21 16:04:34 +00:00
mav
6f03afeee9 Minimal timer period of 100us introduced in r244758 is overkill. While
original 2us are indeed not enough, 3us are working quite well on my tests.
To be more safe set minimal period to 5us and to be even more safe replicate
here from HPET mechanism of rereading counter after programming comparator.

This change allows to handle 30K of short nanosleep() calls per second on
Raspberry Pi instead of just 8K before.

Discussed with:	gonzo
2013-03-21 15:42:41 +00:00
jhb
1b6f4e466c Another NFS SIGSTOP related fix: Ignore thread suspend requests due to
SIGSTOP if stop signals are currently deferred.  This can occur if a
process is stopped via SIGSTOP while a thread is running or runnable
but before it has set TDF_SBDRY.

Tested by:	pho
Reviewed by:	kib
MFC after:	1 week
2013-03-21 14:06:27 +00:00
kib
0487ef2754 Fix twa(4) after the r246713. The driver copies data around to
satisfy some alignment restrictions.  Do not set TW_OSLI_REQ_FLAGS_CCB
flag for mapped data, pass the csio->data_ptr in the req->data.

Do not put the ccb pointer into req->data ever, ccb is stored in
req->orig_req already.

Submitted by:	Shuichi KITAGUCHI <ki@hh.iij4u.or.jp>
PR:	kern/177020
2013-03-21 13:06:28 +00:00
glebius
128b1093e9 Document NGM_NAT_LIBALIAS_INFO.
Submitted by:	Dmitry Luhtionov <dmitryluhtionov gmail.com>
2013-03-21 13:02:43 +00:00
kib
7225171d66 Initialize the variable to avoid (false) compiler warning about
use of an uninitialized local.

Reported by:	Ivan Klymenko <fidaj@ukr.net>
MFC after:	2 weeks
2013-03-21 12:59:24 +00:00
eadler
a43818500f Remove a reference to instant-server which has been removed from the
ports tree in r313427.

PR:		177012
Submitted by:	Kevin Zheng <kevinz5000@gmail.com>
Approved by:	bcr (mentor)
2013-03-21 12:42:25 +00:00
smh
8bbd746275 Add missing descriptions for ZFS sysctls
Reviewed by:	pjd (mentor)
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-03-21 11:25:21 +00:00
joel
048fb92f58 Remove EOL whitespace. 2013-03-21 11:22:13 +00:00
smh
e419fea8b4 Optimisation of TRIM processing.
Previously TRIM processing was very bursty. This was made worse by the fact
that TRIM requests on SSD's are typically much slower than reads or writes.
This often resulted in stalls while large numbers of TRIM's where processed.

In addition due to the way the TRIM thread was only woken by writes, deletes
could stall in the queue for extensive periods of time.

This patch adds a number of controls to how often the TRIM thread for each
SPA processes its outstanding delete requests.
vfs.zfs.trim.timeout: Delay TRIMs by up to this many seconds
vfs.zfs.trim.txg_delay: Delay TRIMs by up to this many TXGs (reduced to 32)
vfs.zfs.vdev.trim_max_bytes: Maximum pending TRIM bytes for a vdev
vfs.zfs.vdev.trim_max_pending: Maximum pending TRIM segments for a vdev
vfs.zfs.trim.max_interval: Maximum interval between TRIM queue processing
(seconds)

Given the most common TRIM implementation is ATA TRIM the current defaults
are targeted at that.

Reviewed by:	pjd (mentor)
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-03-21 11:02:08 +00:00
smh
976f4808aa Names the ZFS TRIM thread
Reviewed by:	pjd (mentor)
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-03-21 10:41:30 +00:00