r156418:
Don't set IN_CHANGE and IN_UPDATE on inodes for potentially suspended
file systems. This could cause deadlocks when creating snapshots.
(We can't do snapshots on ext2fs but it is useful to keep things in sync).
r183079:
- Only set i_offset in the parent directory's i-node during a lookup for
non-LOOKUP operations.
- Relax a VOP assertion for a DELETE lookup.
r187528:
Move the code from ufs_lookup.c used to do dotdot lookup, into
the helper function. It is supposed to be useful for any filesystem
that has to unlock dvp to walk to the ".." entry in lookup routine.
MFC after: 5 days
- Reconnect with some minor modifications, in particular now selsocket()
internals are adapted to use sbintime units after recent'ish calloutng
switch.
The filedesc lock may not be dropped unconditionally before exporting
fd to sbuf: fd might go away during execution. While it is ok for
DTYPE_VNODE and DTYPE_FIFO because the export is from a vrefed vnode
here, for other types it is unsafe.
Instead, drop the lock in export_fd_to_sb(), after preparing data in
memory and before writing to sbuf.
Spotted by: mjg
Suggested by: kib
Review by: kib
MFC after: 1 week
some general word-smithing.
- Don't claim that adaptive mutexes have a timeout (they don't).
- Don't treat pool mutexes as a separate primitive in a few places.
- Describe sleepable read-mostly locks as a separate lock type and add
them to the various tables.
- Don't claim that sx locks are less efficient. That hasn't been true in
a few years now.
- Describe lockmanager locks next to sx locks since they are very similar
in terms of rules, etc., and so that all the lock primitives are
grouped together before the non-lock primitives.
- Similarly, move the section on Giant after the description of all the
non-lock primitives to preserve grouping.
- Condition variables work on several types of locks, not just mutexes.
- Add a bit of language to compare/contrast condition variables with
sleep/wakeup.
- Add a note about why pause(9) is unique.
- Add some language to define bounded vs unbounded sleeps and explain
why they are treated separately (bounded sleeps only need CPU time
to make forward progress).
- Don't state that using mtx_sleep() is a bad idea. It is in fact rather
necessary.
- Rework the interaction table a bit. First, it did not include really
include sleepable rmlocks and it left out lockmgr entirely. To get
things to fit, combine similar lock types into the same column / row,
and explicitly state what "sleep" means. The notes about recursion
and lock order were also a bit banal (lock order is always important,
not just in the few places annotated here), so remove them. In
particular, the lock order note would need to be on just about every
cell. If we want to document recursion I think a better approach
would be a separate table summarizing the recursion rules for each
lock as having too many notes clutters the table.
- Tweak the tables to use less indentation so everything still fits with
the added columns.
- Correct a few cells in the context mode table.
- Use mdoc markup instead of explicit markup in a few places.
Requested by: julian
MFC after: 2 weeks
deadlkres was using a reversed test to check whether ticks had rolled over.
This meant that deadlkres could only fire after ticks had rolled over.
This test was actually unnecessary as deadlkres only ever took the
difference of ticks values which is safe even in the presence of ticks
rollover. Remove the tests entirely. Now deadlkres will properly fire
after a lock has been held after the timeout period.
MFC after: 1 month
This could happen if a thread doing a page-in loses a ZFS range lock
race to a thread writing to the same range
This fixes "panic: vm_page_alloc: pindex already allocated" in
http://docs.FreeBSD.org/cgi/mid.cgi?1372165971.96049.42.camel
Submitted by: avg
MFC after: 1 week
bhyve process when an unhandled one is encountered.
Hide some additional capabilities from the guest (e.g. debug store).
This fixes the issue with FreeBSD 9.1 MP guests exiting the VM on
AP spinup (where CPUID is used when sync'ing the TSCs) and the
issue with the Java build where CPUIDs are issued from a guest
userspace.
Submitted by: tycho nightingale at pluribusnetworks com
Reviewed by: neel
Reported by: many
originally inspired by the Solaris vmem detailed in the proceedings
of usenix 2001. The NetBSD version was heavily refactored for bugs
and simplicity.
- Use this resource allocator to allocate the buffer and transient maps.
Buffer cache defrags are reduced by 25% when used by filesystems with
mixed block sizes. Ultimately this may permit dynamic buffer cache
sizing on low KVA machines.
Discussed with: alc, kib, attilio
Tested by: pho
Sponsored by: EMC / Isilon Storage Division
function name of its corresponding DTrace probes. These descriptions may
contain whitespace, but probe names cannot, so just replace any whitespace
with underscores when creating probes.
MFC after: 1 week
bug where a PCI device would be powered down if it failed to probe, but
not when its driver was detached (e.g. via kldunload).
- Add a new helper method resource_list_release_active() which forcefully
releases any active resources of a specified type from a resource list.
- Add a bus_child_detached method for the PCI bus driver which forces any
active resources to be released (and whines to the console if it finds
any) and then powers the device down.
- Call pci_child_detached() if we fail to probe a device when a driver
is kldloaded. This isn't perfect but can avoid leaking resources
from a probe() routine in the kldload case.
Reviewed by: imp, brooks
MFC after: 1 month
past a trapframe.
Use this macro in exception_exit as it is the function the unwinder enters
as the functions that store the frame setting lr to point to it.
device names "md" or "md[0-9]*" and a "file" option are specified in
/etc/fstab like this:
md none swap sw,file=/swap.bin 0 0
- Add GBDE/GELI encrypted swap space specification support, which
rc.d/encswap supported. The /etc/fstab lines are like the following:
/dev/ada1p1.bde none swap sw 0 0
/dev/ada1p2.eli none swap sw 0 0
.eli devices accepts aalgo, ealgo, keylen, and sectorsize as options.
swapctl(8) can understand an encrypted device in the command line
like this:
# swapctl -a /dev/ada2p1.bde
- "-L" flag is added to support "late" option to defer swapon until
rc.d/mountlate runs.
- rc.d script change:
rc.d/encswap -> removed
rc.d/addswap -> just display a warning message if $swapfile is defined
rc.d/swap1 -> renamed to rc.d/swap
rc.d/swaplate -> newly added to support "late" option
These changes alleviate a race condition between device creation/removal
and swapon/swapoff.
MFC after: 1 week
Reviewed by: wblock (manual page)
a new firmware command.
NVMe controllers may support up to 7 firmware slots for storing of
different firmware revisions. This new firmware command supports
firmware replacement (i.e. firmware download) with or without immediate
activation, or activation of a previously stored firmware image. It
also supports selection of the firmware slot during replacement
operations, using IDENTIFY information from the controller to
check that the specified slot is valid.
Newly activated firmware does not take effect until the new controller
reset, either via a reboot or separate 'nvmecontrol reset' command to the
same controller.
Submitted by: Joe Golio <joseph.golio@emc.com>
Obtained from: EMC / Isilon Storage Division
MFC after: 3 days
This includes pretty printers for all of the standard NVMe log pages
(Error, SMART/Health, Firmware), as well as hex output for non-standard
or vendor-specific log pages.
Submitted by: Joe Golio <joseph.golio@emc.com>
Obtained from: EMC / Isilon Storage Division
MFC after: 3 days
commands.
Also improve the checking of device node names, so that better error
messages are displayed when incorrect names are specified.
Sponsored by: Intel
MFC after: 3 days
max transfer size. This guards against rogue commands coming in from
userspace.
Also add KASSERTS for the virtual address and unmapped bio cases, if the
transfer size exceeds the controller's max transfer size.
Sponsored by: Intel
MFC after: 3 days
Also allow admin commands to transfer up to this maximum I/O size, rather
than the artificial limit previously imposed. The larger I/O size is very
beneficial for upcoming firmware download support. This has the added
benefit of simplifying the code since both admin and I/O commands now use
the same maximum I/O size.
Sponsored by: Intel
MFC after: 3 days