The only operation which is prevented by the hold is the kernel stack
swapout for the faulted thread, which should be fine to allow.
Remove useless checks for NULL curproc or curproc->p_vmspace from the
trap_pfault() wrappers on x86 and powerpc.
Reviewed by: alc (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
illumos/illumos-gate@cf6106c8a0https://www.illumos.org/issues/5987
The existing ZFS prefetch code (dmu_zfetch.c) has some problems:
1. It's nearly impossible to understand. e.g. there are an abundance of kstats
but it's hard to know what they mean (see below).
2. For some workloads, it detects patterns that aren't really there (e.g.
strided patterns, backwards scans), and generates needless i/os prefetching
blocks that will never be referenced.
3. It has lock contention issues. These are caused primarily by
dmu_zfetch_colinear() calling dmu_zfetch_dofetch() (which can block waiting for
i/o) with the zf_rwlock held for writer, thus blocking all other threads
accessing this file.
I suggest that we rewrite this code to detect only forward, sequential streams.
[... truncated ...]
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Gordon Ross <gordon.ross@nexenta.com>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@1437283407https://www.illumos.org/issues/5997
ZFS already supports storing the vdev FRU in a vdev property. There is code in
libzfs to work with this property, and there is code in the zfs-retire FMA
module that looks for that information. But there is no code actually setting
or updating the FRU.
To address this, ZFS is changed to send a handful of new events whenever a vdev
is added, attached, cleared, or onlined, as well as when a pool is created or
imported. The syseventd zfs module will handle these and update the FRU field
when necessary.
Reviewed by: Dan Fields <dan.fields@nexenta.com>
Reviewed by: Josef Sipek <josef.sipek@nexenta.com>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Hans Rosenfeld <hans.rosenfeld@nexenta.com>
Porting notes: only the kernel bits for the new events are imported
CTL HA functionality was originally implemented by Copan many years ago,
but large part of the sources was never published. This change includes
clean room implementation of the missing code and fixes for many bugs.
This code supports dual-node HA with ALUA in four modes:
- Active/Unavailable without interlink between nodes;
- Active/Standby with second node handling only basic LUN discovery and
reservation, synchronizing with the first node through the interlink;
- Active/Active with both nodes processing commands and accessing the
backing storage, synchronizing with the first node through the interlink;
- Active/Active with second node working as proxy, transfering all
commands to the first node for execution through the interlink.
Unlike original Copan's implementation, depending on specific hardware,
this code uses simple custom TCP-based protocol for interlink. It has
no authentication, so it should never be enabled on public interfaces.
The code may still need some polishing, but generally it is functional.
Relnotes: yes
Sponsored by: iXsystems, Inc.
This silence a warning brought up by valgrind whenever if_nametoindex
is used. This was already discussed in PR 166483, but the code
committed in r234329 guards the initilization with #ifdef PURIFY.
Therefore, valgrind still complains. Since this code is not performance
critical, always zero out the local variable to silence valgrind.
PR: 166483
Discussed with: eadler@
MFC after: 4 weeks
Auto-tuning threshold discussions aside, it turns out that if you want
to lower this on say, rather memory-packed machines, you either set maxusers
or kern.maxfiles, or you set it in sysctl. The former is a non-exact
way to tune this; the latter doesn't actually affect anything in the
startup scripts.
This first occured because I wondered why the hell screen would take upwards
of 10 seconds to spawn a new screen. I then found python doing the same
thing during fork/exec of child processes - it calls close() on each FD
up to the current openfiles limit. On a 1TB machine this is like, 26 million
FDs per process. Ugh.
So:
* This allows it to be set early in /boot/loader.conf;
* It can be used to work around the ridiculous situation of
screen, python, etc doing a close() on potentially millions of FDs
even though you only have four open.
Tested:
* 4GB, 32GB, 64GB, 128GB, 384GB, 1TB systems with autotune, ensuring
screen and python forking doesn't result in some pretty hilariously
bad behaviour.
TODO:
* Note that the default login.conf sets openfiles-cur to unlimited,
effectively obeying kern.maxfilesperproc. Perhaps we should fix
this.
* .. and even if we do, we need to also ensure that daemons get
a soft limit of something reasonable and capped - they can request
more FDs themselves.
MFC after: 1 week
Sponsored by: Norse Corp, Inc.
getppid() after a debugger process that is not the parent has attached.
Reviewed by: kib (earlier version)
Differential Revision: https://reviews.freebsd.org/D3615
named node, open(2) cannot create directories. But do allow the flag
combination to succeed if the directory already exists.
Declare the open("name", O_DIRECTORY | O_CREAT | O_EXCL) always
invalid for the same reason, since open(2) cannot create directory.
Note that there is an argument that O_DIRECTORY | O_CREAT should be
invalid always, regardless of the target directory existence or
O_EXCL. The current fix is conservative and allows the call to
succeed in the situation where it succeeded before the patch.
Reported by: Tom Ridge <freebsd@tom-ridge.com>
Reviewed by: rwatson
PR: 202892
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
* Fail when the length passed in is 0
* Remove an unneeded increment of the count on success
* Return ENAMETOOLONG when the input pointer is too long
Sponsored by: ABT Systems Ltd
mapped address without valid pte installed, when parallel wiring of
the entry happen. The entry must be copy on write. If entry is COW
but was already copied, and parallel wiring set
MAP_ENTRY_IN_TRANSITION, vm_fault() would sleep waiting for the
MAP_ENTRY_IN_TRANSITION flag to clear. After that, the fault handler
is restarted and vm_map_lookup() or vm_map_lookup_locked() trip over
the check. Note that this is race, if the address is accessed after
the wiring is done, the entry does not fault at all.
There is no reason in the current kernel to disallow write access to
the COW wired entry if the entry permissions allow it. Initially this
was done in r24666, since that kernel did not supported proper
copy-on-write for wired text, which was fixed in r199869. The r251901
revision re-introduced the r24666 fix for the current VM.
Note that write access must clear MAP_ENTRY_NEEDS_COPY entry flag by
performing COW. In reverse, when MAP_ENTRY_NEEDS_COPY is set in
vmspace_fork(), the MAP_ENTRY_USER_WIRED flag is cleared. Put the
assert stating the invariant, instead of returning the error.
Reported and debugging help by: peter
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Note that to not interfer with finger print it expects a signature on pkg itself
which is named pkg.txz.pubkeysign
To genrate it:
echo -n "$(sha256 -q pkg.txz)" | openssl dgst -sha256 -sign /thekey \
-binary -out ./pkg.txz.pubkeysig
Note the "echo -n" which prevent signing the '\n' one would get otherwise
PR: 202622
MFC after: 1 week
locking and doesn't sleep. Flag the consumer we create as such. In
addition, decrement the in flight index when we have an out of memory
error after having incremented it previously. This would have
prevented swapoff from working if the swap pager ever hit a resource
shortage trying to swap out something (the swap in path always waits
for a bio, so won't have this issue). Simplify the close logic by
abandoning the use of private and initializing the index to 1 and
dropping that reference when we previously set private.
Also, set sw_id only while sw_dev_mtx is held. This should only affect
swapping to a vnode, as opposed to a geom whose close always sets it to
NULL with sw_dev_mtx held.
Differential Review: https://reviews.freebsd.org/D3547
vendor supplied device trees contain the needed properties for us to select
the correct uart to use as the kernel console.
An example of this would be to add the following to loader.conf.
hw.fdt.console="/smb/uart@f7113000"
The intention of this is slightly different than the existing
hw.uart.console option. The new option will mean the boot serial
configuration will be derived from the device node, while the existing
option expects the user to configure all this themselves.
Further work is planned to allow the uart configuration to be set based on
the stdout-path property devicetree bindings.
Sponsored by: ABT Systems Ltd
Differential Revision: https://reviews.freebsd.org/D3559