When coding the pNFS server, I added several vn_start_write() calls done
while the vnode was locked, not realizing I had introduced LORs and
possible deadlock when an exported file system on the MDS is suspended.
This patch fixes this by removing the added vn_start_write() calls and
modifying the code so that the extant vn_start_write() call before the
NFS RPC/operation is done when needed by the pNFS server.
Flags are changed so that LayoutCommit and LayoutReturn now get a
vn_start_write() done for them.
When the pNFS server is enabled, the code now also changes the flags for
Getattr, so that the vn_start_write() is done for Getattr, since it may
need to do a vn_set_extattr(). The nfs_writerpc flag array was made global
to the NFS server and renamed nfsrv_writerpc, which is consistent naming
for globals in the NFS server.
Thanks go to kib@ for reporting that doing vn_start_write() while the vnode is
locked results in a LOR.
This patch only affects the behaviour of the pNFS server.
Some internal KASSERTs access the v_iflag field without the vnode
interlock held after such a refcount update. The fences are needed for
the assertions to be correct in the face of store reordering.
Reported and tested by: jhibbits
Reviewed by: kib, mjg
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D16756
Relax allocation throttling for ditto blocks. Due to random imbalances
in allocation it tends to push block copies to one vdev, that looks
slightly better at the moment. Slightly less strict policy allows both
improve data security and surprisingly write performance, since we don't
need to touch extra metaslabs on each vdev to respect the min distance.
Sponsored by: iXsystems, Inc.
Use METASLAB_WEIGHT_CLAIM weight to allocate tertiary blocks.
Previous use of METASLAB_WEIGHT_SECONDARY for that caused errors
later on metaslab_activate_allocator() call, leading to massive
load of unneeded metaslabs and write freezes.
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Similar to the network stack issue fixed in r337782 pf did not limit the number
of fragments per packet, which could be exploited to generate high CPU loads
with a crafted series of packets.
Limit each packet to no more than 64 fragments. This should be sufficient on
typical networks to allow maximum-sized IP frames.
This addresses the issue for both IPv4 and IPv6.
MFC after: 3 days
Security: CVE-2018-5391
Sponsored by: Klara Systems
Capsicum in past allowed to change the process title.
This was broken with r335939.
PR: 230584
Submitted by: Yuichiro NAITO <naito.yuichiro@gmail.com>
Reported by: ian@niw.com.au
MFC after: 1 week
When a pNFS service is running, the size of the files created on the MDS
are normally 0, since the data is written to the data files on the DS(s).
However, without this patch, if a Setattr with a non-zero size was done by
a client, the MDS file was set to that size. This was thought to be benign,
but it turns out that files with a non-zero size plus extended attributes
can cause a "ffs_truncate3" panic in UFS. Although the exact cause of this
panic() has not been isolated, this patch avoids the panic() and leaves
the MDS files in a consistent state of always having a size == 0.
Note that these MDS files never store data. The patch also includes an
unnecessary initialization of savsize in case some compiler or static
analyser complains it might not be initialized.
This patch only affects the NFS server when pNFS is enabled via the "-p"
command line option on nfsd.
Fix a regression introduced in r336439.
Rather than allowing any linked list of algorithms, allow at most two
(typically, some combination of encrypt and/or MAC). Removes a WAITOK
malloc in an unsleepable context (classic LOR) by placing both software
algorithm contexts within the OCF-managed session object.
Tested with 'cryptocheck -a all -d cryptosoft0', which includes some
encrypt-and-MAC modes.
PR: 230304
Reported by: sef@
Summary:
PowerISA 3.0 adds a 'darn' instruction to "deliver a random number". This
driver was modeled after (rather, copied and gutted of) the Ivy Bridge
rdrand driver.
This uses the "Conditional Random Number" behavior to remove input bias.
From the ISA reference the 'darn' instruction, and the random number
generator backing it, conforms to the NIST SP800-90B and SP800-90C
standards, compliant to the extent possible at the time the hardware was
designed, and guarantees a minimum 0.5 bits of entropy per bit returned.
Reviewed By: markm, secteam (delphij)
Approved by: secteam (delphij)
Differential Revision: https://reviews.freebsd.org/D16552
This change allows one to set kern.boot_tag="" and not get a blank line
preceding other boot messages. While this isn't super critical- blank lines
are easy to filter out both mentally and in processing dmesg later- it
allows for a mode of operation that matches previous behavior.
I intend to MFC this whole series to stable/11 by the end of the month with
boot_tag empty by default to make this effectively a nop in the stable
branch.
Missed in r337940.
(It's not like there are any crypto files IPsec doesn't pull in, so it is
unclear what not defining the crypto option was supposed to achieve.)
Reported by: np@
The wrapper is a thin shim around libsodium's Poly-1305 implementation. For
now, we just use the C algorithm and do not attempt to build the
SSE-optimized variant for x86 processors.
The algorithm support has not yet been plumbed through cryptodev, or added
to cryptosoft.
The idea is untouched upstream sources live in sys/contrib/libsodium.
sys/crypto/libsodium are support routines or compatibility headers to allow
building unmodified upstream code.
This is not yet integrated into the build system, so no functional change.
Bring in https://github.com/jedisct1/libsodium at
461ac93b260b91db8ad957f5a576860e3e9c88a1 (August 7, 2018), unmodified.
libsodium is derived from Daniel J. Bernstein et al.'s 2011 NaCl
("Networking and Cryptography Library," pronounced "salt") software library.
At the risk of oversimplifying, libsodium primarily exists to make it easier
to use NaCl. NaCl and libsodium provide high quality implementations of a
number of useful cryptographic concepts (as well as the underlying
primitics) seeing some adoption in newer network protocols.
I considered but dismissed cleaning up the directory hierarchy and
discarding artifacts of other build systems in favor of remaining close to
upstream (and easing future updates).
Nothing is integrated into the build system yet, so in that sense, no
functional change.
toe_l2_resolve to fill up the complete vtag and not just the vid.
Reviewed by: kib@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D16752
jails since FreeBSD 7.
Along with the system call, put the various security.jail.allow_foo and
security.jail.foo_allowed sysctls partly under COMPAT_FREEBSD11 (or
BURN_BRIDGES). These sysctls had two disparate uses: on the system side,
they were global permissions for jails created via jail(2) which lacked
fine-grained permission controls; inside a jail, they're read-only
descriptions of what the current jail is allowed to do. The first use
is obsolete along with jail(2), but keep them for the second-read-only use.
Differential Revision: D14791
ZoL did the same mistake, and fixed it with separate commit 863522b1f9:
dsl_scan_scrub_cb: don't double-account non-embedded blocks
We were doing count_block() twice inside this function, once
unconditionally at the beginning (intended to catch the embedded block
case) and once near the end after processing the block.
The double-accounting caused the "zpool scrub" progress statistics in
"zpool status" to climb from 0% to 200% instead of 0% to 100%, and
showed double the I/O rate it was actually seeing.
This was apparently a regression introduced in commit 00c405b4b5e8,
which was an incorrect port of this OpenZFS commit:
https://github.com/openzfs/openzfs/commit/d8a447a7
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Steven Noonan <steven@uplinklabs.net>
Closes#7720Closes#7738
Reported by: sef
This is actually several different bugs:
- The code is not designed to handle inpcb deletion after interface deletion
- add reference for inpcb membership
- The multicast address has to be removed from interface lists when the refcount
goes to zero OR when the interface goes away
- decouple list disconnect from refcount (v6 only for now)
- ifmultiaddr can exist past being on interface lists
- add flag for tracking whether or not it's enqueued
- deferring freeing moptions makes the incpb cleanup code simpler but opens the
door wider still to races
- call inp_gcmoptions synchronously after dropping the the inpcb lock
Fundamentally multicast needs a rewrite - but keep applying band-aids for now.
Tested by: kp
Reported by: novel, kp, lwhsu
created and before exec.start is called. [1]
- Bump __FreeBSD_version.
This allows to attach ZFS datasets and various other things to be
done before any command/service/rc-script is started in the new
jail.
PR: 228066 [1]
Reviewed by: jamie [1]
Submitted by: Stefan Grönke <stefan@gronke.net> [1]
Differential Revision: https://reviews.freebsd.org/D15330 [1]
So that I don't have to keep grepping around the codebase to remember what each
one does. And maybe it saves someone else some time.
Fix a trivial whitespace issue while here.
No functional change.
Sponsored by: Dell EMC Isilon