This is aimed at preventing stacked filesystems like nullfs and unionfs
from "losing" their lower mounts due to forced unmount. Otherwise,
VFS operations that are passed through to the lower filesystem(s) may
crash or otherwise cause unpredictable behavior.
Introduce two new functions: vfs_pin_from_vp() and vfs_unpin().
which are intended to be called on the lower mount(s) when the stacked
filesystem is mounted and unmounted, respectively.
Much as registration in the mnt_uppers list previously did, pinning
will prevent even forced unmount of the lower FS and will allow the
stacked FS to freely operate on the lower mount either by direct
use of the struct mount* or indirect use through a properly-referenced
vnode's v_mount field.
vfs_pin_from_vp() is modeled after vfs_ref_from_vp() in that it uses
the mount interlock coupled with re-checking vp->v_mount to ensure
that it will fail in the face of a pending unmount request, even if
the concurrent unmount fully completes.
Adopt these new functions in both nullfs and unionfs.
Reviewed By: kib, markj
Differential Revision: https://reviews.freebsd.org/D30401
Commit d224f05fcf pre-parsed the next operation number for
the put file handle operations. This patch uses this next
operation number, plus the type of the file handle being set by
the put file handle operation, to implement the rules in RFC5661
Sec. 2.6 with respect to replying NFSERR_WRONGSEC.
This patch also adds a check to see if NFSERR_WRONGSEC should be
replied when about to perform Lookup, Lookupp or Open with a file
name component, so that the NFSERR_WRONGSEC reply is done for
these operations, as required by RFC5661 Sec. 2.6.
This patch does not have any practical effect for the FreeBSD NFSv4
client and I believe that the same is true for the Linux client,
since NFSERR_WRONGSEC is considered a fatal error at this time.
MFC after: 2 weeks
When ntpd is synchronizing the system time, it also periodically (30m)
syncs the the RTC time. Remove printf in rk805_settime which triggers
every 30m, as settime_task_func() will log errors under bootverbose.
We leave the RTC Read logging, which should happen only once at boot.
Commit message by: imp
Reviewed by: manu, imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30361
Add more raditap definitions based on "names" found in actual drivers
and based on documentation from radiotap.org (where avail).
Leave one specific "duplicate" in the LinuxKPI implementation but
otherwise manage it all in net80211.
Sponsored by: The FreeBSD Foundation
MFC after: 10 days
Reviewed by: hselasky, adrian, sam
Differential Revision: https://reviews.freebsd.org/D30641
Nanopi4 based SoCs (NanoPC-T4, NanoPi M4*, and NanoPi Neo4) have
assigned-clock* in the pcie_phy node. Handle them but only fail
in case clk_set_assigned() returns an error other than
"no assigned-clock*" (as it would for all other SoCs).
Reviewed by: manu
MFC After: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30363
Normally raw interrupt handler is provided by the kernel text. But
vmbus module registers its own handler that needs to be mapped into
userspace mapping on PTI kernels.
Reported and reviewed by: whu
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30310
Commit 947bd2479b added support for the Secinfo_no_name operation.
When a non-exported file system is being traversed, the list of
security flavors is empty. It turns out that the Linux client
mount attempt fails when the security flavors list in the
Secinfo_no_name reply is empty.
This patch modifies Secinfo/Secinfo_no_name so that it replies
with all four security flavors when the list is empty.
This fixes Linux NFSv4.1/4.2 mounts when the file system at
the NFSv4 root (as specified on a V4: exports(5) line) is
not exported.
MFC after: 2 weeks
ip6_setpktopts() can look up ifnets via ifnet_by_index(), which
is only safe in the net epoch. Ensure that callers are in the net
epoch before calling this function.
Sponsored by: Dell EMC Isilon
MFC after: 4 weeks
Reviewed by: donner, kp
Differential Revision: https://reviews.freebsd.org/D30630
So it turns out that my fix before was not correct. It ended with us failing
some of the "improved" SYN tests, since we are not in the correct states.
With more digging I have figured out the root of the problem is that when
we receive a SYN|FIN the reassembly code made it so we create a segq entry
to hold the FIN. In the established state where we were not in order this
would be correct i.e. a 0 len with a FIN would need to be accepted. But
if you are in a front state we need to strip the FIN so we correctly handle
the ACK but ignore the FIN. This gets us into the proper states
and avoids the previous ack war.
I back out some of the previous changes but then add a new change
here in tcp_reass() that fixes the root cause of the issue. We still
leave the rack panic fixes in place however.
Reviewed by: mtuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30627
Detailed analysis in https://github.com/genneko/freebsd-vimage-jails/issues/2
brought the problem down to a double call of ng_node_name() before and
after a vnet move. Because the name of the node is already known
(occupied by itself), the second call fails.
PR: 241954
Reported by: Paul Armstrong
MFC: 1 week
Differential Revision: https://reviews.freebsd.org/D30110
RFC5661 Sec. 2.6 specifies when a NFSERR_WRONGSEC error reply can be done.
For the four operations PutFH, PutrootFH, PutpublicFH and RestoreFH,
NFSERR_WRONGSEC can or cannot be replied, depending upon what operation
follows one of these operations in the compound.
This patch modifies nfsrvd_compound() so that it parses the next operation
number before executing any of the above four operations, storing it in
"nextop".
A future commit will implement use of "nextop" to decide if NFSERR_WRONGSEC
can be replied for the above four operations.
This commit should not change the semantics of performing the compound RPC.
MFC after: 2 weeks
The 'nodup' option forces fdescfs to return real vnode behind file
descriptor instead of the fdescfs fd vnode, on lookup. The end result
is that e.g. stat("/dev/fd/3") returns the stat data for the underlying
vnode, if any. Similarly, fchdir(2) works in the expected way.
For open(2), if applied over file descriptor opened with O_PATH, it
effectively re-open that vnode into normal file descriptor which has the
specified access mode, assuming the current vnode permissions allow it.
If the file descriptor does not reference vnode, the behavior is unchanged.
This is done by a mount option, because permission check on open(2) breaks
established fdescfs open semantic of dup(2)-ing the descriptor. So it
is not suitable for /dev/fd mount.
Tested by: Andrew Walker <awalker@ixsystems.com>
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30140
pqisrc_is_firmware_feature_enabled shouldn't be declared inline in a
header, and then static inline in the .c function. Remove this stray
declartion from the header. gcc6 complains, but clang does not.
Sponsored by: Netflix
From the PR:
Almost always, my Samsung RF511 laptop could not boot with
x2APIC enabled in the kernel. It froze during SMP initialization,
shortly after "ACPI APIC Table: <SECCSD LH43STAR>" was printed
to the console. When the kernel is instructed not to use x2APIC,
the system boots correctly.
PR: 256389
Submitted by: David Sebek <dasebek@gmail.com>
Reviewed by: markj
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30624
to prepare for one more addition
Reviewed by: markj
Tested by: David Sebek <dasebek@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30624
When a DMA chain can't be loaded, set the state to STATE_INQUEUE so that
the mp[rs]_complete_command can properly fail the command.
Sponsored by: Netflix
When the mpr(4) and mps(4) drivers probe a SATA device, they issue an
ATA Identify command (via mp{s,r}sas_get_sata_identify()) before the
target is fully setup in the driver. The drivers wait for completion of
the identify command, and have a 5 second timeout. If the timeout
fires, the command is marked with the SATA_ID_TIMEOUT flag so it can be
freed later.
That is where the use-after-free problem comes in. Once the ATA
Identify times out, the driver sends a target reset, and then frees any
identify commands that have timed out. But, once the target reset
completes, commands that were queued to the drive are returned to the
driver by the controller.
At that point, the driver (in mp{s,r}_intr_locked()) looks up the
command descriptor for that particular SMID, marks it CM_STATE_BUSY and
sends it on for completion handling.
The problem at this stage is that the command has already been freed,
and put on the free queue, so its state is CM_STATE_FREE. If INVARIANTS
are turned on, we get a panic as soon as this command is allocated,
because its state is no longer CM_STATE_FREE, but rather CM_STATE_BUSY.
So, the solution is to not free ATA Identify commands that get stuck
until they actually return from the controller. Hopefully this works
correctly on older firmware versions. If not, it could result in
commands hanging around indefinitely. But, the alternative is a
use-after-free panic or assertion (in the INVARIANTS case).
This also tightens up the state transitions between CM_STATE_FREE,
CM_STATE_BUSY and CM_STATE_INQUEUE, so that the state transitions happen
once, and we have assertions to make sure that commands are in the
correct state before transitioning to the next state. Also, for each
state assertion, we print out the current state of the command if it is
incorrect.
mp{s,r}.c: Add a new sysctl variable, dump_reqs_alltypes,
that controls the behavior of the dump_reqs sysctl.
If dump_reqs_alltypes is non-zero, it will dump
all commands, not just the commands that are in the
CM_STATE_INQUEUE state. (You can see the commands
that are in the queue by using mp{s,r}util debug
dumpreqs.)
Make sure that the INQUEUE -> BUSY state transition
happens in one place, the mp{s,r}_complete_command
routine.
mp{s,r}_sas.c: Make sure we print the current command type in
command state assertions.
mp{s,r}_sas_lsi.c:
Add a new completion handler,
mp{s,r}sas_ata_id_complete. This completion
handler will free data allocated for an ATA
Identify command and free the command structure.
In mp{s,r}_ata_id_timeout, do not set the command
state to CM_STATE_BUSY. The command is still in
queue in the controller. Since we were blocking
waiting for this command to complete, there was
no completion handler previously. Set the
completion handler, so that whenever the command
does come back, it will get freed properly.
Do not free ATA Identify commands that have timed
out in mp{s,r}sas_add_device(). Wait for them
to actually come back from the controller.
mp{s,r}var.h: Add a dump_reqs_alltypes variable for the new
dump_reqs_alltypes sysctl.
Make sure we print the current state for state
transition asserts.
This was tested in the Spectra Logic test bed (as described in the
review), as well Netflix's Open Connect fleet (where panics dropped from
a dozen or two a month to zero).
Reviewed by: imp@ (who is handling the commit with ken's OK)
Sponsored by: Spectra Logic
Differential Revision: https://reviews.freebsd.org/D25476
Use the accessor function to get the softc for this sim. This also drops
an unneeded cast.
Sponsored by: Netflix
Reviewed by: mav@, hselasky@
Differential Revision: https://reviews.freebsd.org/D30360
if (sb == NULL) { ... sb->s_error } is going to be a bad time. Return
ENOMEM when we cannot allocate an sbuf for the sysctl rather than
dereferencing the NULL pointer just returned.
Reviewed by: manu@, allanjude@
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D30373
The need for !! over (bool) pre-dates gcc 4.2, so go with the patch
as-submitted because the kernel tends to prefer that.
Suggested by: emaste@
Sponsored by: Netflix
There's no need to check pointers for NULL before free()ing them.
No functional change.
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30382
These are global (i.e. shared across vnets) structures, so we need
global lock to protect them. However, we look up entries in these lists
(find_aqm_type(), find_sched_type()) and return them. We must ensure
that the returned structures cannot go away while we are using them.
Resolve this by using NET_EPOCH(). The structures can be safely accessed
under it, and we postpone their cleanup until we're sure they're no
longer used.
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30381
This moves dn_cfg and other parameters into per VNET variables.
The taskqueue and control state remains global.
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D29274
Currently, mmc_fdt_gpio_get_{present,readonly} return all time true.
true ^ 100b = true
false ^ 100b = true
since that's done after promotion to integers. Use !! to convert
the bit to a bool before xor.
Reviewed by: imp@ (converted to (bool) to !! for portability)
Pull Request: https://github.com/freebsd/freebsd-src/pull/461
Update mmc_switch_status to ignore a few CRC errrors when asking for the
card status after setting the new rate with CMD6. Since the card may
take a little while to make the switch, it's possible we'll get a
communications error if we sent the command at the wrong time. Several
low end laptops needs this workaround as they have a window that seems
longer than other systems. This is known to fix at least the Acer Aspire
A114-32-P7E5.
Reviewed by: imp@, manu@
Differential Revision: https://reviews.freebsd.org/D24740
Without this patch, nfsd_checkrootexp() returns failure
and then the NFSv4 operation would reply NFSERR_WRONGSEC.
RFC5661 Sec. 2.6 only allows a few NFSv4 operations, none
of which call nfsv4_checktootexp(), to return NFSERR_WRONGSEC.
This patch modifies nfsd_checkrootexp() to return the
error instead of a boolean and sets the returned error to an RPC
layer AUTH_ERR, as discussed on nfsv4@ietf.org.
The patch also fixes nfsd_errmap() so that the pseudo
error NFSERR_AUTHERR is handled correctly such that an RPC layer
AUTH_ERR is replied to the NFSv4 client.
The two new "enum auth_stat" values have not yet been assigned
by IANA, but are the expected next two values.
The effect on extant NFSv4 clients of this change appears
limited to reporting a different failure error when a
mount that does not use adequate security is attempted.
MFC after: 2 weeks
Commit 49c894ddce introduced an issue that prevented pseries boot,
when hugepages were not available to the guest. Now large page
info must be available before moea64_install is called, so this change
moves the code that scans large page sizes before the call.
Reviewed by: jhibbits (IRC)
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
OBJS are automatically added to CLEANFILES. For pre-built objects, this
is not desirable since it will delete the object from the source
tree. Introduce EXTRA_OBJS which list these object files, but aren't
added to clean files.
Sponsored by: Netflix
Reviewed by: emaste@
Differential Revision: https://reviews.freebsd.org/D30615
ca1ce50b2b ("vfs: add more safety against concurrent forced
unmount to vn_write") has a side effect of only checking MNT_SYNCHRONOUS
if O_FSYNC is set.
Reviewed By: mjg
Differential Revision: https://reviews.freebsd.org/D30610
The KCSAN_ENABLED variable is non-empty when the kernel is being built
with KCSAN. This allows us to disable modules that are known to be
broken.
There was a bug where we would check if it was defined. As this is
always the case the KCSAN_ENABLED variable would be set when building
modules so we would never build such a module. Fix this by checking
if the value is empty before passing it on to the module stage.
This doesn't affect how modules are built as the CFLAGS passed to
modules has the correct check.
Reported by: rstone
Sponsored by: Innovate UK