When a vlan interface is created, its if_output is set directly to the
parent interface's if_output. This is fine in the normal case but has an
unfortunate consequence if you end up with a certain combination of vlan
and lagg interfaces.
Consider you have a lagg interface with a single laggport member. When
an interface is added to a lagg its if_output is set to
lagg_port_output, which blackholes traffic from the normal networking
stack but not certain frames from BPF (pseudo_AF_HDRCMPLT). If you now
create a vlan with the laggport member (not the lagg interface) as its
parent, its if_output is set to lagg_port_output as well. While this is
confusing conceptually and likely represents a misconfigured system, it
is not itself a problem. The problem arises when you then remove the
lagg interface. Doing this resets the if_output of the laggport member
back to its original state, but the vlan's if_output is left pointing to
lagg_port_output. This gives rise to the possibility that the system
will panic when e.g. bpf is used to send any frames on the vlan
interface.
Fix this by creating a new function, vlan_output, which simply wraps the
parent's current if_output. That way when the parent's if_output is
restored there is no stale usage of lagg_port_output.
Reviewed by: rstone
Differential Revision: D21209
- Fix warnings from igor and mandoc.
- Provide a brief description of the separation between zones and their
backend slab allocators.
- Document cache zones and secondary zones.
- Document the kernel config options added in r350659.
- Document the uma_zalloc_pcpu() and uma_zfree_pcpu() wrappers.
- Document uma_zone_reserve(), uma_zone_reserve_kva() and
uma_zone_prealloc().
- Document uma_zone_alloc() and uma_zone_freef().
- Add some missing MLINKs and Xrefs.
MFC after: 2 weeks
The returned error number may be EINTR or ERESTART depending on
whether or not the signal is supposed to interrupt the system call.
Reported and tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
The bhyve virtual local APIC uses an instance-global flag to indicate
when an error LVT is being delivered to prevent infinite recursion.
Use a function argument instead to reduce the amount of instance-global
state.
This was inspired by reviewing the bhyve save/restore work, which
saves a copy of the instance-global state for each vlapic.
Smart OS bug: https://smartos.org/bugview/OS-7777
Submitted by: Patrick Mooney
Reviewed by: markj, rgrimes
Obtained from: SmartOS / Joyent
Differential Revision: https://reviews.freebsd.org/D20365
When the pwm.9 manual was removed, a symlink between pwmbus.9 and pwm.9 was
created, but there's an entry in ObsoleteFiles.inc to remove pwn.9, meaning
that on every installation pwm.9 is created, and make delete-old deletes it.
Remove the entry from ObsoleteFiles.inc, the symlink is clearly intentional
and shouldn't be removed.
Reviewed by: imp, ian
Approved by: imp (implicit, review OK)
Differential Revision: https://reviews.freebsd.org/D21198
XPT_DEV_ADVINFO call should be protected by the lock of the specific
device it is addressed to, not the lock of SES device. In some weird
case, probably with hardware violating standards, it sometimes caused
NULL dereference due to race.
To protect from it further, add lock assertion to *_dev_advinfo().
MFC after: 1 week
Sponsored by: iXsystems, Inc.
This fixes a "timed sleep before timers are working" panic seen
while attaching jedec_dimm(4) instances too early in the boot.
Submitted by: ian
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D21452
This makes the media check process asynchronous, so we no longer block
in cdstrategy() to check for media.
PR: 219857
Obtained from: ken
MFC after: 3 weeks
Current implementation of vnode_create_vobject() and
vnode_destroy_vobject() is written so that it prepared to handle the
vm object destruction for live vnode. Practically, no filesystems use
this, except for some remnants that were present in UFS till today.
One of the consequences of that model is that each filesystem must
call vnode_destroy_vobject() in VOP_RECLAIM() or earlier, as result
all of them get rid of the v_object in reclaim.
Move the call to vnode_destroy_vobject() to vgonel() before
VOP_RECLAIM(). This makes v_object stable: either the object is NULL,
or it is valid vm object till the vnode reclamation. Remove code from
vnode_create_vobject() to handle races with the parallel destruction.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D21412
In ffs_valloc(), force reclaim existing vnode on inode reuse, instead
of trying to re-initialize the same vnode for new purposes. This is
done in preparation of changes to the vp->v_object lifecycle handling.
A new FFSV_REPLACE flag to ffs_vgetf() directs the function to
vgone(9) the vnode if found in vfs hash, instead of returning it.
Reviewed by: markj, mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D21412
Many extern struct pcpu <something>__pcpu declarations were
copied/pasted in sources. The issue is that the definition is MD, but
it cannot be provided by machine/pcpu.h due to actual struct pcpu
defined in sys/pcpu.h later than the inclusion of machine/pcpu.h.
This forced the copying when other code needed direct access to
__pcpu. There is no way around it, due to machine/pcpu.h supplying
part of struct pcpu fields.
To work around the problem, add a new machine/pcpu_aux.h header, which
should fill any needed MD definitions after struct pcpu definition is
completed. This allows to remove copies of __pcpu spread around the
source. Also on x86 it makes it possible to remove work arounds like
OFFSETOF_CURTHREAD or clang specific warnings supressions.
Reported and tested by: lwhsu, bcran
Reviewed by: imp, markj (previous version)
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D21418
Previously, the permissions were checked on the pool which was obviously
incorrect.
After this change, zfs_check_userprops() only validates the properties
without any permission checks. The permissions are checked individually
for each snapshotted dataset.
This was also committed to ZoL: zfsonlinux/zfs@e6203d2
Reported by: CyberSecure
MFC after: 1 week
Sponsored by: CyberSecure
The libxo xml feature of adding an annotation with the "original"
address from the utmpx file if it is different than the final "from"
field was broken by r351379. This was pointed out by the gcc error
that save_p might be used uninitialized. Save the original address
as needed in each entry, don't just use the last one from the previous
loop.
Reviewed by: marcel@
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D21390
vnode usecount drops to 0 all the time (e.g. for directories during path lookup).
When that happens the kernel would always lock the exclusive lock for the vnode
in order to call vinactive(). This blocks other threads who want to use the vnode
for looukp.
vinactive is very rarely needed and can be tested for without the vnode lock held.
This patch gives filesytems an opportunity to do it, sample total wait time for
tmpfs over 500 minutes of poudriere -j 104:
before: 557563641706 (lockmgr:tmpfs)
after: 46309603301 (lockmgr:tmpfs)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21371
- LK macro (conditional on SMP for the lock prefix) is unused
- SETLK unnecessarily performs xchg. obtained value is never used and the
implicit lock prefix adds avoidable cost. Barrier provided by it does
not appear to be of any use.
- the lock waited for is almost never blocked, yet the loop starts with
a pause. Move it out of the common case.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19563
In compare(), all error cases set the error code to EPIPE, so when an
error is set, the correct assertion to make is that the error is EPIPE,
not EINVAL.
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@freqlabs.com>
Closes#8743zfsonlinux/zfs@9dc41a769d
Submitted by: Ryan Moeller <ryan@freqlabs.com>
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D20118
It is not needed by anything in the kernel and it slightly drives up contention
on both proctree and allproc locks.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21447
uiomove_object_page() and exec_map_first_page() would previously wire a
page after having grabbed it. Ask vm_page_grab() to perform the wiring
instead: this removes some redundant code, and is cheaper in the case
where the requested page is not resident since the page allocator can be
asked to initialize the page as wired, whereas a separate vm_page_wire()
call requires the page lock.
In vm_imgact_hold_page(), use vm_page_unwire_noq() instead of
vm_page_unwire(PQ_NONE). The latter ensures that the page is dequeued
before returning, but this is unnecessary since vm_page_free() will
trigger a batched dequeue of the page.
Reviewed by: alc, kib
Tested by: pho (part of a larger patch)
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21440