after the underlying device went away.
The problem was that callers who queue the GEOM resize provider
event didn't check to make sure that the provider had not been
withered. For the other equivalent case, g_new_provider_event(),
the code checks to see whether the provider has been withered
before queueing a g_new_provider_event() to the event thread.
In some cases, a resize provider event would come through after
the provider had been withered and all of the existing consumers
had been orphaned. When the resize event triggered a taste of
the provider, that would attach a new consumer to the now
withered provider. The wither washer (g_wither_washer() would
never be able to completely tear down the GEOM because of the
consumers that were hanging around.
The solution was to check the G_PF_WITHER provider flag before
queueing the g_resize_provider_event(), and add an assert to
g_resize_provider_event() to insure that it isn't called on a
withered provider.
sys/geom/geom_subr.c:
In g_resize_provider(), don't try to continue if the
G_PF_WITHER flag is set.
In g_resize_provider_event(), add an assert that the
G_PF_WITHER flag is not set.
In g_access(), if a provider has an error, print out the
name of the provider with the error.
Sponsored by: Spectra Logic
Approved by: re (marius)
MFC after: 3 days
some fields to match the order in the struct. Especially needed
if_pf_kif to do pf(4) VNET debugging.
Approved by: re (marius)
Obtained from: projects/vnet
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
uncategorised reason. We need to read the fault address register before
enabling interrupts as the interrupt handler may cause this register to
change.
Approved by: re (marius, kib)
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
without VIMAGE support would dereference a NULL point unconditionally
leading to a panic. Wrap the entire VIMAGE related code with #ifdefs
rather than just the decision making part to save an extra bit of
resources.
Reported by: np
Sponsored by: The FreeBSD Foundation
MFC After: 13 days
Approved by: re (marius)
version of the XHCI specification. Make sure the code can handle the
maximum number of allowed scratch pages.
Submitted by: Shichun_Ma@Dell.com
Approved by: re (hrs)
MFC after: 1 week
File and disk-backed I/O requests store counts of read/written disk
blocks in each AIO job so that they can be charged to the thread that
completes an AIO request via aio_return() or aio_waitcomplete(). This
change extends AIO jobs to store counts of received/sent messages and
updates socket backends to set these counts accordingly. Note that
the socket backends are careful to only charge a single messages for
each AIO request even though a single request on a blocking socket might
invoke sosend or soreceive multiple times. This is to mimic the
resource accounting of synchronous read/write.
Adjust the UNIX socketpair AIO test to verify that the message resource
usage counts update accordingly for aio_read and aio_write.
Approved by: re (hrs)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D6911
The DPADD data in .depend will be redundant with what is in the .meta file.
Also extend NO_EXTRADEPEND support to bsd.prog.mk.
Approved by: re (blanket, META_MODE)
Sponsored by: EMC / Isilon Storage Division
device is gone.
The problem was that when disk_gone() is called, if the GEOM disk
creation process has not yet happened, the withering process
couldn't start.
We didn't record any state in the GEOM disk code, and so the d_gone()
callback to the da(4) driver never happened.
The solution is to track the state of the creation process, and
initiate the withering process from g_disk_create() if the disk is
being created.
This change does add fields to struct disk, and so I have bumped
DISK_VERSION.
geom_disk.c: Track where we are in the disk creation process,
and check to see whether our underlying disk has
gone away or not.
In disk_gone(), set a new d_goneflag variable that
g_disk_create() can check to see if it needs to
clean up the disk instance.
geom_disk.h: Add a mutex to struct disk (for internal use) disk
init level, and a gone flag.
Bump DISK_VERSION because the size of struct disk has
changed and fields have been added at the beginning.
Sponsored by: Spectra Logic
Approved by: re (marius)
fully-pessimized implementation that requires a type to be aligned to
its natural size.
On armv6+ the compiler might generate load-/store-multiple instructions
which require 4-byte alignment even though the source code is only
accessing individual uint32_t values in a way that doesn't require any
particular alignment at all. The compiler apparently feels free to
combine multiple accesses into a single instruction that requires a
more-strict alignment, and no set of compiler flags seems to disable
this behavior (at least in clang 3.8).
This fixes alignment faults on arm systems using wifi adapters. The
wifi code uses ALIGNED_POINTER(p, uint32_t) to decide whether it needs
to copy-align tcp headers. Because clang is combining several uint32_t
accesses into a single ldm instruction, we need to say that accessing a
uint32_t requires 4-byte alignment.
Approved by: re(gjb)
statistics. Marking is done by setting the OBJ_ACTIVE flag. The
flags change is locked, but the problem is that many parts of system
assume that vm object initialization ensures that no other code could
change the object, and thus performed lockless. The end result is
corrupted flags in vm objects, most visible is spurious OBJ_DEAD flag,
causing random hangs.
Avoid the active object marking, instead provide equally inexact but
immutable is_object_alive() definition for the object mapped state.
Avoid iterating over the processes mappings altogether by using
arguably improved definition of the paging thread as one which sleeps
on the v_free_count.
PR: 204764
Diagnosed by: pho
Tested by: pho (previous version)
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (gjb)
It turns out that getting decent performance requires stacking the TX
FIFO a little more aggressively.
* Ensure that when we complete a frame, we attempt to push a new frame
into the FIFO so TX is kept as active as it needs to be
* Be more aggressive about batching non-aggregate frames into a single
TX FIFO slot. This "fixes" TDMA performance (since we only get one
TX FIFO slot ungated per DMA beacon alert) but it does this by pushing
a whole lot of work into the TX FIFO slot.
I'm not /entirely/ pleased by this solution, but it does fix a whole bunch
of corner case issues in the transmit side and fix TDMA whilst I'm at it.
I'll go revisit transmit packet scheduling in ath(4) post 11.
Tested:
* AR9380, STA mode
* AR9580, hostap mode
* AR9380, TDMA client mode
Approved by: re (hrs)
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.
Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.
Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.
For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.
Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.
For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).
Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.
Approved by: re (hrs)
Obtained from: projects/vnet
Reviewed by: gnn, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6747
1) Unload mbuf instead of descriptor in rtwn_tx_done().
2) Add more synchronization for device visible mappings before
touching the memory.
3) Improve watchdog timer logic.
Reported and tested by: mva
Approved by: re (gjb)
Remove frames from active/pending Tx queues and free related node
references when vap is destroyed to prevent various use-after-free
scenarios.
Reported and tested by: Aleksander Alekseev <afiskon@devzen.ru>
PR: 208632
Approved by: re (gjb)
Use MPI2_IOCSTATUS_MASK when checking IOCStatus to mask off the log bit, and
make a few more things endian-safe.
- Fix possible use of invalid pointer.
It was possible to use an invalid pointer to get the target ID value. To fix
this, initialize a local Target ID variable to an invalid value and change that
variable to a valid value only if the pointer to the Target ID is not NULL.
- No need to set the MPSSAS_SHUTDOWN flag because it's never used.
- done_ccb pointer can be used if it is NULL.
To prevent this, move check for done_ccb == NULL to before done_ccb is used in
mpssas_stop_unit_done().
- Disks can go missing until a reboot is done in some cases.
This is due to the DevHandle not being released, which causes the Firmware to
not allow that disk to be re-added.
Reviewed by: ken
Approved by: re (gjb), ken, scottl, ambrisko (mentors)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D6872
Among other things, this introduces the idea of DBA-gated queues that
aren't the CABQ. The TDMA support requires this.
Tested:
* AR9580 (hostap mode)
* AR9380 (sta mode)
Approved by: re (gjb)
This started showing up when doing lots of aggregate traffic. For TDMA it's
always no-ACK traffic and I didn't notice this, and I didn't notice it
when doing 11abg traffic as it didn't fail enough in a bad way to trigger
this.
This showed up as the fifo depth being < 0.
Eg:
Jun 19 09:23:07 gertrude kernel: ath0: ath_tx_edma_push_staging_list: queued 2 packets; depth=2, fifo depth=1
Jun 19 09:23:07 gertrude kernel: ath0: ath_edma_tx_processq: Q1, bf=0xfffffe000385f068, start=1, end=1
Jun 19 09:23:07 gertrude kernel: ath0: ath_edma_tx_processq: Q1: FIFO depth is now 0 (1)
Jun 19 09:23:07 gertrude kernel: ath0: ath_edma_tx_processq: Q1, bf=0xfffffe0003866fe8, start=0, end=1
Jun 19 09:23:07 gertrude kernel: ath0: ath_edma_tx_processq: Q1: FIFO depth is now -1 (0)
So, clear the flags before adding them to a TX queue, so if they're
re-added for the retransmit path it'll clear whatever they were and
not double-account the FIFOEND flag. Oops.
Tested:
* AR9380, STA mode, 11n iperf testing (~130mbit)
Approved by: re (delphij)
explains the plausible scenario), resulting in EDEADLK returned on the
local registration attempt. Handle this by re-trying the local op [1].
On unmount, local registration abort is indicated as EINTR, abort the nlm
call as well.
Reported and tested by: pho
Suggested and reviewed by: dfr (previous version, [1])
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (delphij)
Drop scan generation number and node table scan lock - the only place
where ni_scangen is checked is in ieee80211_timeout_stations() (and it
is used to prevent duplicate checking of the same node); node scan lock
protects only this variable + node table scan generation number.
This will fix (at least) next LOR (hostap mode):
lock order reversal:
1st 0xc175f84c urtwm0_scan_loc (urtwm0_scan_loc) @ /usr/src/sys/modules/wlan/../../net80211/ieee80211_node.c:2019
2nd 0xc175e018 urtwm0_com_lock (urtwm0_com_lock) @ /usr/src/sys/modules/wlan/../../net80211/ieee80211_node.c:2693
stack backtrace:
#0 0xa070d1c5 at witness_debugger+0x75
#1 0xa070d0f6 at witness_checkorder+0xd46
#2 0xa0694cce at __mtx_lock_flags+0x9e
#3 0xb03ad9ef at ieee80211_node_leave+0x12f
#4 0xb03afd13 at ieee80211_timeout_stations+0x483
#5 0xb03aa1c2 at ieee80211_node_timeout+0x42
#6 0xa06c6fa1 at softclock_call_cc+0x1e1
#7 0xa06c7518 at softclock+0xc8
#8 0xa06789ae at intr_event_execute_handlers+0x8e
#9 0xa0678fa0 at ithread_loop+0x90
#10 0xa0675fbe at fork_exit+0x7e
#11 0xa08af910 at fork_trampoline+0x8
In addition to the above:
* switch to ieee80211_iterate_nodes();
* do not assert that node table lock is held, while calling node_age();
that's not really needed (there are no resources, which can be protected
by this lock) + this fixes LOR/deadlock between ieee80211_timeout_stations()
and ieee80211_set_tim() (easy to reproduce in HOSTAP mode while
sending something to an STA with enabled power management).
Tested:
* (avos) urtwn0, hostap mode
* (adrian) AR9380, STA mode
* (adrian) AR9380, AR9331, AR9580, hostap mode
Notes:
* This changes the net80211 internals, so you have to recompile all of it
and the wifi drivers.
Submitted by: avos
Approved by: re (delphij)
Differential Revision: https://reviews.freebsd.org/D6833
It turns out the frame scheduling policies (eg DBA_GATED) operate on
a single TX FIFO entry. ASAP scheduling is fine; those frames always
go out.
DBA-gated sets the TX queue ready when the DBA timer fires, which triggers
a beacon transmit. Normally this is used for content-after-beacon queue
(CABQ) work, which needs to burst out immediately after a beacon.
(eg broadcast, multicast, etc frames.) This is a general policy that you
can use for any queue, and Sam's TDMA code uses it.
When DBA_GATED is used and something like say, an 11e TX burst window,
it only operates on a single TX FIFO entry. If you have a single frame
per TX FIFO entry and say, a 2.5ms long burst window (eg TDMA!) then it'll
only burst a single frame every 2.5ms. If there's no gating (eg ASAP) then
the burst window is fine, and multiple TX FIFO slots get used.
The CABQ code does pack in a list of frames (ie, the whole cabq) but
up until this commit, the normal TX queues didn't. It showed up when
I started to debug TDMA on the AR9380 and later.
This commit doesn't fix the TDMA case - that's still broken here, because
all I'm doing here is allowing 'some' frames to be bursting, but I'm
certainly not filling the whole TX FIFO slot entry with frames.
Doing that 'properly' kind of requires me to take into account how long
packets should take to transmit and say, doing 1.5 or something times that
per TX FIFO slot, as if you partially transmit a slot, when it's next
gated it'll just finish that TX FIFO slot, then not advance to the next
one.
Now, I /also/ think queuing a new packet restarts DMA, but you have to
push new frames into the TX FIFO. I need to experiment some more with
this because if it's really the case, I will be able to do TDMA support
without the egregious hacks I have in my local tree. Sam's TDMA code
for previous chips would just kick the TXE bit to push along DMA
again, but we can't do that for EDMA chips - we /have/ to push a new
frame into the TX FIFO to restart DMA. Ugh.
Tested:
* AR9380, STA mode
* AR9380, hostap mode
* AR9580, hostap mode
Approved by: re (gjb)
This allows IPv6 link local addresses (and other IPv6 functionality) to work.
PR: 210355
Submitted by: Steve Wahl and David Bright (both at Dell Inc.)
Reviewed by: cem, mav
Tested by: mav (on Intel hardware)
Approved by: re (kib)
MFC after: 5 days
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D6885
dropping the reference on mnt_cred. Prevent this by referencing the
temporal credentials before unlock.
Tested by: pho
Reviewed by: dfr
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (gjb)
Adopt the OpenBSD syntax for setting and filtering on VLAN PCP values. This
introduces two new keywords: 'set prio' to set the PCP value, and 'prio' to
filter on it.
Reviewed by: allanjude, araujo
Approved by: re (gjb)
Obtained from: OpenBSD (mostly)
Differential Revision: https://reviews.freebsd.org/D6786
This apparently puts ARC back under the limits after the vnode pressure
rework in r291244, in particular due to the kmem exhaustion.
Based on patch by: mckusick
Reviewed by: avg, mckusick
Tested by: allanjude, madpilot
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
to mount points with the given filesystem type, specified by mount
vfs_ops pointer.
Based on patch by: mckusick
Reviewed by: avg, mckusick
Tested by: allanjude, madpilot
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
reported by EFI implementation. This address comment on r301714.
Approved by: re (gjb), andrew (mentor)
Differential Revision: https://reviews.freebsd.org/D6787
Maps Sonics/OCP per-core address spaces to bcma(4)-compatible port/region
identifiers.
This permits the use of common address map identifiers in bhnd device
drivers, independent of the underlying interconnect type.
Approved by: re (gjb), adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D6850
- Delete all chipc children on attachment failure.
- Added missing bhnd_nexus bhnd_bus_deactivate_resource implementation.
- Drop a CHIPC_UNLOCK() accidentally left behind after lifting
synchronization into the chipc region refcounting API.
- Fix re-allocation of chipc resources. Previously, the resource ID was
reset to -1 on release, preventing later re-allocation.
Approved by: re (gjb), adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D6849
supported, e.g. CPUID or MSR, return ENODEV from the ioctl which needs
that feature.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (hrs)
threads, to make it less confusing and using modern kernel terms.
Rename the functions to reflect current use of the functions, instead
of the historic KSE conventions:
cpu_set_fork_handler -> cpu_fork_kthread_handler (for kthreads)
cpu_set_upcall -> cpu_copy_thread (for forks)
cpu_set_upcall_kse -> cpu_set_upcall (for new threads creation)
Reviewed by: jhb (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (hrs)
Differential revision: https://reviews.freebsd.org/D6731
reason for it in modern times. In the other case, expand the comment
stating instead of doubting.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (hrs)
X-Differential revision: https://reviews.freebsd.org/D6731
There is no reason to return non-zero value from zfs_probe_partition()
as that causes following partitions to not be probed for ZFS vdevs.
A particular scenario that I encountered is a GPT partitioned disk
where several partitions have freebsd-zfs type. A partition with a lower
index is used as a cache (l2arc) vdev and in that case case zfs_probe()
returned a non-zero status. That status was returned to ptable_iterate()
and caused it to abort the iteration. Because of that the subsequent
partitions were not probed and a root pool was not discovered resulting
in a boot failure.
While there fix the style for nearby return statements.
Approved by: re (kib)
This is a follow-up to r300343.
This is important for the OBJS_DEPEND_GUESS usage in
gnu/usr.bin/cc/cc_tools.
See comments for more details.
Approved by: re (implicit)
Sponsored by: EMC / Isilon Storage Division
Inserting a full mbuf with an external cluster into the socket buffer
resulted in sbspace() returning -MLEN. However, since sb_hiwat is
unsigned, the -MLEN value was converted to unsigned in comparisons. As a
result, the socket buffer was never autosized. Note that sb_lowat is signed
to permit direct comparisons with sbspace(), but sb_hiwat is unsigned.
Follow suit with what tcp_output() does and compare the value of sbused()
with sb_hiwat instead.
Approved by: re (gjb)
Sponsored by: Chelsio Communications
This reduces the size of kaiocb slightly. I've also added some generic
fields that other backends can use in place of the BIO-specific fields.
Change the socket and Chelsio DDP backends to use 'backend3' instead of
abusing _aiocb_private.status directly. This confines the use of
_aiocb_private to the AIO internals in vfs_aio.c.
Reviewed by: kib (earlier version)
Approved by: re (gjb)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D6547
we set MNTK_UNMOUNT flag on the mp. Otherwise parallel unmount which
wins race with us could dereference the covered vnode, and we are
left with the locked freed memory.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
MFC after: 1 week