conflict with the opensolaris kernel module.
This patch solves a problem where the kernel linker will incorrectly
resolve opensolaris kmem_xxx() functions as linuxkpi ones, which leads
to a panic when these functions are used.
Submitted by: gallatin @
Sponsored by: Mellanox Technologies
MFC after: 1 week
On pre-WS2016 Hyper-V, if the only LUNs > 7 are used, then all disks
fails to attach. Mainly because those versions of Hyper-V do not set
SRB_STATUS properly and deliver junky INQUERY responses.
Submitted by: Hongjiang Zhang <honzhan microsoft com>
Reported by: Hongxiong Xian <v-hoxian microsoft com>
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8724
Original commit "7090 zfs should improve allocation order" declares alloc
queue sorted by time and offset. But in practice io_offset is always zero,
so sorting happened only by time, while order of writes with equal time was
completely random. On Illumos this did not affected much thanks to using
high resolution timestamps. On FreeBSD due to using much faster but low
resolution timestamps it caused bad data placement on disks, affecting
further read performance.
This change switches zio_timestamp_compare() from comparing uninitialized
io_offset to really populated io_bookmark values. I haven't decided yet
what to do with timestampts, but on simple tests this change gives the
same peformance results by just making code to work as declared.
MFC after: 1 week
In particular, the fault access type is accounted for when the
aperture page is moved to GTT domain. On the other hand, the current
pager structure is left intact, most important, only one page is
instantiated per populate call.
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
It allows to provide configurable agressive prefaulting and useful
hints to page daemon about memory allocations, on faults for pages
managed by phys pager. In fact, this implementation is superior to
the MAP_SHARED_PHYS hack from my Postgresql paper, while giving
similar benefits of reducing the page faults numbers on SysV shared
memory mappings.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
with cdev_pg_populate() to provide device drivers access to it. It
gives drivers fine control of the pages ownership and allows drivers
to implement arbitrary prefault policies.
The populate method is called on a page fault and is supposed to
populate the vm object with the page at the fault location and some
amount of pages around it, at pager's discretion. VM provides the
pager with the hints about current range of the object mapping, to
avoid instantiation of immediately unused pages, if pager decides so.
Also, VM passes the fault type and map entry protection to the pager,
allowing it to force the optimal required ownership of the mapped
pages.
Installed pages must contiguously fill the returned region, be fully
valid and exclusively busied. Of course, the pages must be compatible
with the object' type.
After populate() successfully returned, VM fault handler installs as
many instantiated pages into the process page tables as it sees
reasonable, while still obeying the correct semantic for COW and vm
map locking.
The method is opt-in, pager sets OBJ_POPULATE flag to indicate that
the method can be called. If pager' vm objects can be shadowed, pager
must implement the traditional getpages() method in addition to the
populate(). Populate() might fall back to the getpages() on per-call
basis as well, by returning VM_PAGER_BAD error code.
For now for device pagers, the populate() method is only allowed to be
used by the managed device pagers, but the limitation is only made
because there is no unmanaged fault handlers which could use it right
now.
KPI designed together with, and reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
contain a vm_page_t at the specified index. However, with this
change, vm_radix_remove() no longer panics. Instead, it returns NULL
if there is no vm_page_t at the specified index. Otherwise, it
returns the vm_page_t. The motivation for this change is that it
simplifies the use of radix tries in the amd64, arm64, and i386 pmap
implementations. Instead of performing a lookup before every remove,
the pmap can simply perform the remove.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D8708
values. This more closely matches other wifi drivers in the tree.
The bitmap levels have been based closely on other drivers (primarily
[u]rtwn(4)) in the hope that one day these can be unified into a shared
wifi-debug framework.
This is the first step of several pieces of work I'm planning on doing
with the run(4) driver. I may well adjust and refine some of the debug
bitmaps at a later date.
Reviewed by: adrian, avos
Differential Revision: https://reviews.freebsd.org/D8704
This made a couple of bugs visible in handling SSN wrap-arounds
when using DATA chunks. Now bulk transfer seems to work fine...
This fixes the issue reported in
https://github.com/sctplab/usrsctp/issues/111
MFC after: 1 week
kinfo_proc::ki_tdname is three characters shorter than
thread::td_name. Add a ki_moretdname field for these three
extra characters. Add the new field to kinfo_proc32, as well.
Update all in-tree consumers to read the new field and assemble
the full name, except for lldb's HostThreadFreeBSD.cpp, which
I will handle separately. Bump __FreeBSD_version.
Reviewed by: kib
MFC after: 1 week
Relnotes: yes
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D8722
Table to find the CPUs to find the CPUs to start. Currently we assume PSCI,
however this assumption is shared with the FDT code.
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Table to find if the hardware supports PSCI, and if so what method the
kernel should use to interact with it.
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
This makes booting on Hyper-V w/ small # of vCPUs work properly.
Reported by: Hongxiong Xian <v-hoxian microsoft com>, Hongjiang Zhang <honzhan microsoft com>
MFC after: 1 week
Sponsored by: Microsoft
On FreeBSD the sense of rw_write_held() and rw_iswriter() were reversed,
probably due to a cut and paste error. Using rw_iswriter() would cause
the kernel to panic.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D8718
I'm teaching my ath10k port to communicate up the per-rate / channel width
information I get from the firmware.
The HT40 flag field should just be retired and instead moved to use the
PHY bandwidth field.
For IPv4 similar function uses addresses and ports in host byte order,
but for IPv6 it used network byte order. This led to very bad hash
distribution for IPv6 flows. Now the result looks similar to IPv4.
Reported by: olivier
MFC after: 1 week
Sponsored by: Yandex LLC
Some utilities (notably top(1)) exit if any of their input sysctls don't
exist, and the removal of the above-mentioned PG_CACHE-related sysctls
makes it difficult to run such utilities on different versions of the
kernel without recompiling.
Requested by: bde
stack_machdep.c is compiled if either of the DDB or STACK options is
specified, but stack_save_td_running() isn't useable from DDB. Moreover,
stack_save_td_running() works by raising an NMI on the CPU running the
target thread, and the corresponding handler is compiled only if STACK is
configured.
Reported by: kib
MFC after: 1 week
will be used by the gicv2m and ITS ACPI drivers to only attach to the
correct parent.
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
The tools using to generate the sources has been updated and produces
different whitespaces. Commit this seperately to avoid intermixing
these with real code changes.
MFC after: 3 days
If the bus number assigned to a Host-PCI bridge doesn't match the first
bus number in the associated producer range from _CRS, print a warning and
fail to attach rather than panicking due to an assertion failure.
At least one single-socket Dell machine leaves a "ghost" Host-PCI bridge
device in the ACPI namespace that seems to correspond to the I/O hub in
the second socket of a two-socket machine. However, the BIOS doesn't
configure the settings for this "ghost" bridge correctly, nor does it have
any PCI devices behind it.
Tested by: royger
MFC after: 2 weeks
This change includes firmware commands for key setup +
some additional checking via CAMREAD / CAMWRITE registers.
Nothing (except rsu_delete_key() for pairwise keys) is deferred;
to ensure that things are done in order rsu_set_key() will wait
until key deletion task will be finished.
Tested with Asus USB-N10 (all ciphers).
Differences from initial (reviewed) patch:
- Pause AC queues before disassociation - since CMD_DISCONNECT clears
crypto state all pending frames must be processed / dropped before it.
- Check sc_running flag before trying to set static keys.
- Clear key index from bitmap even when firmware command fails
(it will be invalidated via CAMWRITE anyway).
Reviewed by: adrian, kevlo
Tested by: kevlo
Differential Revision: https://reviews.freebsd.org/D8706
The NFSv4.1 server failed to update the nfs-stablerestart file for
a client when the client was issued its first Open. As such, recovery
of Opens after a server reboot failed with NFSERR_NOGRACE.
This patch fixes this.
It also changes the code so that it malloc()'s the 1024 byte array
instead of allocating it on the kernel stack for both NFSv4.0 and NFSv4.1.
Note that this bug only affected NFSv4.1 and only when clients attempted
to reclaim Opens after a server reboot.
MFC after: 2 weeks
subrulenr is considered unset if it's set to -1, not if it's set to 1.
See contrib/tcpdump/print-pflog.c pflog_print() for a user.
This caused incorrect pflog output (tcpdump -n -e -ttt -i pflog0):
rule 0..16777216(match)
instead of the correct output of
rule 0/0(match)
PR: 214832
Submitted by: andywhite@gmail.com
- Append RCR_APP_PHYSTS bit after firmware loading - otherwise
firmware will reset the register and this modification will be lost.
(without it Rx PHY descriptor section will contain garbage).
- Check if R92S_RXDW0_PHYST bit is set (like it is done in rtwn(4)) -
even if infosz is non-zero the section may not contain anything useful.
- In case, if descriptor is absent (A-MPDU?) use last calibrated RSSI
(rtwn(4) uses RSSI from the previous (sub)frame; probably, this
approach should be used here too).
Tested with Asus USB-N10, STA mode.
wait(2).
- Do not acquire the process spinlock if neither WTRAPPED nor WUNTRACED
options were passed [1].
- Extract the code to report alive process into a new helper
report_alive_proc() and use it for trapped, stopped and continued
childrens.
Note that the process spinlock is required around the WTRAPPED and
WUNTRACED tests, because P_STOPPED_TRACE and P_STOPPED_SIG flags are
set before other threads are stopped at the suspension point, and that
threads increment p_suspcount while owning only the process spinlock,
the process lock is dropped by them. If the spinlock is not taken for
tests, the syscall thread might miss both p_suspcount increment and
wakeup in wakeup in thread_suspend_switch().
Based on the submission by: mjg [1]
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
- EMC clock have standard peripheral clock block. Use it.
- Implement full frequency set method for PLLD2. This PLL
is used as HDMI pixel clock so we must be able to set it
to wide range of frequencies, within 5% tolerance allowed
by HDMI specification. Due to this, full state space search
(over m, n, p fields) is necessary.
MFC after: 3 weeks
This function is referenced, but never called from DRM2 code. Also,
real behavior of pmap_mapdev_attr() in ARM world is unclear as we don't
have any additional attribute for a device memory type.
MFC after: 2 weeks
Before this, it would cause the one consumer of this API in powerpc usage
(dev/dpaa) to set the PTE WIMG flags to empty instead of --M-, making the
cache-enabled buffer portals non-coherent.
For whatever reason, smapi, smbios, vpd are all under the "bios" directory.
smapi is only for i386, so the entire "bios" directory is only built for
i386. Break smapi out, and make only it i386-specific. Then, build the
"bios" directory for both amd64 and i386.
Reviewed by: imp
MFC after: 1 week
Sponsored by: Panasas
Differential Revision: https://reviews.freebsd.org/D8609
- Fill in Rx radiotap header correctly (for every packet in a chain;
not once per chain).
- Fix rate / flags fields in Rx radiotap.
- Add debug messages for discarded frames.
- Pass received control (< sizeof(struct ieee80211_frame)) frames
to net80211 (if allowed by device filter; cannot happen yet).
Tested with Asus USB-N10.
Differential Revision: https://reviews.freebsd.org/D5723
This is important in hostap, ibss, (11s at some magical future date, etc)
where different nodes may have smaller limits.
Oops!
MFC after: 1 week
Relnotes: Yes
* ic_freq is the centre of the primary channel, not the centre of the
HT40/HT80/etc channel. Add a method to access that.
* Add a method to access the centre of the primary channel, including
knowing the centre of the 5/10/20/40/80, versus the primary channel.
Ie, it's the centre of the 40, 80, 160MHz channel.
* Add a method to access the centre frequency of the secondary 80MHz
channel - we don't support VHT yet, but when we do.
* Add methods to access the current channel and the per-dev desired
channel. Ideally drivers that do full offload with a per-vap channel
configuration should use the vap channel, NOT ic_curchan.
Non-offload drivers that require net80211 to change the channel should
be accessing ic_curchan.
it.
Remove bogus wrappers and use the kernel defaults.
While here, use DEVMETHOD_END.
Obtained from: pfSense
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC (Netgate)
Instead of failing with ENAMETOOLONG, which is swallowed by
pthread_set_name_np() anyway, truncate the given name to MAXCOMLEN+1
bytes. This is more likely what the user wants, and saves the
caller from truncating it before the call (which was the only
recourse).
Polish pthread_set_name_np(3) and add a .Xr to thr_set_name(2)
so the user might find the documentation for this behavior.
Reviewed by: jilles
MFC after: 3 days
Sponsored by: Dell EMC
Since the vnode is only expected to be shared locked, we can save a
little overhead by only pretending we are locking in the first place.
Reviewed by: kib
Tested by: pho
being a bootstrap tool. However, for reproducible build output,
FreeBSD added dd status=none because it was otherwise difficult to
suppress the status information, but retain any errors that might
happen. There's no real reason that dd has to be a build tool, other
than we use status=none unconditional. Remove dd from a bootstrap tool
entirely by only using status=none when available. This may also help
efforts to build the system on non-FreeBSD hosts as well.
Differential Revision: https://reviews.freebsd.org/D8605
callout_stop() recently started returning -1 when the callout is already
stopped, which is not handled by the netgraph code. Properly filter
the return value. Netgraph callers only want to know if the callout
was cancelled and not draining or already stopped.
Discussed with: julian, glebius
MFC after: 2 weeks
When handling a GPE ACPI interrupt object the EcSpaceHandler()
function can be called which checks the EC_EVENT_SCI bit and then
recurse on the EcGpeQueryHandler() function. If there are multiple GPE
events pending the EC_EVENT_SCI bit will be set at the next call to
EcSpaceHandler() causing it to recurse again via the
EcGpeQueryHandler() function. This leads to a slow never ending
recursion during boot which prevents proper system startup, because
the EC_EVENT_SCI bit never gets cleared in this scenario.
The behaviour is reproducible with the ALASKA AMI in combination with
a newer Skylake based mainboard in the following way:
Enter BIOS and adjust the clock one hour forward. Save and exit the
BIOS. System fails to boot due to the above mentioned bug in
EcGpeQueryHandler() which was observed recursing multiple times.
This patch adds a simple recursion guard to the EcGpeQueryHandler()
function and also also adds logic to detect if new GPE events occurred
during the execution of EcGpeQueryHandler() and then loop on this
function instead of recursing.
Reviewed by: jhb
MFC after: 2 weeks
When a TCP segment with the FIN bit set was received in the CLOSED state,
a TCP RST-ACK-segment is sent. When computing SEG.ACK for this, the
FIN counts as one byte. This accounting was missing and is fixed by this
patch.
Reviewed by: hiren
MFC after: 1 month
Sponsored by: Netflix, Inc.
Differential Revision: https://svn.freebsd.org/base/head
Use after free happens for state that is deleted. The reference
count is what prevents the state from being freed. When the
state is dequeued, the reference count is dropped and the memory
freed. We can't dereference the next pointer or re-queue the
state.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D8671
This adds support to camcontrol(8) and libcam(3) for getting and setting
the time on SCSI protocol drives. This is more commonly found on tape
drives, but is a SPC (SCSI Primary Commands) command, and may be found
on any device that speaks SCSI.
The new camcontrol timestamp subcommand allows getting the current device
time or setting the time to the current system time or any arbitrary time.
sbin/camcontrol/Makefile:
Add timestamp.c.
sbin/camcontrol/camcontrol.8:
Document the new timestamp subcommand.
sbin/camcontrol/camcontrol.c:
Add the timestamp subcommand to camcontrol.
sbin/camcontrol/camcontrol.h:
Add the timestamp() function prototype.
sbin/camcontrol/timestamp.c:
Timestamp setting and reporting functionality.
sys/cam/scsi/scsi_all.c:
Add two new CCB building functions, scsi_set_timestamp() and
scsi_report_timestamp(). Also, add a new helper function,
scsi_create_timestamp().
sys/cam/scsi/scsi_all.h:
Add CDB and parameter data for the the set and report timestamp
commands.
Add function declarations for the new CCB building and helper
functions.
Submitted by: Sam Klopsch
Sponsored by: Spectra Logic
MFC After: 2 weeks
buf_ring contains an assert that checks whether an item being
enqueued already exists on the ring. There is a subtle bug in
this assert. An item can be returned by a peek() function and
freed, and then the consumer thread can be preempted before
calling advance(). If this happens the item appears to still be
on the queue, but another thread may allocate the item from the
free pool and wind up trying to enqueue it again, causing the
assert to trigger incorrectly.
Fix this by skipping the head of the consumer's portion of the
ring, as this index is what will be returned by peek().
Sponsored by: Dell EMC Isilon
MFC After: 1 week
Differential Revision: https://reviews.freebsd.org/D8685
Reviewed by: hselasky
called to allocate a new page of radix trie nodes, there could be a call to
vm_radix_remove() on the same trie (of PG_CACHED pages) as the in-progress
vm_radix_insert(). With the removal of PG_CACHED pages, we can simplify
vm_radix_insert() and vm_radix_remove() by removing the flags on the root of
the trie that were used to detect this case and the code for restarting
vm_radix_insert() when it happened.
Reviewed by: kib, markj
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8664
actual numbers would help debugging (also, `MSR' and `ACPI' are standard
abbreviations and thus should be properly capitalized)
- Rephrase unsupported AMD CPUs message and wrap as an overly long line:
`sorry' 1) is wrongly spelled after period (starts with a small letter)
and 2) carries emotional "tinge" that is unnecessary and even bogus in
debug message; `implemented' is not the best word as `supported' suits
better in this context
- Improve readability when reporting resulted P-state transition (debug)
Approved by: jhb
Prior to this change the loader self relocation code interpreted amd64's
rela relocations as if they were rel relocations, discarding the addend.
This "works" because GNU ld 2.17.50 stores the addend value in both the
r_addend field of the relocation (as expected) and at the target of the
relocation.
Other linkers, and possibly other versions of GNU ld, won't have this
behaviour, so interpret the relocations correctly.
Reported by: George Rimar
Reviewed by: andrew
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D8681
miibus_writereg.
Reduce the DELAY() between reads while waiting for MII access.
Spotted by: yongari
Sponsored by: Rubicon Communications, LLC (Netgate)
If bufring is used for per-TX ring descs, don't update "available"
counter, which is only used to help debugging.
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8674
Writing the full queue size to it every time was makeing it overflow with a
lot of bogus values.
This fixes the interrupt storms on irq 40.
Sponsored by: Rubicon Communications, LLC (Netgate)
Fix ioat_release to only set is_completion_pending if DMAs were actually
queued. Otherwise, the spurious flag could trigger an assert in the
reset path on INVARIANTS kernels.
Reviewed by: bdrewery, Suraj Raju @ Isilon
Sponsored by: Dell EMC Isilon
This allows the driver to be built in a kernel with no FDT support, e.g.
on arm64 with just ACPI.
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
It is quite specific mode of operation without storing on-disk metadata.
It can be useful in some cases in combination with some external control
tools handling mirror creation and disks hot-plug.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
It was an experimental tunable, and is now deemed to be road blocker
for further changes. Time to retire it.
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8654
Bring in the most recent copy of NetBSD's db_disasm, to fix bugs and add more
instructions.
* Fix several bugs in the disassembler, most notably the disassembly of the
rlwi* instructions, the original reason for bringing in this change.
* Add more registers to the SPR list
* Add more instructions to the opcode table
Obtained from: NetBSD
MFC after: 2 weeks
Attempt to fix powerpc64 LINT kernel broken by r308000. Netmap's use of
a uint64_t wchan seems odd, but in the interest of minimizing this
change just cast through uintptr_t to silence the compiler warning.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D8669
- On control request update all status pages, since they may also be
affected if user enables/disables enclosure slots.
- Periodically update element descriptors too, since there is some
hardware where they are changed dynamically.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
passing it to firmware for all Gen3 controllers.
For Thunderbolt controller, keep the legacy behavior i.e. return the SYNCHRONIZE_CACHE command with success status from driver itself.
There is Sysctl parameter 'block_sync_cache' is provided to enable customers either to block/unblock these commands to facilitate
legacy behavior if there is a compatibility issue. Default value for module parameter is to unblock this command.
Submitted by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed by: Kashyap Desai <Kashyap.Desai@broadcom.com>
MFC after: 3 days
Sponsored by: Broadcom Limited/AVAGO Technologies
and return without processing event in AEN thread, if controller reset is in progress.
Submitted by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed by: Kashyap Desai <Kashyap.Desai@broadcom.com>
MFC after: 3 days
Sponsored by: Broadcom Limited/AVAGO Technologies
If a SCSI IO times out, then before initiating OCR, now the driver will try to send a
target reset to the particular target for which the IO is timed out. If that also fails,
then the driver will initiate OCR.
Submitted by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed by: Kashyap Desai <Kashyap.Desai@broadcom.com>
MFC after: 3 days
Sponsored by: Broadcom Limited/AVAGO Technologies
Did the same by setting sc->aen_cmd = NULL when aborting AEN is successful.
Submitted by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed by: Kashyap Desai <Kashyap.Desai@broadcom.com>
MFC after: 3 days
Sponsored by: Broadcom Limited/AVAGO Technologies
MFI linked list in megaraid_sas driver is used for mfi-mpt pass-through commands.
This list can be corrupted due to many possible race conditions in driver and
eventually we may see kernel panic.
One example -
MFI frame is freed from calling process as driver send command via polling method and interrupt
for that command comes after driver free mfi frame (actually even after some other context reuse
the mfi frame). When driver receive MPT frame in ISR, driver will be using the index of MFI and
access that MFI frame and finally in-used MFI frames list will be corrupted.
High level description of new solution -
Free MFI and MPT command from same context.
Free both the command either from process (from where mfi-mpt pass-through was called) or from
ISR context. Do not split freeing of MFI and MPT, because it creates the race condition which
will do MFI/MPT list.
Submitted by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed by: Kashyap Desai <Kashyap.Desai@broadcom.com>
MFC after: 3 days
Sponsored by: Broadcom Limited/AVAGO Technologies
Clear the interrupt state before reading the input char from the
input FIFO. In the current code there is a window between the read
to the data register and the write to the the ICR, during which an
input char will not cause an interrupt.
This fixes the issue by which the serial port input on QEMU freezes
when using the emulated pl011 serial port.
This adds a workaround to incorrectly behaving APs (ie, FreeBSD APs) which
don't beacon out exactly when they should (at TBTT multiples of beacon
intervals.)
It forces the hardware awake (but leaves it in network-sleep so self
generated frames still state that the hardware is asleep!) and will
remain awake until the next sleep transition driven by net80211.
That way if the beacons are just at the wrong interval, we get a much
better chance of hearing more consecutive beacons before we go to sleep,
thus not constantly disconnecting.
Tested:
* AR9485, STA mode, against a misbehaving FreeBSD AP.
The 802.11-2012 spec talks about this - section 10.1.3.2 - Beacon Generation
in Infrastructure Networks. So yes, we should be expecting beacons to be
going out in multiples of intval.
Silly adrian.
So:
* fix the FreeBSD APs that are sending beacons at incorrect TBTTs (target
beacon transmit time); and
* yes indeed we will have to wake up out of network sleep until we sync
a beacon.
ASMedia ASM1062 AHCI chips with some fancy firmware handling PMP inside
seems sometimes forgeting to set bits in PxIS, causing command timeouts.
Removal of this check fixes the issue by the theoretical cost of slightly
higher CPU usage in some odd cases, but this is what Linux does too.
MFC after: 1 month
Note: there was a merge conflict resolved by me.
illumos/illumos-gate@43297f973a43297f973ahttps://www.illumos.org/issues/3821
We recently had nodes with some of the latest zfs bits panic on us in a
rollback-heavy environment. The following is from my preliminary analysis:
Let's look at where we died:
> $C
ffffff01ea6b9a10 taskq_dispatch+0x3a(0, fffffffff7d20450, ffffff5551dea920, 1)
ffffff01ea6b9a60 zil_clean+0xce(ffffff4b7106c080, 7e0f1)
ffffff01ea6b9aa0 dsl_pool_sync_done+0x47(ffffff4313065680, 7e0f1)
ffffff01ea6b9b70 spa_sync+0x55f(ffffff4310c1d040, 7e0f1)
ffffff01ea6b9c20 txg_sync_thread+0x20f(ffffff4313065680)
ffffff01ea6b9c30 thread_start+8()
If we dig in we can find that this dataset corresponds to a zone:
> ffffff4b7106c080::print zilog_t zl_os->os_dsl_dataset->ds_dir->dd_myname
zl_os->os_dsl_dataset->ds_dir->dd_myname = [ "8ffce16a-13c2-4efa-a233-
9e378e89877b" ]
Okay so we have a null taskq pointer. That only happens during the calls to
zil_open and zil_close. If we poke around we can see that we're actually in
midst of a rollback:
> ::pgrep zfs | ::printf "0x%x %s\\n" proc_t . p_user.u_psargs
0xffffff43262800a0 zfs rollback zones/15714eb6-f5ea-469f-ac6d-
4b8ab06213c2@marlin_init
0xffffff54e22a1028 zfs rollback zones/8ffce16a-13c2-4efa-a233-
9e378e89877b@marlin_init
0xffffff4362f3a058 zfs rollback zones/0ddb8e49-ca7e-42e1-8fdc-
4ac4ba8fe9f8@marlin_init
0xffffff5748e8d020 zfs rollback zones/426357b5-832d-4430-953e-
10cd45ff8e9f@marlin_init
0xffffff436b867008 zfs rollback zones/8f36bf37-8a9c-4a44-995c-
6d1b2751e6f5@marlin_init
0xffffff4381ad4090 zfs rollback zones/6c8eca18-fbd6-46dd-ac24-
2ed45cd0da70@marlin_init
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: George Wilson <george.wilson@delphix.com>
MFC after: 3 weeks
This was being done in the pre-AR9380 case, but not for AR9380 and later.
When powersave in STA mode is enabled, this may have lead to the transmit
completion code doing this:
* call the task, which doesn't wake up the hardware
* complete the frames, which doesn't touch the hardware
* schedule pending frames on the hardware queue, which DOES touch the
hardware, and this will be ignored
This would show up in the logs like this:
(with debugging enabled):
Nov 27 23:03:56 lovelace kernel: Q1[ 0] (nseg=1) (DS.V:0xfffffe011bd57300 DS.P:0x49b57300) I: 168cc117 L:00000000 F:0005
...
(in general, doesn't require debugging enabled):
Nov 27 23:03:56 lovelace kernel: ath_hal_reg_write: reg=0x00000804, val=0x49b57300, pm=2
That register is a EDMA TX FIFO register (queue 1), and the val is the descriptor
being written.
Whilst here, make sure the software queue gets kicked here.
Tested;
* AR9485, STA mode + powersave
Since hypervisor does not respond CHOPEN to a revoked channel.
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8636
Just in case that no chimney sending buffer can be used.
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8619
This bug has been bugging me for quite some time. I finally sat down
with enough coffee to figure it out.
The short of it - rounding up to the next intval multiple of the TSF value
only works if the AP is transmitting all its beacons on an interval of
the TSF. If it isn't - for example, doing staggered beacons on a multi-VAP
setup with a single hardware TSF - then weird things occur.
The long of it -
When powersave is enabled, the MAC and PHY are partially powered off.
They can't receive any packets (or transmit, for that matter.)
The target beacon timer programming will wake up the MAC/PHY just before
the beacon is supposed to be received (well, strictly speaking, at DTIM
so it can see the TIM - traffic information map - telling the STA whether
any traffic is there for it) and it happens automatically.
However, this relies on the target beacon time being programmed correctly.
If it isn't then the hardware will wake up and not hear any beacons -
and then it'll be asleep for said beacons. After enough of this, net80211
will give up and assume the AP went away.
This should fix both TSFOOR interrupts and disconnects from APs with powersave
enabled.
The annoying bit is that it only happens if APs stagger things or start
on a non-zero TSF. So, this would sometimes be fine and sometimes not be
fine.
What:
* I don't know (yet) why the code rounds up to the next intval.
For now, just disable rounding it and trust the value we get.
TODO:
* If we do see a beacon miss in STA mode then we should transition
out of sleep for a while so we can hear beacons to resync against.
I'd love a patch from someone to enable that particular behaviour.
Note - that doesn't require that net80211 brings the chip out of
sleep state - only that we wake the chip up through to full-on and
then let it go to sleep again when we've seen a beacon. The wifi
stack and AP can still completely just stay believing we're in sleep
mode.
Tested:
* AR9485, STA mode, powersave enabled
MFC after: 1 week
Relnotes: Yes
- Set IEEE80211_FEXT_SCAN_OFFLOAD flag; firmware can send null data
frames when associated.
- Check IEEE80211_SCAN_ACTIVE scan flag instead of IEEE80211_F_ASCAN
ic flag; the last is never set since r170530.
- Eliminate software scan (net80211) <-> site_survey (driver) race:
* override ic_scan_curchan and ic_scan_mindwell pointers so net80211
will not try to finish scanning automatically;
* inform net80211 about current status via ieee80211_cancel_scan()
and ieee80211_scan_done();
* remove corresponding workaround from rsu_join_bss().
Now the driver can associate to an AP with hidden SSID.
Tested with Asus USB-N10.
the vnode is inactivated. This contradicts with the nullfs caching
which keeps upper vnode around, as consequence keeping the use
reference to lower vnode.
Add a filesystem flag to request nullfs to not cache when mounted over
that filesystem, and set the flag for nfs v4 mounts.
Reported by: asomers
Reviewed by: rmacklem
Tested by: asomers, rmacklem
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
a new page of radix trie nodes to complete a vm_radix_insert() operation
that was requested by vm_page_cache(). Specifically, vm_page_cache()
already held the free page queue lock when UMA tried to acquire it through
a call to vm_page_alloc(). This code path no longer exists, so there is no
longer any reason to allow recursion on the free page queue mutex.
Improve nearby comments.
Reviewed by: kib, markj
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8628
- Defined an abstract NVRAM I/O API (bhnd_nvram_io), decoupling NVRAM/SPROM
parsing from the actual underlying NVRAM data provider (e.g. CFE firmware
devices).
- Defined an abstract NVRAM data API (bhnd_nvram_data), decoupling
higher-level NVRAM operations (indexed lookup, data conversion, etc) from
the underlying NVRAM file format parsing/serialization.
- Implemented a new high-level bhnd_nvram_store API, providing indexed
variable lookup, pending write tracking, etc on top of an arbitrary
bhnd_nvram_data instance.
- Migrated all bhnd(4) NVRAM device drivers to the common bhnd_nvram_store
API.
- Implemented a common bhnd_nvram_val API for parsing/encoding NVRAM
variable values, including applying format-specific behavior when
converting to/from the NVRAM string representations.
- Dropped the now unnecessary bhnd_nvram driver, and moved the
broadcom/mips-specific CFE NVRAM driver out into sys/mips/broadcom.
- Implemented a new nvram_map file format:
- Variable definitions are now defined separately from the SPROM
layout. This will also allow us to define CIS tuple NVRAM
mappings referencing the common NVRAM variable definitions.
- Variables can now be defined within arbitrary named groups.
- Textual descriptions and help information can be defined inline
for both variables and variable groups.
- Implemented a new, compact encoding of SPROM image layout
offsets.
- Source-level (but not build system) support for building the NVRAM file
format APIs (bhnd_nvram_io, bhnd_nvram_data, bhnd_nvram_store) as a
userspace library.
The new compact SPROM image layout encoding is loosely modeled on Apple
dyld compressed LINKEDIT symbol binding opcodes; it provides a compact
state-machine encoding of the mapping between NVRAM variables and the SPROM
image offset, mask, and shift instructions necessary to decode or encode
the SPROM variable data.
The compact encoding reduces the size of the generated SPROM layout data
from roughly 60KB to 3KB. The sequential nature SPROM layout opcode tables
also simplify iteration of the SPROM variables, as it's no longer
neccessary to iterate the full NVRAM variable definition table, but
instead simply scan the SPROM revision's layout opcode table.
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D8645
As of r234483, vnode deactivation causes non-VPO_NOSYNC pages to be
laundered. This behaviour has two problems:
1. Dirty VPO_NOSYNC pages must be laundered before the vnode can be
reclaimed, and this work may be unfairly deferred to the vnlru process
or an unrelated application when the system is under vnode pressure.
2. Deactivation of a vnode with dirty VPO_NOSYNC pages requires a scan of
the corresponding VM object's memq for non-VPO_NOSYNC dirty pages; if
the laundry thread needs to launder pages from an unreferenced such
vnode, it will reactivate and deactivate the vnode with each laundering,
potentially resulting in a large number of expensive scans.
Therefore, ensure that all dirty pages are laundered upon deactivation,
i.e., when all maps of the vnode are removed and all references are
released.
Reviewed by: alc, kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D8641
can be selected for it. If the desired frequency is one of those two, use
this mode instead of the integer one.
When calculating the PLL3 freq for the dotclock, check if it is a multiple
of the fracional frequencies.
MFC after: 2 weeks
(APIC-Timer-always-running) is not implemented.
If machine has ncpus >= 8 and non-FSB interrupt routing from HPET,
default HPET eventtimer quality 450 is reduced by 100, i.e. it is
350. On the other hand, LAPIC default quality is 600 and it is reduced
by 200 if ARAT is not reported. We end up with HPET quality 350 <
LAPIC quality 400, despite ARAT is not set. Then, since deep Cx
states are active by default, eventtimer fail.
E.g., on Nehalem Core i7 CPU and X58 chipset, LAPIC only works in
C0/C1/C1E and HPET does not implement FSB mode, which otherwise
requires manual switch to HPET to get working system.
Set LAPIC eventtimer quality to 100 if no ARAT.
While there, do not ignore deadlint TSC mode for LAPIC timer if ARAT
is not implemented. If user manually selected LAPIC eventtimer on
such CPU, there is no reason to not use deadline if available and not
disabled administratively.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The "-z" option on nfsstats was erroneously zeroing out the counts
of NFSv4 state structures. These counts will normally go back down
to zero as state is released. When zeroed out by "-z", these counts
can go negative. This patch fixes this problem.
MFC after: 2 weeks
MPC750 User Manual Errata (rev 1) adds a note to C.4.2.2 noting that mtsr,
mtsrin, and mtmsr all require a isync after the instruction and before data
address translation uses any of the segment registers. This should make FreeBSD
run correctly on the G3 again.
Reported by: Mark Millard
MFC after: 1 week
MAXPAGESIZE is not well defined by the GNU ld documentation.
Different linkers, and different versions of the same linker, use
different MAXPAGESIZE values. Current versions of GNU gold and LLVM's
lld use 4K. When set to 4K the kernel panics at boot due to an issue
with x86bios.
Here we want the kernel physaddr to be the amd64 superpage size, so use
that value (2MB) explicitly. With this change GNU gold and LLVM lld can
link a working amd64 kernel.
PR: 214718 (x86bios)
Differential Revision: https://reviews.freebsd.org/D8610
The callout subsystem already handles early callouts and schedules
the first clock interrupt appropriately based on the currently pending
callouts. The one nit to fix was that callouts scheduled via C_HARDCLOCK
during early boot could fire too early once timers were enabled as the
per-CPU base time is always zero until timers are initialized. The change
in callout_when() handles this case by using the current uptime as the
base time of the callout during bootup if the per-CPU base time is zero.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: Netflix
Since the previous algorithm, based on bit shifting, does not scale
with large replay windows, the algorithm used here is based on
RFC 6479: IPsec Anti-Replay Algorithm without Bit Shifting.
The replay window will be fast to be updated, but will cost as many bits
in RAM as its size.
The previous implementation did not provide a lock on the replay window,
which may lead to replay issues.
Reviewed by: ae
Obtained from: emeric.poupon@stormshield.eu
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D8468
we see a lot of contention on the arc4 lock, used to generate the IV
of the ESP output packets.
The idea of this patch is to split this mutex in order to reduce the
contention on this lock.
Reviewed by: delphij, markm, ache
Approved by: so
Obtained from: emeric.poupon@stormshield.eu
MFC after: 1 month
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D8130
So that the caller can know the channel close error and react accordingly.
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8600
So that the callers of vmbus_chan_open_br() could handle the passed in
bufring memory properly.
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8569