On load_one, we now cache our capabilities registers internally, similar
to QUERY_HCA_CAP. Capabilities can later be queried using macros
introduced in this patch.
Linux commit:
71862561f3a62015a11de16d1c306481e8415c08
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
Introduced registers will expose capabilities of new registers and
features related to port/management.
Driver will query MCAM and PCAM in order to avoid failing on old
firmwares with lack of support.
Linux commit:
c835ad64683bd3e2d1b31ed2cb1ff4366932edb1
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
PCAM: Ports capabilities mask register.
MCAM: Management capabilities mask register.
PCAM and MCAM registers will provide information regarding firmware
support for different features, in order to avoid cases where new driver
combined with old firmware results in syndromes (for ex. PCIe counters
before this patchset).
Linux commit:
cfdcbceaeffc669b70d904d80a2df9c86c232566
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
Today mlx5 devices support two teardown modes:
1- Regular teardown
2- Force teardown
This change introduces the enhanced version of the "Force teardown" that
allows SW to perform teardown in a faster way without the need to reclaim
all the pages.
Fast teardown provides the following advantages:
1- Fix a FW race condition that could cause command timeout
2- Avoid moving to polling mode
3- Close the vport to prevent PCI ACK to be sent without been
scattered to memory
Linux commit:
fcd29ad17c6ff885dfae58f557e9323941e63ba2
MFC after: 3 days
Sponsored by: Mellanox Technologies
Else the SQs won't be properly released when closing rate-limited connections
leading to wrong state transitions on the SQ.
MFC after: 3 days
Sponsored by: Mellanox Technologies
When using CQE zipping, one can choose between RX hash and Checksum.
This will indicate the parameter on which a zipping session should be
stopped.
While porting the Linux code, Checksum was chosen. However, the value
of Checksum is not being used anywhere.
For the FreeBSD driver, we prefer to use the RX hash format which will
guarantee the RX hash value for all the mini CQEs.
While at it, make sure to initialize the Checksum value in the
decompressed CQE.
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
After doing performance measurements, it seems like CQE zipping doesn't
have any significant benefit.
Moreover, we know that this feature is disabled by default on other
operating systems (Linux for example).
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
Split the function into the mlx5e_update_stats_locked() core and make
mlx5e_update_stats_work() call the _locked helper, similar to many other
places in the kernel. This improves the code structure, making the
locking clean.
Submitted by: kib@
MFC after: 3 days
Sponsored by: Mellanox Technologies
Instead of waiting for all jobs to be cancelled, simply close the completion
queue to prevent more completion events and let mlx5e_destroy_rq() cleanup
the remaining mbufs.
MFC after: 3 days
Sponsored by: Mellanox Technologies
The number of priorities is always 8, while the number of traffic classes
supported can vary. While at it convert the sysctl node into an array.
MFC after: 3 days
Sponsored by: Mellanox Technologies
Instead of reading Ethernet RFC 2819 pXtoYoctets counters from
hardware which counts RX octets, count tx_stat_pXtoYoctets from
Ethernet extended counters which counts TX octets.
TX jumbo counters should be accumulated only after the PPCNT
counters were fetched from hardware with their latest value.
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
Avoid an infinite software firmware reset loop that may be caused by a
hardware bug by limiting the maximum number of resets.
The counter between resets is reset by request for reset, and not by a
successful reset.
The interval between two resets can be configured via sysctl:
hw.mlx5.sw_reset_timeout
which is global to all mlx5 devices in the system.
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
Make sure the interrupt handlers don't race with the fast unload one
code in the shutdown handler.
MFC after: 3 days
Sponsored by: Mellanox Technologies
Temperature warning event is sent by FW to indicate high temperature
as detected by one of the sensors on the board.
Add handling of this event by writing the numbers of the alert sensors
to the kernel log.
Linux commit:
1865ea9adbfaf341c5cd5d8f7d384f19948b2fe9
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
While at it remove unused interface state bits. This also fixes and issue
during shutdown:
There is an issue where the firmware fails during mlx5_load_one,
the health_care timer detects the issue and schedules a health_care call.
Then the mlx5_load_one detects the issue, cleans up and quits. Then
the health_care starts and calls mlx5_unload_one to clean up the resources
that no longer exist and causes kernel panic.
The root cause is that the bit MLX5_INTERFACE_STATE_DOWN is not set
after mlx5_load_one fails. The solution is removing the bit
MLX5_INTERFACE_STATE_DOWN and quit mlx5_unload_one if the
bit MLX5_INTERFACE_STATE_UP is not set. The bit MLX5_INTERFACE_STATE_DOWN
is redundant and we can use MLX5_INTERFACE_STATE_UP instead.
Linux commit:
10a8d00707082955b177164d4b4e758ffcbd4017
b3cb5388499c5e219324bfe7da2e46cbad82bfcf
MFC after: 3 days
Sponsored by: Mellanox Technologies
Add support for DIM based on Linux,
with some minor adaptions specific to FreeBSD.
Linux commit
f97c3dc3c0e8d23a5c4357d182afeef4c67f5c33
MFC after: 3 days
Sponsored by: Mellanox Technologies
Checksum offloading for SCTP is not currently specified for virtio.
If the hypervisor announces checksum offloading support, it means TCP
and UDP checksum offload. If an SCTP packet is sent and the host announced
checksum offload support, the hypervisor inserts the IP checksum (16-bit)
at the correct offset, but this is not the right checksum, which is a CRC32c.
This results in all outgoing packets having the wrong checksum and therefore
breaking SCTP based communications.
This patch removes SCTP checksum offloading support from the virtio
network interface.
Thanks to Felix Weinrank for making me aware of the issue.
Reviewed by: bryanv@
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20147
times - on every interrupt by using an own set of device methods for the
IGB class. This translates to introducing igb_if_intr_{disable,enable}()
and igb_if_{rx,tx}_queue_intr_enable() with that IGB-specific code moved
out of their EM counterparts and otherwise continuing to use the EM IFDI
methods also for IGB.
Note that igb_if_intr_{disable,enable}() also issue E1000_WRITE_FLUSH as
lost with the conversion of igb(4) to iflib(4).
Also note, that the em_if_{disable,enable}_intr() methods are renamed to
em_if_intr_{disable,enable}() for consistency with the names used in the
interface declaration.
o In em_intr():
- Don't bother to bail out if the interrupt type is "legacy", i. e. INTx
or MSI, as iflib(4) doesn't use ift_legacy_intr methods for MSI-X. All
other iflib(4)-based drivers avoid this check, too.
- Given that only the MSI-X interrupts have one-shot behavior (by taking
advantage of the EIAC register), explicitly disable interrupts. Hence,
em_intr() now matches what {em,igb}_irq_fast() previously did (in case
of igb(4) supposedly also to work around MSI message reordering errata
on certain systems).
o In em_if_intr_disable():
- Clear the EIAC register unconditionally for 82574 and not just in case
of MSI-X, matching em_if_intr_enable() and bringing back the last hunk
of r206437 lost with the iflib(4) conversion.
- Write to EM_EIAC for clearing said register instead of to the IGB-only
E1000_EIAC used ever since the iflib(4) conversion.
Reviewed by: shurd
Differential Revision: https://reviews.freebsd.org/D20176
Allow users to specify multiple dump configurations in a prioritized list.
This enables fallback to secondary device(s) if primary dump fails. E.g.,
one might configure a preference for netdump, but fallback to disk dump as a
second choice if netdump is unavailable.
This change does not list-ify netdump configuration, which is tracked
separately from ordinary disk dumps internally; only one netdump
configuration can be made at a time, for now. It also does not implement
IPv6 netdump.
savecore(8) is already capable of scanning and iterating multiple devices
from /etc/fstab or passed on the command line.
This change doesn't update the rc or loader variables 'dumpdev' in any way;
it can still be set to configure a single dump device, and rc.d/savecore
still uses it as a single device. Only dumpon(8) is updated to be able to
configure the more complicated configurations for now.
As part of revving the ABI, unify netdump and disk dump configuration ioctl
/ structure, and leave room for ipv6 netdump as a future possibility.
Backwards-compatibility ioctls are added to smooth ABI transition,
especially for developers who may not keep kernel and userspace perfectly
synced.
Reviewed by: markj, scottl (earlier version)
Relnotes: maybe
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D19996
Right now ath_rate_sample has a fixed rate schedule, rather than the minstrel_ht
style "best, good, most reliable" triplet. So, if higher rates are tried then
it'll not fail back to a lower MCS rate in that transmission schedule.
This means that in low SNR situations it'll not easily drop to MCS0 unless enough
transmissions occur to allow rate control to eventually decide to drop; and if
it's TCP traffic it'll get slowed down because of packet loss.
It's worse for 2-stream and 3-stream rates; it doesn't ever fall back to lower
stream rates, and these higher stream rates required higher SNR to work.
So instead let's (for now?) have each of the 11n transmit rates use MCS0 as
the last attempt. ath_rate_sample will quickly see that rate succeeds more
and will move to it much quicker.
Testing:
* AR9344 (Wasp) - 2G STA mode
These are some fun issues I've found with my upstairs wifi link at such a ridiculous
low signal level (like, < 5dB.)
* Add per-station tx/rx rssi statistics, in potential preparation to use that
in the RX rate control.
* Call the rate control on each received frame to let it potentially use
it as a hint for what rates to potentially use. It's a no-op right now.
* Do ANI calibration during scan as well. The ath_newstate() call was disabling the
ANI timer and only re-enabling it during transitions to _RUN. This has the
unfortunate side-effect that if ANI deafened the NIC because of interference
and it disassociated, it wouldn't be reset and the scan would never hear beacons.
The ANI configuration is stored at least globally on some HALs and per-channel
on others. Because of this a NIC reset wouldn't help; the ANI parameters would
simply be programmed back in.
Now, I have a feeling I also need to do this during AUTH/ASSOC too and maybe,
if I'm feeling clever, I need to reset the ANI parameters on a given channel
during a transition through INIT or if the VAP is destroyed/re-created.
However for now this gets me out of the immediate weeds with connectivity
upstairs (and thus I /can/ commit); I'll keep chipping away at tidying this
stuff up in subsequent commits.
Tested:
* AR9344 (Wasp), 2G STA mode
so it does not require a bounce buffer. The only need for this was
to align the buffer address. Implement unaligned access and we don't
need to copy data twice.
o Remove contigmalloc-based bounce buffer from xDMA code since it is
not suitable for arbitrary memory provided by platform, which is
sometimes a dedicated piece of memory that is not managed by OS at all.
Sponsored by: DARPA, AFRL
It cannot load it automatically at boot, because the root filesystem
is not there yet. An alternative would be adding ispfw(4) to GENERIC,
but it's an additional 1MB.
Reviewed by: mav
MFC after: 2 weeks
Sponsored by: Klara Inc.
Differential Revision: https://reviews.freebsd.org/D19369
Drivers can now pass up numa domain information via the
mbuf numa domain field. This information is then used
by TCP syncache_socket() to associate that information
with the inpcb. The domain information is then fed back
into transmitted mbufs in ip{6}_output(). This mechanism
is nearly identical to what is done to track RSS hash values
in the inp_flowid.
Follow on changes will use this information for lacp egress
port selection, binding TCP pacers to the appropriate NUMA
domain, etc.
Reviewed by: markj, kib, slavash, bz, scottl, jtl, tuexen
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20028
From 7d8dc6544c
"The mcbin (and likely others) have a nonstandard uart clock. This means
that the earlycon programming will incorrectly set the baud rate if it is
specified. The way around this is to tell the kernel to continue using the
preprogrammed baud rate. This is done by setting the baud to 0."
Our drivers (uart_dev_ns8250) do respect zero, but SPCR would error. Let's
not error.
Submitted by: Greg V <greg@unrelenting.technology>
Reviewed by: mw, imp, bcran
Differential Revision: https://reviews.freebsd.org/D19914
This is fairly similar to the AES-GCM support in ccr(4) in that it will
fall back to software for certain cases (requests with only AAD and
requests that are too large).
Tested by: cryptocheck, cryptotest.py
MFC after: 1 month
Sponsored by: Chelsio Communications
To workaround limitations in the crypto engine, empty buffers are
handled by manually constructing the final length block as the payload
passed to the crypto engine and disabling the normal "final" handling.
For HMAC this length block should hold the length of a single block
since the hash is actually the hash of the IPAD digest, but for
"plain" SHA the length should be zero instead.
Reported by: NIST SHA1 test failure
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Add support for newer Thinkpad models with id LEN0268. Was tested on
Thinkpad T480 and ThinkPad X1 Yoga 2nd gen.
PR: 229120
Submitted by: Ali Abdallah <aliovx@gmail.com>
MFC after: 1 week
This commit adds new if_alloc_domain() and if_alloc_dev() methods to
allocate ifnets. When called with a domain on a NUMA machine,
ifalloc_domain() will record the NUMA domain in the ifnet, and it will
allocate the ifnet struct from memory which is local to that NUMA
node. Similarly, if_alloc_dev() is a wrapper for if_alloc_domain
which uses a driver supplied device_t to call ifalloc_domain() with
the appropriate domain.
Note that the new if_numa_domain field fits in an alignment pad in
struct ifnet, and so does not alter the size of the structure.
Reviewed by: glebius, kib, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19930
This fixes a bug that prevented the driver from auto-flashing the
firmware when it didn't see one on the card. This feature was
introduced in r321390 and this bug was introduced in r343269.
Reported by: gallatin@
MFC after: 1 week
Sponsored by: Chelsio Communications
One of the fun issues with scanning has been how the existing
ANI values were programmed into the hardware when channels were
changed. If you're on a really crappy channel and ANI has made
you deaf then when you scan you continue to be deaf on all channels.
This code passes in a flag to startpcureceive which in AR5416 and later
is also used to enable ANI. This allows it to know if it's a normal
operation or a scan operation.
This fixes my situation at home where a temporary spot of a device
going deaf due to interference starts scanning and .. can't hear
anything until I restart.
Now, this isn't the full fix - ideally:
(a) all the ANI config and per-channel information would be migrated
to the shared HAL stuff and enabled for all of the NICs;
(b) when a station reassociates and some other error conditions
(like missed beacons, NF calibration failures, etc) a knob
to reset ANI parameters would likely help recovery.
But hey, I'm committing bits of code again! woo!
Tested:
* AR9344 (2G), STA operation
This fixes a bug where, even when hw.psm.tap_enabled=0, touchpad taps
were processed.
tap_enabled has three states: unconfigured, disabled, and enabled (-1, 0, 1).
To respect PR kern/139272, taps are ignored only when explicity disabled.
Submitted by: Ben LeMasurier <ben@crypt.ly> (initial version)
MFC after: 2 weeks
Ignoring of gesture processing when the palm is detected helps to reduce
some of the erratic pointer behavior.
This fixes regression introduced in r317814
Reported by: Ben LeMasurier <ben@crypt.ly>
MFC after: 2 weeks
As discussed in that commit message, it is a dangerous default. But the
safe default causes enough pain on a variety of platforms that for now,
restore the prior default.
Some of this is self-induced pain we should/could do better about; for
example, programmatic CI systems and VM managers should introduce entropy
from the host for individual VM instances. This is considered a future work
item.
On modern x86 and Power9 systems, this may be wholly unnecessary after
D19928 lands (even in the non-ideal case where early /boot/entropy is
unavailable), because they have fast hardware random sources available early
in boot. But D19928 is not yet landed and we have a host of architectures
which do not provide fast random sources.
This change adds several tunables and diagnostic sysctls, documented
thoroughly in UPDATING and sys/dev/random/random_infra.c.
PR: 230875 (reopens)
Reported by: adrian, jhb, imp, and probably others
Reviewed by: delphij, imp (earlier version), markm (earlier version)
Discussed with: adrian
Approved by: secteam(delphij)
Relnotes: yeah
Security: related
Differential Revision: https://reviews.freebsd.org/D19944
The imagined use is for early boot consumers of random to be able to make
decisions based on whether random is available yet or not. One such
consumer seems to be __stack_chk_init(), which runs immediately after random
is initialized. A follow-up patch will attempt to address that.
Reported by: many
Reviewed by: delphij (except man page)
Approved by: secteam(delphij)
Differential Revision: https://reviews.freebsd.org/D19926
Check caller thread id before allowing to read the buffer
to make sure that it can only be accessed by the thread that
did the associated write to the TPM.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: delphij
Obtained from: Semihalf
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D19713
read_random() is/was used, mostly without error checking, in a lot of
very sensitive places in the kernel -- including seeding the widely used
arc4random(9).
Most uses, especially arc4random(9), should block until the device is seeded
rather than proceeding with a bogus or empty seed. I did not spy any
obvious kernel consumers where blocking would be inappropriate (in the
sense that lack of entropy would be ok -- I did not investigate locking
angle thoroughly). In many instances, arc4random_buf(9) or that family
of APIs would be more appropriate anyway; that work was done in r345865.
A minor cleanup was made to the implementation of the READ_RANDOM function:
instead of using a variable-length array on the stack to temporarily store
all full random blocks sufficient to satisfy the requested 'len', only store
a single block on the stack. This has some benefit in terms of reducing
stack usage, reducing memcpy overhead and reducing devrandom output leakage
via the stack. Additionally, the stack block is now safely zeroed if it was
used.
One caveat of this change is that the kern.arandom sysctl no longer returns
zero bytes immediately if the random device is not seeded. This means that
FreeBSD-specific userspace applications which attempted to handle an
unseeded random device may be broken by this change. If such behavior is
needed, it can be replaced by the more portable getrandom(2) GRND_NONBLOCK
option.
On any typical FreeBSD system, entropy is persisted on read/write media and
used to seed the random device very early in boot, and blocking is never a
problem.
This change primarily impacts the behavior of /dev/random on embedded
systems with read-only media that do not configure "nodevice random". We
toggle the default from 'charge on blindly with no entropy' to 'block
indefinitely.' This default is safer, but may cause frustration. Embedded
system designers using FreeBSD have several options. The most obvious is to
plan to have a small writable NVRAM or NAND to persist entropy, like larger
systems. Early entropy can be fed from any loader, or by writing directly
to /dev/random during boot. Some embedded SoCs now provide a fast hardware
entropy source; this would also work for quickly seeding Fortuna. A 3rd
option would be creating an embedded-specific, more simplistic random
module, like that designed by DJB in [1] (this design still requires a small
rewritable media for forward secrecy). Finally, the least preferred option
might be "nodevice random", although I plan to remove this in a subsequent
revision.
To help developers emulate the behavior of these embedded systems on
ordinary workstations, the tunable kern.random.block_seeded_status was
added. When set to 1, it blocks the random device.
I attempted to document this change in random.4 and random.9 and ran into a
bunch of out-of-date or irrelevant or inaccurate content and ended up
rototilling those documents more than I intended to. Sorry. I think
they're in a better state now.
PR: 230875
Reviewed by: delphij, markm (earlier version)
Approved by: secteam(delphij), devrandom(markm)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D19744
This allows efficient filtering at packet ingress on mlx5en.
Note that the packets are filtered (and potentially dropped) *before*
the driver has committed to (re)allocating an mbuf for the
packet. Dropped packets are treated essentially the same as an
error. Nothing is allocated, and the existing buffer is recycled. This
allows us to drop malicious packets at close to line rate with very
little CPU use.
Reviewed by: hselasky, slavash, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19063
The SPCR table on the Lenovo HR330A Ampere eMAG server indicates 8-bit
access, but 32-bit access is required for the PL011 to work.
PL011 on SBSA platforms always supports 32-bit access (and that was
hardcoded here before my EC2 fix), let's use 32-bit access for PL011
and 32BIT interface types.
Tested by emaste on Ampere eMAG and Cavium/Marvell ThunderX2.
Submitted by: Greg V <greg@unrelenting.technology>
Reviewed by: andrew, imp (earlier)
Differential Revision: https://reviews.freebsd.org/D19507
If a custom block size requested, use it, otherwise revert to the previous logic
of using just a data size if it's less than MMC_BLOCK_SIZE, and MMC_BLOCK_SIZE otherwise.
Reviewed by: bz
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D19783
SDIO command CMD53 (IO_RW_EXTENDED) allows data transfers using blocks of 1-2048 bytes,
with a maximum of 511 blocks per request.
Extend mmc_data structure to properly describe such requests,
and initialize the new fields in kernel and userland consumers.
No actual driver changes happen yet, these will follow in the separate changes.
Reviewed by: bz
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D19779
While it is true that the ACPI spec says that the flag is only valid
on Extended Address Space Descriptors, examples of other descriptors
in the spec use the ProducerConsumer flag explicitly, and real
hardware uses it as well. In fact, even in the ASL of the Thunder X2
for which r330113 was a workaround, some devices use this flag on
non-Extended Address Space Descriptors correctly. Instead, only
ignore the flag for resources associated with the UART devices on the
Thunder X2 using the "ARMH0011" HID to identify these devices.
This should fix regressions from ignoring this flag in other contexts
such as Hyper-V.
PR: 235876
Reported by: Wei Hu <weh@microsoft.com>
Tested by: emaste (Thunder X2)
MFC after: 2 weeks
CPUs can use shared (RF_SHAREABLE) resources for the I/O port used for
entering and exiting C states. If this I/O port is included in an ACPI
system resource device, then this happens to still work, but if the port
wasn't part of a system resource device, only the first CPU could allocate
the I/O port and use C states since resource_list_reserve() was always
allocating the resource from nexus0 without RF_SHAREABLE. By avoiding
the reservation, the flags from the bus_alloc_resource() in the CPU driver
(which include RF_SHAREABLE) are honored.
PR: 236513
Reported by: stockhausen@collogia.de
Sleuthing by: avg
Reviewed by: avg
MFC after: 2 weeks
RTL8152 (chip version URE_CHIP_VER_4C10) doesn't
have hardwired MAC address, in other words, it is all zeros.
This commit fixes it by setting random MAC address
when MAC address is all zeros.
Reviewed by: kevlo
Differential Revision: https://reviews.freebsd.org/D19856
Both linux and u-boot sources for RTL8152 driver has this value.
RTL8152 USB ethernet is used in NanoPI R1 board as second ethernet.
This fixes for me RTL8152 USB ethernet not detected problem after
reboot on NanoPI R1 board.
Both NetBSD and OpenBSD have a wrong value so far.
For PCI device (i.e. child of a PCI bus), reset tries FLR if
implemented and worked, and falls to power reset otherwise.
For PCIe bus (child of a PCIe bridge or root port), reset
disables PCIe link and then re-trains it, performing what is known as
link-level reset.
Reviewed by: imp (previous version), jhb (previous version)
Sponsored by: Mellanox Technologies
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D19646
Add the ability to use interrupts for i2c message.
We still use polling for early boot i2c transfer (for PMIC
for example) but as soon as interrupts are available use them.
On Allwinner SoC >A20 is seems that polling mode is broken for some
reason, this is now fixed by using interrupt mode.
For Allwinner also fix the frequency calculation, the one in the code
was for when the APB frequency is at 48Mhz while it is at 24Mhz on most
(all?) Allwinner SoCs. We now support both cases.
While here add more debug info when it's compiled in.
Tested On: A20, H3, A64
MFC after: 1 month
In r337703 DTS files were updated to Linux 4.18, including Linux commit
4d8b032d3c03f4e9788a18bbb51b10e6c9e8a56b which removed the `phy_id`
property from am335x-bone-common (as the property was deprecated).
Use `phy-handle` via fdt_get_phyaddr, keeping the existing code as a
fallback for old DTBs.
PR: 236624
Submitted by: manu, Gerald Aryeetey <aryeeteygerald_rogers.com>
Reported by: Gerald Aryeetey
Reviewed by: manu
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19814
Harvesting has to compete for the TPM chip with userspace.
Before this change the callout could hijack an unread buffer
causing a userspace call to the TPM to fail.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: delphij
Obtained from: Semihalf
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D19712
interrupt enable are not fatal.
The firmware sets up all the interrupt enables based on run time
configuration, which means the information in the enables is more
accurate than what's compiled into the driver. This change also allows
the fatal bits to be updated without any changes in the driver in some
cases.
MFC after: 1 week
Sponsored by: Chelsio Communications
* Crank the OPAL state machine during the receive loop, to make sure the
pollers are executed
* Add a proper detach function, so the module can be unloaded and reloaded
at runtime.
It still doesn't reliably work 100% of the time on POWER9, and it appears
timing and/or cache related. It may work on POWER8 now.
MFC after: 2 weeks
Using DFLTPHYS/MAXPHYS is not always OK, instead make it possible for the
controller driver to provide maximum data size to MMCCAM, and use it there.
The old stack already does this.
Reviewed by: manu
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D15892
I/O operations already in its queue were not being properly drained.
The GEOM framework does the queue draining, but the device driver
needs to wait for the draining to happen. The waiting is done by
adding a g_md_providergone() function to wait for the I/O operations
to finish up.
It is likely that every GEOM provider that implements orphaning
attached GEOM consumers needs to use the "providergone" mechanism
for this same reason, but some of them do not do so. Apparently
Kenneth Merry (ken@) added the drain for just such races, but he
missed adding it to some of the device drivers that needed it.
Submitted by: Chuck Silvers
Reviewed by: imp
Tested by: Chuck Silvers
MFC after: 1 week
Sponsored by: Netflix
frame header and data.
This will fix 'Mysterious OLPC stuff' for received frames and wrong
CCMP / TKIP / data decoding for transmitted frames in net/wireshark
dissector.
While here, drop unneeded comment - net80211 handles padding requirements
for Tx & Rx without driver adjustment.
Tested with D-Link DWA-140 rev B3, STA mode.
MFC after: 1 week
The declaration in tcp_var.h is still around so t4_tom continued to
compile but wouldn't load. A separate commit will fix tcp_var.h
Reported By: Dustin Marquess (dmarquess at gmail)
Sponsored by: Chelsio Communications
It looks like some DIMMs claim to have a TSOD, but actually don't. Some
claim they weren't able to change the SPD page, but they did. Neither of
those should be fatal errors.
PR: 235944
Submitted by: Greg V <greg@unrelenting.technology>
Reported by: Greg V <greg@unrelenting.technology>
Reviewed by: cem
MFC after: 1 weeks
Sponsored by: Panasas
Differential Revision: https://reviews.freebsd.org/D19681
CAM_RESRC_UNAVAIL instead of CAM_REQUEUE_REQ. This makes CAM delay a bit
before retrying, so that the retries actually get a chance to succeed.
Reviewed by: sbruno
MFC after: 2 weeks
Sponsored by: Klara Inc.
Differential Revision: https://reviews.freebsd.org/D19696
are successfully returned by the card (usually due to an abort being issued
as part of timeout recovery). Remove what amounts to an insufficient
KASSERT, and don't overwrite the state value. State should probably be
re-designed, and that will be done with a future commit.
Reported by: phk, bei.io
Reviewed by: imp, mav
Differential Revision: D19677
TPM has a built-in RNG, with its own entropy source.
The driver was extended to harvest 16 random bytes from TPM every 10 seconds.
A new build option "TPM_HARVEST" was introduced - for now, however, it
is not enabled by default in the GENERIC config.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: markm, delphij
Approved by: secteam
Obtained from: Semihalf
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D19620
In the latest Linux kernel revisions the DSA (Distributed
Switch Architecture) device tree binding was changed.
Instead of the top level dsa@ node, the switch and its
ports is represented as a child node of the mdio bus.
With that other modifications were added, such as
relation with the ethernet port of the SoC. Adjust
e6000sw etherswitch and mvneta drivers to that.
Tested on Armada 3720 EspressoBin and Armada 388 Clearfog Pro boards.
Submitted by: Bert JW Regeer <xistence@0x58.com>
Reviewed by: manu
Differential Revision: https://reviews.freebsd.org/D19036
PIIX4_SMBHSTSTAT_ERR can be set for several reasons that, unfortunately,
cannot be distinguished, but the most typical case is a missing or hung
slave (SMB_ENOACK).
PIIX4_SMBHSTSTAT_FAIL means failed or killed / aborted transaction, so
it's previous mapping to SMB_ENOACK was not ideal.
After this change an smb(4) access to a missing slave results in ENXIO
rather than EIO. To me, that seems to be more appropriate.
MFC after: 3 weeks
This value was being used uninitialized, resulting in predictable issues
on systems with memory-mapped UART registers.
A case could be made that memmap_bus should be declared in a header
rather than being declared in each .c file which needs to refer to it,
but that's a broader style question.
This commit unbreaks hw.uart.console="mm:..." on ARM64.
Submitted by: Greg V
The "access width" value was hard-coded as 2, indicating 32-bit accesses;
instead, use the value specified in the SPCR table.
This unbreaks the console on EC2 "A1" family instances.
Submitted by: Greg V
No functional changes. Replace whitespace by tabs, indent with 4 spaces,
coalesce multi-line shorter than 80 characters,
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
The resource is already being activated in the bus_alloc_resource(),
because the flag RF_ACTIVE is being passed.
Double activation on arm64 is causing kernel panic.
Version of the driver was upgraded to 0.8.4.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Reported-by: Greg V <greg@unrelenting.technology>
Tested-by: cperciva, Greg V <greg@unrelenting.technology>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Differential revision: https://reviews.freebsd.org/D19655
as an NS8250 UART.
This is the same as the UART found in EC2 "bare metal" instances,
except that the card vendor shows up as 0x0000 rather than 0x1d0f.
This seems like a bug in the EC2 firmware; but we might as well support
it anyway.
Reported by: Greg V
Recent firmwares prefer to use a different format for viid internally
and this change allows them to do so.
MFC after: 1 week
Sponsored by: Chelsio Communications
From Krzysztof:
The driver built as KLD cannot be unloaded, if this flag is not set.
Submitted by: Krzysztof Galazka <krzysztof.galazka@intel.com>
Reviewed by: shurd@, erj@
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D19402
From Jake:
iflib_fl_setup calculates a suitable buffer size for the Rx mbufs based
on the isc_max_frame_size value that drivers setup. This calculation is
repeated by drivers when programming their hardware with the size of
each Rx buffer.
This can lead to a mismatch where the iflib mbuf size is different from
the expected size of the buffer as programmed by the hardware. This can
lead to unexpected results.
If iflib ever wants to support mbuf sizes larger than one page, every
driver must be updated to account for the new possible buffer sizes.
Fix this by calculating the mbuf size prior to calling IFDI_INIT, and
adding the iflib_get_rx_mbuf_sz function which will expose this value to
drivers, so that they do not repeat the same calculation.
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: shurd@, erj@
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D19489
Minimalistic PSCI implementation in U-Boot doesn't implement get_version()
method for some SoC. In this case, use PSCI version declared by 'psci' node
in DT as fallback.
MFC after: 2 weeks
- older DT can use 'cpu0-supply' property for power supply binding.
- don't expect that actual CPU frequency is contained in CPU
operational point table, but read current CPU voltage directly from
reguator. Typically, u-boot can set starting CPU frequency to any
value.
MFC after: 2 weeks
Some applications forward from/to host rings most or all the
traffic received or sent on a physical interface. In this
cases it is desirable to have more than a pair of RX/TX host
rings, and use multiple threads to speed up forwarding.
This change adds support for multiple host rings. On registering
a netmap port, the user can specify the number of desired receive
and transmit host rings in the nr_host_tx_rings and nr_host_rx_rings
fields of the nmreq_register structure.
MFC after: 2 weeks
Without this dependency relationship, the linker doesn't find the
flash_register_slicer() function, so kldload fails to load fdt_slicer.ko.
Discussed with: ian@
The forthcoming microcode update will fix a TSX bug by clobbering PMC3
when TSX instructions are executed (even speculatively). There is an
alternate mode where CPU executes all TSX instructions by aborting
them, in which case PMC3 is still available to OS. Any code that
correctly uses TSX must be ready to handle abort anyway.
Since it is believed that FreeBSD population of hwpmc(4) users is
significantly larger than the population of TSX users, switch the
microcode into TSX abort mode whenever a pmc is allocated, and back to
bug avoidance mode when the last pmc is deallocated.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
isci(4) uses deferred loading. Typically on amd64 and i386 non-PAE
the tag does not create any restrictions, but on i386 PAE-tables but
non-PAE configs callbacks might be used.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
writes are running. Some of the cases which are not handled properly in driver are:
1. With R1 fastpath supported, single write from CAM layer can consume 2 MPT frames
at driver/firmware level for fastpath qualification(if fw_outstanding < controller Queue Depth).
Due to this driver has to throttle IOs coming from CAM layer as well as second fastpath
write(of R1 write) against Adapter Queue Depth.
If "fw_outstanding" reaches to adapter queue depth, driver should return IOs from CAM layer with
device busy status.While allocating second MPT frame(corresponding to R1 FP write) also, driver
should ensure fw_outstanding should not exceed adapter QD.
2. For R1 fastpath writes completion, driver decrements "fw_oustanding" counter without
really returning MPT frame to free pool. It may cause IOs(with heavy IOs running, consuming whole
adapter Queue Depth) consuming MPT frames reserved for DCMDs(management commands) and
DCMDs(internal and sent by application) not getting MPT frame will start failing.
Below is one test case to hit the issue described above-
1. Run heavy IOs (outstanding IOs should hit adapter Queue Depth).
2. Run management tool (Broadcom's storcli tool) querying adapter in loop (run command- "storcli64 /c0 show" in loop).
3. Management tool's requests would start failing due to non-availability of free MPT frames as all frames would be consumed by IOs.
Fix: Increment/decrement of "fw_outstanding" counter should be in sync with MPT frame get/return.
Submitted by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed by: Kashyap Desai <Kashyap.Desai@broadcom.com>
Approved by: Ken
MFC after: 3 days
Sponsored by: Broadcom Inc
Make sure the enter and leave polling routines can be called multiple times
with same setting. Ignore setting polling or event mode twice. This fixes a
deadlock during shutdown if polling mode was already selected.
MFC after: 1 week
Sponsored by: Mellanox Technologies
The ESGL bit was left uninitialized when executing the REPORT LUNS
ioctl. This could allow a zeroed data buffer to be treated as a
scatter/gather list. The firmware would eventually walk past the end
of the data buffer, potentially find what looked like a valid
address/length pair, and write the result to semi-random memory.
Obtained from: Dell EMC Isilon
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D19398
The Command Reference Number (CRN) is part of the FC-Tape features
that we enable when talking to tape drives. It starts at 1, and
goes to 255 and wraps around to 1. There are a number of reset
type conditions that result in the CRN getting reset to 1. These
are detailed in section 4.10 (table 8) of the FCP-4r02b specification.
One of the conditions is when a PRLI (Process Login) is sent by
the initiator, and the Establish Image Pair bit is set in Word 0
of the PRLI.
Previously, the isp(4) driver core sent a notification via
isp_async() that the target had changed or stayed in place, but
there was no indication of whether a PRLI was sent and whether the
Establish Image Pair bit was set.
The result of this was that in some situations, notably
switching back and forth between a direct connection and a switch
connection to a tape drive, the isp(4) driver would fail to reset
the CRN in situations that require it according to the spec. When
the CRN isn't reset in a situation that requires it, the tape drive
then rejects every subsequent command that is sent to the drive.
It is assuming that the commands are being sent out of order.
So, modify the isp(4) driver to include Word 0 of the PRLI command
when it sends isp_async() notifications of target changes. Look at
the Establish Image Pair bit, and reset the CRN if that bit is set.
With this change, I am able to switch a tape drive back and forth
between a direct connection and a switch connection, and the isp(4)
driver resets the CRN when it should.
sys/dev/isp_stds.h:
Add bit definitions for PRLI Word 0.
sys/dev/ispmbox.h:
Add PRLI Word 0 to the port database type, isp_pdb_t.
sys/dev/ispvar.h
Add PRLI Word 0 to fcportdb_t.
sys/dev/isp.c:
Populate the new prli_word0 parameter in the port database.
In isp_pdb_add_update(), add a check to see if the
Establish Image Pair bit is set in PRLI Word 0. If it is,
then that is an additional reason to create a change
notification.
sys/dev/isp_freebsd.c:
In isp_async(), if the device changed or stayed, look at
PRLI Word 0 to see if the Establish Image Pair bit is set.
If it is, reset the CRN if we haven't already.
MFC after: 1 week
Sponsored by: Spectra Logic
Differential Revision: https://reviews.freebsd.org/D19472
Intel 3168 uses another EEPROM section to store channel flags;
port missing bits from iwlwifi to make it work.
PR: 230750, 236235
Tested by: Bert JW Regeer <xistence@0x58.com>
MFC after: 3 days
Also, pass control frames to the host while in MONITOR mode and / or
when promiscuous mode is enabled.
Tested with Netgear WG111 v3 (RTL8187B), STA / MONITOR modes.
MFC after: 2 weeks
- Alignment issues:
* Add missing __packed attributes + padding across all drivers; in
most places there was an assumption that padding will be always
minimally suitable; in few places - e.g., in urtw(4) / rtwn(4) -
padding was just missing.
* Add __aligned(8) attribute for all Rx radiotap headers since they can
contain 64-bit TSF timestamp; it cannot appear in Tx radiotap headers, so
just drop the attribute here. Refresh ieee80211_radiotap(9) man page
accordingly.
- Since net80211 automatically updates channel frequency / flags in
ieee80211_radiotap_chan_change() drop duplicate setup for these fields
in drivers.
Tested with Netgear WG111 v3 (urtw(4)), STA mode.
MFC after: 2 weeks
like this:
pqisrc_build_sgl() at pqisrc_build_sgl+0x8d/frame 0xfffffe009e8b7a00
pqisrc_build_raid_io() at pqisrc_build_raid_io+0x231/frame 0xfffffe009e8b7a40
pqisrc_build_send_io() at pqisrc_build_send_io+0x375/frame 0xfffffe009e8b7b00
pqi_request_map_helper() at pqi_request_map_helper+0x282/frame 0xfffffe009e8b7ba0
bus_dmamap_load_ccb() at bus_dmamap_load_ccb+0xd7/frame 0xfffffe009e8b7c00
pqi_map_request() at pqi_map_request+0x9b/frame 0xfffffe009e8b7c70
pqisrc_io_start() at pqisrc_io_start+0x55c/frame 0xfffffe009e8b7d50
smartpqi_cam_action() at smartpqi_cam_action+0xb8/frame 0xfffffe009e8b7de0
xpt_run_devq() at xpt_run_devq+0x30a/frame 0xfffffe009e8b7e40
xpt_action_default() at xpt_action_default+0x94b/frame 0xfffffe009e8b7e90
dastart() at dastart+0x33b/frame 0xfffffe009e8b7ee0
xpt_run_allocq() at xpt_run_allocq+0x1a2/frame 0xfffffe009e8b7f30
dastrategy() at dastrategy+0x71/frame 0xfffffe009e8b7f60
g_disk_start() at g_disk_start+0x351/frame 0xfffffe009e8b7fc0
g_io_request() at g_io_request+0x3cf/frame 0xfffffe009e8b8010
g_part_start() at g_part_start+0x120/frame 0xfffffe009e8b8090
g_io_request() at g_io_request+0x3cf/frame 0xfffffe009e8b80e0
zio_vdev_io_start() at zio_vdev_io_start+0x4b2/frame 0xfffffe009e8b8140
zio_execute() at zio_execute+0x17c/frame 0xfffffe009e8b8180
zio_nowait() at zio_nowait+0xc4/frame 0xfffffe009e8b81b0
vdev_queue_io_done() at vdev_queue_io_done+0x138/frame 0xfffffe009e8b81f0
zio_vdev_io_done() at zio_vdev_io_done+0x151/frame 0xfffffe009e8b8220
zio_execute() at zio_execute+0x17c/frame 0xfffffe009e8b8260
taskqueue_run_locked() at taskqueue_run_locked+0x10c/frame 0xfffffe009e8b82c0
taskqueue_thread_loop() at taskqueue_thread_loop+0x88/frame 0xfffffe009e8b82f0
fork_exit() at fork_exit+0x84/frame 0xfffffe009e8b8330
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe009e8b8330
Reviewed by: deepak.ukey_microsemi.com, sbruno
MFC after: 2 weeks
Sponsored by: Klara Inc.
Differential Revision: https://reviews.freebsd.org/D19470
trying to use disk_add_alias() to make spi* an alias for mx25l*. It turns
out disk_add_alias() works for partitions, but not slices, and that's hard
to fix.
This change is, in effect, a partial revert of r344526.
The mips world relies on the existence of flashmap names formatted as
/dev/flash/spi0s.name, whereas pretty much nothing relies on at45d devices
using the /dev/spi* names (because until recently the at45d driver didn't
even work reliably). So this change makes mx25l devices the sole owner of
the /dev/flash/spi* namespace, which actually makes some sense because it is
a SpiFlash(tm) device, so flash/spi isn't a horrible name.
Reported by: Mori Hiroki <yamori813@yahoo.co.jp>
retries.
When resetting the controller, we abort I/O. Prior to this fix, we
printed a ton of abort messages for I/O that we're going to
retry. This imparts no useful information. Stop printing them unless
our retry count is exhausted. Clarify code for when we don't retry,
and remove useless arg to a routine that's always called with it
as 'true'. All the other debug is still printed (including multiple
reset messages if we have multiple timeouts before the taskqueue
runs the actual reset) so that we know when we reset.
Reviewed by: jimharris@, chuck@
Differential Revision: https://reviews.freebsd.org/D19431
mlx4_en_stop_port() calls mlx4_en_put_qp() which can refer the link level
address of the network interface, which in turn will be freed by the
network interface detach function. Make sure the port is stopped
before detaching the network interface.
MFC after: 1 week
Sponsored by: Mellanox Technologies
It can happen during shutdown that the lock will recurse when the mlx4en(4)
instance is part of a lagg interface. Call ether_ifdetach() unlocked.
Backtrace:
panic(): _sx_xlock_hard: recursed on non-recursive sx &mdev->state_lock
_sx_xlock_hard()
_sx_xlock()
mlx4_en_ioctl()
if_setlladdr()
lagg_port_destroy()
lagg_port_ifdetach()
if_detach()
mlx4_en_destroy_netdev()
mlx4_en_remove()
mlx4_remove_device()
mlx4_unregister_device()
mlx4_unload_one()
mlx4_shutdown()
linux_pci_shutdown()
bus_generic_shutdown()
MFC after: 1 week
Sponsored by: Mellanox Technologies
Chacha20 with a 256 bit key and 128 bit counter size is a good match for an
AES256-ICM replacement.
In userspace, Chacha20 is typically marginally slower than AES-ICM on
machines with AESNI intrinsics, but typically much faster than AES on
machines without special intrinsics. ChaCha20 does well on typical modern
architectures with SIMD instructions, which includes most types of machines
FreeBSD runs on.
In the kernel, we can't (or don't) make use of AESNI intrinsics for
random(4) anyway. So even on amd64, using Chacha provides a modest
performance improvement in random device throughput today.
This change makes the stream cipher used by random(4) configurable at boot
time with the 'kern.random.use_chacha20_cipher' tunable.
Very rough, non-scientific measurements at the /dev/random device, on a
GENERIC-NODEBUG amd64 VM with 'pv', show a factor of 2.2x higher throughput
for Chacha20 over the existing AES-ICM mode.
Reviewed by: delphij, markm
Approved by: secteam (delphij)
Differential Revision: https://reviews.freebsd.org/D19475
* The ani function bitmap was being badly used when determining if a command
could be used. In hostap modes only a couple of the ANI control parameters
are enabled.
* The ani function bitmap was not being reset to HAL_ANI_ALL if transitioning
from AP -> STA.
* Change mrcCckOff to mrcCck - 1 == on, rather than 1 == off. This matches
the API used to set the value from userland via the diagnostic API.
* Handle OFDM/CCK noise immunity level commands in ar9300_ani_control().
These will only come from userland and it will go and program the rest of
the ANI control parameters with the values in the ANI table.
* Ensure all of the ANI parameters can be tweaked at runtime, even if they're
disabled.
Tested:
* carambola2 (AR9331), STA/AP modes
From Jake:
"The iflib_fl_setup() function tries to pick various buffer sizes based
on the max_frame_size value defined by the parent driver. However, this
code was wrapped under CONTIGMALLOC_WORKS, which was never actually
defined anywhere.
This same code pattern was used in if_em.c, likely trying to match
what iflib uses.
Since CONTIGMALLOC_WORKS is not defined, remove this dead code from
iflib_fl_setup and if_em.c
Given that various iflib drivers appear to be using a similar
calculation, it might be worth making this buffer size a value that the
driver can peek at in the future."
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: shurd@
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D19199
I'm trying to debug why reception upstairs here is so terrible and it
turns out ANI is buggy. (Which is no surprise, ANI is always buggy.)
Tested:
* Carambola2 (AR9331), STA/AP modes
- Fix data frames transmission via POWER_STATUS register setup -
it seems to be set by MACID_CONFIG firmware command, which was broken*
in r290439 and later disabled in r307529.
We can re-enable it later if / when firmware rate adaptation will be
ready; however, this step will be required anyway - for firmware-less
builds.
- Force RTS / CTS protection frame rate to CCK1 (this rate works fine
without any additional setup; no better workaround is known yet).
The problem was not observed on the channel 1 or with CCK1 rate enforced
('ifconfig wlan0 ucastrate 1' for 11 b/g; not possible for 11n networks
due to ifconfig(8) bug).
* I'm not sure if it works before r290439 because - AFAIR - I never seen
firmware rate adaptation working for 10-STABLE urtwn(4)
(It needs EN_BCN bit set and RSSI updates at least).
Tested with RTL8188CUS in STA mode
(in regular mode and with disabled MRR - DARFRC*8 is set to 0)
PR: 233949
MFC after: 2 weeks
FDT data. The sector size must be a multiple of the device's page size.
If not configured, use the historical default of the device page size.
Setting the disk sector size to 512 or 4096 allows a variety of standard
filesystems to be used on the device. Of course you wouldn't want to be
writing frequently to a SPI flash chip like it was a disk drive, but for
data that gets written once (or rarely) and read often, using a standard
filesystem is a nice convenient thing.
some #define'd names to be more descriptive. When reporting a post-write
compare failure, report the page number, not the byte address of the page.
The latter is the only functional change, it makes the number match the
words of the error message.
This is especially important for writes. SPI is inherently a bidirectional
bus; you receive data (even if it's garbage) while writing. We should not
receive that data into the same buffer we're writing to the device.
When reading it doesn't matter what we send to the device, but using the
dummy buffer for that as well is pleasingly symmetrical.
As a step towards adding other potential streaming ciphers. As well as just
pushing the loop down into the rijndael APIs (basically 128-bit wide AES-ICM
mode) to eliminate some excess explicit_bzero().
No functional change intended.
Reviewed by: delphij, markm
Approved by: secteam (delphij)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D19411
SGE_QSETS is an upper bound -- fewer qsets may be allocated depending on
the number of CPUs.
Reviewed by: markj, np, vangyzen
X-MFC-With: r333288
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D17274
Specifically, ccr(4) devices are also children of cxgbe nexus devices.
Rather than making assumptions about the child device's softc, walk
the list of ports from the nexus' softc to determine if a child is a
port in t4_child_location_str(). This fixes a panic when detaching a
ccr device.
Reviewed by: np
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D19399
This was actually the known good configuration we used before.
Single MSI-X configuration doesn't even work there on my tests, just due
to lack of documentation not sure whether by design or I am doing something
wrong.
PR: 233654
MFC after: 1 week
These are 4Gb/s and pretty old and slow now, so I see no reason to fight
for their performance over stability.
PR: 233654
MFC after: 1 week
Sponsored by: iXsystems, Inc.
There are some problem reports possibly related to the new driver use of
multiple interrupts on older cards. Hopefully this allow to workaround
them.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
r344162 exposed a bug in one of ixgbe's interrupt filters; they are never
supposed to return 0. Fix the interrupt filter to return the proper nonzero
return value.
Reported by: Oleg Ginzburg <olevole@olevole.ru>
MFC after: 1 week
Sponsored by: Intel Corporation
supporting older kernels. However, all supported versions of FreeBSD
have unmapped I/Os (as do several that have gone EOL), remove it. It's
unlikely the driver would work on the older kernels anyway at this
point.
supported in years. A number of changes have been made to the driver
that likely wouldn't work on those older versions that aren't properly
ifdef'd and it's project policy to GC such code once it is stale.
Marvell XHCI is in fact generic-xhci, so move the driver and
add the compatible string.
While here, get and enable the phy if the dtb provide one.
The xhci bindings state that phys should be in a 'phys' property but
Marvell DTS uses 'usb-phy', only add support for 'usb-phy' for now.
Sponsored-by: Rubicon Communications, LCC ("Netgate")
This is a "fake" phy that handle regulator, clocks and reset gpio.
Only clock and regulator is supported for now.
Sponsored-by: Rubicon Communications, LCC ("Netgate")
This is the common denominator for rockchip compatible from RK3288 to RK3399.
The other compatible are generally present in the DTS but the controllers
are the same.
MFC after: 1 week
devicetree/bindings/mtd/partition.txt.
In the old style, all the children of the device node which did not have a
compatible property were the partitions. In the new style, there is a child
node of the device which has a compatible string of "fixed-partitions", and
its children are the individual partitions.
Also, support the read-only property by setting the corresponding slice flag.
unwrapping multiple lines of code. Also, convert some short multiline
comments into single-line comments. Change old-school FALSE to false.
All in all, no functional changes, it's just more compact and readable.
Embedded lzma decompression library becomes a module usable by other
consumers, in addition to geom_uzip.
Most important code changes are
- removal of XZ_DEC_SINGLE define, we need the code to work
with XZ_DEC_DYNALLOC;
- xz_crc32_init() call is removed from geom_uzip, xz module handles
initialization on its own.
xz is no longer embedded into geom_uzip, instead the depend line for
the module is provided, and corresponding kernel option is added to
each MIPS kernel config file using geom_uzip.
The commit also carries unrelated cleanup by removing excess "device geom_uzip"
in places which were missed in r344479.
Reviewed by: cem, hselasky, ray, slavash (previous versions)
Sponsored by: Mellanox Technologies
Differential revision: https://reviews.freebsd.org/D19266
MFC after: 3 weeks
Non-x86 arches use an inconsistently named header for the file containing
"pc" attributes, and the ifdef messes to include the right header were out
of date in the 2 files that I added to the MI files list.
Only amd64, arm, i386, mips, powerpc and sparc64 are supposed to support
syscons. Only arm and mips were out of date in the ifdef. Test
coverage for of syscons in arm is broken (turned off) in NOTES, but
syscons is in some other arm config files which universe detects as broken.
arm64 and riscv remain broken due to the opposite bug of not turning off
sc in NOTES, the same as before r344458 (see r344443).
The header is MD to contain possibly-non-"pc" encodings of attributes, but
since the attributes are essentially virtual in graphics mode and non-x86
arches only support graphics mode, the header has always been the same on
all arches except for different style bugs, so there should be only 1 MI
copy of it for syscons' use. It was used in pcvt and still gives an an
API and an ABI, so it should be public and MI near or in sys/consio.h.
Both SpiFlash (mx25l) and DataFlash (at45d) drivers create a disk device
with a name of /dev/flash/spiN where N is the driver's unit number. If
both types of devices are present in the same system, this creates a fatal
conflict that prevents attachment of whichever device attaches second
(because mx25l0 and at45d0 both try to create a spi0).
This gives each type of device a unique name (mx25lN or at45dN respectively)
and also adds an alias of spiN for compatibility. When both device types
appear in the same system, only the first to attach gets the spiN alias.
When the second device attaches there is a non-fatal warning that the alias
can't be created, but both devices are still accessible via their primary
names (and there is no need for the spiN name to work for backwards
compatibility on such a system, because it has never been possible to use
the spiN names when both devices exist).
jedec ID as its older cousin the AT45DB642D, but uses a different page size.
The only way to distinguish between the two chips is that the 2D chip has
0 bytes of extended ID info and the new 1E has 1 byte of extended ID. The
actual value of the extended ID byte is all zeroes. In other words, it's
the presence of the extended info that identifies this chip. (Presumably
a future upgrade might define non-zero values for the extended ID byte.)
- Do not use nvf = 4 as it is not really supported by the firmware.
Firmwares 1.23.3.0 and above will ignore it silently.
- Increase PF4's share of the VIs and let it use all of the RSS table.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
to match a chip to our table of metadata describing the chips. At least one
new DataFlash chip has a 3-byte jedec ID identical to its predecessors and
differs only in the extended info, and it has different metadata requiring a
unique entry in the table. This paves the way for supporting such chips.
The metadata table now includes two new fields, extmask and extid. The two
bytes of extended info obtained from the chip are ANDed with extmask then
compared to extid, so it's possible to use only a subset of the extended
info in the matching.
We now always read 6 bytes of jedec ID info. Most chips don't return any
extended info, and the values read back for those two bytes may be
indeterminate, but such chips have extmask and extid values of 0x0000 in the
table, so the extid effectively doesn't participate in the matching on those
chips and it doesn't matter what they return in the extended info bytes.
This fixes a panic during configuration if the tx channel of a port
isn't the same as its port id.
Reported by: Fabrice Bruel
MFC after: 1 week
Sponsored by: Chelsio Communications
in the delayed attach to use early returns, which allows reducing the level
of indentation. So all in all, what looks like a lot of changes is really
no change in behavior, mostly just moving whitespace around.
A big security advantage of Wayland is not allowing applications to read
input devices all the time. Having /dev/input/* accessible to the user
account subverts this advantage.
libudev-devd was opening the evdev devices to detect their types (mouse,
keyboard, touchpad, etc). This don't work if /dev/input/* is inaccessible.
With the kernel exposing this information as sysctls (kern.evdev.input.*),
we can work w/o /dev/input/* access, preserving the Wayland security model.
Submitted by: Greg V <greg@unrelenting.technology>
Reviewed by: wulf, imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D18694
Add support for simple NVDIMM v1.2 namespaces from the UEFI
version 2.7 specification. The combination of NVDIMM regions and
labels can lead to a wide variety of namespace layouts. Here we
support a simple subset of namespaces where each NVDIMM SPA range
is composed of a single region per member dimm.
Submitted by: D Scott Phillips <d.scott.phillips@intel.com>
Discussed with: kib
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D18736
When attaching to NVDIMM devices, read and verify the namespace
labels from the special namespace label storage area. A later
change will expose NVDIMM namespaces derived from this label data.
Submitted by: D Scott Phillips <d.scott.phillips@intel.com>
Discussed with: kib
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D18735
Separate code for exposing a device backed by a system physical
address range away from the NVDIMM spa code. This will allow a
future patch to add support for NVDIMM namespaces while using the
same device code.
Submitted by: D Scott Phillips <d.scott.phillips@intel.com>
Reviewed by: bwidawsk
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D18736
A pointer is first tested for NULL. If non-NULL, another pointer is
set equal to the first. The second pointer is then checked for NULL
and an error path taken if so. This second test and the associated
path is dead code as the pointer value, having just been checked for
NULL, cannot be NULL at this point. Remove the dead code.
Reported by: Coverity
Reviewed by: daniel.william.ryan_gmail.com, vangyzen
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D19165
First remove ifdefs of the unsupported option SC_DUMB_TERMINAL which
prevented building using both in the same kernel and broke regression
tests. This option will be replaced by per-emulator supported options.
The dumb emulator rotted with KSE in r83366, but usually compiled since
it is ifdefed to nothing unless SC_DUMB_TERMINAL is defined. The type
of an unused function parameter changed.
Both emulators rotted when 2 new methods were added while the emulators
were removed. Only null methods are needed, but null function pointers
give panics instead.
The wildcard in the default for the unsupported option SC_DFLT_TERM
never really worked. It tends to prefer the dumb emulator when multiple
emulators are configured. Change it to prefer scteken for compatibility.
- Do not explicitly count active descriptors. It allows hardware reset
to happen while device is still referenced, plus simplifies locking.
- Do not stop/start callout each time the queue becomes empty. Let it
run to completion and rearm if needed, that is much cheaper then to touch
it every time, plus also simplifies locking.
- Decouple submit and cleanup locks, making driver reentrant.
- Avoid memory mapped status register read on every interrupt.
- Improve locking during device attach/detach.
- Remove some no longer used variables.
Reviewed by: cem
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D19231
Its a hack, we can't know/list all DMA engines, but this covers all
I/OAT of Xeon E5/E7 at least from Sandy Bridge till Skylake I saw.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
will be committed later.
The "sc" emulator has the advantages of full support for cons25 and running
about 8 times faster than teken (for writing to the frame buffer).
The "dumb" emulator has the advantage of being simple.
Runtime choice of the emulator is good, but compile time choice is bad.
Per discussions on arch@ and elsewhere, the maintenance of this code
has moved to the drm-kmod and drm-legacy-kmod ports. Remove the i915
and radeon drivers from the tree.
Approved by: graphics team
Reviewed by: manu@, mmel@
Differential Revision: https://reviews.freebsd.org/D19196
Retire the drm modules / drivers. These are now handled by the
drm-legacy-kmod port and/or the drm-kmod port. All future
development and maintanace will be handled there.
Approved by: graphics team
Reviewed by: manu@, mmel@
Differential Revision: https://reviews.freebsd.org/D19196
This change adds a counter (kqueue_users) to keep track of how many
kqueue users are referencing a given struct nm_selinfo.
In this way, nm_os_selwakeup() can schedule the kevent notification
task only when kqueue is actually being used.
This is important to avoid wasting CPU in the common case where
kqueue is not used.
Reviewed by: Aleksandr Fedorov <aleksandr.fedorov@itglobal.com>
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D19177
In out of order mode Rx buffer are accesses by req_id.
Accessing and validating mbuf using ntc is causing false error.
Increase driver revision after latest RX OOO completion fixes.
Submitted by: Rafal Kozik <rk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
MFC after: 1 week
Requested ID should be validated when the packet is received and not
when the driver is repopulating the mbufs.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
MFC after: 1 week
Don't use a struct if_irq for IFLIB_INTR_IOV type interrupts since that results
in get_core_offset() being called on them, and get_core_offset() doesn't
handle IFLIB_INTR_IOV type interrupts, which results in an assert() being triggered
in iflib_irq_set_affinity().
PR: 235730
Reported by: Jeffrey Pieper <jeffrey.e.pieper@intel.com>
MFC after: 1 day
Sponsored by: Intel Corporation
When pci_realloc_bars was first added, the intention was to eventually
enable it by default, but it was left disabled to preserve existing
behavior. The setting is pretty conservative in that it does not
attempt to allocate resources for BARs that the BIOS/firmware leaves
disabled. It only attempts to reallocate resources for a BAR that the
firmware programmed during boot but that conflicts with another
resource during the kernel's device scan.
PR 221350 is an example of a machine that this knob fixes.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D18965
The hardcoded ident is exactly 20 bytes long but sprintf adds terminating zero,
so there is one byte written out of array bounds.As a fix use strncpy it
appends \0 only if space allows and its behavior matches virtio spec:
When VIRTIO_BLK_T_GET_ID is issued, the device identifier, up to 20 bytes, is
written to the buffer. The identifier should be interpreted as an ascii string.
It is terminated with \0, unless it is exactly 20 bytes long.
PR: 202298
Reviewed by: br
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D18852
r241119 that's performed globally by device_attach(9).
- As for the EM-class of devices, em(4) supports multiple queues
and MSI-X respectively only with 82574 devices. However, since
the conversion to iflib(4), em(4) relies on the interrupt type
fallback mechanism, i. e. MSI-X -> MSI -> INTx, of iflib(4) to
figure out the interrupt type to use for the EM-class (as well
as the IGB-class) of MACs. Moreover, despite the datasheet for
82583V not mentioning any support of MSI-X, there actually are
82583V devices out there that report a varying number of MSI-X
messages as supported. The interrupt type fallback of iflib(4)
is causing two failure modes depending on the actual number of
MSI-X messages supported for such instances of 82583V:
1) With only one MSI-X message supported, none is left for the
RX/TX queues as that one message gets assigned to the admin
interrupt. Worse, later on - which will be addressed with a
separate fix - iflib(4) interprets that one messages as MSI
or INTx to be set up, but fails to actually do so as it has
previously called pci_alloc_msix(9). [1, 2]
2) With more message supported, their distribution is okay but
then em_if_msix_intr_assign() doesn't work for 82583V, with
the interface being left in a non-working state, too. [3]
Thus, let em_if_attach_pre() indicate to iflib(4) to try MSI-X
with 82574 only, and at most MSI for the remainder of EM-class
devices.
While at it, remove "try_second_bar" as it's polarity inverted
and not actually needed.
- Remove code from em_if_timer() that effectively is a NOP since
the conversion to iflib(4) ("trigger" is no longer read).
While at it, let the comment for em_if_timer() reflect reality
after said conversion.
- Implement an ifdi_watchdog_reset method which only updates the
em(4) "watchdog_events" counter but doesn't perform any reset,
so that the em(4) "watchdog_timeouts" SYSCTL (iflib(4) doesn't
provide a counterpart) reflects reality and these timeouts add
to IFCOUNTER_OERRORS again after the iflib(4) conversion.
- Remove the "mbuf_defrag_fail" and "tx_dma_fail" SYSCTLS; since
the iflib(4) conversion, associated counters are disconnected,
but iflib(4) provides "mbuf_defrag_failed" and "tx_map_failed"
respectively as equivalents.
- Move the description preceding lem_smartspeed() to the correct
spot before em_reset() and bring back appropriate comments for
{igb,em}_initialize_rss_mapping() and lem_smartspeed() lost in
the iflib(4) conversion.
- Adapt some other function descriptions and INIT_DEBUGOUT() use
to match reality after the iflib(4) conversion.
- Put the debugging message of em_enable_vectors_82574() (missed
in r343578) under bootverbose, too.
PR: 219428 [1], 235246 [2], 235147 [3]
Reviewed by: erj (previous version)
Differential Revision: https://reviews.freebsd.org/D19108
When configured with more tx queues than rx queues,
em_if_msix_intr_assign() was incorrectly routing the tx event
interrupts.
Reviewed by: erj, marius
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D19070
Use the information from IORT parsing to translate the PCI RID to
GIC ITS device ID. And similarly, use the information to find the
PIC XREF identifier to be used for PCI devices.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D18004
Add new file arm64/acpica/acpi_iort.c to support the "IO Remapping
Table" (IORT). The table is specified in ARM document "ARM DEN 0049D"
titled "IO Remapping Table Platform Design Document". The IORT table
has information on the associations between PCI root complexes, SMMU
blocks and GIC ITS blocks in the system.
The changes are to parse and save the information in the IORT table.
The API to use this information is added to sys/dev/acpica/acpivar.h.
The acpi_iort.c also has code to check the GIC ITS nodes seen in the
IORT table with corresponding entries in MADT table (for validity)
and with entries in SRAT table (for proximity information).
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D18002
There are few places in interrupt handler where the driver
lock is dropped; ensure that device is still running before
processing remaining ring entries.
PR: 192641
MFC after: 5 days
when TEKEN_CONS25 is configured. Fix this by adding a function to
set the flag that enables the fix and always calling this function
for syscons.
Expand the man page for teken_set_cons25(). This function is not
very useful since it can only set but not clear 1 flag. In practice,
it is only used when TEKEN_CONS25 is configured and all that does is
choose the the default emulation for syscons at compile time.
Changelist:
- Replace ND, D and RD macros with nm_prdis, nm_prinf, nm_prerr
and nm_prlim, to avoid possible naming conflicts.
- Add netmap_krings_mode_commit() helper function and use that
to reduce code duplication.
- Refactor pipes control code to export some functions that
can be reused by the veth driver (on Linux) and epair(4).
- Add check to reject API requests with version less than 11.
- Small code refactoring for the null adapter.
MFC after: 1 week
Use recent best practices for Copyright form at the top of
the license:
1. Remove all the All Rights Reserved clauses on our stuff. Where we
piggybacked others, use a separate line to make things clear.
2. Use "Netflix, Inc." everywhere.
3. Use a single line for the copyright for grep friendliness.
4. Use date ranges in all places for our stuff.
Approved by: Netflix Legal (who gave me the form), adrian@ (pmc files)
Add SYNC_KLOOP_MODE option, and add support for direct mode, where application
executes the TXSYNC and RXSYNC in the context of the ioeventfd wake up callback.
MFC after: 5 days
When in MSI mode, the device was only being configured with one
interrupt index, but it needs two - one for the actual interrupt and
one to park the tx queue at.
Also clarified comments relating to interrupt index assignment.
Reported by: Yuri Pankov <yuripv@yuripv.net>
MFC after: 1 day
"slow" interrupt handler:
- Expand the list of INT_CAUSE registers known to the driver.
- Add decode information for many more bits but decouple it from the
rest of intr_info so that it is entirely optional.
- Call t4_fatal_err exactly once, and from the top level PL intr handler.
t4_fatal_err:
- Use t4_shutdown_adapter from the common code to stop the adapter.
- Stop servicing slow interrupts after the first fatal one.
Driver/firmware interaction:
- CH_DUMP_MBOX: note whether the mailbox being dumped is a command or a
reply or something else.
- Log the raw value of pcie_fw for some errors.
- Use correct log levels (debug vs. error).
Sponsored by: Chelsio Communications
Not all child devices of the NVDIMM root device represent DIMM devices
which are present in the system. The spec says (ACPI 6.2, sec 9.20.2):
For each NVDIMM present or intended to be supported by platform,
platform firmware also exposes an NVDIMM device ... under the
NVDIMM root device.
Present NVDIMM devices are found by walking all of the NFIT table's
SPA ranges, then walking the NVDIMM regions mentioned by those SPA
ranges.
A set of NFIT walking helper functions are introduced to avoid the
need to splat the enumeration logic across several disparate
callbacks.
Submitted by: D Scott Phillips <d.scott.phillips@intel.com>
Sponsored by: Intel Corporation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D18439
Move the enumeration of NVDIMM SPA ranges from the spa GEOM class
initializer into the NVDIMM root device. This will be necessary for a
later change where NVDIMM namespaces require NVDIMM device enumeration
to be reliably ordered before SPA enumeration.
Submitted by: D Scott Phillips <d.scott.phillips@intel.com>
Sponsored by: Intel Corporation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D18734
This patch and commit message are based on r340256 created by Jacob Keller:
The iflib stack does not disable TSO automatically when TXCSUM is
disabled, instead assuming that the driver will correctly handle TSOs
even when CSUM_IP is not set.
This results in iflib calling ixgbe_isc_txd_encap with packets which have
CSUM_IP_TSO, but do not have CSUM_IP or CSUM_IP_TCP set. Because of
this, ixgbe_tx_ctx_setup will not setup the IPv4 checksum offloading.
This results in bad TSO packets being sent if a user disables TXCSUM
without disabling TSO.
Fix this by updating the ixgbe_tx_ctx_setup function to check both
CSUM_IP and CSUM_IP_TSO when deciding whether to enable checksums.
Once this is corrected, another issue for TSO packets is revealed. The
driver sets IFLIB_NEED_ZERO_CSUM in order to enable a work around that
causes the ip->sum field to be zero'd. This is necessary for ix
hardware to correctly perform TSOs.
However, if TXCSUM is disabled, then the work around is not enabled, as
CSUM_IP will not be set when the iflib stack checks to see if it should
clear the sum field.
Fix this by adding IFLIB_TSO_INIT_IP to the iflib flags for the ix and
ixv interface files.
Once both of these changes are made, the ix and ixv drivers should
correctly offload TSO packets when TSO offload is enabled, regardless
of whether TXCSUM is enabled or disabled.
Submitted by: Piotr Pietruszewski <piotr.pietruszewski@intel.com>
Reviewed by: IntelNetworking
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D18470
From Piotr:
This patch introduces adapter->task_requests register responsible for
recording requests for mod_task, msf_task, mbx_task, fdir_task and
phy_task calls. Instead of enqueueing these tasks with
GROUPTASK_ENQUEUE, handlers will be called directly from
ixgbe_if_update_admin_status() while holding ctx lock.
SIOCGIFXMEDIA ioctl() call reads adapter->media list. The list is
deleted and rewritten in ixgbe_handle_msf() task without holding ctx
lock. This change is needed to maintain data coherency when sharing
adapter info via ioctl() calls.
Patch co-authored by Krzysztof Galazka <krzysztof.galazka@intel.com>.
PR: 221317
Submitted by: Piotr Pietruszewski <piotr.pietruszewski@intel.com>
Reviewed by: sbruno@, IntelNetworking
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D18468
iflib is already a module, but it is unconditionally compiled into the
kernel. There are drivers which do not need iflib(4), and there are
situations where somebody might not want iflib in kernel because of
using the corresponding driver as module.
Reviewed by: marius
Discussed with: erj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D19041
copyright.
When all member nations of the Buenos Aires Convention adopted the Berne
Convention, the phrase "All rights reserved" became unnecessary to assert
copyright. Remove it from files under my or Panasas's copyright. The files
related to jedec_dimm(4) also bear avg@'s copyright; he has approved this
change.
Approved by: avg
Sponsored by: Panasas
When using poll(), select() or kevent() on netmap file descriptors,
netmap executes the equivalent of NIOCTXSYNC and NIOCRXSYNC commands,
before collecting the events that are ready. In other words, the
poll/kevent callback has side effects. This is done to avoid the
overhead of two system call per iteration (e.g., poll() + ioctl(NIOC*XSYNC)).
When the kqueue subsystem invokes the kqueue(9) f_event callback
(netmap_knrw), it holds the lock of the struct knlist object associated
to the netmap port (the lock is provided at initialization, by calling
knlist_init_mtx).
However, netmap_knrw() may need to wake up another netmap port (or even
the same one), which means that it may need to call knote().
Since knote() needs the lock of the struct knlist object associated to
the to-be-wake-up netmap port, it is possible to have a lock order reversal
problem (AB/BA deadlock).
This change prevents the deadlock by executing the knote() call in a
per-selinfo taskqueue, where it is possible to hold a mutex.
Reviewed by: aleksandr.fedorov_itglobal.com
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D18956
bus_teardown_intr(9) before pci_release_msi(9).
- Ensure that iflib(4) and associated drivers pass correct RIDs to
bus_release_resource(9) by obtaining the RIDs via rman_get_rid(9)
on the corresponding resources instead of using the RIDs initially
passed to bus_alloc_resource_any(9) as the latter function may
change those RIDs. Solely em(4) for the ioport resource (but not
others) and bnxt(4) were using the correct RIDs by caching the ones
returned by bus_alloc_resource_any(9).
- Change the logic of iflib_msix_init() around to only map the MSI-X
BAR if MSI-X is actually supported, i. e. pci_msix_count(9) returns
> 0. Otherwise the "Unable to map MSIX table " message triggers for
devices that simply don't support MSI-X and the user may think that
something is wrong while in fact everything works as expected.
- Put some (mostly redundant) debug messages emitted by iflib(4)
and em(4) during attachment under bootverbose. The non-verbose
output of em(4) seen during attachment now is close to the one
prior to the conversion to iflib(4).
- Replace various variants of spelling "MSI-X" (several in messages)
with "MSI-X" as used in the PCI specifications.
- Remove some trailing whitespace from messages emitted by iflib(4)
and change them to consistently start with uppercase.
- Remove some obsolete comments about releasing interrupts from
drivers and correct a few others.
Reviewed by: erj, Jacob Keller, shurd
Differential Revision: https://reviews.freebsd.org/D18980
Effectively all i386 kernels now have two pmaps compiled in: one
managing PAE pagetables, and another non-PAE. The implementation is
selected at cold time depending on the CPU features. The vm_paddr_t is
always 64bit now. As result, nx bit can be used on all capable CPUs.
Option PAE only affects the bus_addr_t: it is still 32bit for non-PAE
configs, for drivers compatibility. Kernel layout, esp. max kernel
address, low memory PDEs and max user address (same as trampoline
start) are now same for PAE and for non-PAE regardless of the type of
page tables used.
Non-PAE kernel (when using PAE pagetables) can handle physical memory
up to 24G now, larger memory requires re-tuning the KVA consumers and
instead the code caps the maximum at 24G. Unfortunately, a lot of
drivers do not use busdma(9) properly so by default even 4G barrier is
not easy. There are two tunables added: hw.above4g_allow and
hw.above24g_allow, the first one is kept enabled for now to evaluate
the status on HEAD, second is only for dev use.
i386 now creates three freelists if there is any memory above 4G, to
allow proper bounce pages allocation. Also, VM_KMEM_SIZE_SCALE changed
from 3 to 1.
The PAE_TABLES kernel config option is retired.
In collaboarion with: pho
Discussed with: emaste
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D18894
This fixes BIO_ORDERED semantics while also improving performance by:
- sleeping also before BIO_ORDERED bio, as defined, not only after;
- not queueing BIO_ORDERED bio to taskqueue if no other bios running;
- waking up sleeping taskqueue explicitly rather then rely on polling.
On Samsung SSD 970 PRO this shows sync write latency, measured with
`diskinfo -wS`, reduction from ~2ms to ~1.1ms by not sleeping without
reason till next HZ tick.
On the same device ZFS pool with 8 ZVOLs synchronously writing 4KB blocks
shows ~950 IOPS instead of ~750 IOPS before. I suspect ZFS does not need
BIO_ORDERED on BIO_FLUSH at all, but that will be next question.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Because of a typo, the code was mistakenly resetting the
vtnrx_vq pointer rather than vtntx_tq.
Reviewed by: bryanv
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D19015
On sync-kloop stop, send a wake-up signal to the kloop, so that
waiting for the timeout is not needed.
Also, improve logging in netmap_freebsd.c.
MFC after: 3 days
Enforce net80211 rates for control / management / multicast / EAPOL frames
and allow to override rate for unicast frames via ifconfig(8) 'ucastrate'
option; by default it still uses f/w rate adaptation for unicast frames.
MFC after: 1 week
* There's no reason to have a while() loop here, because:
- if msleep returns 0, that means we were woken up by the interrupt handler,
and we are going to exit immediately as sc_fw_chunk_done will now be 1
(there is nothing else that sleeps on sc_fw.)
- if msleep doesn't return 0 (i.e. it returned ETIMEDOUT) then we will
exit immediately because of the if-test.
So, just use a single msleep() and then check sc_fw_chunk_done as before.
* The comment said we were sleeping for 5 seconds, but the msleep was only
for 1. Before r314065, this was 1 second and so was the comment,
and in that commit the comment was changed and the function call wasn't.
Possibly fixes failures to initialize uCode on certain devices.
Submitted by: Augustin Cavalier (waddlesplash gmail.com)
Obtained from: Haiku 132990ecdcb072f2ce597b5d497ff3e5b1f09c20
MFC after: 10 days
Wrap ieee80211_add_channel_list_2ghz into another function
which supplies default (1-14) channel list to it and drop
its copies from drivers.
Checked with RTL8188EE, country US / JP / KR / UA.
MFC after: 2 weeks
- Drain offload transmit queues when RATELIMIT is enabled but
TCP_OFFLOAD is not.
- Expose the per-VI nofldtxq and first_ofld_txq sysctls when
RATELIMIT is enabled but TCP_OFFLOAD is not.
- Clear offload transmit queue stats as part of a 'cxgbetool clearstats'
request when RATELIMIT is enabled but TCP_OFFLOAD is not.
Reviewed by: np
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D18966
Replace in-place implementation with system-wide one; since it
guarantees non-zero result drop all less-than-one checks from
drivers and net80211.
MFC after: 2 weeks
* This hopefully avoids some firmware panics, I was occasionally seeing,
when iwm disconnects upon losing signal to an access point at some point.
* This is synchronizing the if_iwm_time_event.c file a bit more from the
corresponding Linux iwlwifi/mvm/time-event.c.
Taken-From: Linux iwlwifi
Submitted by: Augustin Cavalier <waddlesplash@gmail.com> (Haiku)
Obtained from: DragonFlyBSD (e8cb71584a6a72232c13151d60e57f7f229220eb)
* This is a mix of the OpenBSD Git 7fd9664469d1b717a307eebd74aeececbd3c41cc
change, and syncing with the Linux iwlwifi code.
Taken-From: Linux iwlwifi, and OpenBSD
Submitted by: Augustin Cavalier <waddlesplash@gmail.com> (Haiku)
Obtained from: DragonFlyBSD (706a3044afd27c3fecfdf57bec1695310e53e228)
* This avoids firmware resets in all the cases in iwm_newstate(). Instead
iwm_bring_down_firmware() is called, which tears down all the STA
connection state, according to the sc->sc_firmware_state value.
* Improve the behaviour of the LED blinking a bit, so it only blinks when
there really is a wireless scan going on.
* Print the newstate arg in debug output of iwm_newstate(), to help in
debugging.
This is inspired by the firmware state maintaining change in OpenBSD's iwm,
by stsp@openbsd.org (OpenBSD Git 0ddb056fb7370664b1d4b84392697cb17d1a414a).
Submitted by: Augustin Cavalier <waddlesplash@gmail.com> (Haiku)
Obtained from: DragonFlyBSD (8a41b10ac639d0609878696808387a6799d39b57)
* While there remove unused IWM_UCODE_TLV_CAPA_LMAC_UPLOAD definition,
which isn't defined in iwlwifi.
Taken-From: Linux iwlwifi
Submitted by: Augustin Cavalier <waddlesplash@gmail.com> (Haiku)
Obtained from: DragonFlyBSD (fd4f9de8bc72ea961e50829b45b59d0549040b7d)
* Remove outdated notifications IWM_SCAN_ABORT_CMD,
IWM_SCAN_START_NOTIFICATION and IWM_SCAN_RESULTS_NOTIFICATION.
* Remove unused enum iwm_scan_complete_status.
* Use the updated FW Api version 3 of struct iwm_scan_results_notif.
* No functional change, since struct iwm_scan_results_notif is never
accessed in iwm at the moment.
Taken-From: Linux iwlwifi commits 1083fd7391e989be52022f0f338e9dadc048b063
and 75118fdb63496e4611ab50380499ddd62b9de69f.
Submitted by: Augustin Cavalier <waddlesplash@gmail.com> (Haiku)
Obtained from: DragonFlyBSD (c947b0b8dc96dabefd63f7b70d53695e36c7b64f)
* Rename some structs and struct members for firmware handling.
Submitted by: Augustin Cavalier <waddlesplash@gmail.com> (Haiku)
Obtained from: DragonFlyBSD (4b1006a6e4d0f61d48c67b46e1f791e30837db67)
* There is (almost) nothing to do in suspend/resume if if_iwm has failed
during initialization (e.g. because of firmware load failure) and was
already uninitialized by iwm_detach_local().
Submitted by: Augustin Cavalier <waddlesplash@gmail.com> (Haiku)
Obtained from: DragonFlyBSD (67b5e090efb225654815fed91020db6cfc16bb19)
* We should load the firmware exactly once before the driver really
initializes the hardware the first time, and unload it at detach time.
There is no need to retrieve the firmware during execution of
iwm_mvm_load_ucode_wait_alive(), we should make sure we already have the
firmware data at hand before that.
* The existing sc_preinit_hook code fails to deal with the case where
if_iwm is loaded by the loader (or is statically linked) and the
firmware needs to be loaded from disk. So we can just call
iwm_read_firmware() from iwm_attach() directly.
* A separate solution will have to be added to properly defer the firmware
loading during bootup, until the necessary filesystem is mounted.
Submitted by: Augustin Cavalier <waddlesplash@gmail.com> (Haiku)
Obtained from: DragonFlyBSD (0104ee1f4cb6a2313c00c2526c6ae98d42e5041d)
* Doing the iwm_prepare_card_hw() call in iwm_attach() only on Family 8000
hardware matches the code in Linux iwlwifi.
* While there remove DEFAULT_MAX_TX_POWER definition which is unused, and
has a value different from IWL_DEFAULT_MAX_TX_POWER in iwlwifi.
Submitted by: Augustin Cavalier <waddlesplash@gmail.com> (Haiku)
Obtained from: DragonFlyBSD (e8560f8dc58df12a7c79a6bb4e6ccb156e001085)
* Rather than providing a non-zero index into the firmware RS table,
we should always use index 0 and update the firmware RS table whenever
our chosen tx rate for data-frames changes.
* Send IWM_LQ_CMD updates when the tx rate gets updated by the net80211
rate control (which is after we tell the tx status to the net80211
rate-control in iwm_mvm_rx_tx_cmd_single()).
* Disregard frames transferred with a different tx rate than the currently
selected rate for the rate-control calculations. This way we avoid
counting management frames (which are sent at a slow, and fixed rate),
as well as frames we added to the tx queue just before a new IWM_LQ_CMD
update took effect.
Submitted by: Augustin Cavalier <waddlesplash@gmail.com> (Haiku)
Obtained from: DragonFlyBSD (5d6b465e288ac5b52d7115688d4e6516acbbea1c)
From Krzysztof:
Ensure that the entire data buffer passed from the NVM update tool is copied in
to kernel space and copied back out to user space using copyin() and copyout().
PR: 234104
Submitted by: Krzysztof Galazka <krzysztof.galazka@intel.com>
Reported by: Finn <ixbug@riseup.net>
MFC after: 5 days
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D18817
From Jake:
In r341156 ("Fix first-packet completion", 2018-11-28) a hack to work
around a delta calculation determining how many descriptors were used
was added to ixl_isc_tx_credits_update_dwb.
The same fix was also applied to the em and igb drivers in r340310, and
to ix in r341156.
The hack checked the case where prev and cur were equal, and then added
one. This works, because by the time we do the delta check, we already
know there is at least one packet available, so the delta should be at
least one.
However, it's not a complete fix, and as indicated by the comment is
really a hack to work around the real bug.
The real problem is that the first time that we transmit a packet,
tx_cidx_processed will be set to point to the start of the ring.
Ultimately, the credits_update function expects it to point to the
*last* descriptor that was processed. Since we haven't yet processed any
descriptors, pointing it to 0 results in this incorrect calculation.
Fix the initialization code to have it point to the end of the ring
instead. One way to think about this, is that we are setting the value
to be one prior to the first available descriptor.
Doing so, corrects the delta calculation in all cases. The original fix
only works if the first packet has exactly one descriptor. Otherwise, we
will report 1 less than the correct value.
As part of this fix, also update the MPASS assertions to match the real
expectations. First, ensure that prev is not equal to cur, since this
should never happen. Second, remove the assertion about prev==0 || delta
!= 0. It looks like that originated from when the em driver was
converted to iflib. It seems like it was supposed to ensure that delta
was non-zero. However, because we originally returned 0 delta for the
first calculation, the "prev == 0" was tacked on.
Instead, replace this with a check that delta is greater than zero,
after the correction necessary when the ring pointers wrap around.
This new solution should fix the same bug as r341156 did, but in a more
robust way.
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: shurd@
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D18545
Note that the affected interface is available only to root.
admbugs: 765
Reported by: Vlad Tsyrklevich <vlad@tsyrklevich.net>
Reviewed by: emaste, ram
MFC after: 1 day
Security: Kernel memory disclosure
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18914
Changelist:
- Add the proper memory barriers in the kloop ring processing
functions.
- Fix memory barriers usage in the user helpers (nm_sync_kloop_appl_write,
nm_sync_kloop_appl_read).
- Fix nm_kr_txempty() helper to look at rhead rather than rcur. This
is important since the kloop can read a value of rcur which is ahead
of the value of rhead (see explanation in nm_sync_kloop_appl_write)
- Remove obsolete ptnetmap_guest_write_kring_csb() and
ptnet_guest_read_kring_csb(), and update if_ptnet(4) to use those.
- Prepare in advance the arguments for netmap_sync_kloop_[tr]x_ring(),
to make the kloop faster.
- Provide kernel and user implementation for nm_ldld_barrier() and
nm_ldst_barrier()
MFC after: 2 weeks
The nm_os_selwakeup function needs to call knote() to wake up kqueue(9)
users. However, this function can be called from different code paths,
with different lock requirements.
This patch fixes the knote() call argument to match the relavant lock state.
Also, comments have been updated to reflect current code.
PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219846
Reported by: Aleksandr Fedorov <aleksandr.fedorov@itglobal.com>
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D18876
Also, expose IFLIB_MAX_RX_SEGS to iflib drivers and add
iflib_dma_alloc_align() to the iflib API.
Performance is generally better with the tunable/sysctl
dev.vmx.<index>.iflib.tx_abdicate=1.
Reviewed by: shurd
MFC after: 1 week
Relnotes: yes
Sponsored by: RG Nets
Differential Revision: https://reviews.freebsd.org/D18761
mean that the driver should taste the firmware in the KLD and use that
firmware's version for all its fw_install checks.
The driver gets firmware version information from compiled-in values by
default and this change allows custom (or older/newer) firmware modules
to be used with the stock driver.
There is no change in default behavior.
MFC after: 1 week
Sponsored by: Chelsio Communications
- Check if buffer can contain Rx descriptor before accessing it.
- Verify upper / lower bounds for frame length.
- Do not pass too short frames into ieee80211_find_rxnode().
While here:
- Move cleanup to the function end.
- Reuse IEEE80211_IS_DATA() macro.
MFC after: 1 week
- Discard frames that are bigger than MCLBYTES (to prevent buffer overrun).
- Check buffer length before accessing its contents.
- Fix len <-> dmalen check - the last includes Rx Wireless information
structure size.
- Fix out-of-bounds read during Rx node search for ACK / CTS frames
(monitor mode only).
While here:
- Mark few suspicious places with comments.
- Move common cleanup to the function end.
MFC after: 1 week
indicates an error. Also, do not remove it twice from the hf list in
this case.
Submitted by: Krishnamraju Eraparaju @ Chelsio
MFC after: 1 week
Sponsored by: Chelsio Communicatons
The recent gcc versions (7 and 8 at least) can check for switch case
statements for fall through (implicit-fallthrough). When fall through
is intentional, the default method for warning suppression is to place
comment /* FALLTHROUGH */ exactly before next case statement.
Differential Revision: https://reviews.freebsd.org/D18577
Without HID_IGNORE quirk enabled these models appear in the system as a uhid
devices while NUT (Network UPS Tool) expects them to be ugen.
PR: 131521
Submitted by: Naoyuki Tai <ntai@smartfruit.com>, John Bayly <john.bayly@tipstrade.net>
MFC after: 1 week
The SPA ids are published numbers, so it's safe (if not a bit
annoying) to copy them into a source file.
Submitted by: D Scott Phillips <d.scott.phillips@intel.com>
Sponsored by: Intel Corporation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D18733
The NVDIMM root device is parent to the individual ACPI NVDIMM
devices. Add a driver for the NVDIMM root device that can own
enumeration of NVDIMM devices as well as NVDIMM SPA ranges that the
system has.
Submitted by: D Scott Phillips <d.scott.phillips@intel.com>
Sponsored by: Intel Corporation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D18346
Even M_WAITOK callers must check for failure. For example, if the device is
quiescing, either due to automatic error-recovery induced reset, or due to
administrative detach, the routine will return ENXIO and the acquire
reference will not be held. So, there is no mode in which it is safe to
assume the routine succeeds without checking.
Sponsored by: Dell EMC Isilon
Generic Tx stats fixes:
- do not try to parse "aggregation status" for single frames; send them
to iwn_tx_done() instead;
- try to attach mbuf / node reference pair to reported BA events;
allows to fix reported status for ieee80211_tx_complete() and ifnet counters
(previously all A-MPDU frames were counted as failed - see PR 210211);
requires few more firmware bug workarounds;
- preserve short / long retry counters for wlan_amrr(4)
(disabled for now - causes significant performance degradation).
- Add new IWN_DEBUG_AMPDU debug category.
- Add one more check into iwn_tx_data() to prevent aggregation ring
overflow.
- Workaround 'seqno % 256' != 'current Tx slot' case (until D9195 is not
in the tree).
- Improve watchdog timer updates (previously watchdog check was omitted
when at least one frame was transmitted).
- Stop Tx when memory leak in currently used ring was detected (unlikely
to happen).
- Few other minor fixes.
Was previously tested with:
- Intel 6205, STA mode (Tx aggregation behaves much better now).
- Intel 4965AGN, STA mode (still unstable).
PR: 192641, 210211
Reviewed by: adrian, dhw
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D10728
Rate tables have this bit set to indicate minimal set of basic rates;
however, it overlappes with MCS bit, so rate2ridx() will treat them as
an 11n rate.
Due to the current rates setup the issue can be reproduced only
in 5GHz band with 11n / protection enabled.
Tested with RTL8821AU, HOSTAP mode.
MFC after: 5 days
In FreeBSD, this is normal situation that the Tx ring is being full. In
hat case, the packet is put back into drbr and the next attempt to send
it is taken after the cleanup.
Too much logs like this can cause system instability and even cause the
device reset (because keep alive or cleanup could be missed).
To fix that, the log level of this message is changed to debug.
Upon this change upgrade the driver version to v0.8.2.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
upstream it seems).
The tlv variable was changed to a pointer but the advancement of the data pointer
was left as sizeof(tlv). While the sizeof the (now) pointer equals the
sizeof 2 x uint32_t (size of the struct) on 64bit platforms, on 32bit platforms
the size of the advancement of the data pointer was wrong leading to
firmware load issues.
Correctly advance the data pointer by the size of the structure and not by
the size of a pointer.
PR: 219683
Submitted by: waddlesplash gamil.com (Haiku) on irc
MFC after: 1 week
ccr reuses the control queue and first rx queue from the first port on
each adapter. The driver cannot send requests until those queues are
initialized. Refuse to create sessions for now if the queues aren't
ready. This is a workaround until cxgbe allocates one or more
dedicated queues for ccr.
PR: 233851
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D18478
o In vm_pager_bufferinit() create pbuf_zone and start accounting on how many
pbufs are we going to have set.
In various subsystems that are going to utilize pbufs create private zones
via call to pbuf_zsecond_create(). The latter calls uma_zsecond_create(),
and sets a limit on created zone. After startup preallocate pbufs according
to requirements of all pbuf zones.
Subsystems that used to have a private limit with old allocator now have
private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS cluster, FFS,
swap, vnode pager.
The following subsystems use shared pbuf zone: cam(4), nvme(4), physio(9),
aio(4). They should have their private limits, but changing that is out of
scope of this commit.
o Fetch tunable value of kern.nswbuf from init_param2() and while here move
NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that was holding only
this option.
Default values aren't touched by this commit, but they probably should be
reviewed wrt to modern hardware.
This change removes a tight bottleneck from sendfile(2) operation, that
uses pbufs in vnode pager. Other pagers also would benefit from faster
allocation.
Together with: gallatin
Tested by: pho
Do not lose error condition by always returning 0 from set_led.
None of the calls to set_led checks for return value at the moment so
none of API consumers in base is affected.
PR: 231567
Submitted by: Bertrand Petit <bsdpr@phoe.frmug.org>
MFC after: 1 week
Family 15h is a bit of an oddball. Early models used the same temperature
register and spec (mostly[1]) as earlier CPU families.
Model 60h-6Fh and 70-7Fh use something more like Family 17h's Service
Management Network, communicating with it in a similar fashion. To support
them, add support for their version of SMU indirection to amdsmn(4) and use
it in amdtemp(4) on these models.
While here, clarify some of the deviceid macros in amdtemp(4) that were
added with arbitrary, incorrect family numbers, and remove ones that were
not used. Additionally, clarify intent and condition of heterogenous
multi-socket system detection.
[1]: 15h adds the "adjust range by -49°C if a certain condition is met,"
which previous families did not have.
Reported by: D. C. <tjoard AT gmail.com>
PR: 234657
Tested by: D. C. <tjoard AT gmail.com>
Extend the vendor class USB audio quirk to cover devices without
the USB audio control descriptor.
PR: 234794
MFC after: 1 week
Sponsored by: Mellanox Technologies
Issue:
ocs_fc(4) driver panics. It's induced by setting the port_state
sysctl to offline, then online, then offline, then online, and so
forth and so on in rapid succession.
Reason:
While we set the port_state to online fc discovery will start and OS
is enumerating the target discs by calling ocs_action(), then set the
port state to "offline" which deletes domain/sport/nodes.
In ocs_action()->XPT_GET_TRAN_SETTINGS we are accessing the remote
node which can be invalid to get the wwpn, wwnn and port.
Fix:
Removed accessing of remote node and domain in some ocs_action() cases.
Populated the required values from ocs_fcport.
This removes the dependency of node and domain structures while
processing XPT_PATH_INQ and XPT_GET_TRAN_SETTINGS.
We will invalidate the target entries after the device lost
timeout(30 seconds).
Approved by: ken, mav
MFC after: 3 weeks
The code is similar to the one for RTL8188E* and probably
should be shared with RTL8188CE (needs to be tested).
Checked with RTL8188CUS, STA mode.
MFC after: 5 days
- Remove macros that covertly create epoch_tracker on thread stack. Such
macros a quite unsafe, e.g. will produce a buggy code if same macro is
used in embedded scopes. Explicitly declare epoch_tracker always.
- Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list
IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read
locking macros to what they actually are - the net_epoch.
Keeping them as is is very misleading. They all are named FOO_RLOCK(),
while they no longer have lock semantics. Now they allow recursion and
what's more important they now no longer guarantee protection against
their companion WLOCK macros.
Note: INP_HASH_RLOCK() has same problems, but not touched by this commit.
This is non functional mechanical change. The only functionally changed
functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter
epoch recursively.
Discussed with: jtl, gallatin
Dell-branded Intel P4600 NVMe drives benefit from NVMe 1.3's NOIOB
feature. Unfortunately just like Intel DC P4500s, they don't advertise
themselves as benefiting from this...
This changes adds P4600s to the existing list of old drives which
benefit from striping.
PR: 233969
Submitted by: David Fugate <dave.fugate@gmail.com>
Reviewed by: imp, mav
Approved by: imp (mentor)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D18772
setting the data prior to setting up the interrupt. Now we only set
the cookie afterwards, and that (a) cannot be helpd and (b) isn't used
in the ISR.
PR: 147127
Submitted by: hps@
On system with Celeron 1.5GHz CPU, sometimes when a PCMCIA to Compact Flash
adapter containing a Compact Flash card is inserted in the cardbus slot the
system hangs. This problem has not been observed in systems with a 2.8GHz
XEON CPU or faster.
Analysis of the cbb driver shows functional interrupts are routed to PCI
BEFORE the interrupt handler for functional interrupts has been registered.
Fix applied as described in the bug.
PR: 128040
Submitted by: Arthur Hartwig
Use BUS_DMA_NOWAIT for loads at initialization time.
Report actual numeric error code if any problem occurs at the
initialization.
Reported and tested by: pho
Reviewed by: mav
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D18741
tws_passthru() was doing a copyin of a user-specified request
without validating its length, so a malicious request could overrun
the buffer. By default, the tws(4) device file is only accessible
as root.
admbug: 825
Reported by: Anonymous of the Shellphish Grill Team
Reviewed by: delphij
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18536
out dead USB HUB devices by implementing an error counter, so that the USB
enumeration thread does not spend all its time reading from non-responding
devices, blocking user-space access in the end.
Tested by: Matthias Apitz <guru@unixarea.de>
MFC after: 1 week
Sponsored by: Mellanox Technologies
done on the old keyboard and then do the corresponding number of grabs
on the new keyboard.
This fixes a race that can leave the system with a non-functioning
keyboard. It goes like this...
- The bios claims there is an AT keyboard, atkbd attaches.
- SI_SUB_INT_CONFIG_HOOKS runs.
- USB probes devices. Devices begin attaching, including disks.
- GELI prompts for a password for a just-attached disk, which results
in a cngrab() while atkbd is the keyboard.
- A USB keyboard attaches.
- vt_upgrade() runs and switches the keyboard to the new USB keyboard,
but because cngrab was never called for it, it's not activated and
keystrokes are ignored.
- Now there is no functional keyboard and no way to get one; even
plugging in a different USB keyboard doesn't help, because the console
is still grabbed, still waiting for a GELI pw.
Discussed with: ray@
front-end doesn't support SDMA or the latter implements a platform-
specific transfer method instead. While at it, factor out allocation
and freeing of SDMA resources to sdhci_dma_{alloc,free}() in order to
keep the code more readable when adding support for ADMA variants.
o Base the size of the SDMA bounce buffer on MAXPHYS up to the maximum
of 512 KiB instead of using a fixed 4-KiB-buffer. With the default
MAXPHYS of 128 KiB and depending on the controller and medium, this
reduces the number of SDHCI interrupts by a factor of ~16 to ~32 on
sequential reads while an increase of throughput of up to ~84 % was
seen.
Front-ends for broken controllers that only support an SDMA buffer
boundary of a specific size may set SDHCI_QUIRK_BROKEN_SDMA_BOUNDARY
and supply a size via struct sdhci_slot. According to Linux, only
Qualcomm MSM-type SDHCI controllers are affected by this, though.
Requested by: Shreyank Amartya (unconditional bump to 512 KiB)
o Introduce a SDHCI_DEPEND macro for specifying the dependency of the
front-end modules on the sdhci(4) one and bump the module version
of sdhci(4) to 2 via an also newly introduced SDHCI_VERSION in order
to ensure that all components are in sync WRT struct sdhci_slot.
o In sdhci(4):
- Make pointers const were applicable,
- replace a few device_printf(9) calls with slot_printf() for
consistency, and
- sync some local functions with their prototypes WRT static.
Previous code typically crashed in case of NVMe device unplug or even clean
detach while some I/Os are still in flight. To fix this the new code calls
disk_gone() and waits for confirmation of all references gone before calling
disk_destroy(), freeing other resources and allowing controller detach.
While there, fix disk lists locking and reimplement unit numbers assignment.
MFC after: 1 month
Sponsored by: iXsystems, Inc.
Due to hardware errata in Aero controllers, reads to certain
fusion registers could intermittently return all zeroes.
This behavior is transient in nature and subsequent reads will return
valid value.
Fix:
For Aero controllers, any read will retry the read operations
from certain registers for maximum three times, if read returns zero.
Submitted by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed by: Kashyap Desai <Kashyap.Desai@broadcom.com>
Approved by: ken
MFC after: 3 days
Sponsored by: Broadcom Inc
For Aero adapters-
1. Driver will use 32 bit atomic descriptor to fire IOs and DCMDs.
2. Driver will use 64 bit request descriptor to fire IOC INIT.
3. If Aero firmware supports 32 bit atomic descriptor, then only driver will use it
otherwise driver will use 64 bit request descriptor.
For rest of adapters(Ventura, Invader and Thunderbolt), driver will use 64 bit request
descriptors only.
Submitted by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed by: Kashyap Desai <Kashyap.Desai@broadcom.com>
Approved by: ken
MFC after: 3 days
Sponsored by: Broadcom Inc
Driver will throw a warning message when a Configurable secure type controller is
encountered.
Submitted by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed by: Kashyap Desai <Kashyap.Desai@broadcom.com>
Approved by: ken
MFC after: 3 days
Sponsored by: Broadcom Inc
Due to HW Errta on Aero/Sea A0 chipset on secure boot mode & on heavy IO load,
sometimes read operation on MPT Fusion registers will give zero value,
So, as a workaround driver will retry the MPT Fusion register
read operation for max three times upon reading zero value form these
registers.
Submitted by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Reviewed by: Kashyap Desai <Kashyap.Desai@broadcom.com>
Approved by: ken
MFC after: 3 days
Sponsored by: Broadcom Inc
Enable atomic type descriptor support only for Sea & Aero cards,
due to HW errata this atomic descriptor support has to be disabled
on Ventura cards.
Submitted by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Reviewed by: Kashyap Desai <Kashyap.Desai@broadcom.com>
Approved by: ken
MFC after: 3 days
Sponsored by: Broadcom Inc
Added deviceID's for Sea,Aero to mpr Driver
Aero:
0x00E0 Invalid
0x00E1 Configurable Secure
0x00E2 Hard Secure
0x00E3 Tampered
Sea:
0x00E4 Invalid
0x00E5 Configurable Secure
0x00E6 Hard Secure
0x00E7 Tampered
For Tampered & Invalid type cards, driver will claim the device & quit the probe function with below error message,
"HBA is in Non Secure mode"
for Configurable Secure type cards, driver will display below message in .probe() callback function,
"HBA is in Configurable Secure mode"
Submitted by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Reviewed by: Kashyap Desai <Kashyap.Desai@broadcom.com>
Approved by: ken
MFC after: 3 days
Sponsored by: Broadcom Inc
Following list of changes done in the driver as a part of TM handling on the NVMe drives.
Below changes are only applicable on NVMe drives and only when custom NVMe TM handling bit is set to zero by IOC.
1. Issue LUN reset & Target reset TMs with Target reset method field set to Protocol Level reset (0x3),
2. For LUN & target reset TMs use the timeout value as ControllerResetTO value provided by firmware using PCie Device Page 0,
3. If LUN reset fails to terminates the IO then directly escalate to host reset instead of going for target reset TM,
4. For Abort TM use the timeout value as NVMeAbortTO value given by the IOC using Manufacturing Page 11,
5. Log message "PCie Host Reset failed" message up on receiving P
Submitted by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Reviewed by: Kashyap Desai <Kashyap.Desai@broadcom.com>
Approved by: ken
MFC after: 3 days
Sponsored by: Broadcom Inc
typedef struct mps_pass_thru
{
uint64_t PtrRequest;
uint64_t PtrReply;
uint64_t PtrData;
uint32_t RequestSize;
uint32_t ReplySize;
uint32_t DataSize;
uint32_t DataDirection;
uint64_t PtrDataOut;
uint32_t DataOutSize;
uint32_t Timeout;
} mps_pass_thru_t, * ptrmpssas_pass_thru_t;
In the above mps_pass_thru structure; Application expects PrtReply buffer
should contain both MPI reply followed by sense data. So, updated driver
to copy sense data at PtrReply + sizeof(MPI2 reply) location where
application wants the driver to copy back the sense data info.
Submitted by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Reviewed by: Kashyap Desai <Kashyap.Desai@broadcom.com>
Approved by: ken
MFC after: 3 days
Sponsored by: Broadcom Inc
CAM does not require SIM lock since FreeBSD 10.4, and NVMe code never
required it at all, using per-queue locks instead. This formally allows
parallel request submission in CAM mode as much as single per-device and
per-queue locks of CAM allow.
MFC after: 1 month
g_io_deliver() finishing initialization of the bio, but g_io_deliver()
actually destroys the bio. INVARIANTS makes the bug obvious by
overwriting the bio with garbage.
Restore the old order for calling devstat (except don't restore not calling
it for the error case), and translate to the devstat KPI so that this order
works.
Reviewed by: kib
To check if txsync can be skipped, it is necessary to look for
unseen TX space. However, this means comparing ring->cur
against ring->tail, rather than ring->head against ring->tail
(like nm_ring_empty() does).
This change also adds some more comments to explain the optimization
performed at the beginning of netmap_poll().
MFC after: 3 days
Sponsored by: Sunny Valley Networks
The bug was introduced by r339639, although it is present in the upstream
netmap code since 2015. It is due to resetting the want_rx variable to
POLLIN, rather than resetting it to POLLIN|POLLRDNORM.
It only affects select(), which uses POLLRDNORM. poll() is not affected,
because it uses POLLIN.
Also, it only affects FreeBSD, because Linux skips the optimization
implemented by the piece of code where the bug occurs.
MFC after: 3 days
Sponsored by: Sunny Valley Networks
Add a generic mechanism to override mp?_wait_command's timeout behavior,
which continues to invoke reinit by default. Invokers who set
cm_timeout_handler may avoid automatic reinit and do their own handling.
Adapt mp?sas_get_sata_identify to this mechanism and remove its callout
hack.
Reviewed by: scottl
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D18614
In the event that the ID command timed out, mps(4)/mpr(4) did not free the
command until it could be cancelled. However, it freed the associated
buffer (cm_data). Fix the lifetime issue by freeing the associated buffer
only after Abort Task or controller reset.
Reviewed by: scottl
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D18612
This code validates the netmap buf_size against the interface MTU
and maximum descriptor size, to make sure the values are consistent.
Moving this functionality to its own function is needed because this
function is also called by Linux-specific code.
MFC after: 3 days
implement not double-caching for reads from vnode-backed md devices.
Use VOP_ADVISE() similarly instead of !IO_DIRECT unsimilarly for writes.
Add a "cache" option to mdconfig to allow changing the default of not
caching.
This depends on a recent commit to fix VOP_ADVISE(). A previous version
had optimizations for sequential i/o's (merge the i/o's and only uncache
for discontiguous i/o's and for full blocks), but optimizations and
knowledge of block boundaries belong in VOP_ADVISE(). Read-ahead should
also be handled better, by supporting it in md and discarding it in
VOP_ADVISE().
POSIX_FADV_DONTNEED is ignored by zfs, but so is IO_DIRECT.
POSIX_FADV_DONTNEED works better than IO_DIRECT if it is not ignored,
since it only discards from the buffer cache immediately, while
IO_DIRECT also discards from the page cache immediately.
IO_DIRECT was not used for writes since it was claimed to be too slow,
but most of the slowness for writes is from doing them synchronously by
default. Non-synchronous writes still deadlock in many cases.
IO_DIRECT only has a special implementation for ffs reads with DIRECTIO
configured. Otherwise, if it is not ignored than it uses the buffer and
page caches normally except for discarding everything after each i/o,
and then it has much the same overheads as POSIX_FADV_DONTNEED. The
overheads for reading with ffs and DIRECTIO were similar in tests of md.
Reviewed by: kib
Move static variable definition (cdevsw) to a more conventional location
(the C file it is used in), rather than a header.
This fixes the GCC warning, -Wunused-variable ("defined but not used") when
the tpm20.h header is included in files other than tpm20.c (e.g.,
tpm_tis.c).
X-MFC-with: r342084
Sponsored by: Dell EMC Isilon
On amd64 the RSP address can be read in single 8-byte transaction,
which is obviously not possible on 32-bit platforms. Fix that
by performing 2 4-byte read on them.
Obtained from: Semihalf
Sponsored by: Stormshield
This fixes a warning seen when compiling amd64 GENERIC with clang 7.
Also remove the workaround added in r337324. clang 7 and gcc 4.2
generate the same code with or without the code change.
Reviewed by: imp (previous version)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D18603
- Fix PR 227760 by getting the TOE to respond to the SYN after the call
to toe_syncache_add, not during it. The kernel syncache code calls
syncache_respond just before syncache_insert. If the ACK to the
syncache_respond is processed in another thread it may run before the
syncache_insert and won't find the entry. Note that this affects only
t4_tom because it's the only driver trying to insert and expand
syncache entries from different threads.
- Do not leak resources if an embryonic connection terminates at
SYN_RCVD because of L2 lookup failures.
- Retire lctx->synq and associated code because there is never a need to
walk the list of embryonic connections associated with a listener.
The per-tid state is still called a synq entry in the driver even
though the synq itself is now gone.
PR: 227760
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Those should ensure correctness of ichwd_find_ich_lpc_bridge() and
ichwd_find_ich_lpc_bridge() as well as make it easier for both humans
and static analyzers to see the relation between tco_version and ich and
smb variables in ichwd_identify().
Reported by: Coverity
CID: 1396314, 1396317
MFC after: 10 days
The code is unreachable since the entries of radeon_ioctls[] are not
associated with any device: we provide only the KMS entry points.
Moreover, r600_cp_dispatch_texture() contains an integer overflow bug
that can be triggered from userspace.[1]
Reported by: Anonymous of the Shellphish Grill Team[1]
Reviewed by: dumbbell
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18516
This includes removing stray whitespace, adding a line after the
variable declaration block and removing a redundant check.
MFC after: 1 week
X-MFC with: r339754
In testing on a Dell Latitude 7480, having ig4.ko loaded during a
suspend caused the system to hang. It turns out that ig4iic_intr() was
being called after the device entered D3, and entered an infinite loop
because a read of the I2C status register returned all ones, causing us
to attempt to read a byte from the data buffer until one of the status
bits clears. This occured because ig4iic_pci0 shares an interrupt with
the VGA device on this laptop, so ig4iic_intr() gets called even when
there is no work to do. This is exactly the problem fixed by r342170,
which resolves the hang for me and allows suspend/resume to work with
ig4.ko loaded. So, re-enable autoloading of ig4.ko in the hope that
r342170 resolves the problem universally.
Reviewed by: gonzo
MFC after: 1 month (pending an MFC of r342170)
Differential Revision: https://reviews.freebsd.org/D18587
The goal of this change is to fix a problem with PCI shared interrupts
during suspend and resume.
I have observed a couple of variations of the following scenario.
Devices A and B are on the same PCI bus and share the same interrupt.
Device A's driver is suspended first and the device is powered down.
Device B generates an interrupt. Interrupt handlers of both drivers are
called. Device A's interrupt handler accesses registers of the powered
down device and gets back bogus values (I assume all 0xff). That data is
interpreted as interrupt status bits, etc. So, the interrupt handler
gets confused and may produce some noise or enter an infinite loop, etc.
This change affects only PCI devices. The pci(4) bus driver marks a
child's interrupt handler as suspended after the child's suspend method
is called and before the device is powered down. This is done only for
traditional PCI interrupts, because only they can be shared.
At the moment the change is only for x86.
Notable changes in core subsystems / interfaces:
- BUS_SUSPEND_INTR and BUS_RESUME_INTR methods are added to bus
interface along with convenience functions bus_suspend_intr and
bus_resume_intr;
- rman_set_irq_cookie and rman_get_irq_cookie functions are added to
provide a way to associate an interrupt resource with an interrupt
cookie;
- intr_event_suspend_handler and intr_event_resume_handler functions
are added to the MI interrupt handler interface.
I added two new interrupt handler flags, IH_SUSP and IH_CHANGED, to
implement the new intr_event functions. IH_SUSP marks a suspended
interrupt handler. IH_CHANGED is used to implement a barrier that
ensures that a change to the interrupt handler's state is visible
to future interrupts.
While there, I fixed some whitespace issues in comments and changed a
couple of logically boolean variables to be bool.
MFC after: 1 month (maybe)
Differential Revision: https://reviews.freebsd.org/D15755
PR: maybe related to 233998 (inconclusive at this time)
Submitted by: byuu <byuu AT tutanota.com> (previous version)
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D18506
It was written basing on:
TCG PC Client Platform TPM Profile (PTP) Specification Version 22, Revision 1.03.
It only supports Locality 0. Interrupts are only supported in FIFO mode.
The driver in FIFO mode was tested on x86 with Infineon SLB9665 discrete TPM chip.
Driver in both modes was also tested on qemu with swtpm running on host.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D18048
This is based on a patch developed by
Tetsuya Uemura <t_uemura@macome.co.jp>.
Many thanks!
Submitted by: Tetsuya Uemura <t_uemura@macome.co.jp> (earlier version)
Tested by: Tetsuya Uemura <t_uemura@macome.co.jp>
MFC after: 2 weeks
very minimal prints and even few important messages will not get logged.
Submitted by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed by: Kashyap Desai <Kashyap.Desai@broadcom.com>
Approved by: ken
MFC after: 3 days
Sponsored by: Broadcom Inc
capable IOs. NVME specification supports specific type of scatter gather list
called as PRP (Physical Region Page) for IO data buffers. Since NVME drive is
connected behind SAS3.5 tri-mode adapter, MegaRAID driver/firmware has to convert
OS SGLs in native NVMe PRP format. For IOs sent to firmware, MegaRAID firmware
does this job of OS SGLs to PRP translation and send PRPs to backend NVME device.
For fastpath IOs, driver will do this OS SGLs to PRP translation.
Submitted by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed by: Kashyap Desai <Kashyap.Desai@broadcom.com>
Approved by: ken
MFC after: 3 days
Sponsored by: Broadcom Inc
required Write IOs as Fast Path IOs (after the appropriate checks
allowing Fast Path to be used) to the appropriate physical drives
(translated from the OS logical IO) and wait for all Write IOs to complete.
Design: A write IO on RAID volume will be examined if it can be sent in
Fast Path based on IO size and starting LBA and ending LBA falling on to
a Physical Drive boundary. If the underlying RAID volume is a RAID 1/10,
driver issues two fast path write IOs one for each corresponding physical
drive after computing the corresponding start LBA for each physical drive.
Both write IOs will have the same payload and are posted to HW such that
replies land in the same reply queue.
If there are no resources available for sending two IOs, driver will send
the original IO from upper layer to RAID volume through the Firmware.
When both IOs are completed by HW, the resources will be released
and SCSI IO completion handler will be called.
Submitted by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed by: Kashyap Desai <Kashyap.Desai@broadcom.com>
Approved by: ken
MFC after: 3 days
Sponsored by: Broadcom Inc
stream to help HBA Firmware do the Full Stripe Writes. For read IOs on
certain RAID volumes like Read Ahead volumes,this will help driver to
send it to Firmware even if the IOs can potentially be sent to
hardware directly (called fast path) bypassing firmware.
Design: 8 streams are maintained per RAID volume as per the combined
firmware/driver design. When there is no stream detected the LRU stream
is used for next potential stream and LRU/MRU map is updated to make this
as MRU stream. Every time a stream is detected the MRU map
is updated to make the current stream as MRU stream.
Submitted by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed by: Kashyap Desai <Kashyap.Desai@broadcom.com>
Approved by: ken
MFC after: 3 days
Sponsored by: Broadcom Inc
for different number of supported VDs for SAS3.5 MegaRAID adapters.
Submitted by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed by: Kashyap Desai <Kashyap.Desai@broadcom.com>
Approved by: ken
MFC after: 3 days
Sponsored by: Broadcom Inc
In the nda(4) driver, only set DISKFLAG_CANDELETE (a.k.a. can support
BIO_DELETE) if the drive supports Dataset Management. There are reports
that without this check, VMWare Workstation does not work reliably.
Fix is to check the ONCS field in the NVMe Controller Data structure for
support. This check previously existed but did not survive the
big-endian changes.
Reported by: yuripv@yuripv.net
Reviewed by: imp, mav, jimharris
Approved by: imp (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D18493
The pwm subsystem consist of API for PWM controllers, pwmbus to register them
and a pwm(8) utility to talk to them from userland.
Reviewed by: oshgobo (capsicum), bcr (manpage), 0mp (manpage)
Differential Revision: https://reviews.freebsd.org/D17938
If bus_dmamap_load_mbuf() fails following a defrag, the caller of
bwn_dma_tx_start() would free the original mbuf after m_defrag() had
already done so. Fix this by returning the defragged mbuf to the
caller instead. Update bwn_pio_tx_start() similarly for consistency.
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Reviewed by: landonf
Tested by: landonf
MFC after: 3 days
admbug: 820
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18342
PR: 217505
Submitted by: John O. Brickley <obryan.brickley@gmail.com>, updated by Maciej Pasternacki <maciej@pasternacki.net>
Reported by: John O. Brickley <obryan.brickley@gmail.com>
MFC after: 1 week
initialize the controller.
According to the datasheet, the old code checks if port 2 (P2E, 0x4) was
the only enabled port (except port 0, which was ignored by mask 0xfe),
and issue a write to the PCS register to disable all but port 0, right
before ahci_ctlr_reset.
Some other operating systems would issue a port enable to all ports, but
since the current code only does the special initialization for ICH8M,
it entirely and rely on BIOS to do the right thing (the alternative
would be https://reviews.freebsd.org/D18300?id=50922 , should we see
reports that we really need to do it).
Reviewed by: mav
MFC after: 3 months
Differential Revision: https://reviews.freebsd.org/D18300
On EF10 HW we can avoid sending packets without checksum offload
or with IP-only checksum offload to dedicated queues. Instead, we
can use option descriptors to change offload policy on any queue
during runtime. Thus, we don't need to create two dedicated queues.
Submitted by: Ivan Malov <Ivan.Malov at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D18390
The number of Tx queues on event queue 0 can depend on the NIC family type,
and this property will be leveraged by future patches.
This patch prepares the code for this change.
Submitted by: Ivan Malov <Ivan.Malov at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D18389
FreeBSD driver needs a patch to provide a means for packets
which do not need checksum offload but have flow ID set
to avoid hitting only the first Tx queue (which has been used
for packets not needing checksum offload).
This should be possible on Huntington, Medford or Medford2 chips
since these support toggling checksum offload on any given queue
dynamically by means of pushing option descriptors.
The patch for FreeBSD driver will then need a means to figure out
whether the feature can be used, and testing adapter family might
not be a good solution.
This patch adds a feature bit specifically to indicate support
for checksum option descriptors. The new feature bits may have
more users in future, apart from the mentioned FreeBSD patch.
Submitted by: Ivan Malov <Ivan.Malov at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D18388
In order to find out why the first event queue and corresponding
interrupt is triggered more frequent, it is useful to know which
events go to each event queue.
Sponsored by: Solarflare Communications, Inc.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D18418
devstat_end_transaction() was called before the i/o was actually ended
(by delivering it to GEOM), so at least the i/o length was messed up.
It was always recorded as 0, so the average transaction size and the
average transfer rate was always displayed as 0.
devstat_end_transaction() was not called at all for the error case, so
there were sometimes multiple starts per end. I didn't observe this in
practice and don't know if it did much damage. I think it extended the
length of the i/o to the next transaction.
Reviewed by: kib
ACPI SRAT table on arm64 uses GICC entries to provide CPU locality
information. These entries use an AcpiProcessorUid to identify the
CPU (unlike on x86 where the entries have an APIC ID).
Update acpi_pxm.c to extend the cpu_add/cpu_find/cpu_get_info
functions to handle AcpiProcessorUid. Use the updated functions
while parsing ACPI_SRAT_GICC_AFFINITY entry for arm64.
Also update sys/conf/files.arm64 to build acpi_pxm.c when ACPI is
enabled.
Reviewed by: markj (previous version)
Differential Revision: https://reviews.freebsd.org/D17942
This moves the architecture independent parts of sys/x86/acpica/srat.c
to sys/dev/acpica/acpi_pxm.c, to be used later on arm64. The function
declarations are moved to sys/dev/acpica/acpivar.h
We also need to update sys/conf/files.{i386,amd64} to use the new file.
No functional changes.
Reviewed by: markj, imp
Differential Revision: https://reviews.freebsd.org/D17941
Because of that typo the driver would try to attach to every device
on acpi bus. That disrupted acpi attachment of uart driver, at least.
MFC after: 4 days
X-MFC with: r339754
The iflib subsystem implements netmap support in a driver-independent
way (sys/net/iflib.c). We can therefore remove the headers that
used to implement netmap support for all the drivers now supported
by iflib (em, igb, ixl, ixgbe, lem).
MFC after: 1 week
through.
cxgb4vf doesn't own the buffer size list but still expects the first two
entries to be 4K and some power of 2 respectively. The BSD cxgbe
doesn't care where its preferred buffer sizes are as long as they're in
the list somewhere, so just move its entries towards the end as a
workaround.
MFC after: 1 month
Sponsored by: Chelsio Communicatons
Specifically, assume that the device is present if evaluation of _STA
method fails.
Before r330957 we ignored any _STA evaluation failure (which was
performed by AcpiGetObjectInfo in ACPICA contrib code) for the purpose
of acpi_DeviceIsPresent and acpi_BatteryIsPresent. ACPICA 20180313
removed evaluation of _STA from AcpiGetObjectInfo. So, we added
evaluation of _STA to acpi_DeviceIsPresent and acpi_BatteryIsPresent.
One important difference is that the new code ignored a failure only if
_STA did not exist (AE_NOT_FOUND). Any other kind of failure was
treated as a fatal failure. Apparently, on some systems we can get
AE_NOT_EXIST when evaluating _STA. And that error is not an evil twin
of AE_NOT_FOUND, despite a very similar name, but a distinct error
related to a missing handler for an ACPI operation region.
It's possible that for some people the problem was already fixed by
changes in ACPICA and/or in acpi_ec driver (or even in BIOS) that fixed
the AE_NOT_EXIST failure related to EC operation region.
This work is based on a great analysis by cem and an earlier patch by
Ali Abdallah <aliovx@gmail.com>.
PR: 227191
Reported by: 0mp
MFC after: 2 weeks
This allows tcpdump to capture outbound kernel packets while
in netmap mode
Submitted by: Marc de la Gueronniere <mdelagueronniere@verisign.com>
Reviewed by: vmaffione
MFC after: 1 week
Sponsored by: Verisign, Inc.
Differential Revision: https://reviews.freebsd.org/D17896
card initialization. This is an expanded version of r333682.
Break up prep_firmware into simpler routines while here. Load the
firmware/config KLD only if needed.
MFC after: 1 month
Sponsored by: Chelsio Communications
The backpressure indication is implemented using an unlimited rate type of
mbuf send tag. When the upper layers typically the socket layer has obtained such
a tag, it can then query the destination driver queue for the current
amount of space available in the send queue.
A single mbuf send tag may be referenced multiple times and a refcount has been added
to the mlx5e_priv structure to track its usage. Because the send tag resides
in the mlx5e_channel structure, there is no need to wait for refcounts to reach
zero until the mlx4en(4) driver is detached. The channels structure is persistant
during the lifetime of the mlx5en(4) driver it belongs to and can so be accessed
without any need of synchronization.
The mlx5e_snd_tag structure was extended to contain a type field, because there are now
two different tag types which end up in the driver which need to be distinguished.
Submitted by: hselasky@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
In order to enable HW LRO, both the "hw_lro" sysctl in the mlx5en(4) config
space must be set, and the ifconfig(8) LRO capability must be set. Any other
settings will disable HW LRO.
Submitted by: hselasky@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
Add counter for all transmitted and received bytes. Currently only all
transmitted and received packets were counted. Fix description of RX LRO
counters while at it.
Submitted by: hselasky@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
By allocating the worst case size channel structure array
at attach time we can eliminate various NULL checks in the
fast path. And also reduce the chance for use-after-free
issues in the transmit fast path.
This change is also a requirement for implementing
backpressure support.
Submitted by: hselasky@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
Writing to the debug stats variable must be locked,
else serialization will be lost which might cause
various kernel panics due to creating and destroying
sysctls out of order.
Make sure the sysctl context is initialized after freeing
the sysctl nodes, else they can be freed twice.
Submitted by: hselasky@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
Inspect the ethernet compliance code to figure out actual cable type by reading
the PDDR module info register.
Submitted by: hselasky@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
This can happen when connections are short lived and leads to
a firmware error printout in dmesg, syndrome 0x51cfb0, because
the SQ is in the wrong state.
Submitted by: hselasky@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
1) Don't exceed the drivers own hardcoded TX inline limit.
The blueflame register size can be much greater than the hardcoded limit
for inlining. Make sure we don't exceed the drivers own limit, because this
also means that the maximum number of TX fragments becomes invalid and
then memory size assumptions in the TX path no longer hold up.
2) Make sure the mlx5_query_min_inline() function returns an error code.
3) Header inlining is required when using TSO.
4) Catch failure to compute inline header size for TSO.
5) Add support for UDP when computing inline header size.
6) Fix for inlining issues with regards to DSCP.
Make sure we inline 4 bytes beyond the ethernet and/or
VLAN header to workaround a hardware bug extracting
the DSCP field from the IPv4/v6 header.
Submitted by: hselasky@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
The hardware queues are deep enough currently and using the DRBR and associated
callbacks only leads to more task switching in the TX path. The is also a race
setting the queue_state which can lead to hung TX rings.
Submitted by: hselasky@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
Add support for setting the bandwidth limit as a ratio rather than in bits per
second. The ratio must be an integer number between 1 and 100 inclusivly.
Implement the needed firmware commands and SYSCTLs through mlx5en(4).
Submitted by: hselasky@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies