mv_pci driver omitted slot 0, which can be valid device on Armada38x.
New mechanism detects if device is root link, basing on vendor's
and device's IDs.
It is restricted to Armada38x; on other machines, behaviour remains
the same.
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: Stormshield
Submitted by: Bartosz Szczepanek <bsz@semihalf.com>
Differential revision: https://reviews.freebsd.org/D4377
Driver was modified to ensure it attaches properly to "marvell,orion-ehci"
node, which doesn't have error interrupt line defined. Neccessary
ofw_compat_data struct was added and probe procedure was altered.
Reviewed by: andrew, ian
Obtained from: Semihalf
Sponsored by: Stormshield
Submitted by: Bartosz Szczepanek <bsz@semihalf.com>
Differential revision: https://reviews.freebsd.org/D4369
uart_dev_ns8250 now relies on compatible property instead of additional
'busy-detect' cell. All drivers with compatible = "snps,dw-apb-uart" have
busy detection turned on. DTS files of devices affected by the change
were modified and 'busy-detect' property was removed.
Reviewed by: andrew, ian, imp
Obtained from: Semihalf
Sponsored by: Stormshield
Submitted by: Bartosz Szczepanek <bsz@semihalf.com>
Differential revision: https://reviews.freebsd.org/D4218
This compatibility string is used in .dts file of Armada38x
and isrequired for driver attachment.
Reviewed by: andrew, ian, imp
Obtained from: Semihalf
Sponsored by: Stormshield
Submitted by: Michal Stanek <mst@semihalf.com>
Differential revision: https://reviews.freebsd.org/D4216
Strict compatibility requirement is a root of problems when simplebus'
node has two compatibility strings (i.e. on Armada38x). Removing this
requirement should not interfere with other platforms.
fdt_is_compatible_strict() and fdt_find_compatible() calls were changed
in fdt_common.c and mv_common.c.
Reviewed by: ian, imp
Obtained from: Semihalf
Sponsored by: Stormshield
Submitted by: Bartosz Szczepanek <bsz@semihalf.com>
Differential revision: https://reviews.freebsd.org/D4602
New annotation coming into use in the common code. Support its use by
adding null macro definitions for non-Windows platforms.
Submitted by: Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
X-MFC with: r293901
to actually wait until the TX FIFOs of UARTs have be drained before
returning. This is done by bringing the equivalent of the TS_BUSY flag
found in the previous implementation back in an ABI-preserving way.
Reported and tested by: Patrick Powell
Most likely, drivers for USB-serial-adapters likewise incorporating
TX FIFOs as well as other terminal devices that buffer output in some
form should also provide implementations of tsw_busy.
MFC after: 3 days
- Add optimizing LRO wrapper which pre-sorts all incoming packets
according to the hash type and flowid. This prevents exhaustion of
the LRO entries due to too many connections at the same time.
Testing using a larger number of higher bandwidth TCP connections
showed that the incoming ACK packet aggregation rate increased from
~1.3:1 to almost 3:1. Another test showed that for a number of TCP
connections greater than 16 per hardware receive ring, where 8 TCP
connections was the LRO active entry limit, there was a significant
improvement in throughput due to being able to fully aggregate more
than 8 TCP stream. For very few very high bandwidth TCP streams, the
optimizing LRO wrapper will add CPU usage instead of reducing CPU
usage. This is expected. Network drivers which want to use the
optimizing LRO wrapper needs to call "tcp_lro_queue_mbuf()" instead
of "tcp_lro_rx()" and "tcp_lro_flush_all()" instead of
"tcp_lro_flush()". Further the LRO control structure must be
initialized using "tcp_lro_init_args()" passing a non-zero number
into the "lro_mbufs" argument.
- Make LRO statistics 64-bit. Previously 32-bit integers were used for
statistics which can be prone to wrap-around. Fix this while at it
and update all SYSCTL's which expose LRO statistics.
- Ensure all data is freed when destroying a LRO control structures,
especially leftover LRO entries.
- Reduce number of memory allocations needed when setting up a LRO
control structure by precomputing the total amount of memory needed.
- Add own memory allocation counter for LRO.
- Bump the FreeBSD version to force recompilation of all KLDs due to
change of the LRO control structure size.
Sponsored by: Mellanox Technologies
Reviewed by: gallatin, sbruno, rrs, gnn, transport
Tested by: Netflix
Differential Revision: https://reviews.freebsd.org/D4914
after changing the HW LRO sysctl when previously in up state.
Reviewed by: gnn
Sponsored by: Mellanox Technologies
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D4941
Ensure that checksum flags and L3/L4 fields are ignored
if lower level errors are reported in the event.
Remove checks for CRC0_ERR (bad iSCSI header CRC) and
CRC1_ERR (bad iSCSI payload or FCoE/FCoIP CRC) as they
are not used by any existing code.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4975
The dynamic config on Medford is stored using two partitions in flash, and at
any time one is the 'current' partition, used to provide the active config,
and the other 'backup' partition is used for writes. This means that there
are two potential partitions that can be used to service reads, and which is
required can depend on, for example, whether the read is to get the current
contents or to verify a write.
When the partition write lock is held, the default behaviour is to read from
the backup partition, which was wrong for most reads in the common code which
require the current partition. This change allows the current partition to be
read whilst the write lock is held.
There is one read in Manftest which needs the backup partition.
ef10_nvram_partn_read_mode() is created to avoid changing
ef10_nvram_partn_read() which shares a prototype with the equivalent Falcon
and Siena methods.
MC_CMD_NVRAM_READ_IN_V2 adds an extra field, but firmware which doesn't support
it just ignores it.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4974
on FDT/OFW platforms.
After the refactoring of the powerpc code so that OF_decode_addr() is usable
on all FDT/OFW platforms, this switches uart(4) to using it.
Differential Revision: https://reviews.freebsd.org/D4675
tlv_partition_header has field *preset* to support RFID-selectable
segments of dynamic configuration
Submitted by: Mateusz Wrzesinski <mwrzesinski at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
This allows an MTU change to be requested on unpriviliged functions
without also setting all the other parameters supported by MC_CMD_SET_MAC.
The enhanced SET_MAC command was introduced in v4_7 firmware.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4958
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4961
we want to use it.
The rate table was being initialised in low->high, but the link quality
table was being initialised high->low. So, when we did a lookup, we
would get the indexes wrong.
This started by a patch from dragonflybsd which reversed how the ni->in_ridx[]
array is being used; I'd rather it all be consistent. So, this is consistent.
Inspired by: what I did to iwn(4) a while ago
Inspired by: DragonflyBSD; <imre@vdsz.com>
Previously the polarity was for TTL levels, which are the reverse of RS-232.
Also add handling of the UART_PPS_INVERT_PULSE option bit in the sysctl
value, the same as was recently added to uart(4), so that people using TTL
level connections can request a logical inverting of the signal.
Use the named constants from the new dev/uart/uart_ppstypes.h for the pps
capture modes and option bits.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4957
- Add the structure with already known fields offsets
(some of them were taken from this driver,
some (channel_plan, rf_* fields) - from TP-LINK official driver)
- Fix a typo / dehardcode a constant in RTL8192C ROM structure.
Tested with RTL8188EU, STA mode
Reviewed by: kevlo
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D4274
Leaving BIST methods for now as, though the Medford bootrom now has lots
of BIST support, production firmware doesn't appear to have been updated
yet.
Submitted by: Mark Spender <mspender at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4949
Not per PF copies as on Huntington.
Submitted by: Mark Spender <mspender at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4935
Some classes of IOAT hardware prefetch reads. DMA operations that
depend on the result of prior DMA operations must use the DMA_FENCE flag
to prevent stale reads.
(E.g., I've hit this personally on Broadwell-EP. The Broadwell-DE has a
different IOAT unit that is documented to not pipeline DMA operations.)
Sponsored by: EMC / Isilon Storage Division
The "mcdi_err_arg" probe still reports results of failed MCDI
commands, unless the caller invoked efx_mcdi_execute_quiet().
Submitted by: Andy Moreton <amoreton at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4919
Add support for Huntington MCDI licensing interface to common code.
Ported from Linux net driver IOCTL functions with restructuring for
initial support for V3 licensing API.
Submitted by: Richard Houldsworth <rhouldsworth at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4918
Fix an explanatory comment which did not explain very well.
Submitted by: Richard Houldsworth <rhouldsworth at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4916
Fix efx_vpd_hunk_next() which has -- since its inception -- failed to
correctly iterate over the tags and keywords contained in the VPD data.
Only the first tag or keyword would be returned and the next call with
*contp == 1 would walk to the end of the data and finish.
This was spotted when fixing up errors spotted by Prefast code analysis
(which neglected to set all of the out parameters in all successful cases)
Also fix efx_vpd_verify() on Siena and EF10 which (as a side effect of
correctly iterating over all the tags and keywords) was failing as it
detected that both the static VPD and dynamic VPD storage contained an
RV keyword in the VPD-R tag. This is intentional as the static VPD and
dynamic VPD are stored separately (firmware merges their contents and
computes a new RV keyword checksum for the data readable from the VPD
capability in PCIe configuration space).
Submitted by: Andrew Lee <alee at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4915
The only value which has changed is the number of rows
(ER_DZ_EVQ_TMR_REG_ROWS is 2048 vs 1024 for FR_BZ_TIMER_COMMAND_REGP0_ROWS)
but that isn't used, so this shouldn't change behaviour.
Submitted by: Mark Spender <mspender at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4913
If the NVSP protocol version is not greater than NVSP_PROTOCOL_VERSION_2,
then the recv buffer size is 15MB, otherwise the buffer size is 16MB.
Submitted by: Hongjiang Zhang <honzhan microsoft com>
Reviewed by: royger, Dexuan Cui <decui microsoft com>, adrian
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4814
Submitted by: Howard Su <howard0su gmail com>
Reviewed by: royger, Dexuan Cui <decui microsoft com>, adrian
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4693
Submitted by: Howard Su <howard0su@gmail.com>
Reviewed by: delphij, royger, adrian
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4676
We don't need them at all.
Submitted by: Dexuan Cui <decui microsoft com>
Sponsored by: Microsoft OSTC
Reviewed by: royger, adrian, delphij
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D4595
This is first step to move the generic part of HV code into kernel instead
of module, so that it is possible to use hypercall to implement some other
paravirtualization code in the kernel.
Submitted by: Howard Su <howard0su@gmail.com>
Reviewed by: royger, delphij, adrian
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D3072
alignment guarantees provided by m_defrag(9), use m_collapse(9)
instead for performance reasons.
While at it, sanitize the statistics softc members, i. e. retire
unused ones and add SYSCTL nodes missing for actually used ones.
Differential Revision: https://reviews.freebsd.org/D4717
MAKEDEV_CHECKNAME flag to the call, this is required to not panic on
race between the clone and destructing the closed master.
Reported by and discussed with: bde
Tested by: pho (as part of the larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Pulled firmware_ids.h from firmwaresrc and applied genfwdef script.
Submitted by: Richard Houldsworth <rhouldsworth at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4897
The EFSYS_OPT_RX_HDR_SPLIT optional feature in the common code
implemented the Lookahead Split feature of Windows. This split
received packets at a preconfigured byte offset, and delivered
the header and payload portions to separate receive queues.
Now the common code interface has no callers, so remove it.
Note that this should not be confused with the Header Data Split
feature of Windows, which splits packets at a header boundary.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4888
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4880
option to invert the polarity in software. Also add an option to capture
very narrow pulses by using the hardware's MSR delta-bit capability of
latching line state changes.
This effectively reverts the mistake I made in r286595 which was based on
empirical measurements made on hardware using TTL-level signaling, in which
the logic levels are inverted from RS-232. Thus, this re-syncs the polarity
with the requirements of RFC 2783, which is writen in terms of RS-232
signaling.
Narrow-pulse mode uses the ability of most ns8250 and similar chips to
provide a delta indication in the modem status register. The hardware is
able to notice and latch the change when the pulse width is shorter than
interrupt latency, which results in the signal no longer being asserted by
time the interrupt service code runs. When running in this mode we get
notified only that "a pulse happened" so the driver synthesizes both an
ASSERT and a CLEAR event (with the same timestamp for each). When the pulse
width is about equal to the interrupt latency the driver may intermittantly
see both edges of the pulse. To prevent generating spurious events, the
driver implements a half-second lockout period after generating an event
before it will generate another.
Differential Revision: https://reviews.freebsd.org/D4477
Prior to Medford, option ROM config was stored with one partition
per network port. Medford stores option ROM config in a single
partition (as an array of configurations, one per PF).
Update the EFXname /port to MCDI partition mapping for this.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4885
Make error messages consistent, and remove redundant checks.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4884
This API has been replaced by efx_mac_multicast_list_set()
and has no callers.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4883
New filters types may be added, but the same machinery should be able to
handle them.
Submitted by: Mark Spender <mspender at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4881
These bigs are changed on Medford.
Submitted by: Mark Spender <mspender at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4876
The pktfilter module has been obsolete for some time, as
it was replaced by newer features in filter module. With
the removal of the storport driver, this module has no
users and can be removed.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4875
Some new partitions have been added, but they shouldn't need to be
handled any differently.
Submitted by: Mark Spender <mspender at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4873
Rename all except hunt_tx_qdesc_tso_create(), which creates a
fw-assisted TSO v1 descriptor which isn't supported on Medford.
Submitted by: Mark Spender <mspender at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4870
All of these apply to both Huntington and Medford.
Submitted by: Mark Spender <mspender at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4868
This should allow these functions to work for Medford as well.
Submitted by: Mark Spender <mspender at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4866
All these fields will be used in shared ef10 code, so put them in an
ef10 member of a per-architecture union, rather that in the per-chip
union.
Submitted by: Mark Spender <mspender at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4865
Creating some files together to do the build system changes in one go.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4859
This one mainly avoids mbuf cluster allocation for TCP ACKs during
TCP sending tests. And it gives me ~200Mbps improvement (4.7Gbps
-> 4.9Gbps), when running iperf3 TCP sending test w/ 16 connections.
While I'm here, nuke the unnecessary zeroing out pkthdr.csum_flags.
Reviewed by: adrain
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4853
Many applications and kernel modules (e.g. bridge) rely on the ifmedia
status report; give them what they want.
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: Jun Su <junsu microsoftc com>, me, adrian
Modified by: me (minor)
Original differential: https://reviews.freebsd.org/D4611
Differential Revision: https://reviews.freebsd.org/D4852
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
- Implement the LRO using tcp_lro APIs, and LRO is enabled by default.
- Add several stats sysctl nodes.
- Check IP/TCP length before sending the packet to tcp_lro_rx(), if host
does not provide RX csum information (*); and add an option through
sysctl to always trust host TCP segment csum checks (default is off).
- Add sysctl to control the LRO entry depth; it is disabled by default.
It is used to avoid holding too much TCP segments in driver. Limiting
the LRO entry depth helps a lot in a one/two streams RX test.
This one 3x the RX performance on my local test (3Gbps -> 10Gbps), and
~2x the RX performance over a directly connected 40Ge network (5Gbps ->
9Gbps).
(*) It seems the host stops supplying csum information, once the network
load is high. This still needs investigation...
Reviewed by: Hongjiang Zhang <honzhan microsoft com>,
Dexuan Cui <decui microsoft com>,
Jun Su <junsu microsoft com>,
delphij
Tested by: me (local),
Hongjiang Zhang <honzhan microsoft com>
(directly connected 40Ge)
Approved by: delphij (mentor), adrian (mentor, no objection)
With feedback from: delphij, Hongjiang Zhang <honzhan microsoft com>
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4824
This will allow to restore channel list after switching interface
to more restrictive regdomain.
Tested with Intel 3945BG (wpi) only.
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D4863
by busdma to the blkfront driver must be an integer number of sectors,
and must be aligned in memory on a "sector" boundary.
Having these assertions yesterday would have made finding the bug fixed
in r293698 somewhat easier.
Ensure that all iSCSI sessions are correctly terminated during shutdown.
* Enhances the changes done by r286226 (D3052).
* Add shutdown post sync event to run after filesystem shutdown
(SHUTDOWN_PRI_FIRST) but before CAM shutdown (SHUTDOWN_PRI_DEFAULT).
* Changes iscsi_maintenance_thread to processes terminate in preference to
reconnect.
Reviewed by: trasz
MFC after: 2 weeks
Sponsored by: Multiplay
Differential Revision: https://reviews.freebsd.org/D4429
- Add a description of Advantech PCI-1602 Rev. A boards. [1]
- Properly set up REG_ACR also for PCI-1602 Rev. A based on what the
Advantech-supplied Linux driver does.
- Additionally use the macros of <dev/ic/ns16550.h> to replace existing
magic values and get rid of trivial comments.
- Fix the style of some comments.
PR: 205359 [1]
Submitted by: Jan Mikkelsen (original patch) [1]
up to now.
The new sendfile is the code that Netflix uses to send their multiple tens
of gigabits of data per second. The new implementation features asynchronous
I/O, when I/O operations are launched, but not awaited to be complete. An
explanation of why such behavior is beneficial compared to old one is
going to be too long for a commit message, so we will skip it here.
Additional features of new syscall are extra flags, which provide an
application more control over data sent. The SF_NOCACHE flag tells
kernel that data shouldn't be cached after it was sent. The SF_READAHEAD()
macro allows to specify readahead size in pages.
The new syscalls is a drop in replacement. No modifications are required
to applications. One can take nginx binary for stable/10 and run it
successfully on head. Although SF_NODISKIO lost its original sense, as now
sendfile doesn't block, and now means something completely different (tm),
using the new sendfile the old way is absolutely safe.
Celebrates: Netflix global launch!
Sponsored by: Nginx, Inc.
Sponsored by: Netflix
Relnotes: yes
ioat_acquire_reserve() is an extended version of ioat_acquire(). It
allows users to reserve space in the channel for some number of
descriptors. If this succeeds, it guarantees that at least submission
of N valid descriptors will succeed.
Sponsored by: EMC / Isilon Storage Division
Due to FreeBSD system-wide limits on number of MSI-X vectors
(https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321),
it may be desirable to allocate fewer than the maximum number
of vectors for an NVMe device, in order to save vectors for
other devices (usually Ethernet) that can take better
advantage of them and may be probed after NVMe.
This tunable is expressed in terms of minimum number of CPUs
per I/O queue instead of max number of queues per controller,
to allow for a more even distribution of CPUs per queue. This
avoids cases where some number of CPUs have a dedicated queue,
but other CPUs need to share queues. Ideally the PR referenced
above will eventually be fixed and the mechanism implemented
here becomes obsolete anyways.
While here, fix a bug in the CPUs per I/O queue calculation to
properly account for the admin queue's MSI-X vector.
Reviewed by: gallatin
MFC after: 3 days
Sponsored by: Intel
This helps immensily with our ability to operate in the Amazon Cloud.
Discussed on Intel Networking Community call this morning.
Submitted by: Jarrod Petz(petz@nisshoko.net)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D4788
the system is booted and running.
Add PHY detection logic to ixgbe_handle_mod() and add locking to
ixgbe_handle_msf() as well.
PR: 150251
Submitted by: aboyer@averesystems.com
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D3188
e1000/e1000e split in linux.
Split rxbuffer and txbuffer apart to support the new RX descriptor format
structures. Move rxbuffer manipulation to em_setup_rxdesc() to unify the
new behavior changes.
Add a RSSKEYLEN macro for help in generating the RSSKEY data structures
in the card.
Change em_receive_checksum() to process the new rxdescriptor format
status bit.
MFC after: 2 weeks
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D3447
Previously nvme(4) would revert to a signle I/O queue if it could not
allocate enought interrupt vectors or NVMe submission/completion queues
to have one I/O queue per core. This patch determines how to utilize a
smaller number of available interrupt vectors, and assigns (as closely
as possible) an equal number of cores to each associated I/O queue.
MFC after: 3 days
Sponsored by: Intel
Instead just use num_io_queues to make this determination.
This prepares for some future changes enabling use of multiple
queues when we do not have enough queues or MSI-X vectors
for one queue per CPU.
MFC after: 3 days
Sponsored by: Intel
This significantly improves parallelism in the most common case.
The taskqueue is still used whenever BIO_ORDERED bios are in flight.
This patch is based heavily on a patch from gallatin@.
MFC after: 3 days
Sponsored by: Intel
This ensures the bio flags are not read after biodone().
The ordering will still be enforced, after the bio is
submitted successfully.
MFC after: 3 days
Sponsored by: Intel
Still wait until all in-flight bios (including the ordered bio)
complete before processing more bios from the queue.
MFC after: 3 days
Sponsored by: Intel
and t_maxseg. This dualism emerged with T/TCP, but was not properly cleaned
up after T/TCP removal. After all permutations over the years the result is
that t_maxopd stores a minimum of peer offered MSS and MTU reduced by minimum
protocol header. And t_maxseg stores (t_maxopd - TCPOLEN_TSTAMP_APPA) if
timestamps are in action, or is equal to t_maxopd otherwise. That's a very
rough estimate of MSS reduced by options length. Throughout the code it
was used in places, where preciseness was not important, like cwnd or
ssthresh calculations.
With this change:
- t_maxopd goes away.
- t_maxseg now stores MSS not adjusted by options.
- new function tcp_maxseg() is provided, that calculates MSS reduced by
options length. The functions gives a better estimate, since it takes
into account SACK state as well.
Reviewed by: jtl
Differential Revision: https://reviews.freebsd.org/D3593
This optimization is not proper (and causes kernel panic),
since driver checks fw_status to optimize away parsing stage
if it was already done.
Reported by: dchagin
It was returning a pointer to stack-allocated memory, so make the
allocation at the caller instead.
Found by: clang static analyzer
Coverity: CID 1245774
Reviewed by: ed, rpaulo
Review URL: https://reviews.freebsd.org/D4740
The fd is closed later in this case. This fixes a "SS_NOFDREF on enter"
panic.
Submitted by: Krishnamraju Eraparaju @ Chelsio
Reviewed by: Steve Wise @ Open Grid Computing
- Separate 'firmware_put(sc->fw_fp, FIRMWARE_UNLOAD); sc->fw_fp = NULL;'
into iwn_unload_firmware().
- Move error handling to the end of iwn_read_firmware().
No functional changes.
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D4768
iwn(4) / wpi(4) works in the same way
(read_firmware() -> hw_init() -> firmware_put())
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D4766
- Simplify defragmentation code.
- Use proper number of dma segments for data.
Approved by: adrian (mentor)
Obtained from: DragonFlyBSD (mostly)
Differential Revision: https://reviews.freebsd.org/D4754
stable/10 doesn't have the if_getdrvflags(9) KPI. Reference the field in the
structure directly if the __FreeBSD_version is < 1100022, so the driver can
be built with PCI_IOV support on stable/10, without backporting all of
r266974 (which requires additional changes due to projects/ifnet, etc)
Differential Revision: https://reviews.freebsd.org/D4759
Reviewed by: erj, sbruno
Sponsored by: EMC / Isilon Storage Division
These are going to be much more efficient on low end embedded systems
but unfortunately they make it .. less convenient to implement correct
bus barriers and debugging. They also didn't implement the register
serialisation workaround required for Owl (AR5416.)
So, just remove them for now. Later on I'll just inline the routines
from ah_osdep.c.
- Change order of data in if_iwmvar.h
(like it is in other drivers: defines, data structures,
vap/node structures, softc struct and locks); use indentation.
- Fix IWM_LOCK(_sc) / IWM_UNLOCK(_sc) macro.
- Add IWM_LOCK_INIT / DESTROY(sc) + fix mtx_init() usage.
- Wrap iwm_node casts into IWM_NODE() macro.
- Drop some fields:
* wt_hwqueue from Tx radiotap header;
* macaddr[6] from iwm_vap;
Approved by: adrian
Differential Revision: https://reviews.freebsd.org/D4753
tree parsing opt-out rather than opt-in. All FDT-based systems as well as
PowerPC systems with real Open Firmware use the CHRP-derived binding that
includes it, which makes SPARC the odd man out here. Making it opt-out
avoids astonishment on new platform bring up.
The ath hal and driver code all assume the world is an x86 or the
bus layer does an explicit bus flush after each operation (eg netbsd.)
However, we don't do that.
So, to be "correct" on platforms like sparc64, mips and ppc (and maybe
ARM, I am not sure), just do explicit barriers after each operation.
Now, this does slow things down a tad on embedded platforms but I'd
rather things be "correct" versus "fast." At some later point if someone
wishes it to be fast then we should add the barrier calls to the HAL and
driver.
Tested:
* carambola 2 (AR9331.)
rounding) has better spread. Implement fp16_sin() to go along with
fp16_cos(). In the rendering loop, switch from addition to subtraction
so the center of the pattern will be a trough rather than a peak. This
is completely arbitrary, of course, but looks better to me.
This is a port from openbsd. It's incomplete and unstable, but it's better
than nothing. I have no plans to MFC this until it's complete and stable.
Submitted by: kevlo
Add if_requestencap() interface method which is capable of calculating
various link headers for given interface. Right now there is support
for INET/INET6/ARP llheader calculation (IFENCAP_LL type request).
Other types are planned to support more complex calculation
(L2 multipath lagg nexthops, tunnel encap nexthops, etc..).
Reshape 'struct route' to be able to pass additional data (with is length)
to prepend to mbuf.
These two changes permits routing code to pass pre-calculated nexthop data
(like L2 header for route w/gateway) down to the stack eliminating the
need for other lookups. It also brings us closer to more complex scenarios
like transparently handling MPLS nexthops and tunnel interfaces.
Last, but not least, it removes layering violation introduced by flowtable
code (ro_lle) and simplifies handling of existing if_output consumers.
ARP/ND changes:
Make arp/ndp stack pre-calculate link header upon installing/updating lle
record. Interface link address change are handled by re-calculating
headers for all lles based on if_lladdr event. After these changes,
arpresolve()/nd6_resolve() returns full pre-calculated header for
supported interfaces thus simplifying if_output().
Move these lookups to separate ether_resolve_addr() function which ether
returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr()
compat versions to return link addresses instead of pre-calculated data.
BPF changes:
Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT.
Despite the naming, both of there have ther header "complete". The only
difference is that interface source mac has to be filled by OS for
AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside
BPF and not pollute if_output() routines. Convert BPF to pass prepend data
via new 'struct route' mechanism. Note that it does not change
non-optimized if_output(): ro_prepend handling is purely optional.
Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI.
It is not needed for ethernet anymore. The only remaining FDDI user is
dev/pdq mostly untouched since 2007. FDDI support was eliminated from
OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65).
Flowtable changes:
Flowtable violates layering by saving (and not correctly managing)
rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated
header data from that lle.
Differential Revision: https://reviews.freebsd.org/D4102
We'll remove the per-channel control_work_queue because it can't properly
do serialization of message handling, e.g., when there are 2 NIC devices,
vmbus_channel_on_offer() -> hv_queue_work_item() has a race condition:
for an SMP VM, vmbus_channel_process_offer() can run concurrently on
different CPUs and if the second NIC's
vmbus_channel_process_offer() -> hv_vmbus_child_device_register() runs
first, the second NIC's name will be hn0 and the first NIC's name will
be hn1!
We can fix the race condition by removing the per-channel control_work_queue
and run all the message handlers in the global
hv_vmbus_g_connection.work_queue -- we'll do this in the next patch.
With the coming next patch, we have to run the non-blocking handlers
directly in the kernel thread vmbus_msg_swintr(), because the special
handling of sub-channel: when a sub-channel (e.g., of the storvsc driver)
is received and being handled in vmbus_channel_on_offer() running on the
global hv_vmbus_g_connection.work_queue, vmbus_channel_process_offer()
invokes channel->sc_creation_callback, i.e., storvsc_handle_sc_creation,
and the callback will invoke hv_vmbus_channel_open() -> hv_vmbus_post_message
and expect a further reply from the host, but the handling of the further
messag can't be done because the current message's handling hasn't finished
yet; as result, hv_vmbus_channel_open() -> sema_timedwait() will time out
and th device can't work.
Also renamed the handler type from hv_pfn_channel_msg_handler to
vmbus_msg_handler: the 'pfn' and 'channel' in the old name make no sense.
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: royger
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D4596
Now vmbus_channel_on_offer() -> vmbus_channel_process_offer() can
safely run on the global hv_vmbus_g_connection.work_queue now.
We remove the per-channel control_work_queue to achieve the proper
serialization of the message handling.
I removed the bogus TODO in vmbus_channel_on_offer(): a vmbus offer
can only come from the parent partition, i.e., the host.
PR: kern/205156
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: Howard Su <howard0su gmail com>, delphij
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D4597
coalescing and zipping multiple CQEs into a single merged CQE. The
feature is enabled by default and can be disabled by a sysctl.
Implementing this feature mlx5_cqwq_pop() has been separated from
mlx5e_get_cqe().
MFC after: 1 week
Submitted by: Mark Bloch <markb@mellanox.com>
Differential Revision: https://reviews.freebsd.org/D4598
Sponsored by: Mellanox Technologies
cperciva's libmd implementation is 5-30% faster
The same was done for SHA256 previously in r263218
cperciva's implementation was lacking SHA-384 which I implemented, validated against OpenSSL and the NIST documentation
Extend sbin/md5 to create sha384(1)
Chase dependancies on sys/crypto/sha2/sha2.{c,h} and replace them with sha512{c.c,.h}
Reviewed by: cperciva, des, delphij
Approved by: secteam, bapt (mentor)
MFC after: 2 weeks
Sponsored by: ScaleEngine Inc.
Differential Revision: https://reviews.freebsd.org/D3929
This space does not require DMA syncing. It reduces lock scope of the DMA
scratch space. It allows whole DMA scratch space to be used to I/O, so now
we can fetch up to ~1000 ports from SNS.
Due to the last fact, increase maximal number of ports from 256 to 1024.
IEEE 802.3 Clause 45 added backwards-compatible support for 2^16 PHY registers
through the addition of an additional device address frame.
Clause 45 addressing is used in 10Gbe PHYs, 802.3az EEE registers, etc. It may
make sense to provide a similar extension to the miibus interface, but I've
refrained from unilaterally doing so here.
Submitted by: Landon Fuller <landon@landonf.org>
Differential Revision: https://reviews.freebsd.org/D4607
cards supported by cxgbe(4).
On the host side this driver interfaces with the storage stack via the
ICL (iSCSI Common Layer) in the kernel. On the wire the traffic is
standard iSCSI (SCSI over TCP as per RFC 3720/7143 etc.) that
interoperates with all other standards compliant implementations. The
driver is layered on top of the TOE driver (t4_tom) and promotes
connections being handled by t4_tom to iSCSI ULP (Upper Layer Protocol)
mode. Hardware assistance in this mode includes:
- Full TCP processing.
- iSCSI PDU identification and recovery within the TCP stream.
- Header and/or data digest insertion (tx) and verification (rx).
- Zero copy (both tx and rx).
Man page will follow in a separate commit in a couple of weeks.
Relnotes: Yes
Sponsored by: Chelsio Communications
Before this change virtual ports control IOCBs were executed synchronously
via Execute IOCB mailbox command. It required exclusive use of scratch
space of driver and mailbox registers of the hardware. Because of that
shared resources use this code could not really sleep, having to spin for
completion, blocking any other operation.
This change introduces new asynchronous design, sending the IOCBs directly
on request queue and gracefully waiting for their return on response queue.
Returned IOCBs are identified with unified handle space from r292725.
The mdio driver interface is generally useful for devices that require
MDIO without the full MII bus interface. This lifts the driver/interface
out of etherswitch(4), and adds a mdio(4) man page.
Submitted by: Landon Fuller <landon@landonf.org>
Differential Revision: https://reviews.freebsd.org/D4606
I am not sure why this was split long ago, but I see no reason for it.
At this point this unification just slightly reduces memory usage, but
as next step I plan to reuse shared handle space for other IOCB types.