net.inet6.ip6.v6only=0.
Without this patch, the inp_vflag would have INP_IPV4 and the
INP_IPV6 flags for accepted TCP/IPv6 connections if the sysctl
variable net.inet6.ip6.v6only is 0. This resulted in netstat
to report the source and destination addresses as IPv4 addresses,
even they are IPv6 addresses.
PR: 226421
Reviewed by: bz, hiren, kib
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D13514
Three copies of the linuxulator linux_sysvec.c contained identical
BSD to Linux errno translation tables, and future work to support other
architectures will also use the same table. Move the table to a common
file to be used by all. Make it 'const int' to place it in .rodata.
(Some existing Linux architectures use MD errno values, but x86 and Arm
share the generic set.)
This change should introduce no functional change; a followup will add
missing errno values.
MFC after: 3 weeks
Sponsored by: Turing Robotic Industries Inc.
Differential Revision: https://reviews.freebsd.org/D14665
Two copies of chacha20 were imported into the tree on Apr 15 2017 (r316982)
and Apr 16 2017 (r317015). Only the latter is actually used by anything, so
just go ahead and garbage collect the unused version while it's still only
in CURRENT.
I'm not making any judgement on which implementation is better. If I pulled
the wrong one, feel free to swap the existing implementation out and replace
it with the other code (conforming to the API that actually gets used in
randomdev, of course). We only need one generic implementation.
Sponsored by: Dell EMC Isilon
On some systems, we're getting timeouts when we use multiple queues on
drives that work perfectly well on other systems. On a hunch, Jim
Harris suggested I poll the completion queue when we get a timeout.
This patch polls the completion queue if no fatal status was
indicated. If it had pending I/O, we complete that request and
return. Otherwise, if aborts are enabled and no fatal status, we abort
the command and return. Otherwise we reset the card.
This may clear up the problem, or we may see it result in lots of
timeouts and a performance problem. Either way, we'll know the next
step. We may also need to pay attention to the fatal status bit
of the controller.
PR: 211713
Suggested by: Jim Harris
Sponsored by: Netflix
This allows compatibility translation to take place on the stack
(md_ioctl is too big) and is more suitable as a public interface within
the kernel than the kern_ioctl interface.
Except for the initialization of the md_req from the md_ioctl
(including detection of kernel md_file pointers) and the updating
of the md_ioctl prior to return, this is a mechanical replacment
of md_ioctl and mdio with md_req and mdr.
Reviewed by: markj, cem, kib (assorted versions)
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14704
vmd_free_count manipulation. Reduce the scope of the free lock by
using a pageout lock to synchronize sleep and wakeup. Only trigger
the pageout daemon on transitions between states. Drive all wakeup
operations directly as side-effects from freeing memory rather than
requiring an additional function call.
Reviewed by: markj, kib
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14612
Move locks from outside ioctl to the individual implementations.
This is the first step of changing the implementations to act on a
kernel-internal request struct rather than on struct md_ioctl and to
removing the use of kern_ioctl in mountroot.
Reviewed by: cem, kib, markj (prior version)
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14700
The crash scenario goes like this: there's a thread waiting on "reinstate";
because it doesn't update the timeout counter it gets terminated by the
callout; at this point the maintenance thread starts the termination routine.
The first thread finishes waiting, proceeds to icl_conn_handoff(), and drops
the refcount, which allows the maintenance thread to free its resources. At
this point another thread receives a PDU. Boom.
PR: 222898, 219866
Reported by: Eugene M. Zheganin <emz at norma.perm.ru>
Tested by: Eugene M. Zheganin <emz at norma.perm.ru>
Reviewed by: mav@ (earlier version)
MFC after: 2 weeks
Sponsored by: playkey.net
unconditionally incrementing i in the loop;
Reported by: cem
MFC with: r330880
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14685
Improve clarity of a comment and style(9) some areas.
No functional change.
Reported by: markj (on review of a mostly-copied driver)
Sponsored by: Dell EMC Isilon
The problem is that g_access() must be called with the GEOM topology
lock held. And that gives a false impression that the lock is indeed
held across the call. But this isn't always true because many classes,
ZVOL being one of the many, need to drop the lock. It's either to
perform an I/O on the first open or to acquire a different lock (like in
g_mirror_access).
That, of course, can break many assumptions. For example,
g_slice_access() adds an extra exclusive count on the first open. As
described above, an underlying geom may drop the topology lock and that
would open a race with another thread that would also request another
extra exclusive count. In general, two consumers may be granted
incompatible accesses.
To avoid this problem the code is changed to mark a geom with special
flag before calling its access method and clear the flag afterwards. If
another thread sees that flag, then it means that the topology lock has
been dropped (either by the geom in question or downstream from it), so
it is not safe to make another access call. So, the second thread would
use g_topology_sleep() to wait until the flag is cleared and only then
would it proceed with the access.
Also see http://docs.freebsd.org/cgi/mid.cgi?809d9254-ee56-59d8-69a4-08838e985cea
PR: 225960
Reported by: asomers
Reviewed by: markj, mav
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D14533
illumos/illumos-gate@5f5913bb835f5913bb83https://www.illumos.org/issues/9164
This issue has been reported by Alan Somers as
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=225877
dmu_objset_refresh_ownership() first disowns a dataset (and releases
it) and then owns it again. There is an assert that the new dataset
object is the same as the old dataset object. When running ZFS Test
Suite on FreeBSD we see this panic from zpool_upgrade_007_pos test:
panic: solaris assert: newds == os->os_dsl_dataset (0xfffff80045f4c000
== 0xfffff80021ab4800)
I see that the old dataset has dsl_dataset_evict_async() pending in
ds_dbu.dbu_tqent and its ds_dbuf is NULL.
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Don Brady <don.brady@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Andriy Gapon <avg@FreeBSD.org>
PR: 225877
Reported by: asomers
MFC after: 1 week
It seems default timeout of 100ms is not enough for my 2694L card,
while it was perfectly fine for others, even for full-height 2694.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
The NVME standard has required in section 7.2.6, since at least 1.1,
that a clean shutdown is signalled by deleting the subission and the
completion queues before setting the shutdown bit in CC. The 1.0
standard, apparently, did not and many of the early Intel cards didn't
care. Some newer cards care, at least one whose beta firmware can
scramble the card on an unclean shutdown. Linux has done this for some
time. To make it possible to move forward with an evaluation of this
pre-release card with wonky firmware, delete the queues on the card
when we delete the qpair structures.
Sponsored by: Netflix
We'll need to delete namespaces soon, so go ahead and stop making
these devices eternal. It doesn't help much, and will be getting in
the way soon.
Sponsored by: Netflix
It is believed that the conditions Coverity indicated were actually
impossible to hit. So this patch just adds a cleanup to only compute
v_mount once in brelse(), and in vfs_bio_getpages() always initializes error
to zero to appease the static analyzer.
No functional change intended.
Submitted by: Darrick Lew <darrick.freebsd AT gmail.com>
Reviewed by: kib
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14613
During shutdown mps waits for its SSU requests to complete however when
performing a reboot after handling a panic the scheduler is stopped so
getmicrotime which is used can be non-functional.
Switch to using the same method as shutdown_panic to ensure we actually
complete.
In addition reduce the timeout when RB_NOSYNC is set in howto as we expect
this to fail.
Reviewed by: slm
MFC after: 1 week
Sponsored by: Multiplay
Differential Revision: https://reviews.freebsd.org/D12776
Update the ZFS TRIM code to ensure it respects VTOC8 partition headers as
documented by the ZFS On-Disk Specification section 1.3
Before this a zpool create on a VTOC8 partitioned device would overwrite the
partition metadata.
Reported by: marius
Reviewed by: marius agv
MFC after: 1 week
Sponsored by: Multiplay
Compare sbavail() with the cached sb_off of already-sent data instead of
always comparing with zero. This will correctly close the connection and
send the FIN if the socket buffer contains some previously-sent data but
no unsent data.
Reported by: Harsh Jain @ Chelsio
Sponsored by: Chelsio Communications
- Remove the one use of is_tls_offload() and the function. AIO special
handling only needs to be disabled when a TOE socket is actively doing
TLS offload on transmit. The TOE socket's mode (which affects receive
operation) doesn't matter, so remove the check for the socket's mode and
only check if a TOE socket has TLS transmit keys configured to determine
if an AIO write request should fall back to the normal socket handling
instead of the TOE fast path.
- Move can_tls_offload() into t4_tls.c. It is not used in critical paths,
so inlining isn't that important. Change return type to bool while here.
Sponsored by: Chelsio Communications
being marked "standard", which is less confusing than having it conditional
on AIM CPUs here, and then picked up through options FDT from conf/files
on Book-E.
Request by: jhibbits
kern.cam.{,a,n}da.X.invalidate=1 forces *daX to detach by calling
cam_periph_invalidate on the underlying periph. This is for testing
purposes only. Include only with options CAM_TEST_FAILURE and rename
the former [AN]DA_TEST_FAILURE, and fix nda to compile with it set.
We're using it at work to harden geom and the buffer cache to be
resilient in the face of drive failure. Today, it far too often
results in a panic. While much work was done on SIM initiated removal
for the USB thumnb drive removal work, little has been done for periph
initiated removal. This simulates what *daerror() does for some errors
nicely: we get the same panics with it that we do with failing drives.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D14581
When multiple trims are in the queue, collapse them as much as
possible. At present, this usually results in only a few trims being
collapsed together, but more work on that will make it possible to do
hundreds (up to some configurable max).
Sponsored by: Netflix
When the ccb is NULL to cam_iosched_bio_complete, just update the
other statistics, but not the time. If many operations are collapsed
together, this is needed to keep stats properly for the grouped bp.
This should fix trim accounting.
Sponsored by: Netflix
platform that can run without a device tree (PS3) still uses the OF_*()
functions to check if one exists and OF_* is used unconditionally in
core parts of the system like powerpc/machdep.c. Reflect this reality
in files.powerpc, for example by changing occurrences of aim | fdt to
standard.
Includes patch to conditionalize use of __builtin_clz(ll) on __has_builtin().
The issue is tracked upstream at https://github.com/facebook/zstd/pull/884 .
Otherwise, these are vanilla Zstandard 1.3.3 files.
Note that the 1.3.4 release should be due out soon.
Sponsored by: Dell EMC Isilon
The TOE engine in Chelsio T6 adapters supports offloading of TLS
encryption and TCP segmentation for offloaded connections. Sockets
using TLS are required to use a set of custom socket options to upload
RX and TX keys to the NIC and to enable RX processing. Currently
these socket options are implemented as TCP options in the vendor
specific range. A patched OpenSSL library will be made available in a
port / package for use with the TLS TOE support.
TOE sockets can either offload both transmit and reception of TLS
records or just transmit. TLS offload (both RX and TX) is enabled by
setting the dev.t6nex.<x>.tls sysctl to 1 and requires TOE to be
enabled on the relevant interface. Transmit offload can be used on
any "normal" or TLS TOE socket by using the custom socket option to
program a transmit key. This permits most TOE sockets to
transparently offload TLS when applications use a patched SSL library
(e.g. using LD_LIBRARY_PATH to request use of a patched OpenSSL
library). Receive offload can only be used with TOE sockets using the
TLS mode. The dev.t6nex.0.toe.tls_rx_ports sysctl can be set to a
list of TCP port numbers. Any connection with either a local or
remote port number in that list will be created as a TLS socket rather
than a plain TOE socket. Note that although this sysctl accepts an
arbitrary list of port numbers, the sysctl(8) tool is only able to set
sysctl nodes to a single value. A TLS socket will hang without
receiving data if used by an application that is not using a patched
SSL library. Thus, the tls_rx_ports node should be used with care.
For a server mostly concerned with offloading TLS transmit, this node
is not needed as plain TOE sockets will fall back to software crypto
when using an unpatched SSL library.
New per-interface statistics nodes are added giving counts of TLS
packets and payload bytes (payload bytes do not include TLS headers or
authentication tags/MACs) offloaded via the TOE engine, e.g.:
dev.cc.0.stats.rx_tls_octets: 149
dev.cc.0.stats.rx_tls_records: 13
dev.cc.0.stats.tx_tls_octets: 26501823
dev.cc.0.stats.tx_tls_records: 1620
TLS transmit work requests are constructed by a new variant of
t4_push_frames() called t4_push_tls_records() in tom/t4_tls.c.
TLS transmit work requests require a buffer containing IVs. If the
IVs are too large to fit into the work request, a separate buffer is
allocated when constructing a work request. This buffer is associated
with the transmit descriptor and freed when the descriptor is ACKed by
the adapter.
Received TLS frames use two new CPL messages. The first message is a
CPL_TLS_DATA containing the decryped payload of a single TLS record.
The handler places the mbuf containing the received payload on an
mbufq in the TOE pcb. The second message is a CPL_RX_TLS_CMP message
which includes a copy of the TLS header and indicates if there were
any errors. The handler for this message places the TLS header into
the socket buffer followed by the saved mbuf with the payload data.
Both of these handlers are contained in tom/t4_tls.c.
A few routines were exposed from t4_cpl_io.c for use by t4_tls.c
including send_rx_credits(), a new send_rx_modulate(), and
t4_close_conn().
TLS keys for both transmit and receive are stored in onboard memory
in the NIC in the "TLS keys" memory region.
In some cases a TLS socket can hang with pending data available in the
NIC that is not delivered to the host. As a workaround, TLS sockets
are more aggressive about sending CPL_RX_DATA_ACK messages anytime that
any data is read from a TLS socket. In addition, a fallback timer will
periodically send CPL_RX_DATA_ACK messages to the NIC for connections
that are still in the handshake phase. Once the connection has
finished the handshake and programmed RX keys via the socket option,
the timer is stopped.
A new function select_ulp_mode() is used to determine what sub-mode a
given TOE socket should use (plain TOE, DDP, or TLS). The existing
set_tcpddp_ulp_mode() function has been renamed to set_ulp_mode() and
handles initialization of TLS-specific state when necessary in
addition to DDP-specific state.
Since TLS sockets do not receive individual TCP segments but always
receive full TLS records, they can receive more data than is available
in the current window (e.g. if a 16k TLS record is received but the
socket buffer is itself 16k). To cope with this, just drop the window
to 0 when this happens, but track the overage and "eat" the overage as
it is read from the socket buffer not opening the window (or adding
rx_credits) for the overage bytes.
Reviewed by: np (earlier version)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D14529
- Change t4_ddp_mod_load() to return void instead of always returning
success. This avoids having to pretend to have proper support for
unloading when only part of t4_tom_mod_load() has run.
- If t4_register_uld() fails, don't invoke t4_tom_mod_unload() directly.
The module handling code in the kernel invokes MOD_UNLOAD on a module
whose MOD_LOAD fails with an error already.
Reviewed by: np (part of a larger patch)
MFC after: 1 month
Sponsored by: Chelsio Communications
Always terminate the list with -1 and document the ioctl behavior.
This preserves existing behavior as seen from userspace with the
addition of the unconditional termination which will not be seen by
working consumers of MDIOCLIST.
Because this ioctl can only be performed by root (in default
configurations) and is not used in the base system this bug is not
deemed to warrant either a security advisory or an eratta notice.
Reviewed by: kib
Obtained from: CheriBSD
Discussed with: security-officer (gordon)
MFC after: 3 days
Security: kernel heap buffer overflow
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14685
For _IO() ioctls, addr is a pointer to uap->data which is a caddr_t.
When the caddr_t stores an int, dereferencing addr as an (int *) results
in truncation on little-endian 64-bit systems and corruption (owing to
extracting top bits) on big-endian 64-bit systems. In practice the
value of chan was probably always zero on systems of the latter type as
all such FreeBSD platforms use a register-based calling convention.
Reviewed by: mav
Obtained from: CheriBSD
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14673
There, the pages freed might be managed but the page's lock is not
owned. For KPI correctness, the page lock is requried around the call
to vm_page_free_prep(), which is asserted. Reclaim loop already did
the work which could be done by vm_page_free_prep(), so the lock is
not needed and the only consequence of not owning it is the assert
trigger.
Instead of adding the locking to satisfy the assert, revert to the
code that calls vm_page_free_phys() directly.
Reported by: pho
Discussed with: jeff
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
This fixes a problem encountered on the Lenovo Thinkpad X220/Yoga 11e where
runtime services would try to inexplicably jump to other parts of memory
where it shouldn't be when attempting to enumerate EFI vars, causing a
panic.
The virtual mapping is enabled by default and can be disabled by setting
efi_disable_vmap in loader.conf(5).
Reviewed by: kib (earlier version)
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D14677
Migrate to modern types before creating MD Linuxolator bits for new
architectures.
Reviewed by: cem
Sponsored by: Turing Robotic Industries Inc.
Differential Revision: https://reviews.freebsd.org/D14676
When the kernel can be in real mode in early boot, we can execute from
high addresses aliased to the kernel's physical memory. If that high
address has the first two bits set to 1 (0xc...), those addresses will
automatically become part of the direct map. This reduces page table
pressure from the kernel and it sets up the kernel to be used with
radix translation, for which it has to be up here.
This is accomplished by exploiting the fact that all PowerPC kernels are
built as position-independent executables and relocate themselves
on start. Before this patch, the kernel runs at 1:1 VA:PA, but that
VA/PA is random and set by the bootloader. Very early, it processes
its ELF relocations to operate wherever it happens to find itself.
This patch uses that mechanism to re-enter and re-relocate the kernel
a second time witha new base address set up in the early parts of
powerpc_init().
Reviewed by: jhibbits
Differential Revision: D14647
As noted in the comment, UEFI spec claims the capabilities pointer is
optional, but some implementations will choke and attempt to dereference it
without checking. This specific problem was found on a Lenovo Thinkpad X220
that would panic in efirtc_identify.
Or else disable the device. Note that the detection can be bypassed by
setting the hw.atrtc.enable option in the loader configuration file.
More information can be found on atrtc(4).
Sponsored by: Citrix Systems R&D
Reviewed by: ian
Differential revision: https://reviews.freebsd.org/D14399
On x86 the IA-PC Boot Flags in the FADT can signal whether VGA is
available or not.
Sponsored by: Citrix systems R&D
Reviewed by: marcel
Differential revision: https://reviews.freebsd.org/D14397
The old code used the thread's pcb via the uap->data pointer.
Reviewed by: ed
Approved by: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14674
The ioctl objects contain pointers and require translation and some
refactoring of the infrastructure to work. For now prevent opertion
on garbage values. This is very slightly overbroad in that ENCIOC_INIT
is safe.
Reviewed by: imp, kib
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14671
These take a union ccb argument which is full of kernel pointers.
Substantial translation efforts would be required to make this work.
By rejecting the request we avoid processing or returning entierly
wrong data.
Reviewed by: imp, ken, markj, cem
Obtained from: CheriBSD
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14654
Remove NO_FUEWORD so the 'e' variants are wrapped by the non-'e'
variants. This is more correct and leaves sparc64 as the outlier.
Reviewed by: jmallett, kib
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14603
The gcc 7 does check for switch statement fall through cases, and if legit,
such complaint can besilenced by /* FALLTHROUGH */ comment. Unfortunately
such comment is quite limited, but will still notify the reader.
This patch is backport from illumos, see
https://www.illumos.org/rb/r/941/
Reviewed by: eadler
Differential Revision: https://reviews.freebsd.org/D14663
Make sure the periph lock is held around rmw access to softc data,
espeically flags, including work flags in iosched.
Add asserts for the periph lock where it should be held.
PR: 226510
Sponsored by: Netflix
Differential Review: https://reviews.freebsd.org/D14456
ip_reass() expects IPv4 packet and will just corrupt any IPv6 packets
that it gets. Until proper IPv6 fragments handling function will be
implemented, pass IPv6 packets to next rule.
PR: 170604
MFC after: 1 week
from the i8254 driver when I created separate mutexes for each. The i8254
driver could be the active timecounter, leading to recursion during mutex
profiling, but the atrtc driver cannot be a timecounter, so it isn't needed.
o count in_nomem counter when we have failed to allocate mbuf for
promisc socket;
o count in_msgtarget counter when we have secussfully sent data to socket;
o Since we are sending messages in a loop, returning error on first fail
interrupts the loop, and all remaining sockets will not receive this
message. So, do not return error when we have failed to send data to ALL
or REGISTERED target. Return error only for KEY_SENDUP_ONE case. Now,
when some socket has overfilled its receive buffer, this will not break
other sockets.
MFC after: 2 weeks
un-function-like RTC_LOCK/UNLOCK macro usage into normal function calls.
Since there is no longer any need to handle register access from a debugger
context, those function calls can just be regular mutex lock/unlock calls.
Requested by: bde
command handler which provided much the same information. Removing the
possibility of accessing the hardware regs from the debugger context
paves the way for simplifying the locking code in the driver.
Nothing uses the #define's values or the types. (Some NTP code does use
an audio_info_t, but it is in #ifdef'd support for Solaris and is not
this audio_info_t).
Sponsored by: DARPA, AFRL
For each regulators create an hw.regulator.<regname>. :
uvolt: Current value
always_on: 1 If the reg is always on
boot_on: 1 If the reg is set at boot time
enable_cnt: Number of consumer(s)
enable_delay: Delay before enabling the regulator
ramp_delay: The Ramp delay
max_uamp: The maximum value of the regulator in uAmps
min_uamp: The minimal value of the regulator in uAmps
max_uvolt: The maximum value of the regulator in uVolts
min_uvolt: The minimal value of the regulator in uVolts
Reviewed by: ian
Differential Revision: https://reviews.freebsd.org/D14578
These parameters may be changed via ifconfig(8); by default,
mgt / mcast rates are lowest possible and ucast rate is not set
(matches previous configuration).
While here, store some variables locally for better readability.
The vfs.mountroot.timeout tunable and .timeout directive in a mount.conf(5)
file allow specifying a wait timeout for the device(s) hosting the root
filesystem to become usable. The current mechanism for waiting for devices
and detecting their availability can't be used for zfs-hosted filesystems.
See the comment #20 in the PR for some expanded detail on these points.
This change adds retry logic to the actual root filesystem mount. That is,
insted of relying on device availability using device name lookups, it uses
the kernel_mount() call itself to detect whether the filesystem can be
mounted, and loops until it succeeds or the configured timeout is exceeded.
These changes are based on the patch attached to the PR, but it's rewritten
enough that all mistakes belong to me.
PR: 208882
X-MFC after: sufficient testing, and hopefully in time for 11.1
this check on open, but "iscsictl -M", or an iSCSI redirect received by
iscsid(8) could end up with two sessions with the same target name and
portal.
MFC after: 2 weeks
Upstream DTBs don't provide IRQ lines for the RNG. Moreover, harvesting
bytes as often as the RNG interrupt is triggered (87 times per sec) is an
overkill.
For these reasons, get rid of the interrupt mode and make callout mode the
default, with random bits harvested every 4 seconds.
Submitted by: Sylvain Garrigues <sylgar@gmail.com>
Reviewed by: ian, imp, manu, mmel
Approved by: emaste
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D14541
When complete_all() is called there might be multiple waiters. The
current implementation could only handle one waiter. Make sure the
completion is sticky when complete_all() is called to be compatible
with Linux.
Found by: Johannes Lundberg <johalun0@gmail.com>
MFC after: 1 week
Sponsored by: Mellanox Technologies
Sponsored by: Limelight Networks
Move copy-pasted code for RTS/CTS frame allocation into net80211.
While here, add stat / debug message for allocation failures
(copied from run(4)) + return error here in bwn(4).
Reviewed by: adrian
Differential Revision: https://reviews.freebsd.org/D14628
This seems to no be needed on supported hardware as they are cache-coherent,
however this may not be the case on all platforms.
Sponsored by: DARPA, AFRL
rrs - Lets make the LRO code look for true dup-acks and window update acks
fly on through and combine.
rrs - Make the LRO engine a bit more aware of ack-only seq space. Lets not
have it incorrectly wipe out newer acks for older acks when we have
out-of-order acks (common in wifi environments).
jeggleston - LRO eating window updates
Based on all of the above I think we are RFC compliant doing it this way:
https://tools.ietf.org/html/rfc1122
section 4.2.2.16
"Note that TCP has a heuristic to select the latest window update despite
possible datagram reordering; as a result, it may ignore a window update with
a smaller window than previously offered if neither the sequence number nor the
acknowledgment number is increased."
Submitted by: Kevin Bowling <kevin.bowling@kev009.com>
Reviewed by: rstone gallatin
Sponsored by: NetFlix and Limelight Networks
Differential Revision: https://reviews.freebsd.org/D14540
There is a difference when parsing a completion entry between Ethernet
and IB ports. When link layer is Ethernet the bits describe the type of
L3 header in the packet. In the case when link layer is Ethernet and VLAN
header is present the value of SL is equal to the 3 UP bits in the VLAN
header. If VLAN header is not present then the SL is undefined and consumer
of the completion should check if IB_WC_WITH_VLAN is set.
While that, this patch also fills the vlan_id field in the completion if
present.
linux commit 12f8fedef2ec94c783f929126b20440a01512c14
MFC after: 1 week
Sponsored by: Mellanox Technologies
mlx5core.
Do not consider the inability to create a firmware dump fatal, but
inform about the situation and allow the driver to attach. The device
might not implement the needed VSC, or we might not know the layout of
the registers map. In either case, only firmware dump functionality is
limited, the network operations should be fine.
Submitted by: kib@
MFC after: 1 week
Sponsored by: Mellanox Technologies
When the mlx5en(4) driver was converted to using BUSDMA(9) the call to
m_defrag() was moved after the part of the TX routine that strips the
header from the mbuf chain. Before it called m_defrag it first trimmed
off the now-empty mbufs from the start of the chain. This has the side
effect of also removing the head of the chain that has M_PKTHDR set.
m_defrag() will not defrag a chain that does not have M_PKTHDR set,
thus it was effectively never defragging the mbuf chains.
As it turns out, trimming the mbufs in this fashion is unnecessary since
the call to bus_dmamap_load_mbuf_sg doesn't map empty mbufs anyway, so
remove it.
Differential Revision: https://reviews.freebsd.org/D12050
Submitted by: mjoras@
MFC after: 1 week
Sponsored by: Mellanox Technologies
Set and report vport MTU rather than physical MTU,
The driver will set both vport and physical port mtu
and will rely on the query of vport mtu.
SRIOV VFs have to report their MTU to their vport manager (PF),
and this will allow them to work with any MTU they need
without failing the request.
Also for some cases where the PF is not a port owner, PF can
work with MTU less than the physical port mtu if set physical
port mtu didn't take effect.
Based on Linux upstream commit:
cd255efff9baadd654d6160e52d17ae7c568c9d3
Submitted by: Meny Yossefi <menyy@mellanox.com>
MFC after: 1 week
Sponsored by: Mellanox Technologies
Currently the ifnet interface is named mceX, where X is a monotonically
incremented value. If the device is reset due to a fatal error, then the
interface name will change. Using the device unit number will keep the
naming consistent across the reset logic.
Submitted by: Matthew Finlay <matt@mellanox.com>
MFC after: 1 week
Sponsored by: Mellanox Technologies
ConnectX-4/5 devices in mlx5core.
The dump is obtained by reading a predefined register map from the
non-destructive crspace, accessible by the vendor-specific PCIe
capability (VSC). The dump is stored in preallocated kernel memory and
managed by the mlx5tool(8), which communicates with the driver using a
character device node.
The utility allows to store the dump in format
<address> <value>
into a file, to reset the dump content, and to manually initiate the
dump.
A call to mlx5_fwdump() should be added at the places where a dump
must be fetched automatically. The most likely place is right before a
firmware reset request.
Submitted by: kib@
MFC after: 1 week
Sponsored by: Mellanox Technologies
Add the ability to access the vendor specific space gateway in order
to support reading and writing data into the different configuration
domains.
Submitted by: Matthew Finlay <matt@mellanox.com>
MFC after: 1 week
Sponsored by: Mellanox Technologies
Add support for PFC and implement reading the per priority statistics
using the sysctl(8) interface. PFC is used together with VLAN priority
and can be enabled and disabled on a per priority basis.
Global pause frames and PFC are incompatible features and surrounding
logic has been added to warn the user about misconfiguration.
Update relevant mlx5core APIs for PFC configuration.
MFC after: 1 week
Sponsored by: Mellanox Technologies
ECN configuration and statistics is available through a set of sysctl(8)
nodes under sys.class.infiniband.mlx5_X.cong . The ECN configuration
nodes can also be used as loader tunables.
MFC after: 1 week
Sponsored by: Mellanox Technologies
This patch accumulates the following Linux commits:
mlx5_health.c
- 78ccb25861d76a8fc5c678d762180e6918834200
mlx5_core: Fix wrong name in struct
- 171bb2c560f45c0427ca3776a4c8f4e26e559400
mlx5_core: Update health syndromes
- 0144a95e2ad53a40c62148f44fb0c1f9d2a0d1e9
mlx5_core: Use accessor functions to read from device memory
- ac6ea6e81a80172612e0c9ef93720f371b198918
mlx5_core: Use private health thread for each device
- fd76ee4da55abb21babfc69310d321b9cb9a32e0
mlx5_core: Fix internal error detection conditions
- 2241007b3d783cbdbaa78c30bdb1994278b6f9b9
mlx5: Clear health sick bit when starting health poll
- 712bfef60912d91033cb25739f7444d5b8d8c59f
mlx5: Fix version printout in case of health issue
- 89d44f0a6c732db23b219be708e2fe1e03ee4842
mlx5_core: Add pci error handlers to mlx5_core driver
mlx5_cmd.c
- be87544de8df2b1eb34bcb5e32691287d96f9ec4
mlx5_core: Fix async commands return code
- a31208b1e11df334d443ec8cace7636150bb8ce2
mlx5_core: New init and exit flow for mlx5_core
- 020446e01eebc9dbe7eda038e570ab9c7ab13586
mlx5_core: Prepare cmd interface to system errors handling
- 89d44f0a6c732db23b219be708e2fe1e03ee4842
mlx5_core: Add pci error handlers to mlx5_core driver
- 0d834442cc247c7b3f3bd6019512ae03e96dd99a
mlx5: Fix teardown errors that happen in pci error handler
mlx5_main.c
- 5fc7197d3a256d9c5de3134870304b24892a4908
mlx5: Add pci shutdown callback
Submitted by: Matthew Finlay <matt@mellanox.com>
MFC after: 1 week
Sponsored by: Mellanox Technologies
there is a valid reservation. This can trip erroneously when memory
falls within a domain but doesn't have the reservation initialized because
it does not meet size or alignment requirements.
Reported by: pho, mjg
Sponsored by: Netflix, Dell/EMC Isilon
accomplishes a few things:
- Makes NULL an invalid address in the kernel, which is useful for catching
bugs.
- Lays groundwork for radix-tree translation on POWER9, which requires the
direct map be at high memory.
- Similarly lays groundwork for a direct map on 64-bit Book-E.
The new base address is chosen as the base of the fourth radix quadrant
(the minimum kernel address in this translation mode) and because all
supported CPUs ignore at least the first two bits of addresses in real
mode, allowing direct-map addresses to be used in real-mode handlers.
This is required by Linux and is part of the architecture standard
starting in POWER ISA 3, so can be relied upon.
Reviewed by: jhibbits, Breno Leitao
Differential Revision: D14499
Add support for mapping priority to traffic class via sysctl
Submitted by: Slava Shwartsman <slavash@mellanox.com>
MFC after: 1 week
Sponsored by: Mellanox Technologies