Drivers can now pass up numa domain information via the
mbuf numa domain field. This information is then used
by TCP syncache_socket() to associate that information
with the inpcb. The domain information is then fed back
into transmitted mbufs in ip{6}_output(). This mechanism
is nearly identical to what is done to track RSS hash values
in the inp_flowid.
Follow on changes will use this information for lacp egress
port selection, binding TCP pacers to the appropriate NUMA
domain, etc.
Reviewed by: markj, kib, slavash, bz, scottl, jtl, tuexen
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20028
Previously, a pid check was used to prevent open of the tun(4); this works,
but may not make the most sense as we don't prevent the owner process from
opening the tun device multiple times.
The potential race described near tun_pid should not be an issue: if a
tun(4) is to be handed off, its fd has to have been sent via control message
or some other mechanism that duplicates the fd to the receiving process so
that it may set the pid. Otherwise, the pid gets cleared when the original
process closes it and you have no effective handoff mechanism.
Close up another potential issue with handing a tun(4) off by not clobbering
state if the closer isn't the controller anymore. If we want some state to
be cleared, we should do that a little more surgically.
Additionally, nothing prevents a dying tun(4) from being "reopened" in the
middle of tun_destroy as soon as the mutex is unlocked, quickly leading to a
bad time. Return EBUSY if we're marked for destruction, as well, and the
consumer will need to deal with it. The associated character device will be
destroyed in short order.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20033
It seems that there should be a better way to handle this, but this seems to
be the more common approach and it should likely get replaced in all of the
places it happens... Basically, thread 1 is in the process of destroying the
tun/tap while thread 2 is executing one of the ioctls that requires the
tun/tap mutex and the mutex is destroyed before the ioctl handler can
acquire it.
This is only one of the races described/found in PR 233955.
PR: 233955
Reviewed by: ae
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20027
The <sys/pctrie.h> APIs expect a 64-bit DMA key.
This is fine as long as the DMA is less than or equal to 64 bits, which
is currently the case.
Sponsored by: Mellanox Technologies
From 7d8dc6544c
"The mcbin (and likely others) have a nonstandard uart clock. This means
that the earlycon programming will incorrectly set the baud rate if it is
specified. The way around this is to tell the kernel to continue using the
preprogrammed baud rate. This is done by setting the baud to 0."
Our drivers (uart_dev_ns8250) do respect zero, but SPCR would error. Let's
not error.
Submitted by: Greg V <greg@unrelenting.technology>
Reviewed by: mw, imp, bcran
Differential Revision: https://reviews.freebsd.org/D19914
This is fairly similar to the AES-GCM support in ccr(4) in that it will
fall back to software for certain cases (requests with only AAD and
requests that are too large).
Tested by: cryptocheck, cryptotest.py
MFC after: 1 month
Sponsored by: Chelsio Communications
A request to encrypt an empty payload without any AAD is unusual, but
it is defined behavior. Removing this assertion removes a panic and
instead returns the correct tag for an empty buffer.
Reviewed by: cem, sef
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D20043
To workaround limitations in the crypto engine, empty buffers are
handled by manually constructing the final length block as the payload
passed to the crypto engine and disabling the normal "final" handling.
For HMAC this length block should hold the length of a single block
since the hash is actually the hash of the IPAD digest, but for
"plain" SHA the length should be zero instead.
Reported by: NIST SHA1 test failure
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Add support for newer Thinkpad models with id LEN0268. Was tested on
Thinkpad T480 and ThinkPad X1 Yoga 2nd gen.
PR: 229120
Submitted by: Ali Abdallah <aliovx@gmail.com>
MFC after: 1 week
destroy_dev_sched_cb() is excessively asynchronous, and during media change
retaste new provider may appear sooner then device of the previous one get
destroyed.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
We may need the BSP to reboot, but we don't need any AP CPU that isn't the
panic thread. Any CPU landing in this routine during panic isn't the panic
thread, so we can just detect !BSP && panic and shut down the logical core.
The savings can be demonstrated in a bhyve guest with multiple cores; before
this change, N guest threads would spin at 100% CPU. After this change,
only one or two threads spin (depending on if the panicing CPU was the BSP
or not).
Konstantin points out that this may break any future patches which allow
switching ddb(4) CPUs after panic and examining CPU-local state that cannot
be inspected remotely. In the event that such a mechanism is incorporated,
this behavior could be made configurable by tunable/sysctl.
Reviewed by: kib
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20019
As with mlx5en, the idea is to drop unwanted traffic as early
in receive as possible, before mbufs are allocated and anything
is passed up the stack. This can save considerable CPU time
when a machine is under a flooding style DOS attack.
The major change here is to remove the unneeded abstraction where
callers of rxd_frag_to_sd() get back a pointer to the mbuf ring, and
are responsible for NULL'ing that mbuf themselves. Now this happens
directly in rxd_frag_to_sd(), and it returns an mbuf. This allows us
to use the decision (and potentially mbuf) returned by the pfil
hooks. The driver can now recycle mbufs to avoid re-allocation when
packets are dropped.
Reviewed by: marius (shurd and erj also provided feedback)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19645
This GRE-in-UDP encapsulation allows the UDP source port field to be
used as an entropy field for load-balancing of GRE traffic in transit
networks. Also most of multiqueue network cards are able distribute
incoming UDP datagrams to different NIC queues, while very little are
able do this for GRE packets.
When an administrator enables UDP encapsulation with command
`ifconfig gre0 udpencap`, the driver creates kernel socket, that binds
to tunnel source address and after udp_set_kernel_tunneling() starts
receiving of all UDP packets destined to 4754 port. Each kernel socket
maintains list of tunnels with different destination addresses. Thus
when several tunnels use the same source address, they all handled by
single socket. The IP[V6]_BINDANY socket option is used to be able bind
socket to source address even if it is not yet available in the system.
This may happen on system boot, when gre(4) interface is created before
source address become available. The encapsulation and sending of packets
is done directly from gre(4) into ip[6]_output() without using sockets.
Reviewed by: eugen
MFC after: 1 month
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D19921
mtmsr and mtsr require context synchronizing instructions to follow. Without
a CSI, there's a chance for a machine check exception. This reportedly does
occur on a MPC750 (PowerMac G3).
Reported by: Mark Millard
r346307 inadvertently started installing FDT_DTS_FILE along with the kernel.
While this isn't necessarily bad, it was not intended or discussed and it
actively breaks some current setups that don't anticipate any .dtb being
installed when it's using static fdt. This change could be reconsidered down
the line, but it needs to be done with prior discussion.
Fix it by pushing FDT_DTS_FILE build down into the raw dtb.build.mk bits.
This technically allows modules building DTS to accidentally specify an
FDT_DTS_FILE that gets built but isn't otherwise useful (since it's not
installed), but I suspect this isn't a big deal and would get caught with
any kind of testing -- and perhaps this might end up useful in some other
way, for example by some module wanting to embed fdt in some other way than
our current/normal mechanism.
Reported by: Mori Hiroki <yamori813@yahoo.co.jp>
MFC after: 3 days
X-MFC-With: r346307
tun destruction will not continue until TUN_OPEN is cleared. There are brief
moments in tunclose where the mutex is dropped and we've already cleared
TUN_OPEN, so tun_destroy would be able to proceed while we're in the middle
of cleaning up the tun still. tun_destroy should be blocked until these
parts (address/route purges, mostly) are complete.
PR: 233955
MFC after: 2 weeks
If kern.random.initial_seeding.bypass_before_seeding is disabled, random(4)
and arc4random(9) will block indefinitely until enough entropy is available
to initially seed Fortuna.
It seems that zero flowids are perfectly valid, so avoid blocking on random
until initial seeding takes place.
Discussed with: bz (earlier revision)
Reviewed by: thj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20011
As mphyp_pte_unset() can also remove PTE entries, and as this can
happen in parallel with PTEs evicted by mphyp_pte_insert(), there
is a (rare) chance the PTE being evicted gets removed before
mphyp_pte_insert() is able to do so. Thus, the KASSERT should
check wether the result is H_SUCCESS or H_NOT_FOUND, to avoid
panics if the situation described above occurs.
More details about this issue can be found in PR 237470.
PR: 237470
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D20012
Tested by Greg V with mlx5en on an Ampere eMAG instance at Packet.com on
c2.large.arm (with some additional uncommitted PCIe WIP).
PR: 237055
Submitted by: Greg V <greg@unrelenting.technology>
Reviewed by: hselasky
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D19983
RFC 4391 specifies that the IB interface GID should be re-used as IPv6
link-local address. Since the code in in6_get_hw_ifid() ignored
IFT_INFINIBAND case, ibX interfaces ended up with the local address
borrowed from some other interface, which is non-compliant.
Use lowest eight bytes from GID for filling the link-local address,
same as Linux.
Reviewed by: bz (previous version), ae, hselasky, slavash,
Sponsored by: Mellanox Technologies
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D20006
In r297225 the initial INP_RLOCK() was replaced by an early
acquisition of an r- or w-lock depending on input variables
possibly extending the write locked area for reasons not entirely
clear but possibly to avoid a later case of unlock and relock
leading to a possible race condition and possibly in order to
allow the route cache to work for connected sockets.
Unfortunately the conditions were not 1:1 replicated (probably
because of the route cache needs). While this would not be a
problem the legacy IP code compared to IPv6 has an extra case
when dealing with IP_SENDSRCADDR. In a particular case we were
holding an exclusive inp lock and acquired the shared udbinfo
lock (now epoch).
When then running into an error case, the locking assertions
on release fired as the udpinfo and inp lock levels did not match.
Break up the special case and in that particular case acquire
and udpinfo lock depending on the exclusitivity of the inp lock.
MFC After: 9 days
Reported-by: syzbot+1f5c6800e4f99bdb1a48@syzkaller.appspotmail.com
Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D19594
visible on armv7 and armv8. Similar issue to rS302292.
Obtained from: Semihalf
Authored by: Michal Krawczyk <mk@semihalf.com>
Approved by: wma
Differential Revision: https://reviews.freebsd.org/D19932
Summary: when using pseries-llan driver, Opkts and Oerrs counters (netstat
-i) are always zero. This patch adds an small error handling to increment
these counters.
Submitted by: alfredo.junior_eldorado.org.br
Differential Revision: https://reviews.freebsd.org/D20009
Some hypervisor calls, such as H_SEND_LOGICAL_LAN, take more arguments than
are traditionally passed in registers. The HCALL ABI will accept these
arguments in r11 and r12. With ELFv2 ABI, these arguments are 2
double-words lower than ELFv1 ABI, as two double-words in the stack frame
are no longer used, and therefore removed from the frame. Fix the offsets
for loading the registers for the HCALL. This fixes the phyp_llan driver
with ELFv2 kernel.
Submitted by: alfredo.junior_eldorado.org.br
Differential Revision: https://reviews.freebsd.org/D20008
This commit adds new if_alloc_domain() and if_alloc_dev() methods to
allocate ifnets. When called with a domain on a NUMA machine,
ifalloc_domain() will record the NUMA domain in the ifnet, and it will
allocate the ifnet struct from memory which is local to that NUMA
node. Similarly, if_alloc_dev() is a wrapper for if_alloc_domain
which uses a driver supplied device_t to call ifalloc_domain() with
the appropriate domain.
Note that the new if_numa_domain field fits in an alignment pad in
struct ifnet, and so does not alter the size of the structure.
Reviewed by: glebius, kib, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19930
This fixes a bug that prevented the driver from auto-flashing the
firmware when it didn't see one on the card. This feature was
introduced in r321390 and this bug was introduced in r343269.
Reported by: gallatin@
MFC after: 1 week
Sponsored by: Chelsio Communications
The checks are too expensive for a general-purpose kernel. Enable the
checks when DIAGNOSTIC is defined and provide a sysctl to enable the
checks in a non-DIAGNOSTIC INVARIANTS kernel.
Reviewed by: kib
Discussed with: Doug Moore <dougm@rice.edu>
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19999
fragmented packets.
When sending IPv4 and IPv6 fragmented packets and a fragment is lost,
the mbuf making up the fragment will remain in the temporary hashed
fragment list for a while. If the network interface departs before the
so-called slow timeout clears the packet, the fragment causes a panic
when the timeout kicks in due to accessing a freed network interface
structure.
Make sure that when a network device is departing, all hashed IPv4 and
IPv6 fragments belonging to it, get freed.
Backtrace:
panic()
icmp6_reflect()
hlim = ND_IFINFO(m->m_pkthdr.rcvif)->chlim;
^^^^ rcvif->if_afdata[AF_INET6] is NULL.
icmp6_error()
frag6_freef()
frag6_slowtimo()
pfslowtimo()
softclock_call_cc()
softclock()
ithread_loop()
Differential Revision: https://reviews.freebsd.org/D19622
Reviewed by: bz (network), adrian
MFC after: 1 week
Sponsored by: Mellanox Technologies
According to specs and common sense, all sense data reported in descriptor
format should be valid. But practice shows different, some devices return
descriptors with invalid data, resulting in error messages looking worse.
Decouple block/stream commands sense data and information field printing.
Looking on present specs, there are much more cases when those fields are
not related, and incomplete old code was not printing valid sense data and
leaving empty lines for invalid.
MFC after: 2 weeks
occasional spurious interrupts are a normal thing on this hardware. Also,
change the name of the cpu-local interrupt controller driver from local_intc
to lintc, because the name gets built into interrupt names, which have to
fit into a 19-byte field for stats reporting (so this allows 5 more bytes
of the actual interrupt name to be displayed).
One of the fun issues with scanning has been how the existing
ANI values were programmed into the hardware when channels were
changed. If you're on a really crappy channel and ANI has made
you deaf then when you scan you continue to be deaf on all channels.
This code passes in a flag to startpcureceive which in AR5416 and later
is also used to enable ANI. This allows it to know if it's a normal
operation or a scan operation.
This fixes my situation at home where a temporary spot of a device
going deaf due to interference starts scanning and .. can't hear
anything until I restart.
Now, this isn't the full fix - ideally:
(a) all the ANI config and per-channel information would be migrated
to the shared HAL stuff and enabled for all of the NICs;
(b) when a station reassociates and some other error conditions
(like missed beacons, NF calibration failures, etc) a knob
to reset ANI parameters would likely help recovery.
But hey, I'm committing bits of code again! woo!
Tested:
* AR9344 (2G), STA operation
This fixes a bug where, even when hw.psm.tap_enabled=0, touchpad taps
were processed.
tap_enabled has three states: unconfigured, disabled, and enabled (-1, 0, 1).
To respect PR kern/139272, taps are ignored only when explicity disabled.
Submitted by: Ben LeMasurier <ben@crypt.ly> (initial version)
MFC after: 2 weeks
Ignoring of gesture processing when the palm is detected helps to reduce
some of the erratic pointer behavior.
This fixes regression introduced in r317814
Reported by: Ben LeMasurier <ben@crypt.ly>
MFC after: 2 weeks