Commit Graph

141426 Commits

Author SHA1 Message Date
John Baldwin
8a67a1a964 <sys/bitstring.h>: Cast _BITSTR_BITS to int in a ternary operator.
This fixes a -Wsign-compare error reported by GCC due to the two
results of the ternary operator having differing signedness.

Reviewed by:	dougm, rlibby
Differential Revision:	https://reviews.freebsd.org/D34122
2022-02-01 09:45:11 -08:00
Kristof Provost
4daa31c108 pflog: align header to 4 bytes, not 8
6d4baa0d01 incorrectly rounded the lenght of the pflog header up to 8
bytes, rather than 4.

PR:		261566
Reported by:	Guy Harris <gharris@sonic.net>
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-02-01 18:17:44 +01:00
Hans Petter Selasky
84d7b8e75f mlx5en: Implement TLS RX support.
TLS RX support is modeled after TLS TX support. The basic structures and layouts
are almost identical, except that the send tag created filters RX traffic and
not TX traffic.

The TLS RX tag keeps track of past TLS records up to a certain limit,
approximately 1 Gbyte of TCP data. TLS records of same length are joined
into a single database record.

Regularly the HW is queried for TLS RX progress information. The TCP sequence
number gotten from the HW is then matches against the database of TLS TCP
sequence number records and lengths. If a match is found a static params WQE
is queued on the IQ and the hardware should immediately resume decrypting TLS
data until the next non-sequential TCP packet arrives.

Offloading TLS RX data is supported for untagged, prio-tagged, and
regular VLAN traffic.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:17 +01:00
Hans Petter Selasky
e6d7ac1d03 mlx5core: Set driver version into firmware.
If the driver_version capability bit is enabled, send the driver
version to firmware after the init HCA command, for display purposes.

Example of driver version: "FreeBSD,mlx5_core,14.0.0,3.x-xxx"

Linux commits:
012e50e109fd27ff989492ad74c50ca7ab21e6a1

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:17 +01:00
Hans Petter Selasky
8e332232a5 mlx5en: Implement one RQT object per channel.
These objects will eventually be used to switch TLS RX traffic.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:17 +01:00
Hans Petter Selasky
ea00d7e8ca mlx5: Add raw ethernet local loopback support.
Currently, unicast/multicast loopback raw ethernet (non-RDMA) packets
are sent back to the vport.  A unicast loopback packet is the packet
with destination MAC address the same as the source MAC address.  For
multicast, the destination MAC address is in the vport's multicast
filter list.

Moreover, the local loopback is not needed if there is one or none
user space context.

After this patch, the raw ethernet unicast and multicast local
loopback are disabled by default. When there is more than one user
space context, the local loopback is enabled.

Note that when local loopback is disabled, raw ethernet packets are
not looped back to the vport and are forwarded to the next routing
level (eswitch, or multihost switch, or out to the wire depending on
the configuration).

Linux commits:
c85023e153e3824661d07307138fdeff41f6d86a
8978cc921fc7fad3f4d6f91f1da01352aeeeff25

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:16 +01:00
Hans Petter Selasky
c1b76119cb mlx5: Implement mlx5_nic_vport_update_local_lb()
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:16 +01:00
Hans Petter Selasky
5381f93647 mlx5en: Create TIRs before flowtables.
Because flowtables may redirect traffic to TIRs.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:16 +01:00
Hans Petter Selasky
001106f807 mlx5en: Create flowtables in correct order.
Because it affects how the flow tables may re-direct traffic.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:16 +01:00
Hans Petter Selasky
2c0ade806a mlx5: Implement flow steering helper functions for TCP sockets.
This change adds convenience functions to setup a flow steering rule based on
a TCP socket. The helper function gets all the address information from the
socket and returns a steering rule, to be used with HW TLS RX offload.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:16 +01:00
Hans Petter Selasky
0ee1b09eaa mlx5: Implement offloads flowtable namespace.
This namespace will be used for TCP offloads, like hardware decryption
of TLS TCP data.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:16 +01:00
Hans Petter Selasky
e059c120b4 mlx5en: Create and destroy all flow tables and rules when the network interface attaches and detaches.
Previously flow steering tables and rules were only created and destroyed
at link up and down events, respectivly. Due to new requirements for adding
TLS RX flow tables and rules, the main flow steering table must always be
available as there are permanent redirections from the TLS RX flow table
to the vlan flow table.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:16 +01:00
Hans Petter Selasky
a8e715d21b mlx5en: Add race protection for SQ remap
Add a refcount for posted WQEs to avoid a race between
post WQE and FW command flows.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:16 +01:00
Hans Petter Selasky
aabca1034c mlx5en: Properly account for no-checksum on tunneled packets.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:15 +01:00
Hans Petter Selasky
06c2bd1872 mlx5en: Force all packets through the indirection table.
All packets must go through the indirection table, RQT,
because it is not possible to modify the RQN of the TIR
for direct dispatchment after it is created, typically
when the link goes up and down.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:15 +01:00
Hans Petter Selasky
266c81aae3 mlx5/mlx5en: Add SQ remap support
Add support to map an SQ to a specific schedule queue using a
special WQE as performance enhancement.

SQ remap operation is handled by a privileged internal queue, IQ,
and the mapping is enabled from one rate to another.

The transition from paced to non-paced should however always go
through FW.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:15 +01:00
Hans Petter Selasky
1c407d0494 mlx5: Properly define the reg_umr_sq networking offload capability bit.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:15 +01:00
Hans Petter Selasky
9680b1ba71 mlx5en: Only delete installed VxLAN rules.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:15 +01:00
Hans Petter Selasky
6176a5e338 mlx5en: Fix inverted logical assignment.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:15 +01:00
Hans Petter Selasky
694263572f mlx5en: Implement support for internal queues, IQ.
Internal send queues are regular sendqueues which are reserved for WQE commands
towards the hardware and firmware. These queues typically carry resync
information for ongoing TLS RX connections and when changing schedule queues
for rate limited connections.

The internal queue, IQ, code is more or less a stripped down copy
of the existing SQ managing code with exception of:

1) An optional single segment memory buffer which can be read or
   written as a whole by the hardware, may be provided.
2) An optional completion callback for all transmit operations, may
   be provided.
3) Does not support mbufs.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:15 +01:00
Hans Petter Selasky
21228c67ab mlx5en: Implement helper functions to open and close TLS TIR context.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:15 +01:00
Hans Petter Selasky
75767cb889 mlx5en: Share DEK objects with TLS RX.
The TLS RX support also needs to be able to allocate DEK objects.
Share the available objects 1:1.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:14 +01:00
Hans Petter Selasky
fad4b7d1f2 mlx5en: Add missing TLS structure prototype.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:14 +01:00
Hans Petter Selasky
3a1bf85503 mlx5en: Remove unused hardware TLS field.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:14 +01:00
Hans Petter Selasky
33a6a7a72a mlx5en: Make the receive packet indirection table, RQT, static instead of dynamic.
Allocate the RQT once, pointing all initial entries to the drop RQN.
When opening the channels simplify modify the RQT, directing all traffic
to the new RQNs. Similarly when closing the channels point all RQT entries
back to the so-called drop RQN.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:14 +01:00
Hans Petter Selasky
7800af352a mlx5en: Set CQN in RQ parameters for drop RQ.
Else creating the drop RQ fails.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:14 +01:00
Hans Petter Selasky
03567b0dfa mlx5en: Set channel pointer for drop receive queue.
A valid channel pointer is needed to get the priv pointer during init.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:14 +01:00
Hans Petter Selasky
4e40e984da mlx5en: Print error code when opening drop RQ fails.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:14 +01:00
Hans Petter Selasky
27b778ae55 mlx5en: Implement dummy receive queue, RQ, for dropping packets.
What is a drop RQ and why is it needed?

The RSS indirection table, also called the RQT, selects the
destination RQ based on the receive queue number, RQN. The RQT is
frequently referred to by flow steering rules to distribute traffic
among multiple RQs. The problem is that the RQs cannot be destroyed
before the RQT referring them is destroyed too. Further, TLS RX
rules may still be referring to the RQT even if the link went
down. Because there is no magic RQN for dropping packets, we create
a dummy RQ, also called drop RQ, which sole purpose is to drop all
received packets. When the link goes down this RQN is filled in all
RQT entries, of the main RQT, so the real RQs which are about to be
destroyed can be released and the TLS RX rules can be sustained.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:14 +01:00
Hans Petter Selasky
a60f953424 mlx5en: Make the hw_lro parameter read only tunable.
This prevents the so-called TIR context from changing during runtime.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:14 +01:00
Hans Petter Selasky
788e9e7478 mlx5: Remove support for FreeBSD 10 and older.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:13 +01:00
Hans Petter Selasky
2d5e5a0d75 mlx5en: Patch to inhibit transmit doorbell writes during packet reception.
During packet reception the network stack frequently transmit data in
response to TCP window updates. To reduce the number of transmit doorbells
needed, inhibit all transmit doorbells designated for the same channel until
after the reception of packets for the given channel is completed.

While at it slightly refactor the mlx5e_tx_notify_hw() function:

1) The doorbell information is always stored into sq->doorbell.d64 .
No need to pass a separate pointer to this variable.

2) Move checks for skipping doorbell writes inside this function.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:13 +01:00
Konstantin Belousov
0f7b6e11c0 mlx5en: Use a UMA cache zone for managing TLS send tags
Instead of allocating directly from a normal zone. This way
import and release are guaranteed to process all allocated and then
deallocated items. Also, the release occurs in a sleepable context when
caller of uma_zfree() or uma_zdestroy() can sleep itself.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 14:45:58 +02:00
Konstantin Belousov
028130b8e4 mlx5ib: idiomatic use of preprocessor, in particular paths
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 14:45:58 +02:00
Konstantin Belousov
7060097908 mlx5ib: normalize use of the opt_*.h files
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 14:45:57 +02:00
Konstantin Belousov
89918a2375 mlx5en: idiomatic use of preprocessor, in particular paths
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 14:45:57 +02:00
Konstantin Belousov
b984b95693 mlx5en: normalize use of the opt_*.h files
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 14:45:57 +02:00
Hans Petter Selasky
12c56d7dc4 mlx5: idiomatic use of preprocessor, in particular paths
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 14:45:57 +02:00
Konstantin Belousov
ee9d634bd3 mlx5: normalize use of the opt_*.h files
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 14:45:57 +02:00
Konstantin Belousov
303d3ae7e8 ufs, msdosfs: do not record witness order when creating vnode
When allocating new vnode, we need to lock it exclusively before
making it externally visible.  Since other threads cannot observe the
vnode yet, current lock order cannot create LoR conditions.

Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34126
2022-02-01 10:51:55 +02:00
Konstantin Belousov
d51b0786a2 msdosfs_denode.c: some style
Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D34126
2022-02-01 10:51:48 +02:00
Konstantin Belousov
99aa3b731c ffs: lock buffers after snaplk with LK_NOWITNESS
Reviewed by:	mckusick
Discussed with:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34073
2022-02-01 06:54:50 +02:00
Konstantin Belousov
c02780b78c Add GB_NOWITNESS flag
It prevents WITNESS from recording the lock order for the buffer lock
acquired by getblkx().

Reviewed by:	mckusick
Discussed with:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34073
2022-02-01 06:54:50 +02:00
Konstantin Belousov
e11b2b69c5 ffs_alloc.c: order includes alphabetically
Reviewed by:	mckusick
Discussed with:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34073
2022-02-01 06:54:50 +02:00
Konstantin Belousov
d950c5898a vm/vm_extern.h, vm/vm_page.h: use sys/kassert.h
instead of fatty sys/systm.h.

Suggested by:	jhb
Reviewed by:	alc, imp, jhb (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34089
2022-02-01 05:55:35 +02:00
Konstantin Belousov
f4cdb9d7c3 vm/vm_pager.h: use sys/systm.h header
it is needed for __read_mostly attribute definition, which right now
comes from vm/vm_page.h including sys/systm.h

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34089
2022-02-01 05:55:35 +02:00
Konstantin Belousov
54d34bfbdf Introduce sys/kassert.h
It contains assert-related definitions previously provided by
sys/systm.h.  The new header is leaner than whole systm.h.
Include kassert.h from systm.h for compatibility.

The copyright assignment to Eivind Eklund was suggested by Kirk McKusick
and is based in the commit 5526d2d920.

Suggested by:	jhb
Reviewed by:	alc, imp, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34089
2022-02-01 05:14:14 +02:00
John Baldwin
53e938e408 hyperv storvsc: Don't abuse struct sglist to hold virtual addresses.
struct sglist is intended for holding S/G lists of physical address
ranges, not virtual address ranges.  GCC 9.x issues several warnings
due to casts between pointers and integers of different sizes as a
result (vm_paddr_t is 64-bits on i386).  Instead, add a local 'struct
hv_sglist' which uses an array of 'struct iovec' to hold the S/G list
of virtual address ranges.

Differential Revision:	https://reviews.freebsd.org/D31933
2022-01-31 17:11:27 -08:00
John Baldwin
d782385e9b tcp_ratelimit: Handle some edge cases with TLS + RL send tags.
- After a connection has fallen back from NIC TLS to SW TLS, any
  pacing rate changes should modify the inpcb send tag even though
  SB_TLS_IFNET is set.

- If a connection tries to modify the pacing rate before the send
  tag has been converted from plain TLS to TLS + RL, don't fail
  the rate request set but let it fall through to setting the rate
  on the non-TLS inpcb RL tag.

Reviewed by:	gallatin, rrs, hselasky
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D34085
2022-01-31 16:40:04 -08:00
John Baldwin
d958bc7963 ktls: Try to enable TOE TLS after marking existing data not ready.
At the moment this is mostly a no-op but in the future there will be
in-flight encrypted data which requires software decryption.  This
same setup is also needed for NIC TLS RX.

Note that this does break TOE TLS RX for AES-CBC ciphers since there
is no software fallback for AES-CBC receive.  This will be resolved
one way or another before 14.0 is released.

Reviewed by:	hselasky
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D34082
2022-01-31 16:39:21 -08:00
Mark Johnston
773e3a71b2 pf: Initialize pf_kpool mutexes earlier
There are some error paths in ioctl handlers that will call
pf_krule_free() before the rule's rpool.mtx field is initialized,
causing a panic with INVARIANTS enabled.

Fix the problem by introducing pf_krule_alloc() and initializing the
mutex there.  This does mean that the rule->krule and pool->kpool
conversion functions need to stop zeroing the input structure, but I
don't see a nicer way to handle this except perhaps by guarding the
mtx_destroy() with a mtx_initialized() check.

Constify some related functions while here and add a regression test
based on a syzkaller reproducer.

Reported by:	syzbot+77cd12872691d219c158@syzkaller.appspotmail.com
Reviewed by:	kp
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34115
2022-01-31 16:14:00 -05:00
Konstantin Belousov
66c5fbca77 insmntque1(): remove useless arguments
Also remove once-used functions to clean up after failed insmntque1(),
which were destructor callbacks in previous life.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D34071
2022-01-31 16:49:08 +02:00
Kornel Duleba
1a6d987b7f enetc: Wait for pending transmissions before disabling TX queues
According to the RM it's not safe to disable a TX ring while it is busy
transmitting frames.
In order to be safe wait until the ring is empty. (cidx==pidx)
Use this opportunity to remove a set-but-unused variable.

Obtained from: Semihalf
Sponsored by: Alstom Group
2022-01-31 08:57:48 +01:00
Kornel Duleba
a6bda3e1ef enetc: Simply TX ring credits counting logic
According to the RM rings can hold at most ring_size - 1 descriptors at any time.
No additional logic is needed since iflib already respects this constrain.
Thanks to that the pidx == cidx situation is not ambiguous and indicates an
empty ring.
Use that to simplify the logic that calculates the amount of processed frames.

Obtained from: Semihalf
Sponsored by: Alstom Group
2022-01-31 08:57:48 +01:00
Kornel Duleba
f485d733e8 enetc: Disable HW IP packet alignment
The NIC can IP align received packets.
It was observed that it caused some rare stalls, that required full board reset.
Disable this feature for now. It doesn't provide any significant performance
improvement anyway.

Obtained from: Semihalf
Sponsored by: Alstom Group
2022-01-31 08:57:48 +01:00
Konstantin Belousov
8d8589b385 ufs: be more persistent with finishing some operations
when the vnode is doomed after relock.  The mere fact that the vnode is
doomed does not prevent us from doing UFS operations on it while it is
still belongs to UFS, which is determined by non-NULL v_data.  Not
finishing some operations, e.g. not syncing the inode block only because
the vnode started reclamation, is not correct.

Add macro IS_UFS() which incapsulates the v_data != NULL, and use it
instead of VN_IS_DOOMED() for places where the operation completion is
important.

Reviewed by:	markj, mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34072
2022-01-31 04:46:21 +02:00
Konstantin Belousov
4559700a0a ffs_snapblkfree(): add a comment explaining lockmgr invocation
Reviewed by:	markj, mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34072
2022-01-31 04:46:21 +02:00
Konstantin Belousov
0cdc603308 ufs: Use IS_SNAPSHOT()
Reviewed by:	markj, mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34072
2022-01-31 04:46:21 +02:00
Konstantin Belousov
3d68c4e175 syncer VOP_FSYNC(): unlock syncer vnode around call to VFS_SYNC()
The lock is unneccessary since the mount point is busied, which prevents
unmount and syncer vnode deallocation.  Having the vnode locked causes
innocent LoRs and complicates debugging.

Also stop starting write accounting around it.  Any caller of
VOP_FSYNC() must do it already, and sync_vnode() does.

Reported and tested by:	pho
Reviewed by:	markj, mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34072
2022-01-31 04:46:21 +02:00
Konstantin Belousov
5875b94c74 buf_alloc(): lock the buffer with LK_NOWAIT
The buffer must not be accessed by any other thread, it is freshly
allocated.  As such, LK_NOWAIT should be nop but also it prevents
recording the order between the buffer lock and any other locks we might
own in the call to getnewbuf().  In particular, if we own FFS snap lock,
it should avoid triggering false positive warning.

Reviewed by:	markj, mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34072
2022-01-31 04:46:21 +02:00
Konstantin Belousov
531f8cfea0 Use dedicated lock name for pbufs
Also remove a pointer to array variable, use array address directly.

Reviewed by:	markj, mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34072
2022-01-31 04:46:14 +02:00
Konstantin Belousov
9cd59de2e1 ext2fs: remove remnants of the UFS snapshot code
Noted and reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34095
2022-01-31 04:37:16 +02:00
Kirk McKusick
85f7e9a4f0 In GEOM debugging output, show consumer for cloned and duplicated bio's.
When using bio's created by g_clone_bio() or g_duplicate_bio()
their consumer device (the device to which their I/O requests
are sent) is listed by the geom debugging facility as [unknown].
If available, this update lists the consumer associated with
the bio's parent.

MFC after:    2 weeks
Sponsored by: Netflix
2022-01-30 17:21:13 -08:00
Jason A. Harmening
a01ca46b9b unionfs: use VV_ROOT to check for root vnode in unionfs_lock()
This avoids a potentially wild reference to the mount object.
Additionally, simplify some of the checks around VV_ROOT in
unionfs_nodeget().

Reviewed by:	kib
Differential Revision: https://reviews.freebsd.org/D33914
2022-01-29 22:38:44 -06:00
Alexander Motin
67c58cd729 GEOM: Remove g_wait_sim.
It seems never been used since addition.
2022-01-29 22:12:43 -05:00
Alexander Motin
10ae42ccbd GEOM: Set G_CF_DIRECT_SEND/RECEIVE for taste consumers.
All I/O requests through the taste consumers are synchronous, done
with g_read_data() and without any locks held.  It makes no sense
to delegate the I/O to g_down/g_up threads.

This removes many of context switches during disk retaste.

MFC after:	2 weeks
2022-01-29 21:59:03 -05:00
Peter Jeremy
afcd121024
geom_gate: Distinguish between classes of errors
The geom_gate API provides 2 distinct paths for exchanging error
details between the kernel and the userland client: Including an error
code in the g_gate_ctl_io structure passed in the ioctl(2) call or
having the ioctl(2) call return -1 with an error code in errno. The
latter reflects errors in the ioctl(2) call itself whilst the former
reflects errors within the geom_gate instance.

The G_GATE_CMD_START ioctl blocks waiting for an I/O request to be
directed to the geom_gate instance and the wait can fail
(necessitating an error return) if the geom_gate instance is destroyed
or if the msleep(9) fails. The code previously treated both error
cases indentically: Returning ECANCELED as a geom_gate instance error
(which the ggatec treats as a fatal error).  Whilst this is the correct
behaviour if the geom_gate instance is destroyed, a msleep(9) failure
is unrelated to the geom_gate instance itself and should be reported
as an ioctl(2) "failure".  The distinction is important because
msleep(9) can return ERESTART, which means the system call should be
retried (and this will occur automatically as part of the generic
syscall return processing).

This change alters the msleep(9) handling to directly return the error
code from msleep(9), which ensures ERESTART is correctly handled,
rather than being treated as a fatal error.

Reviewed by:    Johannes Totz <jo@bruelltuete.com>
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D33996
2022-01-29 21:15:51 +11:00
Alexander V. Chernikov
217481a333 u3g: Add support Quectel EM12-G modem.
Submitted by:	<tda.77793 at gmail.com>
PR:		260218
MFC after:	2 weeks
2022-01-29 09:59:20 +00:00
Kristof Provost
9dac026822 dummynet: dn_dequeue() may return NULL
If there are no more entries, or if we fail to restore the rcvif of a
queued mbuf dn_dequeue() can return NULL.
Cope with this.

Reviewed by:	glebius
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D34078
2022-01-28 23:09:08 +01:00
Kristof Provost
703e533da5 mbuf: do not restore dying interfaces
When we remove an interface it is first removed from the interface list
V_ifnet (by if_unlink_ifnet()) and marked as IFF_DYING. We then wait for
any possible references to stop being used (i.e.
epoch_wait/epoch_drain_callbacks) before we tear it fully down.

However, the index in ifindex_table is not removed, so m_rcvif_restore()
can still find the (now dying) interface.

This results in panics, for example when dummynet restores the rcvif
pointer and passes a packet to ip6_input() we can panic because the
AF_INET6 domain has already been removed (so we end up dereferencing a
NULL pointer there).

Check that the interface is not dying before we restore it, which is
equivalent to checking its presence in V_ifnet, and thus ensures that
future accesses (while in NET_EPOCH) are safe.

Reviewed by:	glebius
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D34076
2022-01-28 23:09:08 +01:00
John Baldwin
29d481ae6a Make <vm/vm_extern.h> more self-contained.
Add a nested include of <sys/systm.h> for recently added assertions.
Without this, existing code (such as in drm-kmod) needs to be patched
to add the newly required header.

While here, rewrite the assertions using KASSERT().

Reviewed by:	dougm, alc, imp, kib
Differential Revision:	https://reviews.freebsd.org/D34070
2022-01-28 13:14:03 -08:00
John Baldwin
2e8d1a5525 iscsi: Allocate a dummy PDU for the internal nexus reset task.
When an iSCSI target session is terminated, an internal nexus reset
task is posted to abort existing tasks belonging to the session.
Previously, the ctl_io for this internal nexus reset stored a pointer
to the session in the slot that normally holds a pointer to the PDU
from the initiator that triggered the I/O request.  The completion
handler then assumed that any nexus reset I/O was due to an internal
request and fetched the session pointer (instead of the PDU pointer)
from the ctl_io.  However, it is possible to trigger a nexus reset via
an on-the-wire task management PDU.  If such a PDU were sent to the
target, then the completion handler would incorrectly treat this
request as an internal request and treat the pointer to the received
PDU as a pointer to the session instead.

To fix, allocate a dummy PDU for the internal reset task and use an
invalid opcode to differentiate internal nexus resets from resets
requested by the initiator.

PR:		260449
Reported by:	Robert Morris <rtm@lcs.mit.edu>
Reviewed by:	mav
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D34055
2022-01-28 13:07:04 -08:00
Mitchell Horne
b1ab9568bc hwpmc: remove mips event definitions
Reviewed by:	imp, emaste
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34084
2022-01-28 16:37:28 -04:00
Alexander Motin
29998bf2ac glabel: Set G_CF_DIRECT_SEND/RECEIVE for taste consumer.
All I/O requests through the taste consumer are synchronous, done
with g_read_data() and without any locks held.  It makes no sense
to delegate the I/O to g_down/g_up threads.

This removes many of context switches during disk retaste.

MFC after:	2 weeks
2022-01-28 14:22:41 -05:00
Alexander Motin
ffc1cc95e7 GEOM: Relax direct dispatch for GEOM threads.
The only cases when direct dispatch does not make sense is for I/O
submission from down thread and for completion from up thread.  In
all other cases, if both consumer and producer are OK about it, we
can save on context switches.

MFC after:	2 weeks
2022-01-28 14:21:21 -05:00
Gleb Smirnoff
964b8f8b99 ifnet: garbage collect unused function ifaddr_byindex().
Last use was removed in 5adea417d4.
2022-01-28 09:51:52 -08:00
Alexander Motin
0d8cec7658 graid: Set G_CF_DIRECT_SEND for task consumer.
Unlike normal consumers all taste consumer I/O is synchronous, done
with g_read_data() and without any locks held.  It makes no sense to
delegate I/O submission to g_down thread.

This should remove number of context switches during disk retaste.

MFC after:	2 weeks
2022-01-28 11:09:30 -05:00
Gordon Bergling
4bd030b369 sctp(4): Fix a typo in an INVARIANTS panic message
- s/failes/fails/

MFC after:	1 week
2022-01-28 13:20:52 +01:00
Edward Tomasz Napierala
99454d3e98 linux: Provide dummy seccomp(2)
Don't emit messages; this isn't any different from a Linux kernel
built without OPTIONS_SECCOMP, so the userspace already needs to know
how to deal with it.  This is also similar with how we handle seccomp
in linux_prctl().

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D33808
2022-01-28 11:45:41 +00:00
Kirk McKusick
ddf162d1d1 ufs: handle LoR between snap lock and vnode lock
When a filesystem is mounted all of its associated snapshots must
be activated. It first allocates a snapshot lock (snaplk) that will
be shared by all the snapshot vnodes associated with the filesystem.
As part of each snapshot file activation, it must replace its own
ufs vnode lock with the snaplk. In this way acquiring the snaplk
gives exclusive access to all the snapshots for the filesystem.

A write to a ufs vnode first acquires the ufs vnode lock for the
file to be written then acquires the snaplk. Once it has the snaplk,
it can check all the snapshots to see if any of them needs to make
a copy of the block that is about to be written. This ffs_copyonwrite()
code path establishes the ufs vnode followed by snaplk locking
order.

When a filesystem is unmounted it has to release all of its snapshot
vnodes. Part of doing the release is to revert the snapshot vnode
from using the snaplk to using its original vnode lock. While holding
the snaplk, the vnode lock has to be acquired, the vnode updated
to reference it, then the snaplk released. Acquiring the vnode lock
while holding the snaplk violates the ufs vnode then snaplk order.
Because the vnode lock is unused, using LK_EXCLUSIVE | LK_NOWAIT
to acquire it will always succeed and the LK_NOWAIT prevents the
reverse lock order from being recorded.

This change was made in January 2021 (173779b98f) to avoid an LOR
violation in ffs_snapshot_unmount(). The same LOR issue was recently
found again when removing a snapshot in ffs_snapremove() which must
also revert the snaplk to the original vnode lock as part of freeing it.

The unwind in ffs_snapremove() deals with the case in which the
snaplk is held as a recursive lock holding multiple references.
Specifically an equal number of references are made on the vnode
lock. This change factors out the lock reversion operations into a
new function revert_snaplock() which handles both the recursive
locks and avoids the LOR. The new revert_snaplock() function is
then used in both ffs_snapshot_unmount() and in ffs_snapremove().

Reviewed by:  kib
Tested by:    Peter Holm
MFC after:    2 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33946
2022-01-27 23:03:35 -08:00
Rick Macklem
98c788737f nfsclient: Delete unused function nfscl_getcookie()
The function nfscl_getcookie(), which is essentially the
same as ncl_getcookie(), is never called, so delete it.
This is probably cruft left over from the port of the
NFSv4 code to FreeBSD several years ago.

Found while modifying the code to better use the
directory offset cookies.

MFC after:	2 weeks
2022-01-27 15:30:26 -08:00
John Baldwin
ac4643ef78 Remove terasic drivers used on the Cambridge BERI tablet.
Reviewed by:	brooks
Sponsored by:	The University of Cambridge, Google Inc.
Differential Revision:	https://reviews.freebsd.org/D34057
2022-01-27 11:01:51 -08:00
Richard Scheffenegger
4531b3450b tcp: Tidying up the conditionals for unwinding a spurious RTO
- Use the semantically correct TSTMP_xx macro when comparing
  timestamps. (No functional change)
- check for bad retransmits only when TSopt is present in ACK
  (don't assume there will be a valid TSopt in the TCP options struct)
- exclude tsecr == 0, since that most likely indicates an
  invalid ts echo return (tsecr) value.

Reviewed By: tuexen, #transport
MFC after:   3 days
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34062
2022-01-27 18:59:55 +01:00
Richard Scheffenegger
68e623c3f0 tcp: Rewind erraneous RTO only while performing RTO retransmissions
Under rare circumstances, a spurious retranmission is
incorrectly detected and rewound, messing up various tcpcb values,
which can lead to a panic when SACK is in use.

Reviewed By: tuexen, chengc_netapp.com, #transport
MFC after:   3 days
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D33979
2022-01-27 18:49:42 +01:00
Gleb Smirnoff
6abb5043a6 rtsock: always set m_pkthdr.rcvif when queueing on netisr
netisr uses global workstreams and after dequeueing an mbuf it
uses rcvif to get the VNET of the mbuf.  Of course, this is not
needed when kernel is compiled without VIMAGE.  It came out that
routing socket does not set rcvif if compiled without VIMAGE.
Make this assignment not depending on VIMAGE option.

Fixes:	6871de9363
2022-01-27 09:41:31 -08:00
Gleb Smirnoff
f59fa11280 mbuf: make M_ASSERT_NO_SND_TAG() as strict as other similar asserts
Fixes:	17cbcf33c3
2022-01-27 09:41:31 -08:00
Andriy Gapon
6fd84a627f mmc_da: create disk(9) for pre-2.0 SD cards
It does not look like there is anything in mmc_da code that actually
requires protocol 2.0 or later.  dev/mmc code also does not have such a
restriction.

Tested with a very old 2GB mini-SD card.  Prior to this change mmc_da
would claim the card but would not expose it to GEOM.

Without MMCCAM:
 mmc0: <MMC/SD bus> on sdhci_pci0
 mmc0: Probing bus
 mmc0: SD probe: OK (OCR: 0x00ff8000)
 mmc0: Current OCR: 0x00ff8000
 mmc0: CMD8 failed, RESULT: 1
 mmc0: Probing cards
 mmc0: New card detected (CID 1c53565344432020100002982e007600)
 mmc0: New card detected (CSD 005e00325f5a83d02db7ffbf96800000)
 mmc0: Card at relative address 0xb368 added:
 mmc0:  card: SD SDC   1.0 SN 0002982E MFG 06/2007 by 28 SV
 mmc0:  quirks: 0
 mmc0:  bus: 4bit, 50MHz (high speed timing)
 mmc0:  memory: 3998720 blocks, erase sector 256 blocks
 mmc0: setting transfer rate to 50.000MHz (high speed timing)
 GEOM: new disk mmcsd0
 mmcsd0: 2GB <SD SDC   1.0 SN 0002982E MFG 06/2007 by 28 SV> at mmc0 50.0MHz/4bit/65535-block
 mmc0: setting bus width to 4 bits high speed timing

With MMCCAM and this change:
 sdda0 at sdhci_slot0 bus 0 scbus2 target 0 lun 0
 sdda0: Relative addr: 0000b368
 Card features: <Memory>
 sdda0: Serial Number 0002982E
 sdda0: SD SDC   1.0 SN 0002982E MFG 06/2007 by 28 SV
 GEOM: new disk sdda0

Reviewed by:	manu
MFC after:	3 weeks
2022-01-27 18:59:54 +02:00
Mateusz Guzik
2a7e4cf843 Revert b58ca5df0b ("vfs: remove the now unused insmntque1")
I was somehow convinced that insmntque calls insmntque1 with a NULL
destructor. Unfortunately this worked well enough to not immediately
blow up in simple testing.

Keep not using the destructor in previously patched filesystems though
as it avoids unnecessary casts.

Noted by:	kib
Reported by:	pho
2022-01-27 16:32:22 +00:00
Andrew Gallatin
8a7404b2ae tcp: fix leaks in tcp_chg_pacing_rate error paths
tcp_chg_pacing_rate() is expected to release the hw rate limit table,
but failed to do so in several error cases, leading to ever
increasing counts of flows using the rate.

This patch was mostly done by rrs

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D34058
Reviewed by: hselasky, rrs,  jhb (inital version, outside of Differential)
2022-01-27 10:35:03 -05:00
Andrew Gallatin
9ba117960e Fix a memory leak when ip_output_send() returns EAGAIN due to send tag issues
When ip_output_send() returns EAGAIN due to issues with send tags (route
change, lagg failover, etc), it must free the mbuf. This is because
ip_output_send() was written as a wrapper/replacement for a direct
call to  if_output(), and the contract with if_output() has
historically been that it owns the mbufs once called. When
ip_output_send() failed to free mbufs, it violated this assumption
and lead to leaked mbufs.

This was noticed when using NIC TLS in combination with hardware
rate-limited connections. When seeing lots of NIC output drops
triggered ratelimit send tag changes, we noticed we were leaking
ktls_sessions, send tags and mbufs. This was due ip_output_send()
leaking mbufs which held references to ktls_sessions, which in
turn held references to send tags.

Many thanks to jbh, rrs, hselasky and markj for their help in
debugging this.

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D34054
Reviewed by: hselasky, jhb, rrs
MFC after: 2 weeks
2022-01-27 10:34:34 -05:00
Mark Johnston
38da0c96dc geom: Assert that BIO_SPEEDUP BIOs have bio_data set to NULL
Like BIO_FLUSH, there is no reason for consumers to pass a BIO_SPEEDUP
request with non-NULL bio_data, so assert this.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2022-01-27 09:58:19 -05:00
Mark Johnston
a2dfffb989 shsec: Allocate data blocks only for BIO_READ/WRITE requests
In particular, there is no need to allocate a data block when passing
BIO_FLUSH requests to child providers, and g_io_request() asserts that
bp->bio_data == NULL for such requests.

PR:		255131
Reported and tested by:	nvass@gmx.com
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2022-01-27 09:56:07 -05:00
Andrew Turner
548a2ec49b Add PT_GETREGSET
This adds the PT_GETREGSET and PT_SETREGSET ptrace types. These can be
used to access all the registers from a specified core dump note type.
The NT_PRSTATUS and NT_FPREGSET notes are initially supported. Other
machine-dependant types are expected to be added in the future.

The ptrace addr points to a struct iovec pointing at memory to hold the
registers along with its length. On success the length in the iovec is
updated to tell userspace the actual length the kernel wrote or, if the
base address is NULL, the length the kernel would have written.

Because the data field is an int the arguments are backwards when
compared to the Linux PTRACE_GETREGSET call.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19831
2022-01-27 11:40:34 +00:00
Andriy Gapon
5d5f44623e g_mirror: don't fail reads while losing next-to-last disk
I observed a situation where some read requests failed when a 2-way geom
mirror lost one disk.  The problem appears to be in the logic that skips
retrying a failed request when a mirror has only one active disk.
Generally, that makes sense.  But during a transition from two disks to
one it is possible that the request failed on the failing disk before it
was inactivated and, so, the remaining active disk is the disk that
should be tried.

This change adds an additional check to ensure that it was the (only)
active disk that was already tried.

Reviewed by:	mav
MFC after:	3 weeks
2022-01-27 13:22:52 +02:00
Gleb Smirnoff
6871de9363 netisr: serialize/restore m_pkthdr.rcvif when queueing mbufs
Reviewed by:		kp
Differential revision:	https://reviews.freebsd.org/D33268
2022-01-26 21:58:50 -08:00
Gleb Smirnoff
165746f4e4 dummynet: use m_rcvif_serialize/restore when queueing packets
This fixed panic with interface being removed while packet
was sitting on a queue.  This allows to pass all dummynet
tests including forthcoming dummynet:ipfw_interface_removal
and dummynet:pf_interface_removal and demonstrates use of
m_rcvif_serialize() and m_rcvif_restore().

Reviewed by:		kp
Differential revision:	https://reviews.freebsd.org/D33267
2022-01-26 21:58:50 -08:00
Gleb Smirnoff
e1882428dc ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif
Supplement ifindex table with generation count and use it to
serialize & restore an ifnet pointer.

Reviewed by:		kp
Differential revision:	https://reviews.freebsd.org/D33266
Fun note:		git show e6abef0918
2022-01-26 21:58:50 -08:00
Gleb Smirnoff
91f44749c6 ifnet: make if_index global
Now that ifindex is static to if.c we can unvirtualize it.  For lifetime
of an ifnet its index never changes.  To avoid leaking foreign interfaces
the net.link.generic.system.ifcount sysctl and the ifnet_byindex() KPI
filter their returned value on curvnet.  Since if_vmove() no longer
changes the if_index, inline ifindex_alloc() and ifindex_free() into
if_alloc() and if_free() respectively.

API wise the only change is that now minimum interface index can be
greater than 1.  The holes in interface indexes were always allowed.

Reviewed by:		kp
Differential revision:	https://reviews.freebsd.org/D33672
2022-01-26 21:58:44 -08:00
Mateusz Guzik
d35991d327 nullfs: ansify fs/nullfs/null_subr.c 2022-01-27 01:01:45 +01:00
Mateusz Guzik
b58ca5df0b vfs: remove the now unused insmntque1
Bump __FreeBSD_version to 1400052.
2022-01-27 01:00:24 +01:00
Mateusz Guzik
3150cf0c13 unionfs: stop using insmntque1
It adds nothing of value over insmntque.
2022-01-27 00:57:37 +01:00
Mateusz Guzik
5ccdfdabc8 tmpfs: stop using insmntque1
It adds nothing of value over insmntque.
2022-01-27 00:56:12 +01:00
Mateusz Guzik
4e91a0b9fe nullfs: stop using insmntque1
It adds nothing of value over insmntque.
2022-01-27 00:54:47 +01:00
Mateusz Guzik
ade1367ba8 fdescfs: stop using insmntque1
It adds nothing of value over insmntque.
2022-01-27 00:54:38 +01:00
Mateusz Guzik
3af3e99ce4 devfs: stop using insmntque1
It adds nothing of value over insmntque.
2022-01-27 00:54:30 +01:00
Vladimir Kondratyev
c974c22a4f Revert "LinuxKPI: Allow wake_up to be executed within a critical section"
This change was based on currently reverted commit 7dea0c9e6eba.

This reverts commit 89889ab470.
2022-01-27 01:27:01 +03:00
Vladimir Kondratyev
11ef1d975f Revert "LinuxKPI: Allow spin_lock_irqsave to be called within a critical section"
This change results in deadlocks on UP systems

This reverts commit 7dea0c9e6eba4dc127cd67667c81fa2c250f1024.

Requested by:	kib, hselasky
2022-01-27 01:27:01 +03:00
Kyle Evans
773fa8cd13 execve: disallow argc == 0
The manpage has contained the following verbiage on the matter for just
under 31 years:

"At least one argument must be present in the array"

Previous to this version, it had been prefaced with the weakening phrase
"By convention."

Carry through and document it the rest of the way.  Allowing argc == 0
has been a source of security issues in the past, and it's hard to
imagine a valid use-case for allowing it.  Toss back EINVAL if we ended
up not copying in any args for *execve().

The manpage change can be considered "Obtained from: OpenBSD"

Reviewed by:	emaste, kib, markj (all previous version)
Differential Revision:	https://reviews.freebsd.org/D34045
2022-01-26 13:40:27 -06:00
Gordon Bergling
9966757dd6 hwpmc(4): Fix a typo in a sysctl description
- s/avalable/available/

MFC after:	3 days
2022-01-26 20:18:57 +01:00
Ryan Moeller
47e46b1123 zfs: Fix zvol_cdev_open locking
First open locking changes were correctly applied to zvol_geom_open but
incorrectly applied to zvol_cdev_open, causing spa_namespace_lock to be
held indefinitely.

Make the first open locking in zvol_cdev_open match zvol_geom_open.

This change has been accepted upstream in openzfs/zfs#13016 but is not
yet merged.

Reviewed by:	mav
Fixes:		e92ffd9b62
Sponsored by:	iXsystems, Inc.
2022-01-26 18:37:52 +00:00
Gordon Bergling
9e58cca3e8 extra_tcp_stacks: Fix two typos in source code comments
- s/differnt/different/

MFC after;	3 days
2022-01-26 18:02:55 +01:00
Ed Maste
9c296a2105 geom: Add HiFive boot partitions
As documented in the HiFive Unmatched Software Reference Manual.

Reviewed by:	imp, mhorne
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34010
2022-01-26 10:54:45 -05:00
Hans Petter Selasky
9e2cce7e6a Implement a function to get the next TCP- and TLS- receive sequence number.
This function will be used by coming TLS hardware receive offload support.

Differential Revision:	https://reviews.freebsd.org/D32356
Discussed with:	jhb@
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-01-26 12:55:00 +01:00
Hans Petter Selasky
c8f2c290e4 Add definitions for TLS receive tags using the existing send tag infrastructure.
Although send tags are strictly used for transmit, the name might be changed
in the future to be more generic.

The TLS receive tags support regular IPv4 and IPv6 traffic, and also over any
VLAN. If prio-tagging is enabled, VLAN ID zero, this must be checked in the
network driver itself when creating the TLS RX decryption offload filter.

TLS receive tags have a modify callback to tell the network driver about
the progress of decryption. Currently decryption is done IP packet by IP
packet, even if the IP packet contains a partial TLS record. The modify
callback allows the network driver to keep track of TCP sequence numbers
pointing to the beginning of TLS records after TCP packet reassembly.
These callbacks only happen when encrypted or partially decrypted data is
received and are used to verify the decryptions starting point for the
hardware. Typically the hardware will guess where TLS headers start and
needs help from the software to know if the guess was correct. This is
the purpose of the modify callback.

Differential Revision:	https://reviews.freebsd.org/D32356
Discussed with:	jhb@
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-01-26 12:55:00 +01:00
Hans Petter Selasky
17cbcf33c3 mbuf(9): Assert receive mbufs don't carry a send tag.
Else we would start leaking reference counts.

Discussed with:	jhb@
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-01-26 12:55:00 +01:00
Hans Petter Selasky
a6d4524323 mbuf(9): Properly declare some function macros when debugging is disabled.
No functional change intended.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-01-26 12:54:59 +01:00
Emmanuel Vadot
81de556105 linuxkpi: i2c: Add MODULE_DEPEND for iicbus
MFC after:	1 month
MFC with:	1961a14a47
Fixes:	1961a14a47 ("linuxkpi: Add i2c support")
Reported by:	GregV
Sponsored by:	Beckhoff Automation GmbH & Co. KG
2022-01-26 10:44:07 +01:00
Andriy Gapon
f4a041af29 add overlay for enabling spi0 on allwinner h3
At least on Orange Pi PC Plus it is routed to the 40-pin header, so it
can used to communicate with external devices.

MFC after:	2 weeks
2022-01-26 11:42:20 +02:00
Andriy Gapon
a471646a08 add overlay for enabling i2c1 on allwinner h3
At least on Orange Pi PC Plus it is routed to the 40-pin header, so it
can used to communicate with external devices.

MFC after:	2 weeks
2022-01-26 11:42:20 +02:00
Gordon Bergling
b3df222eae extra_tcp_stacks: Fix a few common typos
TCP_BBR:
- Fix a typo introducted in 1b90dfa5d2, which was reported by tuexen@

TCP_RACK:
- Correct two sysctl descriptions: s/corret/correct/

tcp_bbr(4): Also fix s/measurment/measurement/ in the man page

MFC after:	1 week
2022-01-26 10:35:17 +01:00
Andriy Gapon
173d0fb616 add overlay for enabling serial1 / uart1 on rk3328
On Rock64 the uart is routed to pins on the "Pi-2" header, so it is
potentially useful.

Pin mapping:
----------------------------
| ID | Name     | Function |
----------------------------
| 15 | GPIO3_A4 | TX       |
| 16 | GPIO3_A5 | RTS      |
| 18 | GPIO3_A6 | RX       |
| 22 | GPIO3_A7 | CTS      |
----------------------------

MFC after:	2 weeks
2022-01-26 11:31:59 +02:00
Andriy Gapon
f41f98f0f0 add overlay for enabling i2c0 on rk3328
On Rock64 it is routed to pins 3 and 5 of the so called Pi-2 header.

MFC after:	2 weeks
2022-01-26 11:30:53 +02:00
Andriy Gapon
94ff1d9cc8 sdhci: fix dumping support in MMCCAM configuration
This change fixes interaction with recently added sddadump.

MFC after:	1 week
2022-01-26 09:31:45 +02:00
Warner Losh
e35816c1c9 mpr/mps: Fix a race in diagnostic reset
There's a small race in freezing the simq when performing a diagnostic
reset. During this time, a transaction can slip through and encounter
the target id of 0. If we're still in diagnostic reset when we detect
this, return a CAM_DEVICE_NOT_THERE status. Instead, freeze the queue
and return a requeue status, similar to what we do when we're resetting
a target and a transaction get here. The race is unavoidable due to
separate locks for queue and SIM, but easy enough to detect and make
harmless.

Sponsored by:		Netflix
Reviewed by:		scottl, mav
Differential Revision:	https://reviews.freebsd.org/D34017
2022-01-25 19:15:46 -07:00
John Baldwin
5fcb5ae8dc Remove a stale comment.
The intr_disable as a macro was only a problem on arm and mips and
is no longer relevant after the mips removal.
2022-01-25 17:19:36 -08:00
John Baldwin
46f69eba96 opencrypto/xform_*.h: Trim scope of included headers.
Reviewed by:	markj, emaste
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34022
2022-01-25 15:21:22 -08:00
John Baldwin
f6459a7aa8 opencrypto/cryptodev.h: Add includes to make more self-contained.
Reviewed by:	markj, emaste
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34021
2022-01-25 15:20:46 -08:00
Jessica Clarke
d930ec4ff9 dp83822phy: Add missing MII_PHY_END to avoid buffer overread on probe
Found by:	CHERI
Fixes:		0c9156faec ("Introduce DP83822 PHY driver")
2022-01-25 20:34:55 +00:00
Jessica Clarke
3f707064a5 dp83867phy: Add missing MII_PHY_END to avoid buffer overread on attach
Found by:	CHERI
Fixes:		e85c94b8d6 ("Introduce DP83867 PHY driver")
2022-01-25 20:34:55 +00:00
Emmanuel Vadot
59d465e200 Bump __FreeBSD_version for LinuxKPI changes
Sponsored by:	Beckhoff Automation GmbH & Co. KG
2022-01-25 16:15:46 +01:00
Emmanuel Vadot
1961a14a47 linuxkpi: Add i2c support
Add i2c support to linuxkpi. This is needed by drm-kmod.
For every i2c_adapter added by i2c_add_adapter we add a child to the
device named "lkpi_iic". This child handle the conversion between
Linux i2c_msgs to FreeBSD iic_msgs.
For every i2c_adapter added by i2c_bit_add_bus we add a child to the
device named "lkpi_iicbb". This child handle the conversion between
Linux i2c_msgs to FreeBSD iic_msgs.
With the help of iic(4), this expose the i2c controller to userspace
allowing a user to query DDC information from a monitor.
e.g.: i2c -f /dev/iic0 -a 0x28 -c 128 -d r
will query the standard EDID from the monitor if plugged.

The bitbang part (lkpi_iicbb) isn't tested at all for now as I don't have
compatible hardware (all my hardware have native i2c controller).

Tested on:	Intel (SandyBridge, Skylake, ApolloLake)
Tested on:	AMD (Picasso, Polaris (amd64 and arm64))

MFC after:	1 month
Reviewed by:	hselasky
Sponsored by:	Beckhoff Automation GmbH & Co. KG
Differential Revision:	https://reviews.freebsd.org/D33053
2022-01-25 16:15:39 +01:00
Edward Tomasz Napierala
9caeb82eab Revert "linux: Provide dummy seccomp(2)"
This reverts commit 56981629f9.

Wrong patch; fails to build on i386.
2022-01-20 22:25:15 +00:00
Edward Tomasz Napierala
56981629f9 linux: Provide dummy seccomp(2)
Don't emit warnings; this isn't any different from a Linux kernel
built without OPTIONS_SECCOMP, so the userspace already needs to know
how to deal with it.  This is also similar with how we handle seccomp
in linux_prctl().

Sponsored By:	EPSRC
Differential Revision: https://reviews.freebsd.org/D33808
2022-01-25 11:54:00 +00:00
Gleb Smirnoff
6d1808f051 if_clone: correctly destroy a clone from a different vnet
Try to live with cruel reality fact - if_vmove doesn't move an
interface from previous vnet cloning infrastructure to the new
one.  Let's admit this as design feature and make it work better.

* Delete two blocks of code that would fallback to vnet0, if a
  cloner isn't found.  They didn't do any good job and also whole
  idea of treating vnet0 as special one is wrong.
* When deleting a cloned interface, lookup its cloner using it's
  home vnet.

With this change simple sequence works correctly:

  ifconfig foo0 create
  jail -c name=jj persist vnet vnet.interface=foo0
  jexec jj ifconfig foo0 destroy

Differential revision:	https://reviews.freebsd.org/D33942
2022-01-24 21:07:16 -08:00
Gleb Smirnoff
54712fc423 if_vmove: improve restoration in cloner's ifgroup membership
* Do a single call into if_clone.c instead of two.  The cloner
  can't disappear since the interface sits on its list.
* Make restoration smarter - check that cloner with same name
  exists in the new vnet.

Differential revision:	https://reviews.freebsd.org/D33941
2022-01-24 21:06:59 -08:00
Thomas Steen Rasmussen
bc6abdd97e nd6: use CARP link level address in SLLAO for NS sent out
When sending an NS, check if we are using a IPv6 CARP address
and if we do, then put proper CARP link level address into
ND_OPT_SOURCE_LINKADDR option and also put PACKET_TAG_CARP tag
on the packet.  The latter will enforce CARP link level address
at the data link layer too, which might be necessary for broken
implementations.
The code really follows what NA sending code has been doing since
introduction of carp(4).  While here, bring to style(9) the whole
block of code.

PR:			193280
Differential revision:	https://reviews.freebsd.org/D33858
2022-01-24 21:02:47 -08:00
Eric Joyner
e438f0a975
ice_ddp: Update to 1.3.27.0
This is intended to be used with forthcoming ice(4) driver version 1.34.2.

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

Sponsored by:	Intel Corporation
2022-01-24 18:25:56 -08:00
Eric Joyner
213e91399b
iflib: Allow drivers to determine which queue to TX on
Adds a new function pointer to struct if_txrx in order to allow
drivers to set their own function that will determine which queue
a packet should be sent on.

Since this includes a kernel ABI change, bump the __FreeBSD_version
as well.

(This motivation behind this is to allow the driver to examine the
UP in the VLAN tag and determine which queue to TX on based on
that, in support of HW TX traffic shaping.)

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

Reviewed by:	kbowling@, stallamr@netapp.com
Tested by:	jeffrey.e.pieper@intel.com
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D31485
2022-01-24 18:22:02 -08:00
John Baldwin
2c4b65cc3d Bump __FreeBSD_version for the addition of <crypto/curve25519.h>.
Sponsored by:	The FreeBSD Foundation
2022-01-24 15:28:36 -08:00
John Baldwin
16cf646a6f crypto: Remove xform.c and compile xform_*.c standalone.
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33995
2022-01-24 15:27:40 -08:00
John Baldwin
faf470ffdc xform_*.c: Add headers when needed to compile standalone.
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33994
2022-01-24 15:27:40 -08:00
John Baldwin
991b84eca9 Retire now-unused M_XDATA.
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33993
2022-01-24 15:27:39 -08:00
John Baldwin
35d9e00dba IPsec: Use protocol-specific malloc types instead of M_XDATA.
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33992
2022-01-24 15:27:39 -08:00
John Baldwin
8f3f3fdf73 cryptodev: Use a private malloc type (M_CRYPTODEV) instead of M_XDATA.
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33991
2022-01-24 15:27:39 -08:00
John Baldwin
1d95c6f9c0 Don't implicitly pull in most of 'device crypto' for 'options IPSEC'.
options IPSEC is already documented as requiring 'device crypto' and
duplicating the dependencies is harder to read and not always
consistent.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33990
2022-01-24 15:27:39 -08:00
John Baldwin
0c6274a819 crypto: Add an API supporting curve25519.
This adds a wrapper around libsodium's curve25519 support matching
Linux's curve25519 API.  The intended use case for this is WireGuard.

Note that this is not integrated with OCF as it is not related to
symmetric operations on data.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33935
2022-01-24 15:27:39 -08:00
John Baldwin
a8c4147edc cxgbei: Parse all PDUs received prior to enabling offload mode.
Previously this would only handle a single PDU that did not contain
any data.  This should now handle an arbitrary number of PDUs.

While here check for these PDUs in the T6-specific CPL_RX_ISCSI_CMP
handler in addition to CPL_RX_ISCSI_DDP.

Reported by:	Jithesh Arakkan @ Chelsio
Sponsored by:	Chelsio Communications
2022-01-24 14:20:02 -08:00
Warner Losh
802f8d4afe mpr/mps: Remove write-only flag and callout
The discovery callout is initialized and cancelled only, making it
write-only. Remove a state flag associated with it being pending as well
as two defines that aren't used that are associated with it. Remove
MP?SAS_SHUTDOWN flag, which is unused.

Sponsored by:		Netflix
Reviewed by:		ken, scottl, mav
Differential Revision:	https://reviews.freebsd.org/D33925
2022-01-24 13:21:09 -07:00
John Baldwin
308fc7e5b1 user_getpeername: Use 'bool' for the compat argument.
This matches user_getsockname.

Reviewed by:	brooks, kib
Sponsored by:	The University of Cambridge, Google Inc.
Differential Revision:	https://reviews.freebsd.org/D33987
2022-01-24 09:51:35 -08:00
Kevin Lo
dea952c3e2 modules: mgb: need opt_platform.h
This fixes the standalone build.
2022-01-24 13:38:39 +08:00