Commit Graph

141717 Commits

Author SHA1 Message Date
Bjoern A. Zeeb
61a68e50d4 LinuxKPI: 802.11 enahnce linuxkpi_ieee80211_iterate_interfaces()
Add support for IEEE80211_IFACE_SKIP_SDATA_NOT_IN_DRIVER in
linuxkpi_ieee80211_iterate_interfaces() needed by a driver.

MFC after:	3 days
2022-02-16 03:56:54 +00:00
Bjoern A. Zeeb
c5b96b3eae LinuxKPI: 802.11 assign an(y) early chandef
The Realtek driver assumes an early chandef to be set.  At the time
of linuxkpi_ieee80211_ifattach() we do not really know one yet so
try to find the first one which is available and set that.
This prevents a NULL-deref panic.

MFC after:	3 days
2022-02-16 03:48:54 +00:00
Bjoern A. Zeeb
652e22d395 LinuxKPI: 802.11: defer workq allocation until we have a name
Turned out all the workq's taskqueues were named "wlanNA" if you had
more then one card in a machine as by the time we called wiphy_name()
the device name was not set yet and we returned the fallback.

Move the alloc_ordered_workqueue() from linuxkpi_ieee80211_alloc_hw()
to linuxkpi_ieee80211_ifattach() at which time the device name has
to be set to give us a unique name.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2022-02-16 03:26:30 +00:00
Bjoern A. Zeeb
d3ef7fb459 LinuxKPI: 802.11 scan update
Realtek's rtw88 is returning a hard-coded 1 in case they cannot
hw_scan (fw not advertising it).  In that case if we want any scan
to run we need to fall-back to sw scan.  Start dealing with this.
Long-term we probably need to keep internal state.

MFC after:	3 days
2022-02-16 03:11:01 +00:00
Mark Johnston
26b08c5d21 armv8crypto: Use cursors to access crypto buffer data
Currently armv8crypto copies the scheme used in aesni(9), where payload
data and output buffers are allocated on the fly if the crypto buffer is
not virtually contiguous.  This scheme is simple but incurs a lot of
overhead: for an encryption request with a separate output buffer we
have to
- allocate a temporary buffer to hold the payload
- copy input data into the buffer
- copy the encrypted payload to the output buffer
- zero the temporary buffer before freeing it

We have a handy crypto buffer cursor abstraction now, so reimplement the
armv8crypto routines using that instead of temporary buffers.  This
introduces some extra complexity, but gallatin@ reports a 10% throughput
improvement with a KTLS workload without additional CPU usage.  The
driver still allocates an AAD buffer for AES-GCM if necessary.

Reviewed by:	jhb
Tested by:	gallatin
Sponsored by:	Ampere Computing LLC
Submitted by:	Klara Inc.
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D28950
2022-02-15 21:50:41 -05:00
Mark Johnston
0b3235ef74 armv8crypto: Factor out some duplicated GCM code
This is in preparation for using buffer cursors.  No functional change
intended.

Reviewed by:	jhb
Sponsored by:	Ampere Computing LLC
Submitted by:	Klara Inc.
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D28948
2022-02-15 21:47:41 -05:00
Mark Johnston
09bfa5cf16 opencrypto: Add a routine to copy a crypto buffer cursor
This was useful in converting armv8crypto to use buffer cursors.  There
are some cases where one wants to make two passes over data, and this
provides a way to "reset" a cursor.

Reviewed by:	jhb
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D28949
2022-02-15 21:47:10 -05:00
Bjoern A. Zeeb
6baea3312d LinuxKPI: skbuff updates
Various updates to skbuff for new/updated drivers and some housekeeping:
- update types and struct members, add new (stub) functions
- improve freeing of frags.
- fix an issue with sleeping during alloc for dev_alloc_skb().
- Adjust a KASSERT for skb_reserve() which apparently can be called
  multiple times if no data was put into the skb yet.
- move the sysctl from linux_8022.c (which may be in a different module)
  to linux_skbuff.c so in case we turn debugging on we do not run into
  unresolved symbols.  Rename the sysctl variable to be less conflicting
  and update debugging macros along with that; also add IMPROVE().
- add DDB support to show an skbuff.
- adjust comments/whitespace.

No functional changes intended for iwlwifi.

Sponsored by:	The FreeBSD Foundation (partially)
MFC after:	3 days
2022-02-16 02:10:10 +00:00
Bjoern A. Zeeb
2e183d999c LinuxKPI: 802.11 header updates and add/adjust source dependencies.
This update is for more/newer versions of drivers:
- add and properly place more structs, enums, defines needed by drivers.
- correct types of struct fields.
- make various function arguments const.
- move REG_RULE() macro to its own file regulatory.h and
  use macros for calculations.
- add linuxkpi_ieee80211_get_channel() implementation.
- change linuxkpi_ieee80211_ifattach() to return int for error checking.

No intended functional changes for iwlwifi.

Sponsored by:	The FreeBSD Foundation (partially)
MFC after:	3 days
2022-02-15 23:45:15 +00:00
Bjoern A. Zeeb
064c110f4b LinuxKPI: lockdep add lockdep_assert_not_held()
Add lockdep_assert_not_held() asserting LA_UNLOCKED as needed by a
driver.

MFC after:	3 days
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D34232
2022-02-15 23:15:00 +00:00
Mateusz Guzik
e68a5225e8 fd: add fde_copy
To dedup handrolled memcpy. This will be used later to make fd code
atomic-clean.
2022-02-15 17:51:08 +00:00
Mateusz Guzik
ec12b4f4ff fd: add missing seqc to dupfdopen 2022-02-15 17:51:08 +00:00
Mateusz Guzik
c9a995994b seqc: rename seqc_consistent_nomb to seqc_consistent_no_fence
For more consistency with other primitives.
2022-02-15 17:51:07 +00:00
Richard Scheffenegger
972a7d95eb iscsi: Use calloutng instead of ticks in iscsi initiator
callout *_sbt functions are used to reduce ping/timeout scheduling
overhead, while allowing later improvments in the functionality.
Keep similar 1000ms callouts while adding a 10 ms window, to allow
some kernel scheduling improvements.

Reviewed By: jhb
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34222
2022-02-15 17:36:22 +01:00
Mark Johnston
235ed6a486 mlx5e: Make TLS tag zones unmanaged
These zones are cache zones used to allocate TLS offload contexts from
firmware.  Releasing items from the cache is a sleepable operation due
to the need to await a response from the firmware command freeing the
tag, so items cannot be reclaimed from the zone in non-sleepable
contexts.  Since the cache size is limited by firmware limits, avoid
this by setting UMA_ZONE_UNMANAGED to avoid reclamation by uma_timeout()
and the low memory handler.

Reviewed by:	hselasky, kib
MFC after:	3 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34142
2022-02-15 09:25:34 -05:00
Mark Johnston
389a3fa693 uma: Add UMA_ZONE_UNMANAGED
Allow a zone to opt out of cache size management.  In particular,
uma_reclaim() and uma_reclaim_domain() will not reclaim any memory from
the zone, nor will uma_timeout() purge cached items if the zone is idle.
This effectively means that the zone consumer has control over when
items are reclaimed from the cache.  In particular, uma_zone_reclaim()
will still reclaim cached items from an unmanaged zone.

Reviewed by:	hselasky, kib
MFC after:	3 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34142
2022-02-15 09:25:34 -05:00
Li-Wen Hsu
7442b63231
if_epair: Use ANSI C definition
This fixes -Werror=strict-prototypes from gcc9

Sponsored by:	The FreeBSD Foundation
2022-02-15 21:45:22 +08:00
Richard Scheffenegger
0c2832ee4f tcp: Restore 6 tcps padding entries in HEAD
The padding in CURRENT shall not shrink. It is
designed that in CURRENT at always stays
the same, and then when a new stable is branched, it
inherits 6 pointer placeholders that can be used
withing this stable/X lifetime to extend the structure.

Reviewed By: tuexen, #transport
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34269
2022-02-15 09:24:07 +01:00
Kristof Provost
24f0bfbad5 if_epair: implement fanout
Allow multiple cores to be used to process if_epair traffic. We do this
(if RSS is enabled) based on the RSS hash of the incoming packet. This
allows us to distribute the load over multiple cores, rather than
sending everything to the same one.

We also switch from swi_sched() to taskqueues, which also contributes to
better throughput.

Benchmark results:
With net.isr.maxthreads=-1

Setup A: (cc0 - bridge0 - epair0a) (epair0b - bridge1 - cc1)

Before          627 Kpps
After (no RSS)  1.198 Mpps
After (RSS)     3.148 Mpps

Setup B: (cc0 - bridge0 - epaira0) (epair0b - vnet jail - epair1a) (epair1b - bridge1 - cc1)

Before          7.705 Kpps
After (no RSS)  1.017 Mpps
After (RSS)     2.083 Mpps

MFC after:	3 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D33731
2022-02-15 09:03:24 +01:00
Wei Hu
de64aa32c8 mana: Add handling of CQE_RX_TRUNCATED
The proper way to drop this kind of CQE is advancing rxq tail
without indicating the packet to the upper network layer.

MFC after:	2 weeks
Sponsored by:	Microsoft
2022-02-15 07:27:42 +00:00
Bjoern A. Zeeb
05f0b24bfb Bump __FreeBSD_version to 1400052 for LinuxKPI changes.
Add a marker after GUID_INIT() and linux/pm_qos.h were added, so
that future version of drm-kmod can selectively remove these bits.
The latest port version does not require user updates for this so
no UPDATING entry.
2022-02-14 23:55:16 +00:00
Bjoern A. Zeeb
fa6d3522b5 LinuxKPI: add linux/pm_qos.h
Add a linux/pm_qos.h with three dummy functions and a struct as needed
by a driver and drm-kmod [1] with no intend to support this for the moment.

Submitted by:	wulf (drm-kmod bits) [1]
Sponsored by:	The FreeBSD Foundation (drm-kmod requested updates)
MFC after:	3 days
Reviewed by:	hselasky (earlier version), wulf
Differential Revision: https://reviews.freebsd.org/D34234
2022-02-14 23:53:17 +00:00
Bjoern A. Zeeb
97009980c4 LinuxKPI: add UUID_STRING_LEN and GUID_INIT to uuid.h
Add a definition for UUID_STRING_LEN to uuid.h as needed by a driver.
Also add GUID_INIT for drm-kmod [1].

Submitted by:	wulf [1]
MFC after:	3 days
Reviewed by:	hselasky (earlier), wulf
Differential Revision: https://reviews.freebsd.org/D34235
2022-02-14 23:51:51 +00:00
Bjoern A. Zeeb
cee56e77d7 LinuxKPI: 802.11: get rid of lkpi_ic_getradiocaps warnings
Users are seeing warnings about 2 channels (1 per band)
triggered by an ioctl from wpa_supplicant usually:
	lkpi_ic_getradiocaps: Adding chan ... returned error 55
This was an early FAQ.

Check the current number of channels against maxchans and the return
code from net80211. In case net80211 reports that we reached the limit
do not print the warning and do not try to add further channels.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2022-02-14 23:48:31 +00:00
Kristof Provost
78bc3d5e17 vlan: allow net.link.vlan.mtag_pcp to be set per vnet
The primary reason for this change is to facilitate testing.

MFC after:	1 week
2022-02-14 22:51:10 +01:00
Franco Fichtner
0143a6bb7f pf: fix set_prio after nv conversion
Reviewed by:	kp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D34266
2022-02-14 22:51:10 +01:00
Bjoern A. Zeeb
32cf376a01 net80211: enhance (disabled) debugging
Add maxchans to the disabled debugging in addchan() and copychan_prev()
to aid debugging possible errors rreturned due to reaching maxchans
limits.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2022-02-14 22:16:59 +00:00
John Baldwin
2f6a842484 Disable -Wreturn-type on GCC.
GCC is more pedantic than clang about warning when a function doesn't
handle undefined enum values (see GCC bug 87950).  Clang's warning
gives a more pragmatic coverage and should find any real bugs, so
disable the warning for GCC rather than adding __unreachable
annotations to appease GCC.

Reviewed by:	imp, emaste
Differential Revision:	https://reviews.freebsd.org/D34147
2022-02-14 11:48:47 -08:00
John Baldwin
becaf6433b Use vmspace->vm_stacktop in place of sv_usrstack in more places.
Reviewed by:	markj
Obtained from:	CheriBSD
Differential Revision:	https://reviews.freebsd.org/D34174
2022-02-14 10:57:30 -08:00
Gleb Smirnoff
65572cade3 unix/dgram: return EAGAIN instead of ENOBUFS when O_NONBLOCK set
This is behavior what some programs expect and what Linux does.  For
example nginx expects EAGAIN when sending messages to /var/run/log,
which it connects to with O_NONBLOCK.  Particularly with nginx the
problem is magnified by the fact that a ENOBUFS on send(2) is also
logged, so situation creates a log-bomb - a failed log message
triggers another log message.

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D34187
2022-02-14 09:21:55 -08:00
Mark Johnston
c7cd607a4e msdosfs: Fix mounting when the device sector size is >512B
HugeSectors * BytesPerSec should be computed before converting
HugeSectors to a DEV_BSIZE-based count.

Fixes:	ba2c98389b ("msdosfs: sanity check sector count from BPB")
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34264
2022-02-14 10:06:47 -05:00
Mark Johnston
852ff943b9 sleepqueue: Annotate sleepq_max_depth as static
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2022-02-14 10:06:47 -05:00
Mark Johnston
893be9d8ac sleepqueue: Address a lock order reversal
After commit 74cf7cae4d ("softclock: Use dedicated ithreads for
running callouts."), there is a lock order reversal between the per-CPU
callout lock and the scheduler lock.  softclock_thread() locks callout
lock then the scheduler lock, when preparing to switch off-CPU, and
sleepq_remove_thread() stops the timed sleep callout while potentially
holding a scheduler lock.  In the latter case, it's the thread itself
that's locked, and if the thread is sleeping then its lock will be a
sleepqueue lock, but if it's still in the process of going to sleep
it'll be a scheduler lock.

We could perhaps change softclock_thread() to try to acquire locks in
the opposite order, but that'd require dropping and re-acquiring the
callout lock, which seems expensive for an operation that will happen
quite frequently.  We can instead perhaps avoid stopping the
td_slpcallout callout if the thread is still going to sleep, which is
what this patch does.  This will result in a spurious call to
sleepq_timeout(), but some counters suggest that this is very rare.

PR:		261198
Fixes:		74cf7cae4d ("softclock: Use dedicated ithreads for running callouts.")
Reported and tested by:	thj
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34204
2022-02-14 10:06:47 -05:00
Bjoern A. Zeeb
a4529c46d4 LinuxKPI; add the beginning of a tracepoint.h implementation
Add a beginning of a tracepoint.h implementation to ease porting drivers
making use of this Linux facility.

MFC after:	3 days
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D34236
2022-02-14 00:24:43 +00:00
Bjoern A. Zeeb
85d61bd872 LinuxKPI: add NETIF_F_HW_CSUM to netdev_features.h
Add NETIF_F_HW_CSUM to netdev_features.h as needed by a driver.

MFC after:	3 days
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D34233
2022-02-14 00:22:24 +00:00
Bjoern A. Zeeb
c840d5cec2 LinuxKPI: add kstrtoint_from_user() and DECLARE_FLEX_ARRAY()
Add an implementation of kstrtoint_from_user() based on the other
implementations and an attempt at DECLARE_FLEX_ARRAY() which works
for the driver needing it.

MFC after:	3 days
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D34231
2022-02-14 00:20:41 +00:00
Bjoern A. Zeeb
0c37ffda79 LinuxKPI: add an initial ethtool.h
Add an initial ethtool.h for a define and a dummy struct for now
needed by drivers.

MFC after:	3 days
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D34229
2022-02-14 00:19:08 +00:00
Bjoern A. Zeeb
3cd6d6ff52 LinuxKPI: add eth_random_addr() and device_get_mac_address()
Add eth_random_addr() and a dummy of device_get_mac_address()
pending OF (FDT) support needed by drivers.

While here remove a white space in random_ether_addr().

MFC after:	3 days
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D34228
2022-02-14 00:17:14 +00:00
Bjoern A. Zeeb
8f33ad3cf5 LinuxKPI: add more errno
Add ENOMEDIUM, ENOSR, and ELNRNG to linux/errno.h needed by drivers.

MFC after:	3 days
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D34227
2022-02-14 00:15:41 +00:00
Bjoern A. Zeeb
e5b95b2201 LinuxKPI: add sizeof_field()
Add sizeof_field() to linux/compiler.h needed by a driver.

MFC after:	3 days
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D34226
2022-02-14 00:13:56 +00:00
Bjoern A. Zeeb
d17b78aa14 LinuxKPI: add __ffs64()
Add __ffs64() to linux/bitops.h needed by a driver.

Reviewed by:	hselasky
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D34225
2022-02-14 00:12:09 +00:00
Bjoern A. Zeeb
2e818fbcfc LinuxKPI: add get_unaligned_le16()
Add get_unaligned_le16() to asm/unaligned.h needed by a driver.

MFC after:	3 days
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D34224
2022-02-14 00:09:57 +00:00
Bjoern A. Zeeb
232d323ef2 TCP syncache: enhance KASSERT output
Improve the "syncache: mbuf too small" assertion message with various
variables (some not actually needed) but enough that it will be obvious
if (a) we use IPv4 or IPv6, (b) if UDP tunneling is on, (c) what
max_linkhdr is, and (d) what MHLEN is.

This should help diagnostics in the future.
The case was hit with wireless drivers setting a large ic_headroom
and using IPv6.

Reviewed by:	gallatin, tuexen, rscheff
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D34217
2022-02-14 00:03:20 +00:00
Dimitry Andric
09d0a0fbe8 bwi: Fix clang 14 warning about possible unaligned access
On architectures with strict alignment requirements (e.g. arm), clang 14
warns about a packed struct which encloses a non-packed union:

In file included from sys/dev/bwi/bwimac.c:79:
sys/dev/bwi/if_bwivar.h:308:7: error: field iv_val within 'struct bwi_fw_iv' is less aligned than 'union (unnamed union at sys/dev/bwi/if_bwivar.h:305:2)' and is usually due to 'struct bwi_fw_iv' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access]
	}			iv_val;
				^

It appears to help if you also add __packed to the inner union (i.e.
iv_val). No change to the layout is intended.

MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D34196
2022-02-13 14:35:58 +01:00
Mateusz Guzik
6aa246e605 vfs: convert vnsz2log to a macro 2022-02-13 13:07:08 +00:00
Mateusz Guzik
5c31025060 fd: use FILEDESC_FOREACH_{FDE,FP} where appropriate 2022-02-13 13:07:08 +00:00
Mateusz Guzik
60b699f99c fd: add FILEDESC_FOREACH_{FDE,FP}
Right now they naively walk the fd table just like all the other code,
but that's going to change.
2022-02-13 13:07:08 +00:00
Mateusz Guzik
809f3121be fd: assign fd_freefile early when copying
This is to simplify an upcomming change.
2022-02-13 13:07:08 +00:00
Mateusz Guzik
893d20c95a fd: move fd table sizing out of fdinit
now it is placed with the rest of actual initialisation
2022-02-13 13:07:08 +00:00
Mateusz Guzik
4103c3cd5b fd: drop volatile keyword from refcounts
While here move a comment where it belongs and do small whitespace clean up.
2022-02-13 13:07:08 +00:00
Mateusz Guzik
b53133a778 proc: load/store p_cowgen using atomic primitives 2022-02-13 13:07:08 +00:00
Mateusz Guzik
29ee49f66b thread: remove dead store from thread_cow_update 2022-02-13 13:07:08 +00:00
Richard Scheffenegger
70e9f880d8 iscsi: address unused-but-set-variable warning
remove "interrupted" in icl_soft_proxy_connect()

Reviewed By: hselasky
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34223
2022-02-12 06:15:02 +01:00
Navdeep Parhar
08c7dc7fd4 cxgbe(4): Fix illegal hardware access in cxgbe_refresh_stats.
cxgbe_refresh_stats takes into account VI_SKIP_STATS but not
VI_INIT_DONE when deciding whether to read the hardware stats.  But
before this change VI_SKIP_STATS was set only for VIs with VI_INIT_DONE.
That meant that cxgbe_refresh_stats always accessed the hardware for
uninitialized VIs, and this is a problem if the adapter is suspended or
in the middle of a reset.

Fix this by setting VI_SKIP_STATS on all VIs during suspend.  While
here, ignore VI_INIT_DONE in vi_refresh_stats too to be consistent with
cxgbe_refresh_stats.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2022-02-12 09:53:50 -08:00
Ed Maste
acfb506b3d newvers.sh: allow multiple -V args in one invocation
Reviewed by:	imp
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34253
2022-02-12 11:06:54 -05:00
Navdeep Parhar
39a36707bd cxgbe(4): Avoid unsafe hardware access in the ifmedia ioctls.
The hardware is unavailable when the device is suspended or in the
middle of a reset.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2022-02-11 16:35:27 -08:00
John Baldwin
cd0525f615 ktls: Write-lock the INP when changing a transmit TLS session.
The TCP rate pacing code relies on being able to read this pointer
safely while holding an INP lock.  The initial TLS session pointer is
set while holding the write lock already.

Reviewed by:	gallatin, hselasky
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D34086
2022-02-11 15:16:25 -08:00
Konstantin Belousov
fd8d4e53bc vdso linker scripts: explicitly specify output arch and target
Requested by:	jhb
Reviewed by:	emaste, imp, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34157
2022-02-12 00:32:23 +02:00
John Baldwin
b0c1600a8c linuxkpi xarray: Correct expression in assertion.
Reported by:	GCC -Wparantheses
Reviewed by:	wulf, hselasky
Differential Revision:	https://reviews.freebsd.org/D34197
2022-02-11 13:59:27 -08:00
Mateusz Guzik
1d65a9b47e cache: improve vnode vs name assertion in cache_enter_time 2022-02-11 12:29:26 +00:00
Mateusz Guzik
611470a515 cache: remove NOCACHE handling from cache_fplookup_noentry
It was copy-pasted from locked lookup. As LOOKUP operation cannot have
the flag set it was always ending up setting MAKEENTRY.
2022-02-11 12:29:26 +00:00
Mateusz Guzik
513c7a6e0c fd: make fget_unlocked take a thread argument
Just like other fget routines. This enables embedding fd table pointer
in struct thread, avoiding taking a trip through proc.
2022-02-11 12:29:26 +00:00
Mateusz Guzik
45bb8beacc fd: elide one acquire fence in fget_unlocked_seq
Still validate we got the stable state before returning an error though.
2022-02-11 12:29:26 +00:00
Mateusz Guzik
62849eef5b fd: split fget_unlocked_seq depending on CAPABILITIES
This will simplify an upcoming change.
2022-02-11 12:27:22 +00:00
Mateusz Guzik
b937908e41 fd: split fget_cap depending on CAPABILITIES
This will simplify an upcoming change.
2022-02-11 12:13:27 +00:00
Mateusz Guzik
f40dd6c803 tty: switch ttyhook_register to use fget_cap_locked
It is still wrong-ish as fget* funcs don't expect to operate on abitrary
file descriptor tables, but this at least moves it out of the way of an
upcoming change while being bug-compatible.
2022-02-11 12:13:27 +00:00
Mateusz Guzik
93288e2445 Employ thread_cow_synced in setrlimit
In order to avoid proc lock/unlock on next kernel entry.
2022-02-11 11:44:07 +00:00
Mateusz Guzik
32114b639f Add PROC_COW_CHANGECOUNT and thread_cow_synced
Combined they can be used to avoid a proc lock/unlock cycle in the
syscall handler for curthread, see upcoming examples.
2022-02-11 11:44:07 +00:00
Mateusz Guzik
8a0cb04df4 Add lim_cowsync, similar to crcowsync 2022-02-11 11:44:07 +00:00
Warner Losh
0987dc5be5 riscv: Add static asssert for context size
Add a static assert for the siginfo_t, mcontext_t and ucontext_t
sizes. These are de-factor ABI options and cannot change size ever.

Differential Revision:	https://reviews.freebsd.org/D34214
2022-02-10 14:32:21 -07:00
Warner Losh
6e48e160ae powerpc: Add static asssert for context size
Add a static assert for the siginfo_t, mcontext_t and ucontext_t
sizes. These are de-facto ABI options and cannot change size ever. For
powerpc64, also add asserts for {u,m}mcontext32_t and siginfo32.

Reviewed by:		andrew
Differential Revision:	https://reviews.freebsd.org/D34213
2022-02-10 14:32:20 -07:00
Warner Losh
690601f0b4 amd64: Add static asssert for context size
Add a static assert for the siginfo_t, mcontext_t and ucontext_t
sizes. These are de-facto ABI options and cannot change size ever.

Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D34212
2022-02-10 14:32:20 -07:00
Warner Losh
d4f495fbf8 i386: Add static asssert for context size
Add a static assert for the siginfo_t, mcontext_t and ucontext_t
sizes. These are de-facto ABI options and cannot change size ever.

Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D34211
2022-02-10 14:32:20 -07:00
Warner Losh
0c988f92dc arm: Add static asssert for context size
Add a static assert for the siginfo_t, mcontext_t and ucontext_t
sizes. These are de-facto ABI options and cannot change size ever.

Reviewed by:		andrew
Differential Revision:	https://reviews.freebsd.org/D34210
2022-02-10 14:32:20 -07:00
Warner Losh
3988ca5aab aarch64: Add static asssert for context size
Add a static assert for the siginfo{,32}_t, mcontext{,32}_t and
ucontext{,32}_t sizes. These are de-facto ABI options and cannot change
size ever.

Reviewed by:		kib, andrew, jhb
Differential Revision:	https://reviews.freebsd.org/D32958
2022-02-10 14:32:20 -07:00
Jason A. Harmening
974efbb3d5 unionfs: fix typo in comment
I deleted the wrong word when writing up a comment in a prior change;
the covered vnode may be recursed during any unmount, not just forced
unmount.
2022-02-10 15:17:43 -06:00
Mark Johnston
b4f60fab5d tcp: Avoid conditionally defined fields in union lro_address
The layout of the structure ends up depending on whether the including
file includes opt_inet.h and opt_inet6.h, so different compilation units
can end up seeing different versions of the structure.  Fix this by
unconditionally defining the address fields.

As a side effect, this eliminates some duplication in the kernel's CTF
type graph.

Reviewed by:	rscheff, tuexen
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34242
2022-02-10 15:39:58 -05:00
Richard Scheffenegger
3f169c54ab tcp: Add/update AccECN related statistics and numbers
Reserve couters in the tcps struct in preparation
for AccECN, extend the debugging output for TF2
flags, optimize the syncache flags from individual
bits to a codepoint for the specifc ECN handshake.

This is in preparation of AccECN.

No functional chance except for extended debug
output capabilities.

Reviewed By: #transport, rrs
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34161
2022-02-10 00:21:31 +01:00
Justin Hibbits
6db44b0158 Fix gzip compressed core dumps on big endian architectures
The gzip trailer words (size and CRC) are both little-endian per the spec.

MFC after:	3 days
Sponsored by:	Juniper Networks, Inc.
2022-02-10 09:34:37 -06:00
Konstantin Belousov
b51927b7b0 Revert "vm_pageout_scans: correct detection of active object"
This reverts commit 3de96d664a.

Problem is that it is possible to reach the state with ref_count ==
1 for the mapped non-anonymous object. For instance, anonymous posix
shmfd or linux shmfs object could be mapped, and then corresponding
file descriptor closed, dropping the object reference owned by the
shmfd/shmfs file.  Then the check in inactive scan assumes that the
object and page are not mapped and frees the page, while they are not.

PR:	261707
Discussed with:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	now
2022-02-10 16:55:10 +02:00
Hans Petter Selasky
a30f71704e mlx5ib: Add support for parsing udata in mlx5_ib_create_flow().
Backport from Linux 5.17 (drivers/infiniband/hw/mlx5/fs.c)

This fixes creating flow rules from user-space after the
kernel space update based on Linux 5.7-rc1 .

Sponsored by:	NVIDIA Networking
2022-02-10 11:17:42 +01:00
Hans Petter Selasky
04f407a3e5 mlx5en: Make sure the NIC IP addresses are written to firmware on link up.
Fixes e059c120b4 .

PR:		261746
MFC after:	1 day
Sponsored by:	NVIDIA Networking
2022-02-10 11:17:42 +01:00
Kyle Evans
b9c92d631c Annotate geom_md with MODULE_VERSION
This was missed in 74d6c131cb where other geom modules were annotated
with MODULE_VERSION.  Again, the problem is the same: we can't detect
that geom_md is loaded into the kernel without it.

This was noticed in release builds on the cluster; mdconfig attempts to
load geom_md because it can't detect it in the kernel, but the cluster
config includes md(4) and does not build the kmod.  This problem would
have been masked on hosts with the kmod built, as the kmod attempts to
register the g_md module and fails.  With this commit, mdconfig would
not even try to load it again.

Reported by:	re (cperciva)
MFC after:	3 days
2022-02-10 00:16:19 -06:00
Rick Macklem
17a56f3fab nfsd: Reply NFSERR_SEQMISORDERED for bogus seqid argument
The ESXi NFSv4.1 client bogusly sends the wrong value
for the csa_sequence argument for a Create_session operation.
RFC8881 requires this value to be the same as the sequence
reply from the ExchangeID operation most recently done for
the client ID.

Without this patch, the server replies NFSERR_STALECLIENTID,
which is the correct response for an NFSv4.0 SetClientIDConfirm
but is not the correct error for NFSv4.1/4.2, which is
specified as NFSERR_SEQMISORDERED in RFC8881.
This patch fixes this.

This change does not fix the issue reported in the PR, where
the ESXi client loops, attempting ExchangeID/Create_session
repeatedly.

Reported by:	asomers
Tested by:	asomers
PR:	261291
MFC after:	1 week
2022-02-09 15:17:50 -08:00
Kenneth D. Merry
3090d5045a Fix non-printable characters in NVMe model and serial numbers.
The NVMe 1.4 spec simply says that Model and Serial numbers are
ASCII strings.  Unlike SCSI, it doesn't prohibit non-printable
characters or say that the strings should be padded with spaces.

Since 2014, we have had cam_strvis_sbuf(), which gives additional
options for handling non-ASCII characters.  That behavior hasn't
been available for non-sbuf consumers, so users of cam_strvis()
were left with having octal ASCII codes inserted.

So, to avoid having garbage or octal chracters in the strings, use
cam_strvis_sbuf() to create a new function, cam_strvis_flag(), and
re-implement cam_strvis() using cam_strvis_flag().

Now, for the NVMe drives, we can use cam_strvis_flag with the
CAM_STRVIS_FLAG_NONASCII_SPC flag.  This transforms non-printable
characters into spaces.

sys/cam/cam.c:
	Add a new function, cam_strvis_flag(), that creates an sbuf
	on the stack with the user's destination buffer, and calls
	cam_strvis_sbuf() with the given flag argument.

	Re-implement cam_strvis() to call cam_strvis_flag with the
	CAM_STRVIS_FLAG_NONASCII_ESC argument.  This should be the
	equivalent of the old cam_strvis() function, except for the
	overhead of creating the sbuf and calling sbuf_putc/printf.

sys/cam/cam.h:
	Declaration for cam_strvis_flag.

sys/cam/nvme/nvme_all.c:
	In nvme_print_ident, use the NONASCII_SPC flag with
	cam_strvis_flag().

sys/cam/nvme/nvme_da.c:
	In ndaregister(), use cam_strvis_flag() with the
	NONASCII_SPC flag for the disk description and serial
	number we report to GEOM.

sys/cam/nvme/nvme_xpt.c:
	In nvme_probe_done(), use cam_strvis_flag with the
	NONASCII_SPC flag when storing the drive serial number
	in the CAM EDT.

MFC after:	1 week
Sponsored by:	Spectra Logic
Differential Revision:	https://reviews.freebsd.org/D33973
2022-02-09 17:09:25 -05:00
Alexander Motin
98d59d2e0d snd_hda: Add some ATI HDMI codec IDs.
Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:	1 week
2022-02-09 16:29:23 -05:00
Randall Stewart
cc41c17433 opps my patch lost the removal of the tlp_threshold counter increments 2022-02-09 16:19:22 -05:00
Randall Stewart
8d64b4b4c4 cleanup of rack variables.
During a recent deep dive into all the variables so I could
discover why stack switching caused larger retransmits I examined
every variable in rack. In the process I found quite a few bits
that were not used and needed cleanup. This update pulls
out all the unused pieces from rack. Note there are *no* functional
changes here, just the removal of unused variables and a bit of
spacing clean up.

Reviewed by: Michael Tuexen, Richard Scheffenegger
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D34205
2022-02-09 16:08:32 -05:00
Aleksandr Fedorov
b27e6e91d0 ng pppoe(4): Add the required NET_EPOCH section to the hook
disconnection function.

Disconnecting hooks are called outside of NET_EPOCH, but
ng_pppoe_disconnect() calls NG_SEND_DATA_ONLY() which should be called
in NET_EPOCH.

PR:	257067
Reported by:	niels=freebsd@bakker.net
Reviewed by:	vmaffione (mentor), glebius, donner
Approved by:	vmaffione (mentor), glebius, donner
Sponsored by:	vstack.com
Differential Revision:	https://reviews.freebsd.org/D34185
2022-02-09 22:00:50 +03:00
Michael Tuexen
a0aeb1cef5 in_pcb.c: fix compilation of an IPv4 only configuration
While there, remove a duplicate inclusion of sysctl.h.

Reported by:	Gary Jennejohn
Fixes:		a35bdd4489 - main - tcp: add sysctl interface for setting socket options
Sponsored by:	Netflix, Inc.
2022-02-09 19:58:29 +01:00
Michael Tuexen
a35bdd4489 tcp: add sysctl interface for setting socket options
This interface allows to set a socket option on a TCP endpoint,
which is specified by its inp_gencnt. This interface will be
used in an upcoming command line tool tcpsso.

Reviewed by:		glebius, rrs
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D34138
2022-02-09 12:24:41 +01:00
Michael Tuexen
528c764924 tcp: fix compliation when KERN_TLS is not defined
Reported by:	Gary Jennejohn
Fixes:		fd7daa7271 - main - tcp: make tcp_ctloutput_set() non-static
Sponsored by:	Netflix, Inc.
2022-02-09 12:16:43 +01:00
Ram Kishore Vegesna
7bf31432fd ocs_fc: Fix a possible Null pointer dereference
Fix a possible Null pointer dereference in ocs_hw_get_profile_list_cb()

PR: 261453
Reported by: lwhsu

MFC after: 3 days
2022-02-09 16:18:21 +05:30
Stefan Grundmann
06296f77c5 vt: fix splash_cpu logos use of vd_drawrect
In the (extremely unlikely) case of vd->vd_height ==
vt_logo_sprite_height the vd_drawrect code would write outside of
frame-buffer memory.

MFC after:	1 week
Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D34220
2022-02-08 22:22:07 -05:00
Michael Tuexen
fd7daa7271 tcp: make tcp_ctloutput_set() non-static
tcp_ctloutput_set() will be used via the sysctl interface in a
upcoming command line tool tcpsso.

Reviewed by:		glebius, rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D34164
2022-02-08 18:49:44 +01:00
Dimitry Andric
5f2aca8394 Disable clang 14 warning about bitwise operators in zstd
Parts of zstd, used in openzfs and other places, trigger a new clang 14
-Werror warning:

```
sys/contrib/zstd/lib/decompress/huf_decompress.c:889:25: error: use of bitwise '&' with boolean operands [-Werror,-Wbitwise-instead-of-logical]
                        (BIT_reloadDStreamFast(&bitD1) == BIT_DStream_unfinished)
                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```

While the warning is benign, it should ideally be fixed upstream and
then vendor-imported, but for now silence it selectively.

MFC after:	3 days
2022-02-08 21:46:08 +01:00
Dimitry Andric
7d8a4eb943 tty_info: Avoid warning by using logical instead of bitwise operators
Since TD_IS_RUNNING() and TS_ON_RUNQ() are defined as logical
expressions involving '==', clang 14 warns about them being checked with
a bitwise operator instead of a logical one:

```
sys/kern/tty_info.c:124:9: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]
        runa = TD_IS_RUNNING(td) | TD_ON_RUNQ(td);
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                 ||
sys/sys/proc.h:562:27: note: expanded from macro 'TD_IS_RUNNING'
                                ^
sys/kern/tty_info.c:124:9: note: cast one or both operands to int to silence this warning
sys/sys/proc.h:562:27: note: expanded from macro 'TD_IS_RUNNING'
                                ^
sys/kern/tty_info.c:129:9: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]
        runb = TD_IS_RUNNING(td2) | TD_ON_RUNQ(td2);
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                  ||
sys/sys/proc.h:562:27: note: expanded from macro 'TD_IS_RUNNING'
                                ^
sys/kern/tty_info.c:129:9: note: cast one or both operands to int to silence this warning
sys/sys/proc.h:562:27: note: expanded from macro 'TD_IS_RUNNING'
                                ^
```

Fix this by using logical operators instead. No functional change
intended.

Reviewed by:	cem, emaste, kevans, markj
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D34186
2022-02-08 21:21:04 +01:00
Dimitry Andric
14a15342bb Remove device lio from i386's LINT-NOIP
This fixes link errors for the LINT-NOIP kernel on i386:

```
ld: error: undefined symbol: tcp_lro_flush_all
>>> referenced by lio_droq.c
>>>               lio_droq.o:(lio_droq_process_packets)

ld: error: undefined symbol: tcp_lro_rx
>>> referenced by lio_core.c
>>>               lio_core.o:(lio_push_packet)

ld: error: undefined symbol: tcp_lro_init
>>> referenced by lio_main.c
>>>               lio_main.o:(lio_attach)

ld: error: undefined symbol: tcp_lro_free
>>> referenced by lio_main.c
>>>               lio_main.o:(lio_attach)
>>> referenced by lio_main.c
>>>               lio_main.o:(lio_destroy_nic_device)
*** [kernel] Error code 1
```

MFC after:	3 days
2022-02-08 19:53:52 +01:00
Mark Johnston
c862d5f2a7 riscv: Fix a race in pmap_pinit()
All pmaps share the top half of the address space.  With 3-level page
tables, the top-level kernel map entries are not static: they might
change if the kernel map is extended (via pmap_growkernel()) or a 1GB
mapping in the direct map is demoted (not implemented yet).  Thus the
riscv pmap maintains the allpmaps list to synchronize updates to
top-level entries.

When a pmap is created, it is inserted into this list after copying
top-level entries from the kernel pmap.  The copying is done without
holding the allpmaps lock, and it is possible for pmap_pinit() to race
with kernel map updates.  In particular, if a thread is modifying L1
entries, and a concurrent pmap_pinit() copies the old version of the
entries, it might not receive the update.

Fix the problem by copying the kernel map entries after inserting the
pmap into the list.  This ensures that the nascent pmap always receives
updates, though pmap_distribute_l1() may race with the page copy.

Reviewed by:	mhorne, jhb
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34158
2022-02-08 13:31:55 -05:00
Mark Johnston
5de79eeddb ktls: Disallow transmitting empty frames outside of TLS 1.0/CBC mode
There was nothing preventing one from sending an empty fragment on an
arbitrary KTLS TX-enabled socket, but ktls_frame() asserts that this
could not happen.  Though the transmit path handles this case for TLS
1.0 with AES-CBC, we should be strict and allow empty fragments only in
modes where it is explicitly allowed.

Modify sosend_generic() to reject writes to a KTLS-enabled socket if the
number of data bytes is zero, so that userspace cannot trigger the
aforementioned assertion.

Add regression tests to exercise this case.

Reported by:	syzkaller
Reviewed by:	gallatin, jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34195
2022-02-08 12:40:41 -05:00
Mark Johnston
300cfb96fc file: Make fget*() and getvnode*() consistent about initializing *fpp
Most fget*() functions initialize the output parameter to NULL.  Make
the externally visible interface behave consistently, and make
fget_unlocked_seq() private to kern_descrip.c.

This fixes at least one bug in a consumer, _filemon_wrapper_openat(),
which assumes that getvnode() sets the output file pointer to NULL upon
an error.

Reported by:	syzbot+01c0459408f896a5933a@syzkaller.appspotmail.com
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34190
2022-02-08 12:40:41 -05:00
Franco Fichtner
47ded797ce netinet: simplify RSS ifdef statements
Approved by:	transport (rrs)
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D31583
2022-02-07 19:22:03 -07:00
John Baldwin
511b83b167 cxgbei: Replace worker thread pools with per-connection kthreads.
Having a single pool of worker threads adds extra complexity and
overhead.  The software backend also uses per-connection kthreads.

Sponsored by:	Chelsio Communications
2022-02-07 16:20:40 -08:00
John Baldwin
fd8f61d6e9 cxgbei: Dispatch sent PDUs to the NIC asynchronously.
Previously the driver was called to send PDUs to the NIC synchronously
from the icl_conn_pdu_queue_cb callback.  However, this performed a
fair bit of work while holding the icl connection lock.  Instead,
change the callback to add sent PDUs to a STAILQ and defer dispatching
of PDUs to the NIC to a helper thread similar to the scheme used in
the TCP iSCSI backend.

- Replace rx_flags int and the sole RXF_ACTIVE flag with a simple
  rx_active bool.

- Add a pool of transmit worker threads for cxgbei.

- Fix worker thread exit to depend on the wakeup in kthread_exit()
  to fix a race with module unload.

Reported by:	mav
Sponsored by:	Chelsio Communications
2022-02-07 16:20:06 -08:00
Hans Petter Selasky
e85af89fa7 Add more USB host controller PCI ID's.
Submitted by:	Gary Jennejohn <gljennjohn@gmail.com>
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-08 00:29:24 +01:00
John Baldwin
6426978617 Extend the VMM stats interface to support a dynamic count of statistics.
- Add a starting index to 'struct vmstats' and change the
  VM_STATS ioctl to fetch the 64 stats starting at that index.
  A compat shim for <= 13 continues to fetch only the first 64
  stats.

- Extend vm_get_stats() in libvmmapi to use a loop and a static
  thread local buffer which grows to hold the stats needed.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D27463
2022-02-07 14:11:10 -08:00
Kristof Provost
3f3e4f3c74 dummynet: don't use per-vnet locks to protect global data.
The ref_count counter is global (i.e. not per-vnet) so we can't use a
per-vnet lock to protect it. Moreover, in callouts curvnet is not set,
so we'd end up panicing when trying to use DN_BH_WLOCK().

Instead we use the global sched_lock, which is already used when
evaluating ref_count (in unload_dn_aqm()).

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D34059
2022-02-07 22:59:46 +01:00
John Baldwin
7043ca9140 linuxkpi: Add parentheses to pacify -Wparentheses warnings from GCC.
Reviewed by:	bz, emaste
Differential Revision:	https://reviews.freebsd.org/D34145
2022-02-07 13:43:22 -08:00
Sebastian Huber
3ec0dc367b kern_ntptime.c: Remove ntp_init()
The ntp_init() function did set a couple of global objects to zero.  These
objects are in the .bss section and already initialized to zero during kernel
or module loading.
2022-02-07 14:16:16 -07:00
John Baldwin
a3d71fffa7 cfiscsi_done: Free the dummy PDU earlier.
The dummy PDU needs to be freed before marking task abortion complete
as otherwise cfiscsi_session_terminate_tasks can return and destroy
the session in another thread before the PDU is freed.

Fixes:		2e8d1a5525 iscsi: Allocate a dummy PDU for the internal nexus reset task.
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D34176
2022-02-07 12:55:08 -08:00
John Baldwin
c227269e2f Stop adding -Wredundant-decls to CWARNFLAGS.
clang doesn't implement it, and Linux doesn't enforce it.  As a
result, new instances keep cropping up both in FreeBSD's code and in
upstream sources from vendors.

Reviewed by:	emaste
Differential Revision:	https://reviews.freebsd.org/D34144
2022-02-07 12:47:51 -08:00
John Baldwin
949e395966 Trim duplicate code for copying in iovecs for PT_[GS]ETREGSET.
Reviewed by:	andrew, emaste
Differential Revision:	https://reviews.freebsd.org/D34177
2022-02-07 11:49:29 -08:00
Robert Wing
c9e023541a pbuf_ctor(): lock the buffer with LK_NOWAIT
This LOR happens when reading from a file backed MD device:

lock order reversal:
 1st 0xfffffe00431eaac0 pbufwait (pbufwait, lockmgr) @ /cobra/src/sys/vm/vm_pager.c:471
 2nd 0xfffff80003f17930 ufs (ufs, lockmgr) @ /cobra/src/sys/dev/md/md.c:977
lock order pbufwait -> ufs attempted at:
    #0 0xffffffff80c78ead at witness_checkorder+0xbdd
    #1 0xffffffff80bd6a52 at lockmgr_lock_flags+0x182
    #2 0xffffffff80f52d5c at ffs_lock+0x6c
    #3 0xffffffff80d0f3f4 at _vn_lock+0x54
    #4 0xffffffff80708629 at mdstart_vnode+0x499
    #5 0xffffffff807060ec at md_kthread+0x20c
    #6 0xffffffff80bbfcd0 at fork_exit+0x80
    #7 0xffffffff810b809e at fork_trampoline+0xe

This LOR was previously blessed by witness before commit 531f8cfea0
("Use dedicated lock name for pbufs").

Instead of blessing ufs and pbufwait, use LK_NOWAIT to prevent recording
the lock order. LK_NOWAIT will be a nop here as the lock is dropped in
pbuf_dtor(). The takes the same approach as 5875b94c74 ("buf_alloc():
lock the buffer with LK_NOWAIT").

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D34183
2022-02-07 10:05:20 -09:00
Jung-uk Kim
379797d4b4 usb(4): Belatedly add a PCI device ID for AMD Bolton chipset 2022-02-07 13:48:09 -05:00
Andrew Turner
31cf95cec7 Stop single stepping in signal handers on arm64
We should clear the single step flag when entering a signal hander and
set it when returning. This fixes the ptrace__PT_STEP_with_signal test.

While here add support for userspace to set the single step bit as on
x86. This can be used by userspace for self tracing.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34170
2022-02-07 15:03:23 +00:00
Richard Scheffenegger
ab001fcdf2 tcp: Apply tcp flags after ECN processing in rack_fast_output()
Missed to move the tcp_set_flags() past ECN processing
in rack_fast_output() earlier.

Reviewed By: rrs, #transport
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34180
2022-02-07 03:28:27 +01:00
Andrew Turner
67dc576bae Fix the signal code on 32-bit breakpoints on arm64
When debugging 32-bit programs a debugger may insert a instruction that
will raise the undefined instruction trap. The kernel handles these
by raising a SIGTRAP, however the code was incorrect.

Fix this by using the expected TRAP_BRKPT signal code.

Sponsored by:	The FreeBSD Foundation
2022-02-07 11:56:04 +00:00
Randall Stewart
a9696510f5 tcp: Add hystart++ to our cubic implementation.
As promised to the transport call on 11/4/22 here is an implementation
of hystart++ for cubic. It also cleans up the tcp_congestion function
to have a better name. Common variables are moved into the general
cc.h structure so that both cubic and newreno can use them for
hystart++

Reviewed by: Michael Tuexen, Richard Scheffenegger
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D33035
2022-02-07 06:37:46 -05:00
Roger Pau Monné
476438e81f xen: remove public headers in sys/xen/interface
Those are superseded by the ones in sys/contrib/xen and no longer
used.

Sponsored by: Citrix Systems R&D
2022-02-07 10:12:34 +01:00
Elliott Mitchell
ad7dd51499 xen: switch to use headers in contrib
These headers originate with the Xen project and shouldn't be mixed with
the main portion of the FreeBSD kernel. Notably they shouldn't be the
target of clean-up commits.

Switch to use the headers in sys/contrib/xen.

Reviewed by: royger
2022-02-07 10:11:56 +01:00
Roger Pau Monné
3a9fd8242b xen: import Xen 4.16 public headers in sys/contrib/
The current path of the Xen headers at /sys/xen/interface/ is not
correct, as those headers are imported verbatim from the Xen sources
and shouldn't be modified, as any modifications would be lost when a
new version is imported.

Changes to the public headers must be first done in Xen upstream so
that they can be backported and new imports will already carry them.

Import Xen 4.16 headers in sys/contrib/xen/. It's unlikely that we
will import different Xen code, so don't place them inside of any
subdirectory. If in the future other pieces of Xen code need to be
imported the headers will need to move into an include/ subdirectory.

Note that this commit does not yet modify the include path to use the
newly imported headers.

Sponsored by: Citrix Systems R&D
2022-02-07 10:11:56 +01:00
Elliott Mitchell
b6da4ec609 xen: remove leftover bits missed in commit ac3ede5371
These fields are now unused, remove them.

Fixes: ac3ede5371 ('x86/xen: remove PVHv1 code')
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D31206
2022-02-07 10:06:27 +01:00
Roger Pau Monné
759ae58c00 xen/grant-table: remove explicit linear mapping additions
There's no need to explicitly add linear mappings for the grant table
area, as the memory is allocated using xenmem_alloc and it should
already have a linear mapping that can be obtained using
rman_get_virtual.

While there also remove the return value of gnttab_map, since there's
no return value anymore.

Sponsored by: Citrix Systems R&D
Reviewed by: Elliott Mitchell <ehem+freebsd@m5p.com>
Differential revision: https://reviews.freebsd.org/D29602
2022-02-07 10:06:27 +01:00
Gordon Bergling
b6724f7004 tegra: Fix a common typo in source code comments
- s/ajusted/adjusted/

MFC after:	3 days
2022-02-06 17:31:05 +01:00
Gordon Bergling
5a78ec9e7c kern_fflock: Fix a typo in a source code comment
- s/foward/forward/

MFC after:	3 days
2022-02-06 17:29:43 +01:00
Gordon Bergling
a9bee9c77a kern_racct: Fix a typo in a source code comment
- s/maxumum/maximum/

MFC after:	3 days
2022-02-06 17:28:27 +01:00
Gordon Bergling
8ea3ceda76 fs: fix a few common typos in source code comments
- s/quadradically/quadratically/
- s/persistant/persistent/

Obtained from:	NetBSD
MFC after:	3 days
2022-02-06 13:48:31 +01:00
Gordon Bergling
f32dd4d58a cam(4): Fix a few typos in source code comments
- s/trafer/transfer/
- s/failes/fails/

Obtained from:	NetBSD
MFC after:	3 days
2022-02-06 13:45:47 +01:00
Gordon Bergling
bc9432d0e7 xen(4): Fix a common typo in a source code comments
- s/existance/existence/

MFC after:	3 days
2022-02-06 13:44:49 +01:00
Aleksandr Fedorov
ceaf442ff2 if_vxlan(4): Allow netmap_generic to intercept RX packets.
Netmap (generic) intercepts the if_input method to handle RX packets.

Call ifp->if_input() instead of netisr_dispatch().
Add stricter check for incoming packet length.

This change is very useful with bhyve + vale + if_vxlan.

Reviewed by:	vmaffione (mentor), kib, np, donner
Approved by:	vmaffione (mentor), kib, np, donner
MFC after:	2 weeks
Sponsored by:	vstack.com
Differential Revision:	https://reviews.freebsd.org/D30638
2022-02-06 15:27:46 +03:00
Hans Petter Selasky
42cf33dd1a Add new USB host controller PCI ID's.
Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-06 13:18:35 +01:00
Konstantin Belousov
0af463e661 ffs_read(): lock buffers after snaplk with LK_NOWITNESS
Reviewed and tested by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34179
2022-02-06 03:26:22 +02:00
Gleb Smirnoff
c999e3481d dmesg: detect wrapped msgbuf on the kernel side and if so, skip first line
Since 59f256ec35 dmesg(8) will always skip first line of the message
buffer, cause it might be incomplete.  The problem is that in most cases
it is complete, valid and contains the "---<<BOOT>>---" marker.  This
skip can be disabled with '-a', but that would also unhide all non-kernel
messages.  Move this functionality from dmesg(8) to kernel, since kernel
actually knows if wrap has happened or not.

The main motivation for the change is not actually the value of the
"---<<BOOT>>---" marker.  The problem breaks unit tests, that clear
message buffer, perform a test and then check the message buffer for
a result.  Example of such test is sys/kern/sonewconn_overflow.
2022-02-05 13:35:31 -08:00
Richard Scheffenegger
1790549d80 tcp: use TCPSTAT_INC in kernel ecn functions
Incorrectly used KMOD_ marco in static kernel ECN functions.

Both eventually resolve to counter_s64_add(), but better
use the correct macros.

Reviewed By: tuexen, #transport
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34181
2022-02-05 16:55:22 +01:00
Aleksandr Fedorov
fc035df8af if_vtnet(4): Restore the ability to set promisc mode.
PR:	254343, 255054
Reviewed by:	vmaffione (mentor), donner
Approved by:	vmaffione (mentor), donner
MFC after:	2 weeks
Sponsored by:	vstack.com
Differential Revision:	https://reviews.freebsd.org/D30639
2022-02-05 18:47:46 +03:00
Richard Scheffenegger
f7220c486c tcp: move ECN handling code to a common file
Reduce the burden to maintain correct and
extensible ECN related code across multiple
stacks and codepaths.

Formally no functional change.

Incidentially this establishes correct
ECN operation in one instance.

Reviewed By: rrs, #transport
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34162
2022-02-05 15:04:42 +01:00
Kristof Provost
b21826bf15 pf: deal with tables gaining or losing counters
When we create a table without counters, add an entry  and later
re-define the table to have counters we wound up trying to read
non-existent counters.

We now cope with this by attempting to add them if needed, removing them
when they're no longer needed and not trying to read from counters that
are not present.

MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D34131
2022-02-05 10:29:34 +01:00
Richard Scheffenegger
7994ef3c39 Revert "tcp: move ECN handling code to a common file"
This reverts commit 0c424c90ea.
2022-02-05 01:07:51 +01:00
Alan Somers
18ed2ce77a fusefs: fix the build without INVARIANTS after 00134a0789
MFC after:	2 weeks
MFC with:	00134a0789
Reported by:	se
2022-02-04 18:44:27 -07:00
Richard Scheffenegger
0c424c90ea tcp: move ECN handling code to a common file
Reduce the burden to maintain correct and
extensible ECN related code across multiple
stacks and codepaths.

Formally no functional change.

Incidentially this establishes correct
ECN operation in one instance.

Reviewed By: rrs, #transport
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34162
2022-02-04 22:54:41 +01:00
John Baldwin
6bea696af2 linux_copyout_strings: Use PROC_PS_STRINGS().
Reviewed by:	markj
Obtained from:	CheriBSD
Differential Revision:	https://reviews.freebsd.org/D34173
2022-02-04 15:57:57 -08:00
John Baldwin
74fea8eb4f cxgbei: Rework parsing of pre-offload PDUs.
sbcut() returns mbufs in reverse order so is not suitable for reading
data from the socket buffer.  Instead, check for already-received data
in the receive worker thread before passing offload PDUs up to the
iSCSI layer.  This uses soreceive() to read data from the socket and
is also to use M_WAITOK since it now runs from a worker thread instead
of an interrupt thread.

Also, fix decoding of the data segment length for pre-offload PDUs.

Reported by:	Jithesh Arakkan @ Chelsio
Fixes:		a8c4147edc cxgbei: Parse all PDUs received prior to enabling offload mode.
Sponsored by:	Chelsio Communications
2022-02-04 15:38:49 -08:00
Alan Somers
00134a0789 fusefs: require FUSE_NO_OPENDIR_SUPPORT for NFS exporting
FUSE file systems that do not set FUSE_NO_OPENDIR_SUPPORT do not
guarantee that d_off will be valid after closing and reopening a
directory.  That conflicts with NFS's statelessness, that results in
unresolvable bugs when NFS reads large directories, if:

* The file system _does_ change the d_off field for the last directory
  entry previously returned by VOP_READDIR, or
* The file system deletes the last directory entry previously seen by
  NFS.

Rather than doing a poor job of exporting such file systems, it's better
just to refuse.

Even though this is technically a breaking change, 13.0-RELEASE's
NFS-FUSE support was bad enough that an MFC should be allowed.

MFC after:	3 weeks.
Reviewed by:	rmacklem
Differential Revision: https://reviews.freebsd.org/D33726
2022-02-04 16:31:05 -07:00
Alan Somers
4a6526d84a fusefs: optimize NFS readdir for FUSE_NO_OPENDIR_SUPPORT
In its lowest common denominator, FUSE does not require that a directory
entry's d_off field is valid outside of the lifetime of the directory's
FUSE file handle.  But since NFS is stateless, it must reopen the
directory on every call to VOP_READDIR.  That means reading the
directory all the way from the first entry.  Not only does this create
an O(n^2) condition for large directories, but it can also result in
incorrect behavior if either:

* The file system _does_ change the d_off field for the last directory
  entry previously seen by NFS, or
* The file system deletes the last directory entry previously seen by
  NFS.

Handily, for file systems that set FUSE_NO_OPENDIR_SUPPORT d_off is
guaranteed to be valid for the lifetime of the directory entry, there is
no need to read the directory from the start.

MFC after:	3 weeks
Reviewed by:	rmacklem
2022-02-04 16:30:58 -07:00
Alan Somers
d088dc76e1 Fix NFS exports of FUSE file systems for big directories
The FUSE protocol does not require that a directory entry's d_off field
outlive the lifetime of its directory's file handle.  Since the NFS
server must reopen the directory on every VOP_READDIR call, that means
it can't pass uio->uio_offset down to the FUSE server.  Instead, it must
read the directory from 0 each time.  It may need to issue multiple
FUSE_READDIR operations until it finds the d_off field that it's looking
for.  That was the intention behind SVN r348209 and r297887, but a logic
bug prevented subsequent FUSE_READDIR operations from ever being issued,
rendering large directories incompletely browseable.

MFC after:	3 weeks
Reviewed by:	rmacklem
2022-02-04 16:30:49 -07:00
Emmanuel Vadot
867b4decb4 lindebugfs: Fix write
For write operation pseudofs creates an sbuf with the data.
Use this data instead of the uio as it's not usable anymore after
uiomove.

Reviewed by:	hselasky
MFC after:	1 week
Sponsored by:	Beckhoff Automation GmbH & Co. KG
Differential Revision:	https://reviews.freebsd.org/D34114
2022-02-04 14:31:08 +01:00
Konstantin Belousov
9596b349bb x86 atomic.h: remove obsoleted comment
Modules no longer call kernel functions for atomic ops, and since the
previous commit, we always use lock prefix.

Submitted by:	Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by:	jhb, markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34153
2022-02-04 14:01:39 +02:00
Konstantin Belousov
9c0b759bf9 x86 atomics: use lock prefix unconditionally
Atomics have significant other use besides providing in-system
primitives for safe memory updates.  They are used for implementing
communication with out of system software or hardware following some
protocols.

For instance, even UP kernel might require a protocol using atomics to
communicate with the software-emulated device on SMP hypervisor.  Or
real hardware might need atomic accesses as part of the proper
management protocol.

Another point is that UP configurations on x86 are extinct, so slight
performance hit by unconditionally use proper atomics is not important.
It is compensated by less code clutter, which in fact improves the
UP/i386 lifetime expectations.

Requested by:	Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by:	Elliott Mitchell, imp, jhb, markj, royger
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34153
2022-02-04 14:01:39 +02:00
Konstantin Belousov
cbf999e75d x86 atomic.h: cleanup comments for preprocessor directives
Reviewed by:	Elliott Mitchell, imp, jhb, markj, royger
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34153
2022-02-04 14:01:39 +02:00
Andrew Turner
664640ba6c Sort the names of the arm64 debug registers
While here clean up the names for the naming convention of the other
registers in this file.

Reviewed by:	kib, mhorne (earlier version)
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34060
2022-02-04 10:49:27 +00:00
Sylvian Meygret
cd7306bb1f ip_mroute: split mrouter interface deactivation and if_free
Move if_free outside MRW_LOCK. This will silence LOR message
which might appere during deinitialization.
2022-02-04 10:25:07 +01:00
Adrian Chadd
e388de98bd ar40xx_switch: add initial switch for the IPQ4018/IPQ4019.
Summary:

This switch is based off of the AR8327/AR8337 external switch/PHY.
However unlike the AR8327/AR8337 it itself doesn't have any PHYs;
instead an external PHY connects to it using the PSGMII port.

Differential Revision: https://reviews.freebsd.org/D34112
Reviewed by: manu

This code is inspired by the ar40xx code in openwrt, which itself
is based on the Qualcomm QCA-SSDK.  Both of these sources are, amusingly,
BSD licenced - and thus I have included some of the comments in the
hardware workaround paths to document some of the magic numbers.
2022-02-03 21:27:13 -08:00
Adrian Chadd
b509e53896 dts: add IPQ4018/IPQ4019 ethernet MAC and ethernet switch definitions
This adds the ethernet MAC and ethernet switch definitions.

I've rewritten the header file and the DTS based on documentation
and the required driver fields rather than the GPL'ed
ones from openwrt.

Differential Revision: https://reviews.freebsd.org/D34111
Reviewed by: manu
2022-02-03 21:26:45 -08:00
Adrian Chadd
29332c0dce qcom_mdio: add initial IPQ4018 MDIO support
This adds support for the IPQ4018/IPQ4019 MDIO bus.  This is used to
talk to external PHYs and switches.  (There's an internal switch
in the IPQ4018/IPQ4019 as well, but it's accessible via MMIO/AXI.)

Differential Revision: https://reviews.freebsd.org/D34110
Reviewed by: manu
2022-02-03 21:26:14 -08:00
Henri Hennebert
ad494d3b2d rtsx: Update driver version number to 2.1c
Differential Revision:	https://reviews.freebsd.org/D32154
2022-02-03 18:43:13 -05:00
Henri Hennebert
1e800a5934 rtsx: Do not display pci_read_config() errors during rtsx_init()
Differential Revision:	https://reviews.freebsd.org/D32154
2022-02-03 18:43:13 -05:00
Henri Hennebert
ec1f122b56 rtsx: Add CTLFLAG_STATS flag for read and write counters
Differential Revision:	https://reviews.freebsd.org/D32154
2022-02-03 18:43:12 -05:00
Henri Hennebert
7e5933b333 rtsx: Prefer __FreeBSD_version over __FreeBSD__
No functional change.

Differential Revision:	https://reviews.freebsd.org/D32154
2022-02-03 18:43:12 -05:00
Henri Hennebert
8e9740b62e rtsx: Convert driver to use the mmc_sim interface
A lot more generic cam related things were done in mmc_sim so this
simplifies the driver a lot.

Differential Revision:	https://reviews.freebsd.org/D32154
Reviewed by:		imp
2022-02-03 18:43:12 -05:00
Justin Hibbits
aa4736459e powerpc/atomic: Fix atomic_testand_*_long on powerpc64
After b5d227b0 FreeBSD was panicking on boot with "Duplicate free" in
UMA.  Analyzing the asm, the '1' mask was treated as an integer, rather
than a long, causing 'slw' (shift left word) to be used for the shifting
instruction, not 'sld' (shift left double).  This means the upper bits
of the bitfield were not getting used, resulting in corruption of the
bitfield.

While fixing this, the 'and' check of the mask does not need to be
recorded, so don't record (drop the '.').
2022-02-03 17:25:39 -06:00
Alexander Motin
3b248a2113 APEI: Make sure event data fit into the buffer.
There seem to be systems returning some garbage here.  I still don't
know why, but at least I hope this check fix indefinite printf loop.

MFC after:	2 weeks
2022-02-03 15:33:01 -05:00
Richard Scheffenegger
fd723975ec tcp: fix typo in commit f026275e26
missed one bitmask inversion while committing D34148

Differential Revision: https://reviews.freebsd.org/D34148
Differential Revision: https://reviews.freebsd.org/D34160
2022-02-03 21:05:09 +01:00
Richard Scheffenegger
3b0ee68050 tcp: Prevent setting of ECN bits with setsockopt()
setsockopt() grants full access to the deprecated
TOS byte. For TCP, mask out the ECN codepoint, so that
only the DSCP portion can be adjusted.

Reviewed By: tuexen, hselasky, #manpages, #transport, debdrup
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34154
2022-02-03 20:06:42 +01:00
Jesper Schmitz Mouridsen
ea07ba1170 sys/arm64/iommu/iommu_pmap.c readd sys/systm.h
after d950c5898a UINT64_C and bzero were no longer defined

Approved by:	kib
Differential Revision:	https://reviews.freebsd.org/D34155
2022-02-03 20:03:29 +01:00
John Baldwin
87c5d39f77 iwlwifi: Disable -Wformat when building with GCC.
GCC's -Wformat complains about NULL format strings passed to
iwl_fw_dbg_collect_trig (though the function handles NULL format
strings).  Curious that upstream iwlwifi in Linux is built with GCC
and explicitly opts into this warning via the __printf() attribute.

Reviewed by:	bz
Differential Revision:	https://reviews.freebsd.org/D34146
2022-02-03 10:48:18 -08:00
Hans Petter Selasky
c830e92924 mlx5ib: Fix whitespace.
Found by:	kib@
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-03 17:45:19 +01:00
Cy Schubert
5d4a348d0b ipfilter: Fix indentation error
Fixes:		064a5a9564
MFC after:	3 days
2022-02-03 08:37:11 -08:00
Hans Petter Selasky
12af59c2cf mlx5ib: Add missing auto generated header file to Makefile.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-03 17:35:12 +01:00
Alexander Motin
1a8d8a3a90 CTL: Fix mode page trucation on HA synchronization.
Due to variable size of struct ctl_ha_msg_mode ctl_isc_announce_mode()
sent only first 4 bytes of modified mode page to the other HA side,
that caused its corruption there, noticeable only after failover.

I've found alike bug also in ctl_isc_announce_lun(), but there it was
sending slightly more than needed, that is a smaller problem.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2022-02-03 11:10:12 -05:00
Kyle Evans
642701abc8 kern: harvest entropy from callouts
74cf7cae4d ("softclock: Use dedicated ithreads for running callouts.")
switched callouts away from the swi infrastructure.  It turns out that
this was a major source of entropy in early boot, which we've now lost.

As a result, first boot on hardware without a 'fast' entropy source
would block waiting for fortuna to be seeded with little hope of
progressing without manual intervention.

Let's resolve it by explicitly harvesting entropy in callout_process()
if we've handled any callouts.  cc/curthread/now seem to be reasonable
sources of entropy, so use those.

Discussed with:	jhb (also proposed initial patch)
Reported by:	many
Reviewed by:	cem, markm (both csprng)
Differential Revision:	https://reviews.freebsd.org/D34150
2022-02-03 10:05:06 -06:00
Richard Scheffenegger
f026275e26 tcp: set IP ECN header codepoint properly
TCP RACK can cache the IP header while preparing
a new TCP packet for transmission. Thus all the
IP ECN codepoint bits need to be assigned, without
assuming a clear field beforehand.

Reviewed By: tuexen, kbowling, #transport
MFC after:   3 days
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34148
2022-02-03 16:53:41 +01:00
Richard Scheffenegger
1ebf460758 tcp: Access all 12 TCP header flags via inline function
In order to consistently provide access to all
(including reserved) TCP header flag bits,
use an accessor function tcp_get_flags and
tcp_set_flags. Also expand any flag variable from
uint8_t / char to uint16_t.

Reviewed By: hselasky, tuexen, glebius, #transport
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34130
2022-02-03 16:21:58 +01:00
Mark Johnston
b84ed4e7f6 filemon: Reject FILEMON_SET_FD commands when the fd is a kqueue
When FILEMON_SET_FD is used, the filemon handle effectively wraps the
passed file.  In particular, the handle may be inherited by a child
process, or transferred over a unix domain socket, so we must verify
that the backing file permits this.

Reported by:	syzbot+36e6be9e02735fe66ca8@syzkaller.appspotmail.com
Reviewed by:	emaste
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34128
2022-02-03 09:41:53 -05:00
Michael Tuexen
d51c80351f rack: fix compilation and small cleanup
Fix a function prototype missed in the last commit and whitespace
change.
Sponsored by:	Netflix, Inc.
2022-02-02 09:41:40 +01:00
Michael Tuexen
3b3c08c135 tcp: cleanup functions related to socket option handling
Consistently only pass the inp and the sopt around. Don't pass the
so around, since in a upcoming commit tcp_ctloutput_set() will be
called from a context different from setsockopt(). Also expect
the inp to be locked when calling tcp_ctloutput_[gs]et(), this is
also required for the upcoming use by tcpsso, a command line tool
to set socket options.
Reviewed by:		glebius, rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D34151
2022-02-02 09:27:59 +01:00
Warner Losh
e30fceb89b mps: Use 64-bit chain structures
According to Broadcom, mixing 64-bit SGEs with 32-bit chain entries can
lead to IOC Fault code 0x40000d04. This fault code has been observed to
suddenly increase on certain machines when the OCA firmware images are
deployed. The hardware interprets all elements of a 64-bit SGE, even
ones marked as 32-bit. Depending on the other bits, this will just work,
but sometimes generate the above fault. Broadcom recommends this
practice, and the Linux and NetBSD drivers follow it.

Rework the chaining code to use MPI2_SGE_CHAIN64 instead of
MPI2_SGE_CHAIN32. Adjust MPS_SGC_SIZE from 8 to 12 to match the size of
the new structure. Flag the structure as being 64-bits now. Since
MPS_SGE64_SIZE and MPS_SGC_SIZE are the same now, mps_push_sge could be
simplified (after the same fashion of mpr). The different number of
cases collapse to whether or not there's room for the segments and if
not we need a chain, however these changes haven't been made yet as the
current code handles those cases properly with the new defines.

Made chain_busaddr 64-bits, even though we ask for all allocations to be
below 4GB for this tag. Use it to set both parts of the CHAIN64 address
rather than baking the 4GB assumption. Add asserts around the allocation
to detect and BUSDMA bugs in allocation.

Remove asserts and associated comment in mpi_pre_fw_download and
mpi_pre_fw_upload. The code does not, it seems, depend on this
invariant. The mpr driver has similar code, no asserts and also doesn't
depend on this.

Adjust comments to reflect the updated size.

Sponsored by:		Netflix
Reviewed by:		scottl, mav
Differential Revision:	https://reviews.freebsd.org/D34016
2022-02-02 22:35:33 -07:00
Jason A. Harmening
83d61d5b73 unionfs: do not force LK_NOWAIT if VI_OWEINACT is set
I see no apparent need to avoid waiting on the lock just because
vinactive() may be called on another thread while the thread that
cleared the vnode refcount has the lock dropped.  In fact, this
can at least lead to a panic of the form "vn_lock: error <errno>
incompatible with flags" if LK_RETRY was passed to VOP_LOCK().
In this case LK_NOWAIT may cause the underlying FS to return an
error which is incompatible with LK_RETRY.

Reported by:	pho
Reviewed by:	kib, markj, pho
Differential Revision:	https://reviews.freebsd.org/D34109
2022-02-02 21:08:17 -06:00
Jason A. Harmening
6ff167aa42 unionfs: allow lock recursion when reclaiming the root vnode
The unionfs root vnode will always share a lock with its lower vnode.
If unionfs was mounted with the 'below' option, this will also be the
vnode covered by the unionfs mount.  During unmount, the covered vnode
will be locked by dounmount() while the unionfs root vnode will be
locked by vgone().  This effectively requires recursion on the same
underlying like, albeit through two different vnodes.

Reported by:	pho
Reviewed by:	kib, markj, pho
Differential Revision:	https://reviews.freebsd.org/D34109
2022-02-02 21:08:17 -06:00
Jason A. Harmening
0cd8f3e958 unionfs: fix assertion order in unionfs_lock()
VOP_LOCK() may be handed a vnode that is concurrently reclaimed.
unionfs_lock() accounts for this by checking for empty vnode private
data under the interlock.  But it incorrectly asserts that the vnode
is using the unionfs dispatch table before making this check.
Reverse the order, and also update KASSERT_UNIONFS_VNODE() to provide
more useful information.

Reported by:	pho
Reviewed by:	kib, markj, pho
Differential Revision:	https://reviews.freebsd.org/D34109
2022-02-02 21:08:17 -06:00
Rick Macklem
e2fe58d61b nfsd: Allow file owners to perform Open(Delegate_cur)
Commit b0b7d978b6 changed the NFSv4 server's default
behaviour to check the file's mode or ACL for permission to
open the file, to be Linux and Solaris compatible.
However, it turns out that Linux makes an exception for
the case of Claim_delegate_cur(_fh).

When a NFSv4 client is returning a delegation, it must
acquire Opens against the server to replace the ones
done locally in the client.  The client does this via
an Open operation with Claim_delegate_cur(_fh).  If
this operation fails, due to a change to the file's
mode or ACL after the delegation was issued, the
client does not have any way to retain the open.

As such, the Linux client allows the file's owner
to perform an Open with Claim_delegate_cur(_fh)
no matter what the mode or ACL allows.

This patch makes the FreeBSD server allow this case,
to be Linux compatible.

This patch only affects the case where delegations
are enabled, which is not the default.

MFC after:	2 weeks
2022-02-02 14:10:16 -08:00
John Baldwin
63b7c2df8e Disable -Wunused-function for {ed,x}25519_ref10.c in libsodium. 2022-02-02 12:25:16 -08:00
Konstantin Belousov
21a37c3cc6 Exclude DEBUG_VFS_LOCKS from non-debug kernel configs
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2022-02-02 19:27:32 +02:00
Hans Petter Selasky
a88e1a04df usb(4): Ignore port resume failures.
If port resume fails, likely the USB device is detached. Ignore such errors,
because else the USB stack might try forever trying to resume the device,
before it will proceed detaching it.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-02 13:00:48 +01:00
Konstantin Belousov
e7c5442162 amd64: micro-optimize vptopte()/vtopde() further
Eliminate shlq $3,address shift after masking of the va is done, which
is needed to convert pt_entry_t[] array index into byte offset.
Do it by preshifting the mask, and compensating the right shift of va.

Suggested by:	alc
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33786
2022-02-02 11:40:04 +02:00
Konstantin Belousov
0b8643eaf6 vmmeter(): Fix detection of the named swap objects
Noted and reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33549
2022-02-02 11:39:58 +02:00
Konstantin Belousov
4cf9f5d807 vm_object: restore handling of shadow_count for all type of objects
instead of only OBJ_ANON objects that are backing, as it is now.
This is required for e.g. vm_meter is_object_active() detection, and
should be useful in some more cases.

Use refcount KPI for all objects, regardless of owning the object lock,
and the fact that currently OBJ_ANON cannot change for the live object.

Noted and reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33549
2022-02-02 11:39:51 +02:00
Wojciech Macek
77223d98b6 ip_mroute: refactor epoch-basd locking
Remove duplicated epoch_enter and epoch_exit in IP inp/outp routines.
Remove unnecessary macros as well.

Obtained from:		Semihalf
Spponsored by:		Stormshield
Reviewed by:		glebius
Differential revision:	https://reviews.freebsd.org/D34030
2022-02-02 06:48:05 +01:00
Cy Schubert
445ecc480c ipfilter: Correct a typo in a comment
MFC after:	3 days
2022-02-01 19:55:56 -08:00
Mitchell Horne
4e1bc961bb arm64, riscv: handle RB_KDB
This allows entering the debugger at the earliest possible time, if
the '-d' argument is passed to the kernel.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D34120
2022-02-01 13:59:54 -04:00
Mitchell Horne
e6ee2b6506 riscv: add ALT_BREAK_TO_DEBUGGER to GENERIC
It allows quickly entering ddb(4) over a serial line.

Reviewed by:	jhb
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D34119
2022-02-01 13:59:54 -04:00
Richard Scheffenegger
93e28d6e89 tcp: LRO code to deal with all 12 TCP header flags
TCP per RFC793 has 4 reserved flag bits for future use. One
of those bits may be used for Accurate ECN.
This patch is to include these bits in the LRO code to ease
the extensibility if/when these bits are used.

Reviewed By: hselasky, rrs, #transport
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34127
2022-02-01 18:41:36 +01:00
John Baldwin
8a67a1a964 <sys/bitstring.h>: Cast _BITSTR_BITS to int in a ternary operator.
This fixes a -Wsign-compare error reported by GCC due to the two
results of the ternary operator having differing signedness.

Reviewed by:	dougm, rlibby
Differential Revision:	https://reviews.freebsd.org/D34122
2022-02-01 09:45:11 -08:00
Kristof Provost
4daa31c108 pflog: align header to 4 bytes, not 8
6d4baa0d01 incorrectly rounded the lenght of the pflog header up to 8
bytes, rather than 4.

PR:		261566
Reported by:	Guy Harris <gharris@sonic.net>
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-02-01 18:17:44 +01:00
Hans Petter Selasky
84d7b8e75f mlx5en: Implement TLS RX support.
TLS RX support is modeled after TLS TX support. The basic structures and layouts
are almost identical, except that the send tag created filters RX traffic and
not TX traffic.

The TLS RX tag keeps track of past TLS records up to a certain limit,
approximately 1 Gbyte of TCP data. TLS records of same length are joined
into a single database record.

Regularly the HW is queried for TLS RX progress information. The TCP sequence
number gotten from the HW is then matches against the database of TLS TCP
sequence number records and lengths. If a match is found a static params WQE
is queued on the IQ and the hardware should immediately resume decrypting TLS
data until the next non-sequential TCP packet arrives.

Offloading TLS RX data is supported for untagged, prio-tagged, and
regular VLAN traffic.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:17 +01:00
Hans Petter Selasky
e6d7ac1d03 mlx5core: Set driver version into firmware.
If the driver_version capability bit is enabled, send the driver
version to firmware after the init HCA command, for display purposes.

Example of driver version: "FreeBSD,mlx5_core,14.0.0,3.x-xxx"

Linux commits:
012e50e109fd27ff989492ad74c50ca7ab21e6a1

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:17 +01:00
Hans Petter Selasky
8e332232a5 mlx5en: Implement one RQT object per channel.
These objects will eventually be used to switch TLS RX traffic.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:17 +01:00
Hans Petter Selasky
ea00d7e8ca mlx5: Add raw ethernet local loopback support.
Currently, unicast/multicast loopback raw ethernet (non-RDMA) packets
are sent back to the vport.  A unicast loopback packet is the packet
with destination MAC address the same as the source MAC address.  For
multicast, the destination MAC address is in the vport's multicast
filter list.

Moreover, the local loopback is not needed if there is one or none
user space context.

After this patch, the raw ethernet unicast and multicast local
loopback are disabled by default. When there is more than one user
space context, the local loopback is enabled.

Note that when local loopback is disabled, raw ethernet packets are
not looped back to the vport and are forwarded to the next routing
level (eswitch, or multihost switch, or out to the wire depending on
the configuration).

Linux commits:
c85023e153e3824661d07307138fdeff41f6d86a
8978cc921fc7fad3f4d6f91f1da01352aeeeff25

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:16 +01:00
Hans Petter Selasky
c1b76119cb mlx5: Implement mlx5_nic_vport_update_local_lb()
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:16 +01:00
Hans Petter Selasky
5381f93647 mlx5en: Create TIRs before flowtables.
Because flowtables may redirect traffic to TIRs.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:16 +01:00
Hans Petter Selasky
001106f807 mlx5en: Create flowtables in correct order.
Because it affects how the flow tables may re-direct traffic.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:16 +01:00
Hans Petter Selasky
2c0ade806a mlx5: Implement flow steering helper functions for TCP sockets.
This change adds convenience functions to setup a flow steering rule based on
a TCP socket. The helper function gets all the address information from the
socket and returns a steering rule, to be used with HW TLS RX offload.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:16 +01:00
Hans Petter Selasky
0ee1b09eaa mlx5: Implement offloads flowtable namespace.
This namespace will be used for TCP offloads, like hardware decryption
of TLS TCP data.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:16 +01:00
Hans Petter Selasky
e059c120b4 mlx5en: Create and destroy all flow tables and rules when the network interface attaches and detaches.
Previously flow steering tables and rules were only created and destroyed
at link up and down events, respectivly. Due to new requirements for adding
TLS RX flow tables and rules, the main flow steering table must always be
available as there are permanent redirections from the TLS RX flow table
to the vlan flow table.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:16 +01:00
Hans Petter Selasky
a8e715d21b mlx5en: Add race protection for SQ remap
Add a refcount for posted WQEs to avoid a race between
post WQE and FW command flows.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:16 +01:00
Hans Petter Selasky
aabca1034c mlx5en: Properly account for no-checksum on tunneled packets.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:15 +01:00
Hans Petter Selasky
06c2bd1872 mlx5en: Force all packets through the indirection table.
All packets must go through the indirection table, RQT,
because it is not possible to modify the RQN of the TIR
for direct dispatchment after it is created, typically
when the link goes up and down.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:15 +01:00
Hans Petter Selasky
266c81aae3 mlx5/mlx5en: Add SQ remap support
Add support to map an SQ to a specific schedule queue using a
special WQE as performance enhancement.

SQ remap operation is handled by a privileged internal queue, IQ,
and the mapping is enabled from one rate to another.

The transition from paced to non-paced should however always go
through FW.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:15 +01:00
Hans Petter Selasky
1c407d0494 mlx5: Properly define the reg_umr_sq networking offload capability bit.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:15 +01:00
Hans Petter Selasky
9680b1ba71 mlx5en: Only delete installed VxLAN rules.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:15 +01:00
Hans Petter Selasky
6176a5e338 mlx5en: Fix inverted logical assignment.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:15 +01:00
Hans Petter Selasky
694263572f mlx5en: Implement support for internal queues, IQ.
Internal send queues are regular sendqueues which are reserved for WQE commands
towards the hardware and firmware. These queues typically carry resync
information for ongoing TLS RX connections and when changing schedule queues
for rate limited connections.

The internal queue, IQ, code is more or less a stripped down copy
of the existing SQ managing code with exception of:

1) An optional single segment memory buffer which can be read or
   written as a whole by the hardware, may be provided.
2) An optional completion callback for all transmit operations, may
   be provided.
3) Does not support mbufs.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:15 +01:00
Hans Petter Selasky
21228c67ab mlx5en: Implement helper functions to open and close TLS TIR context.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:15 +01:00
Hans Petter Selasky
75767cb889 mlx5en: Share DEK objects with TLS RX.
The TLS RX support also needs to be able to allocate DEK objects.
Share the available objects 1:1.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:14 +01:00
Hans Petter Selasky
fad4b7d1f2 mlx5en: Add missing TLS structure prototype.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:14 +01:00
Hans Petter Selasky
3a1bf85503 mlx5en: Remove unused hardware TLS field.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:14 +01:00
Hans Petter Selasky
33a6a7a72a mlx5en: Make the receive packet indirection table, RQT, static instead of dynamic.
Allocate the RQT once, pointing all initial entries to the drop RQN.
When opening the channels simplify modify the RQT, directing all traffic
to the new RQNs. Similarly when closing the channels point all RQT entries
back to the so-called drop RQN.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:14 +01:00
Hans Petter Selasky
7800af352a mlx5en: Set CQN in RQ parameters for drop RQ.
Else creating the drop RQ fails.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:14 +01:00
Hans Petter Selasky
03567b0dfa mlx5en: Set channel pointer for drop receive queue.
A valid channel pointer is needed to get the priv pointer during init.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:14 +01:00
Hans Petter Selasky
4e40e984da mlx5en: Print error code when opening drop RQ fails.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:14 +01:00
Hans Petter Selasky
27b778ae55 mlx5en: Implement dummy receive queue, RQ, for dropping packets.
What is a drop RQ and why is it needed?

The RSS indirection table, also called the RQT, selects the
destination RQ based on the receive queue number, RQN. The RQT is
frequently referred to by flow steering rules to distribute traffic
among multiple RQs. The problem is that the RQs cannot be destroyed
before the RQT referring them is destroyed too. Further, TLS RX
rules may still be referring to the RQT even if the link went
down. Because there is no magic RQN for dropping packets, we create
a dummy RQ, also called drop RQ, which sole purpose is to drop all
received packets. When the link goes down this RQN is filled in all
RQT entries, of the main RQT, so the real RQs which are about to be
destroyed can be released and the TLS RX rules can be sustained.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:14 +01:00
Hans Petter Selasky
a60f953424 mlx5en: Make the hw_lro parameter read only tunable.
This prevents the so-called TIR context from changing during runtime.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:14 +01:00
Hans Petter Selasky
788e9e7478 mlx5: Remove support for FreeBSD 10 and older.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:13 +01:00
Hans Petter Selasky
2d5e5a0d75 mlx5en: Patch to inhibit transmit doorbell writes during packet reception.
During packet reception the network stack frequently transmit data in
response to TCP window updates. To reduce the number of transmit doorbells
needed, inhibit all transmit doorbells designated for the same channel until
after the reception of packets for the given channel is completed.

While at it slightly refactor the mlx5e_tx_notify_hw() function:

1) The doorbell information is always stored into sq->doorbell.d64 .
No need to pass a separate pointer to this variable.

2) Move checks for skipping doorbell writes inside this function.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 16:21:13 +01:00
Konstantin Belousov
0f7b6e11c0 mlx5en: Use a UMA cache zone for managing TLS send tags
Instead of allocating directly from a normal zone. This way
import and release are guaranteed to process all allocated and then
deallocated items. Also, the release occurs in a sleepable context when
caller of uma_zfree() or uma_zdestroy() can sleep itself.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 14:45:58 +02:00
Konstantin Belousov
028130b8e4 mlx5ib: idiomatic use of preprocessor, in particular paths
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 14:45:58 +02:00
Konstantin Belousov
7060097908 mlx5ib: normalize use of the opt_*.h files
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 14:45:57 +02:00
Konstantin Belousov
89918a2375 mlx5en: idiomatic use of preprocessor, in particular paths
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 14:45:57 +02:00
Konstantin Belousov
b984b95693 mlx5en: normalize use of the opt_*.h files
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 14:45:57 +02:00
Hans Petter Selasky
12c56d7dc4 mlx5: idiomatic use of preprocessor, in particular paths
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 14:45:57 +02:00
Konstantin Belousov
ee9d634bd3 mlx5: normalize use of the opt_*.h files
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-01 14:45:57 +02:00
Konstantin Belousov
303d3ae7e8 ufs, msdosfs: do not record witness order when creating vnode
When allocating new vnode, we need to lock it exclusively before
making it externally visible.  Since other threads cannot observe the
vnode yet, current lock order cannot create LoR conditions.

Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34126
2022-02-01 10:51:55 +02:00
Konstantin Belousov
d51b0786a2 msdosfs_denode.c: some style
Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D34126
2022-02-01 10:51:48 +02:00
Konstantin Belousov
99aa3b731c ffs: lock buffers after snaplk with LK_NOWITNESS
Reviewed by:	mckusick
Discussed with:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34073
2022-02-01 06:54:50 +02:00
Konstantin Belousov
c02780b78c Add GB_NOWITNESS flag
It prevents WITNESS from recording the lock order for the buffer lock
acquired by getblkx().

Reviewed by:	mckusick
Discussed with:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34073
2022-02-01 06:54:50 +02:00
Konstantin Belousov
e11b2b69c5 ffs_alloc.c: order includes alphabetically
Reviewed by:	mckusick
Discussed with:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34073
2022-02-01 06:54:50 +02:00
Konstantin Belousov
d950c5898a vm/vm_extern.h, vm/vm_page.h: use sys/kassert.h
instead of fatty sys/systm.h.

Suggested by:	jhb
Reviewed by:	alc, imp, jhb (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34089
2022-02-01 05:55:35 +02:00
Konstantin Belousov
f4cdb9d7c3 vm/vm_pager.h: use sys/systm.h header
it is needed for __read_mostly attribute definition, which right now
comes from vm/vm_page.h including sys/systm.h

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34089
2022-02-01 05:55:35 +02:00
Konstantin Belousov
54d34bfbdf Introduce sys/kassert.h
It contains assert-related definitions previously provided by
sys/systm.h.  The new header is leaner than whole systm.h.
Include kassert.h from systm.h for compatibility.

The copyright assignment to Eivind Eklund was suggested by Kirk McKusick
and is based in the commit 5526d2d920.

Suggested by:	jhb
Reviewed by:	alc, imp, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34089
2022-02-01 05:14:14 +02:00
John Baldwin
53e938e408 hyperv storvsc: Don't abuse struct sglist to hold virtual addresses.
struct sglist is intended for holding S/G lists of physical address
ranges, not virtual address ranges.  GCC 9.x issues several warnings
due to casts between pointers and integers of different sizes as a
result (vm_paddr_t is 64-bits on i386).  Instead, add a local 'struct
hv_sglist' which uses an array of 'struct iovec' to hold the S/G list
of virtual address ranges.

Differential Revision:	https://reviews.freebsd.org/D31933
2022-01-31 17:11:27 -08:00
John Baldwin
d782385e9b tcp_ratelimit: Handle some edge cases with TLS + RL send tags.
- After a connection has fallen back from NIC TLS to SW TLS, any
  pacing rate changes should modify the inpcb send tag even though
  SB_TLS_IFNET is set.

- If a connection tries to modify the pacing rate before the send
  tag has been converted from plain TLS to TLS + RL, don't fail
  the rate request set but let it fall through to setting the rate
  on the non-TLS inpcb RL tag.

Reviewed by:	gallatin, rrs, hselasky
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D34085
2022-01-31 16:40:04 -08:00
John Baldwin
d958bc7963 ktls: Try to enable TOE TLS after marking existing data not ready.
At the moment this is mostly a no-op but in the future there will be
in-flight encrypted data which requires software decryption.  This
same setup is also needed for NIC TLS RX.

Note that this does break TOE TLS RX for AES-CBC ciphers since there
is no software fallback for AES-CBC receive.  This will be resolved
one way or another before 14.0 is released.

Reviewed by:	hselasky
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D34082
2022-01-31 16:39:21 -08:00
Mark Johnston
773e3a71b2 pf: Initialize pf_kpool mutexes earlier
There are some error paths in ioctl handlers that will call
pf_krule_free() before the rule's rpool.mtx field is initialized,
causing a panic with INVARIANTS enabled.

Fix the problem by introducing pf_krule_alloc() and initializing the
mutex there.  This does mean that the rule->krule and pool->kpool
conversion functions need to stop zeroing the input structure, but I
don't see a nicer way to handle this except perhaps by guarding the
mtx_destroy() with a mtx_initialized() check.

Constify some related functions while here and add a regression test
based on a syzkaller reproducer.

Reported by:	syzbot+77cd12872691d219c158@syzkaller.appspotmail.com
Reviewed by:	kp
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34115
2022-01-31 16:14:00 -05:00
Konstantin Belousov
66c5fbca77 insmntque1(): remove useless arguments
Also remove once-used functions to clean up after failed insmntque1(),
which were destructor callbacks in previous life.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D34071
2022-01-31 16:49:08 +02:00
Kornel Duleba
1a6d987b7f enetc: Wait for pending transmissions before disabling TX queues
According to the RM it's not safe to disable a TX ring while it is busy
transmitting frames.
In order to be safe wait until the ring is empty. (cidx==pidx)
Use this opportunity to remove a set-but-unused variable.

Obtained from: Semihalf
Sponsored by: Alstom Group
2022-01-31 08:57:48 +01:00
Kornel Duleba
a6bda3e1ef enetc: Simply TX ring credits counting logic
According to the RM rings can hold at most ring_size - 1 descriptors at any time.
No additional logic is needed since iflib already respects this constrain.
Thanks to that the pidx == cidx situation is not ambiguous and indicates an
empty ring.
Use that to simplify the logic that calculates the amount of processed frames.

Obtained from: Semihalf
Sponsored by: Alstom Group
2022-01-31 08:57:48 +01:00
Kornel Duleba
f485d733e8 enetc: Disable HW IP packet alignment
The NIC can IP align received packets.
It was observed that it caused some rare stalls, that required full board reset.
Disable this feature for now. It doesn't provide any significant performance
improvement anyway.

Obtained from: Semihalf
Sponsored by: Alstom Group
2022-01-31 08:57:48 +01:00
Konstantin Belousov
8d8589b385 ufs: be more persistent with finishing some operations
when the vnode is doomed after relock.  The mere fact that the vnode is
doomed does not prevent us from doing UFS operations on it while it is
still belongs to UFS, which is determined by non-NULL v_data.  Not
finishing some operations, e.g. not syncing the inode block only because
the vnode started reclamation, is not correct.

Add macro IS_UFS() which incapsulates the v_data != NULL, and use it
instead of VN_IS_DOOMED() for places where the operation completion is
important.

Reviewed by:	markj, mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34072
2022-01-31 04:46:21 +02:00
Konstantin Belousov
4559700a0a ffs_snapblkfree(): add a comment explaining lockmgr invocation
Reviewed by:	markj, mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34072
2022-01-31 04:46:21 +02:00
Konstantin Belousov
0cdc603308 ufs: Use IS_SNAPSHOT()
Reviewed by:	markj, mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34072
2022-01-31 04:46:21 +02:00
Konstantin Belousov
3d68c4e175 syncer VOP_FSYNC(): unlock syncer vnode around call to VFS_SYNC()
The lock is unneccessary since the mount point is busied, which prevents
unmount and syncer vnode deallocation.  Having the vnode locked causes
innocent LoRs and complicates debugging.

Also stop starting write accounting around it.  Any caller of
VOP_FSYNC() must do it already, and sync_vnode() does.

Reported and tested by:	pho
Reviewed by:	markj, mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34072
2022-01-31 04:46:21 +02:00