Commit Graph

137601 Commits

Author SHA1 Message Date
Bjoern A. Zeeb
fc1d840901 LinuxKPI: add more #defines to pci.h
Add more definitions for various PCI uses to linux/pci.h.  Almost all
are defined to their FreeBSD counterparts which are described there.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D30434
2021-05-25 18:01:46 +00:00
Bjoern A. Zeeb
10096cb606 LinuxKPI: add prandom_u32() as used by wireless drivers.
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D30435
2021-05-25 18:01:46 +00:00
Bjoern A. Zeeb
fa58da02f7 LinuxKPI: add rcu_dereference_check()
Add a define for rcu_dereference_check() to rcu_dereference_protected()
which ignores the check argument.  Our lockdep compat implementation
for use cases found in iwlwifi would return 1 anyway.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D30436
2021-05-25 18:01:46 +00:00
Bjoern A. Zeeb
abcac97f82 LinuxKPI: add kfree_sensitive() using zfree().
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D30437
2021-05-25 18:01:46 +00:00
Bjoern A. Zeeb
43b4c00643 LinuxKPI: extract stringify() in their own header file
Add linux/stringify.h as directly included by drivers.  Remove the
definitions from compiler.h and include the new header in places
where the stringify macros are already used without linuxkpi.

I have adjusted the Copyright of the new file according to the commit
originaly adding the macros (99e690772a).

Sposnored by:	The FreeBSD Foundation
MFC after:	2 weeks
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D30440
2021-05-25 18:01:46 +00:00
Bjoern A. Zeeb
5878c7c7b0 LinuxKPI: add kernel_ulong_t typedef in linux/kernel.h.
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D30438
2021-05-25 18:01:46 +00:00
Bjoern A. Zeeb
cae1683120 LinuxKPI: add guid_t for ACPI consumers.
Add a placeholder struct for guid_t which is needed by ACPI consumers
in at least one wireless driver.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D30439
2021-05-25 18:01:46 +00:00
Andrew Gallatin
086a35562f tcp: enter network epoch when calling tfb_tcp_fb_fini
We need to enter the network epoch when calling into
tfb_tcp_fb_fini.  I noticed this when I hit an assert
running the latest rack

Differential Revision: https://reviews.freebsd.org/D30407
Reviewed by: rrs, tuexen
Sponsored by: Netflix
2021-05-25 13:45:37 -04:00
Randall Stewart
13c0e198ca tcp: Fix bugs related to the PUSH bit and rack and an ack war
Michaels testing with UDP tunneling found an issue with the push bit, which was only partly fixed
in the last commit. The problem is the left edge gets transmitted before the adjustments are done
to the send_map, this means that right edge bits must be considered to be added only if
the entire RSM is being retransmitted.

Now syzkaller also continued to find a crash, which Michael sent me the reproducer for. Turns
out that the reproducer on default (freebsd) stack made the stack get into an ack-war with itself.
After fixing the reference issues in rack the same ack-war was found in rack (and bbr). Basically
what happens is we go into the reassembly code and lose the FIN bit. The trick here is we
should not be going into the reassembly code if tlen == 0 i.e. the peer never sent you anything.
That then gets the proper action on the FIN bit but then you end up in LAST_ACK with no
timers running. This is because the usrclosed function gets called and the FIN's and such have
already been exchanged. So when we should be entering FIN_WAIT2 (or even FIN_WAIT1) we get
stuck in LAST_ACK. Fixing this means tweaking the usrclosed function so that we properly
recognize the condition and drop into FIN_WAIT2 where a timer will allow at least TP_MAXIDLE
before closing (to allow time for the peer to retransmit its FIN if the ack is lost). Setting the fast_finwait2
timer can speed this up in testing.

Reviewed by: mtuexen,rscheff
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30451
2021-05-25 13:23:31 -04:00
Warner Losh
00e7a55367 cam_sim: style: sort includes
Sort and remove sys/systm.h, it's not needed.

Sponsored by:		Netflix
2021-05-25 09:56:56 -06:00
Edward Tomasz Napierala
3b9971c8da Clean up some of the core dumping code.
No functional changes.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30397
2021-05-25 16:30:32 +01:00
Mitchell Horne
6f4bb8ecc2 arm64, riscv: remove reference to fsu_intr_fault
This variable no longer exists.

MFC after:	3 days
2021-05-25 12:26:52 -03:00
Konstantin Belousov
fd3ac06f45 ptrace: add an option to not kill debuggees on debugger exit
Requested by:	markj
Reviewed by:	jhb (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differrential revision:	https://reviews.freebsd.org/D30351
2021-05-25 18:22:34 +03:00
Konstantin Belousov
d7a7ea5be6 sys_process.c: extract ptrace_unsuspend()
Reviewed by:	jhb
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differrential revision:	https://reviews.freebsd.org/D30351
2021-05-25 18:22:27 +03:00
Konstantin Belousov
91aae953cb amd64: clear PSL.AC in the right frame
If copyin family of routines fault, kernel does clear PSL.AC on the
fault entry, but the AC flag of the faulted frame is kept intact.  Since
onfault handler is effectively jump, AC survives until syscall exit.

Reported by:	m00nbsd, via Sony
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
admbugs:	975
2021-05-25 18:20:46 +03:00
Warner Losh
1f348be6f2 cam: remove xpt_polled_action
Since periph_runccb now handles all the polling stuff, and
xpt_polled_action is now unused and can be removed.

Sponsored by:		Netflix
Reviewed by:		mav@
Differential Revision:	https://reviews.freebsd.org/D30394
2021-05-25 09:18:08 -06:00
Warner Losh
6c48134275 cam: Remove CAM_SIM_LOCK/UNLOCK macros, they are unused.
Sponsored by:		Netflix
Reviewed by:		mav@
Differential Revision:	https://reviews.freebsd.org/D30384
2021-05-25 09:18:08 -06:00
Warner Losh
28027f28e6 cam: remove sim callout
Nothing is using the sim callout to unfreeze the queue. Remove it to
simplify the SIM. This was introduced in the original CAM commit in 1998
but setting the CAM_SIM_REL_TIMEOUT_PENDING flag was removed in 1999 in
commit 87cfaf0e1f which reworked how bus reset worked. That work was
merged just after 3.2R was released. Remove the unused residuals.

Sponsored by:		Netflix
Reviewed by:		scottl@, mav@
Differential Revision:	https://reviews.freebsd.org/D30383
2021-05-25 09:18:08 -06:00
Colin Percival
27f09959d5 taskqueue: Add missing comma to TASKQUEUE_FAST_DEFINE_THREAD
Add missing comma to TASKQUEUE_FAST_DEFINE_THREAD arguments to prevent
compilation errors.

Submitted by:	ashafer_badland.io
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D30449
2021-05-24 20:37:55 -07:00
Bjoern A. Zeeb
fbf75b113e arm64: log vm_fault error for data_abort
Summary:
Log the vm_fault() error in the data_abort panic so it is easier to
find the reason vm_fault() failed (e.g., invalid address).

Reviewed by:	andrew
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D30362
2021-05-24 21:58:11 +00:00
Michael Tuexen
9bbd1a8fcb tcp: fix a RACK socket buffer lock issue
Fix a missing socket buffer unlocking of the socket receive buffer.

Reviewed by:		gallatin, rrs
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D30402
2021-05-24 20:31:23 +02:00
Randall Stewart
631449d5d0 tcp: Fix an issue with the PUSH bit as well as fill in the missing mtu change for fsb's
The push bit itself was also not actually being properly moved to
the right edge. The FIN bit was incorrectly on the left edge. We
fix these two issues as well as plumb in the mtu_change for
alternate stacks.

Reviewed by: mtuexen
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30413
2021-05-24 14:42:15 -04:00
Kristof Provost
4483fb4773 pf: fix ioctl() memory leak
When we create an nvlist and insert it into another nvlist we must
remember to destroy it. The nvlist_add_nvlist() function makes a copy,
just like nvlist_add_string() makes a copy of the string. If we don't
we're leaking memory on every (nvlist-based) ioctl() call.

While here remove two redundant 'break' statements.

PR:		255971
MFC after:	3 days
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-05-24 15:56:24 +02:00
Emmanuel Vadot
996afd401c arm: RPI-B: Add ext_resources driver
mmc_fdt_helpers needs clock and regulators.
Add all the ext_resources driver to RPI-B conf file to fix the build

Reported by:	mjg
2021-05-24 12:53:00 +02:00
Andrew Turner
e779604f1d Clean up early arm64 pmap code
Early in the arm64 pmap code we need to translate between a virtual
address and a physical address. Rather than manually walking the page
table we can ask the hardware to do it for us.

Reviewed by:	kib, markj
Sponsored by:	Innovate UK
Differential Revision: https://reviews.freebsd.org/D30357
2021-05-24 09:22:19 +00:00
Navdeep Parhar
24b98f288d cxgbe(4): Overhaul CLIP (Compressed Local IPv6) table management.
- Process the list of local IPs once instead of once per adapter.  Add
  addresses from all VNETs to the driver's list but leave hardware
  updates for later when the global VNET/IFADDR list locks have been
  released.

- Add address to the hardware table synchronously when a CLIP entry is
  requested for an address that's not already in there.

- Provide ioctls that allow userspace tools to manage addresses in the
  CLIP table.

- Add a knob (hw.cxgbe.clip_db_auto) that controls whether local IPs are
  automatically added to the CLIP table or not.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2021-05-23 16:07:29 -07:00
Vladimir Kondratyev
47791339f0 ums(4): Start USB xfers on opening of evdev node unconditionally.
This fixes inability to start USB xfers in a case when FIFO has been
already open()-ed but no read() or poll() calls has been issued yet.

MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D30343
2021-05-24 01:41:17 +03:00
Vladimir Kondratyev
05ab03a317 ums(4): Do not stop USB xfers on FIFO close when evdev is still active
This fixes lose of evdev events after moused has been killed.

While here use bitwise operations for UMS_EVDEV_OPENED flag.

Reviewed by:	hselasky
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D30342
2021-05-24 01:38:53 +03:00
Mateusz Guzik
a269183875 vfs: elide vnode locking when it is only needed for audit if possible 2021-05-23 19:37:16 +00:00
Dmitry Chagin
8746bc9187 run(4): add support for DLINK DWA-130 rev F1 wireless adaptor.
PR:		256092
Submitted by:	Francois Briere <purplefiasco at gmail.com>
MFC After:	2 weeks
2021-05-23 21:31:51 +03:00
Mark Johnston
6f6cd1e8e8 ktrace: Remove vrele() at the end of ktr_writerequest()
As of commit fc369a353 we no longer ref the vnode when writing a record.
Drop the corresponding vrele() call in the error case.

Fixes:	fc369a353 ("ktrace: fix a race between writes and close")
Reported by:	syzbot+9b96ea7a5ff8917d3fe4@syzkaller.appspotmail.com
Reported by:	syzbot+6120ebbb354cd52e5107@syzkaller.appspotmail.com
Reviewed by:	kib
MFC after:	6 days
Differential Revision:	https://reviews.freebsd.org/D30404
2021-05-23 14:13:01 -04:00
Mateusz Guzik
e2ab16b1a6 lockprof: move panic check after inspecting the state 2021-05-23 17:55:27 +00:00
Mateusz Guzik
6a467cc5e1 lockprof: pass lock type as an argument instead of reading the spin flag 2021-05-23 17:55:27 +00:00
Konstantin Belousov
eaf00819bc Add support for Gemini Lake LPSS UARTs.
With this patch:
% dmesg | grep -i uart
uart2: <Intel Gemini Lake SIO/LPSS UART 0> mem 0xa1426000-0xa1426fff,0xa1425000-0xa1425fff irq 4 at device 24.0 on pci0
uart3: <Intel Gemini Lake SIO/LPSS UART 1> mem 0xa1424000-0xa1424fff,0xa1423000-0xa1423fff irq 5 at device 24.1 on pci0
uart4: <Intel Gemini Lake SIO/LPSS UART 2> mem 0xfea10000-0xfea10fff irq 6 at device 24.2 on pci0
uart5: <Intel Gemini Lake SIO/LPSS UART 3> mem 0xa1422000-0xa1422fff,0xa1421000-0xa1421fff irq 7 at device 24.3 on pci0

PR:	256101
Submitted by:	 Daniel Ponte <amigan@gmail.com>
MFC after:	1 week
2021-05-23 20:46:32 +03:00
Hans Petter Selasky
ef0f7ae934 The old thread priority must be stored as part of the EPOCH(9) tracker.
Else recursive use of EPOCH(9) may cause the wrong priority to be restored.

Bump the __FreeBSD_version due to changing the thread and epoch tracker
structure.

Differential Revision:	https://reviews.freebsd.org/D30375
Reviewed by:	markj@
MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-05-23 10:53:25 +02:00
Adrian Chadd
c50346bcf5 ath: bump the default node queue size to 128 frames, not 64
It turns out that, silly adrian, setting it to 64 means only two
AMPDU frames of 32 subframes each.  Thus, whilst those are in-flight,
any subsequent queues frames to that node get dropped.

This ends up being pretty no bueno for performance if any receive
is also going on at that point.

Instead, set it to 128 for the time being to ensure that SOME
frames get queued in the meantime.  This results in some frames
being immediately available in the software queue for transmit
when the two existing A-MPDU frames have been completely sent,
rather than the queue remaining empty until at least one is sent.

It's not the best solution - I still think I'm scheduling receive
far more often than giving time to schedule transmit work -
but at least now I'm not starving the transmit side.

Before this, a bidirectional iperf would show receive at ~ 150mbit/sec.
but the transmit side at like 10kbit/sec.  With it set to 128 it's
now 150mbit/sec receive, and ~ 10mbit receive.  It's better than 10kbit/sec,
but still not as far as I'd like it to be.

Tested:

* AR9380/QCA934x (TL-WDR4300 AP), Macbook pro test STA + AR9380 test STA
2021-05-22 21:23:00 -07:00
Adrian Chadd
f858e9281c [ath] Handle STA + AP beacon programming without stomping over HW AP beacon programming
I've been using STA+AP modes at home for a couple years now
and I've been finding and fixing a lot of weird corner cases.
This is the eventual patchset I've landed on.

* Don't force beacon resync in STA mode if we're using sw beacon tracking.
  This stops a variety of stomping issues when the STA VAP is reconfigured;
  the AP hardware beacons were being stomped on!

* Use the first AP VAP to configure beacons on, rather than the first VAP.
  This prevents weird behaviour in ath_beacon_config() when the hardware
  is being reconfigured and the STA VAP was the first one created.
* Ensure the beacon interval / timing programming is within the AR9300
  HAL bounds by masking off any flags that may have been there before
  shifting the value up to 1/8 TUs rather than the 1 TU resolution the
  previous chips used.

Now I don't get weird beacon reprogramming during startup, STA state
changes and hardware recovery which showed up as HI-LARIOUS beacon
configurations and STAs that would just disconnect from the AP very
frequently.

Tested:

* AR9344/AR9380, STA and AP and STA+AP modes
2021-05-22 16:39:16 -07:00
Adrian Chadd
1ca3996828 [ath] Add ast_tsfoor to the sysctl statistics array. 2021-05-22 15:54:16 -07:00
Adrian Chadd
114f4b17d5 [ar71xx] During reset, don't spin, just keep trying
I've seen this fail from time to time and just hang during reset.
Instead of it just hanging, just poke it again.  I've not seen it
fail in hundreds of test resets now.

Tested:

* AR9344 AP/STA configuration
2021-05-22 15:53:00 -07:00
Zhenlei Huang
03b0505b8f ip_forward: Restore RFC reference
Add RFC reference lost in 3d846e4822

PR:		255388
Reviewed By:	rgrimes, donner, karels, marcus, emaste
MFC after:	27 days
Differential Revision: https://reviews.freebsd.org/D30374
2021-05-23 00:01:37 +02:00
Rick Macklem
3f7e14ad93 nfscl: Add hash lists for the NFSv4 opens
A problem was reported via email, where a large (130000+) accumulation
of NFSv4 opens on an NFSv4 mount caused significant lock contention
on the mutex used to protect the client mount's open/lock state.
Although the root cause for the accumulation of opens was not
resolved, it is obvious that the NFSv4 client is not designed to
handle 100000+ opens efficiently.  When searching for an open,
usually for a match by file handle, a linear search of all opens
is done.

This patch adds a table of hash lists for the opens, hashed on
file handle.  This table will be used by future commits to
search for an open based on file handle more efficiently.

MFC after:	2 weeks
2021-05-22 14:53:56 -07:00
Mateusz Guzik
138f78e94b umtx: convert umtxq_lock to a macro
Then LOCK_PROFILING starts reporting callers instead of the inline.
2021-05-22 21:01:05 +00:00
Mateusz Guzik
e71d5c7331 Fix limit testing after 1762f674cc ktrace commit.
The previous:

if ((uoff_t)uio->uio_offset + uio->uio_resid > lim)
	signal(....);

was replaced with:

if ((uoff_t)uio->uio_offset + uio->uio_resid < lim)
	return;
signal(....);

Making (uoff_t)uio->uio_offset + uio->uio_resid == lim trip over the
limit, when it did not previously.

Unbreaks running 13.0 buildworld.
2021-05-22 20:18:21 +00:00
Konstantin Belousov
fc369a353b ktrace: fix a race between writes and close
It was possible that termination of ktrace session occured during some
record write, in which case write occured after the close of the vnode.
Use ktr_io_params refcounting to avoid this situation, by taking the
reference on the structure instead of vnode.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30400
2021-05-22 23:14:13 +03:00
Mateusz Guzik
48235c377f Fix a braino in previous.
Instead of trying to partially ifdef out ktrace handling, define the
missing identifier to 0. Without this fix lack of ktrace in the kernel
also means there is no SIGXFSZ signal delivery.
2021-05-22 19:53:40 +00:00
Mateusz Guzik
154f0ecc10 Fix tinderbox build after 1762f674cc ktrace commit. 2021-05-22 19:41:19 +00:00
Mateusz Guzik
a0842e69aa lockprof: add contested-only profiling
This allows tracking all wait times with much smaller runtime impact.

For example when doing -j 104 buildkernel on tmpfs:

no profiling:	2921.70s user 282.72s system 6598% cpu 48.562 total
all acquires:	2926.87s user 350.53s system 6656% cpu 49.237 total
contested only:	2919.64s user 290.31s system 6583% cpu 48.756 total
2021-05-22 19:28:37 +00:00
Mateusz Guzik
fca5cfd584 lockprof: retire lock_prof_skipcount
The implementation uses a global variable for *ALL* calls, defeating the
point of sampling in the first place. Remove it as it clearly remains
unused.
2021-05-22 19:28:37 +00:00
Mateusz Guzik
cf74b2be53 vfs: retire the now unused vnlru_free routine 2021-05-22 18:42:30 +00:00
Mark Johnston
5c7ef43e96 ktls.h: Guard includes behind _KERNEL
These are not needed when including ktls.h to get sockopt definitions.

Reviewed by:	gallatin, jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30392
2021-05-22 12:12:19 -04:00
Mark Johnston
e4b16f2fb1 ktrace: Avoid recursion in namei()
sys_ktrace() calls namei(), which may call ktrnamei().  But sys_ktrace()
also calls ktrace_enter() first, so if the caller is itself being
traced, the assertion in ktrace_enter() is triggered.  And, ktrnamei()
does not check for recursion like most other ktrace ops do.

Fix the bug by simply deferring the ktrace_enter() call.

Also make the parameter to ktrnamei() const and convert to ANSI.

Reported by:	syzbot+d0a4de45e58d3c08af4b@syzkaller.appspotmail.com
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30340
2021-05-22 12:07:32 -04:00
Michael Tuexen
8923ce6304 tcp: Handle stack switch while processing socket options
Handle the case where during socket option processing, the user
switches a stack such that processing the stack specific socket
option does not make sense anymore. Return an error in this case.

MFC after:		1 week
Reviewed by:		markj
Reported by:		syzbot+a6e1d91f240ad5d72cd1@syzkaller.appspotmail.com
Sponsored by:		Netflix, Inc.
Differential revision:	https://reviews.freebsd.org/D30395
2021-05-22 14:39:36 +02:00
Konstantin Belousov
f784da883f Move mnt_maxsymlinklen into appropriate fs mount data structures
Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-MFC-Note:	struct mount layout
Differential revision:	https://reviews.freebsd.org/D30325
2021-05-22 15:16:09 +03:00
Konstantin Belousov
ea2b64c241 ktrace: add a kern.ktrace.filesize_limit_signal knob
When enabled, writes to ktrace.out that exceed the max file size limit
cause SIGXFSZ as it should be, but note that the limit is taken from
the process that initiated ktrace.   When disabled, write is blocked,
but signal is not send.

Note that in either case ktrace for the affected process is stopped.

Requested and reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30257
2021-05-22 15:16:09 +03:00
Konstantin Belousov
02645b886b ktrace: use the limit of the trace initiator for file size limit on writes
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30257
2021-05-22 15:16:09 +03:00
Konstantin Belousov
1762f674cc ktrace: pack all ktrace parameters into allocated structure ktr_io_params
Ref-count the ktr_io_params structure instead of vnode/cred.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30257
2021-05-22 15:16:08 +03:00
Konstantin Belousov
a6144f713c ktrace: do not stop tracing other processes if our cannot write to this vnode
Other processes might still be able to write, make the decision to stop
based on the per-process situation.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30257
2021-05-22 15:16:08 +03:00
Konstantin Belousov
9bb84c23e7 accounting: explicitly mark the exiting thread as doing accounting
and use the mark to stop applying file size limits on the write of
the accounting record.  This allows to remove hack to clear process
limits in acct_process(), and avoids the bug with the clearing being
ineffective because limits are also cached in the thread structure.

Reported and reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30257
2021-05-22 15:16:08 +03:00
Konstantin Belousov
70c05850e2 kern_descrip.c: Style
Wrap too long lines.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30257
2021-05-22 15:16:08 +03:00
Dmitry Chagin
d6fd321ef6 run(4): add support for ASUS USB-N14 wireless adaptor.
PR:		255759
Submitted by:	john.lmurdoch at gmail.com
MFC After:	1 week
2021-05-22 13:52:12 +03:00
Konstantin Belousov
42881526d4 nullfs: dirty v_object must imply the need for inactivation
Otherwise pages are cleaned some time later when the lower fs decides
that it is time to do it.  This mostly manifests itself as delayed
mtime update, e.g. breaking make-like programs.

Reported by:	mav
Tested by:	mav, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-05-22 12:30:17 +03:00
Konstantin Belousov
d713bf7927 vn_need_pageq_flush(): simplify
There is no need to own vnode interlock, since v_object is type stable
and can only change to/from NULL, and no other checks in the function
access fields protected by the interlock.  Remove the need variable, the
result of the test is directly usable as return value.

Tested by:	mav, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-05-22 12:29:44 +03:00
Edward Tomasz Napierala
33621dfc19 Refactor core dumping code a bit
This makes it possible to use core_write(), core_output(),
and sbuf_drain_core_output(), in Linux coredump code.  Moving
them out of imgact_elf.c is necessary because of the weird way
it's being built.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30369
2021-05-22 09:59:00 +01:00
Navdeep Parhar
ffbb373c5a cxgbe(4): Fix build warnings with NOINET kernels.
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D26334
2021-05-21 20:42:04 -07:00
Richard Scheffenegger
3975688563 rack: honor prior socket buffer lock when doing the upcall
While partially reverting D24237 with D29690, due to introducing some
unintended effects for in-kernel TCP consumers, the preexisting lock
on the socket send buffer was not considered properly.

Found by: markj
MFC after: 2 weeks
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D30390
2021-05-22 00:09:59 +02:00
Mark Johnston
916c61a5ed Fix handling of errors from pru_send(PRUS_NOTREADY)
PRUS_NOTREADY indicates that the caller has not yet populated the chain
with data, and so it is not ready for transmission.  This is used by
sendfile (for async I/O) and KTLS (for encryption).  In particular, if
pru_send returns an error, the caller is responsible for freeing the
chain since other implicit references to the data buffers exist.

For async sendfile, it happens that an error will only be returned if
the connection was dropped, in which case tcp_usr_ready() will handle
freeing the chain.  But since KTLS can be used in conjunction with the
regular socket I/O system calls, many more error cases - which do not
result in the connection being dropped - are reachable.  In these cases,
KTLS was effectively assuming success.

So:
- Change sosend_generic() to free the mbuf chain if
  pru_send(PRUS_NOTREADY) fails.  Nothing else owns a reference to the
  chain at that point.
- Similarly, in vn_sendfile() change the !async I/O && KTLS case to free
  the chain.
- If async I/O is still outstanding when pru_send fails in
  vn_sendfile(), set an error in the sfio structure so that the
  connection is aborted and the mbuf chain is freed.

Reviewed by:	gallatin, tuexen
Discussed with:	jhb
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30349
2021-05-21 17:45:19 -04:00
Mark Johnston
7d2608a5d2 tcp: Make error handling in tcp_usr_send() more consistent
- Free the input mbuf in a single place instead of in every error path.
- Handle PRUS_NOTREADY consistently.
- Flush the socket's send buffer if an implicit connect fails.  At that
  point the mbuf has already been enqueued but we don't want to keep it
  in the send buffer.

Reviewed by:	gallatin, tuexen
Discussed with:	jhb
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30349
2021-05-21 17:45:18 -04:00
Emmanuel Vadot
80e645dcdb mmc: Only build mmc_fdt_helper and mmc_pwrseq for arch that uses ext_resources
This is now a needed requirement and fixes powerpc* build
2021-05-21 19:35:20 +02:00
Emmanuel Vadot
c99d887ca8 dwmmc: Add bus_generic_add_child in the methods
Otherwise sdiob cannot add it's children.

Sponsored by:	Diablotin Systems
Differential Revision:	https://reviews.freebsd.org/D30295
2021-05-21 17:40:14 +02:00
Emmanuel Vadot
115e71a457 arm: allwinner: aw_mmc: Check regulators status before enabling/disabling them
Sponsored by:	Diablotin Systems
Differential Revision:	https://reviews.freebsd.org/D30294
2021-05-21 17:39:47 +02:00
Emmanuel Vadot
f52072b06d extres: regulator: Fix regulator_status for already enable regulators
If a regulator hasn't been enable by a driver but is enabled in hardware
(most likely enabled by U-Boot), regulator_status will returns that it
is enabled and so any call to regulator_disable will panic as it wasn't
enabled by one of our drivers.

Sponsored by:	Diablotin Systems
Differential Revision:	https://reviews.freebsd.org/D30293
2021-05-21 17:39:07 +02:00
Emmanuel Vadot
ce41765c21 mmc: dwmmc: Call mmc_fdt_set_power
This allow us to powerup/down the card and enabling/disabling the
regulators if any.

Sponsored by:	Diablotin Systems
Differential Revision:	https://reviews.freebsd.org/D30292
2021-05-21 17:38:35 +02:00
Emmanuel Vadot
03d4e8bb65 mmc_fdt_helper: Add mmc_fdt_set_power
This helper can be used to enable/disable the regulator and starting
the power sequence of sd/sdio/eMMC cards.

Sponsored by:	Diablotin Systems
Differential Revision:	https://reviews.freebsd.org/D30291
2021-05-21 17:38:05 +02:00
Emmanuel Vadot
182717da88 arm64: allwinner: axp81x: Add support for regnode_status
This method is used to know if a regulator is enabled or not.

Sponsored by:	Diablotin Systems
Differential Revision:	https://reviews.freebsd.org/D30290
2021-05-21 17:37:37 +02:00
Emmanuel Vadot
b0387990a7 mmc_fdt_helpers: Parse the optional pwrseq element.
If a sd/emmc node have a pwrseq property parse it and get the corresponding
driver.
This can later be used to powerup/powerdown the SDIO card or eMMC.

Sponsored by:	Diablotin Systems
Differential Revision:	https://reviews.freebsd.org/D30289
2021-05-21 17:36:58 +02:00
Emmanuel Vadot
5b2a81f58d mmc: Add mmc-pwrseq driver
This driver is used to power up sdio card or eMMC.
It handle the reset-gpio, clocks and needed delays for powerup/powerdown.

Sponsored by:	Diablotin Systems
Differential Revision:	https://reviews.freebsd.org/D30288
2021-05-21 17:36:20 +02:00
Emmanuel Vadot
bc1bb80564 arm64: rockchip: gpio: Give friendlier name to gpio
By default name the gpio P<bank><bankpin>
This make it easier to find the gpio when reading schematics or DTS.

Sponsored by:	Diablotin Systems
Differential Revision:	https://reviews.freebsd.org/D30287
2021-05-21 17:35:43 +02:00
Emmanuel Vadot
af2253f61c mmccam: Add two new XPT for MMC and use them in mmc_sim and sdhci
For the discovery phase of SD/eMMC we need to do some transaction in a async
way.
The classic CAM XPT_{GET,SET}_TRAN_SETTING cannot be used in a async way.
This also allow us to split the discovery phase into a more complete state
machine and we don't mtx_sleep with a random number to wait for completion
of the tasks.
For mmc_sim we now do the SET_TRAN_SETTING in a taskqueue so we can call
the needed function for regulators/clocks without the cam lock(s). This part is
still needed to be done for sdhci.
We also now save the host OCR in the discovery phase as it wasn't done before and
only worked because the same ccb was reused.

Reviewed by:	imp, kibab, bz
Differential Revision:	https://reviews.freebsd.org/D30038
2021-05-21 17:34:05 +02:00
Hans Petter Selasky
4eac63af23 Fix for use-after-free by if_ioctl() calls from user-space in USB drivers by
detaching the ifnet before the miibus.

PR:		252608
Suggested by:	jhb@
MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-05-21 14:59:19 +02:00
Hans Petter Selasky
b764a42653 There is a window where threads are removed from the process list and where
the thread destructor is invoked. Catch that window by waiting for all
task_struct allocations to be returned before freeing the UMA zone in the
LinuxKPI. Else UMA may fail to release the zone due to concurrent access
and panic:

panic() - Bad link element prev->next != elm
zone_release()
bucket_drain()
bucket_free()
zone_dtor()
zone_free_item()
uma_zdestroy()
linux_current_uninit()

This failure can be triggered by loading and unloading the LinuxKPI module
in a loop:

while true
do
kldload linuxkpi
kldunload linuxkpi
done

Discussed with:	kib@
MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-05-21 13:18:41 +02:00
Hans Petter Selasky
c82c200622 Accessing the epoch structure should happen after the INIT_CHECK().
Else the epoch pointer may be NULL.

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-05-21 11:21:32 +02:00
Hans Petter Selasky
f33168351b Properly define EPOCH(9) function macro.
No functional change intended.

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-05-21 11:21:32 +02:00
Hans Petter Selasky
cc9bb7a9b8 Rework for-loop in EPOCH(9) to reduce indentation level.
No functional change intended.

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-05-21 11:21:32 +02:00
Hans Petter Selasky
209d4919c5 Make sure all tasklets are drained before unloading the LinuxKPI.
Else use-after-free may happen.

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-05-21 11:21:32 +02:00
Richard Scheffenegger
032bf749fd [tcp] Keep socket buffer locked until upcall
r367492 would unlock the socket buffer before eventually calling the upcall.
This leads to problematic interaction with NFS kernel server/client components
(MP threads) accessing the socket buffer with potentially not correctly updated
state.

Reported by: rmacklem
Reviewed By: tuexen, #transport
Tested by: rmacklem, otis
MFC after: 2 weeks
Sponsored By: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29690
2021-05-21 11:07:51 +02:00
Michael Tuexen
500eb6dd80 tcp: Fix sending of TCP segments with IP level options
When bringing in TCP over UDP support in
https://cgit.FreeBSD.org/src/commit/?id=9e644c23000c2f5028b235f6263d17ffb24d3605,
the length of IP level options was considered when locating the
transport header. This was incorrect and is fixed by this patch.

X-MFC with:		https://cgit.FreeBSD.org/src/commit/?id=9e644c23000c2f5028b235f6263d17ffb24d3605
MFC after:		3 days
Reviewed by:		markj, rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D30358
2021-05-21 09:49:45 +02:00
Edward Tomasz Napierala
8dc96b74ed cam: clear on-stack CCBs in last few drivers
This changes ahc(4), ahd(4), hptiop(4), hptnr(4), hptrr(4),
and ps3cdrom(4).

Reviewed By:	imp
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D30305
2021-05-21 08:53:59 +01:00
Edward Tomasz Napierala
45f57ce122 arcmsr: clear CCB allocated on the stack
Reviewed By:	delphij, imp
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D30304
2021-05-21 08:22:13 +01:00
Edward Tomasz Napierala
b9353e0b44 isci: clear CCBs allocated on the stack
Reviewed By:	gallatin, imp
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D30303
2021-05-21 08:10:22 +01:00
Edward Tomasz Napierala
de992eed78 mpt: clear CCBs allocated on the stack
Reviewed By:	imp
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D30302
2021-05-21 07:59:02 +01:00
Edward Tomasz Napierala
7608b98c43 mpr, mps: clear CCBs allocated on the stack
Reviewed By:	imp
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D30301
2021-05-21 07:42:13 +01:00
Edward Tomasz Napierala
d39aac796b pms(4): clear CCBs allocated on the stack
Reviewed By:	imp
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D30300
2021-05-21 07:29:23 +01:00
Edward Tomasz Napierala
95c19e1d65 linux: refactor bsd_to_linux_regset() out of linux_ptrace.c
This will be used for Linux coredump support.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30365
2021-05-21 07:26:07 +01:00
Wojciech Macek
eedbbec3fd ip_mroute: remove unused declarations
fix build for non-x86 targets
2021-05-21 08:01:26 +02:00
Wojciech Macek
741afc6233 ip_mroute: refactor bw_meter API
API should work as following:
- periodicaly report Lower-or-EQual bandwidth (LEQ) connections
  over kernel socket, if user application registered for such
  per-flow notifications
- report Grater-or-EQual (GEQ) bandwidth as soon as it reaches
  specified value in configured time window

Custom implementation of callouts was removed. There is no
point of doing calout-wheel here as generic callouts are
doing exactly the same. The performance is not critical
for such reporting, so the biggest concern should be
to have a code which can be easily maintained.

This is ia preparation for locking rework which is highly inefficient.

Approved by:    mw
Sponsored by:   Stormshield
Obtained from:  Semihalf
Differential Revision:  https://reviews.freebsd.org/D30210
2021-05-21 06:43:41 +02:00
Rick Macklem
d80a903a1c nfsd: Add support for CLAIM_DELEG_PREV_FH to the NFSv4.1/4.2 Open
Commit b3d4c70dc6 added support for CLAIM_DELEG_CUR_FH to Open.
While doing this, I noticed that CLAIM_DELEG_PREV_FH support
could be added the same way.  Although I am not aware of any extant
NFSv4.1/4.2 client that uses this claim type, it seems prudent to add
support for this variant of Open to the NFSv4.1/4.2 server.

This patch does not affect mounts from extant NFSv4.1/4.2 clients,
as far as I know.

MFC after:	2 weeks
2021-05-20 18:37:40 -07:00
Philippe Michaud-Boudreault
5d698386fb hda: correct comment about Asus laptop digital mics
Reported in review D30333

MFC after:	1 week
2021-05-20 14:58:00 -04:00
John Baldwin
0cc7d64a2a iscsi: Move the maximum data segment limits into 'struct icl_conn'.
This fixes a few bugs in iSCSI backends where the backends were using
the limits they advertised initially during the login phase as the
final values instead of the values negotiated with the other end.

Reported by:	Jithesh Arakkan @ Chelsio
Reviewed by:	mav
Differential Revision:	https://reviews.freebsd.org/D30271
2021-05-20 09:59:11 -07:00
John Baldwin
71e3d1b3a0 iscsi: Always free a cdw before its associated ctl_io.
cxgbei stores state about a target transfer in the ctl_private[] array
of a ctl_io that is freed when a target transfer (represented by the
cdw) is freed.  As such, freeing a ctl_io before a cdw that references
it can result in a use after free in cxgbei.  Two of the four places
freed the cdw first, and the other two freed the ctl_io first.  Fix
the latter two places to free the cdw first.

Reported by:	Jithesh Arakkan @ Chelsio
Reviewed by:	mav
Differential Revision:	https://reviews.freebsd.org/D30270
2021-05-20 09:58:59 -07:00
Don Morris
f17a590085 ufs: Avoid M_WAITOK allocations when building a dirhash
At this point the directory's vnode lock is held, so blocking while
waiting for free pages makes the system more susceptible to deadlock in
low memory conditions.  This is particularly problematic on NUMA systems
as UMA currently implements a strict first-touch policy.

ufsdirhash_build() already uses M_NOWAIT for other allocations and
already handled failures for the block array allocation, so just convert
to M_NOWAIT.

PR:		253992
Reviewed by:	markj, mckusick, vangyzen
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D29045
2021-05-20 11:25:45 -04:00
Kristof Provost
b62489cc92 pf: Support killing floating states by interface
Floating states get assigned to interface 'all' (V_pfi_all), so when we
try to flush all states for an interface states originally created
through this interface are not flushed. Only if-bound states can be
flushed in this way.

Given that we track the original interface we can check if the state's
interface is 'all', and if so compare to the orig_if instead.

MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30246
2021-05-20 12:49:27 +02:00
Kristof Provost
d0fdf2b28f pf: Track the original kif for floating states
Track (and display) the interface that created a state, even if it's a
floating state (and thus uses virtual interface 'all').

MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30245
2021-05-20 12:49:27 +02:00
Kristof Provost
0592a4c83d pf: Add DIOCGETSTATESNV
Add DIOCGETSTATESNV, an nvlist-based alternative to DIOCGETSTATES.

MFC after:      1 week
Sponsored by:   Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30243
2021-05-20 12:49:27 +02:00
Kristof Provost
1732afaa0d pf: Add DIOCGETSTATENV
Add DIOCGETSTATENV, an nvlist-based alternative to DIOCGETSTATE.

MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30242
2021-05-20 12:49:26 +02:00
Wojciech Macek
787845c0e8 Revert "ip_mroute: refactor bw_meter API"
This reverts commit d1cd99b147.
2021-05-20 12:14:58 +02:00
Marcin Wojtas
240429103c Rename ofwpci.c to ofw_pcib.c
It's a class0 driver that implements some pcib methods and creates
a pci bus as its children.
The "ofw_pci" name will be used by a new driver that will be a subclass
of the pci bus.
No functional changes intended.

Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30226
2021-05-20 11:22:25 +02:00
Marcin Wojtas
b08bf4c35c sdhci_fsl_fdt: Skip vccq reconfiguration without regulator
There is no need to preform any voltage reconfiguration
in case the vccq regulator is not physically attached to the
slot.

Submitted by: Lukasz Hajec <lha@semihalf.com>
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30355
2021-05-20 11:21:53 +02:00
Ceri Davies
c1a148873d sys/*/conf/*, docs: fix links to handbook
While here, fix all links to older en_US.ISO8859-1 documentation
in the src/ tree.

PR:             255026
Reported by:    Michael Büker <freebsd@michael-bueker.de>
Reviewed by:    dbaio
Approved by:    blackend (mentor), re (gjb)
MFC after:      10 days
Differential Revision: https://reviews.freebsd.org/D30265
2021-05-20 09:27:10 +01:00
Wojciech Macek
d1cd99b147 ip_mroute: refactor bw_meter API
API should work as following:
- periodicaly report Lower-or-EQual bandwidth (LEQ) connections
  over kernel socket, if user application registered for such
  per-flow notifications
- report Grater-or-EQual (GEQ) bandwidth as soon as it reaches
  specified value in configured time window

Custom implementation of callouts was removed. There is no
point of doing calout-wheel here as generic callouts are
doing exactly the same. The performance is not critical
for such reporting, so the biggest concern should be
to have a code which can be easily maintained.

This is ia preparation for locking rework which is highly inefficient.

Approved by:    mw
Sponsored by:   Stormshield
Obtained from:  Semihalf
Differential Revision:  https://reviews.freebsd.org/D30210
2021-05-20 10:13:55 +02:00
John Baldwin
3bede2908a cxgbei: Add tunable sysctls for the FirstBurstLength and MaxBurstLength.
Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D30269
2021-05-19 15:56:54 -07:00
John Baldwin
671fd0ec8d cxgbei: Remove unused sysctls.
These were seemingly copied over from icl_soft.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D30268
2021-05-19 15:56:45 -07:00
John Baldwin
a9f0cf4838 cxgbe: Fix some merge-o's for the per-rxq iSCSI counters.
I botched a few of the changes when rebasing the changes in
4b6ed0758d across the changes in
43bbae1948.

- Move the counter allocations into alloc_ofld_rxq().

- Free the counters freeing an ofld rxq.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D30267
2021-05-19 15:56:31 -07:00
Konstantin Belousov
77b637338a alc(4): add support for Mikrotik 10/25G NIC
The new Mikrotik 10/25G NIC is mostly compatible with AR8151 hardware,
with few exceptions:

* card supports only 32bit DMA operations
* card does not support write-one-to-clear semantics for interrupt status
  register
* MDIO operations can take longer to complete

This patch adds support for Mikrotik 10/25G NIC to the alc driver
while maintaining support for all earlier HW.

The patch was tested with FreeBSD main branch as of commit
f4b38c360e

This was tested on Intel i7-4790K system with Mikrotik 10/25G NIC.
This was tested on Intel i7-4790K system with RB44Ge (AR8151 based 4-port NIC)
to verify backwards compatibility.

PR:	256000
Submitted by:	 Gatis Peisenieks  <gatis@mikrotik.com>
MFC after:	1 week
2021-05-20 01:30:25 +03:00
Warner Losh
96480d9b33 cam_sim: add doxygen to cam_sim_alloc_dev
cam_sim_alloc_dev was overlooked when cam_sim_alloc was documented.
Add doxygen docs for it, pointing at cam_sim_alloc.

Sponsored by:		Netflix
2021-05-19 15:59:09 -06:00
Rick Macklem
c28cb257dd nfscl: Fix NFSv4.1/4.2 mount recovery from an expired lease
The most difficult NFSv4 client recovery case happens when the
lease has expired on the server.  For NFSv4.0, the client will
receive a NFSERR_EXPIRED reply from the server to indicate this
has happened.
For NFSv4.1/4.2, most RPCs have a Sequence operation and, as such,
the client will receive a NFSERR_BADSESSION reply when the lease
has expired for these RPCs.  The client will then call nfscl_recover()
to handle the NFSERR_BADSESSION reply.  However, for the expired lease
case, the first reclaim Open will fail with NFSERR_NOGRACE.

This patch recognizes this case and calls nfscl_expireclient()
to handle the recovery from an expired lease.

This patch only affects NFSv4.1/4.2 mounts when the lease
expires on the server, due to a network partitioning that
exceeds the lease duration or similar.

MFC after:	2 weeks
2021-05-19 14:52:56 -07:00
Mateusz Guzik
4fe925b81e fdescfs: allow shared locking of root vnode
Eliminates fdescfs from lock profile when running poudriere.
2021-05-19 17:58:54 +00:00
Mateusz Guzik
43999a5cba pseudofs: use vget_prep + vget_finish instead of vget + the interlock 2021-05-19 17:58:42 +00:00
Alexander Motin
4a6830761c Fix packet cbs/ebs conversion.
Each packet is counted as 128 bytes by the code, not 125.  Not sure
what I was thinking about here 14 years ago.  May be just a typo.

Reported by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:	2 weeks
2021-05-19 11:04:08 -04:00
Bjoern A. Zeeb
f0a5e81af4 arm64: rockchip, implement the two rk805/808 clocks
While the xin32k clk was implemented in rk3399_cru as a fixed rate
clock, migrate it to rk805 as we will also need the 2nd clock
'rtc_clko_wifi' for SDIO and BT.
Both clocks remain fixed rate, and while the 1st one is always on
(though that is not expressed in the clk framework), the 2nd one
we can toggle on/off.

Reviewed-by:	manu
Tested-by:	manu
MFC-after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D26870
2021-05-19 11:48:11 +00:00
Navdeep Parhar
3965469eaa cxgbe(4): Remove some dead code.
MFC after:	3 days
2021-05-18 23:16:03 -07:00
Rick Macklem
fc0dc94029 nfsd: Reduce the callback timeout to 800msec
Recent discussion on the nfsv4@ietf.org mailing list confirmed
that an NFSv4 server should reply to an RPC in less than 1second.
If an NFSv4 RPC requires a delegation be recalled,
the server will attempt a CB_RECALL callback.
If the client is not responsive, the RPC reply will be delayed
until the callback times out.
Without this patch, the timeout is set to 4 seconds (set in
ticks, but used as seconds), resulting in the RPC reply taking over 4sec.
This patch redefines the constant as being in milliseconds and it
implements that for a value of 800msec, to ensure the RPC
reply is sent in less than 1second.

This patch only affects mounts from clients when delegations
are enabled on the server and the client is unresponsive to callbacks.

MFC after:	2 weeks
2021-05-18 16:17:58 -07:00
Rick Macklem
b3d4c70dc6 nfsd: Add support for CLAIM_DELEG_CUR_FH to the NFSv4.1/4.2 Open
The Linux NFSv4.1/4.2 client now uses the CLAIM_DELEG_CUR_FH
variant of the Open operation when delegations are recalled and
the client has a local open of the file.  This patch adds
support for this variant of Open to the NFSv4.1/4.2 server.

This patch only affects mounts from Linux clients when delegations
are enabled on the server.

MFC after:	2 weeks
2021-05-18 15:53:54 -07:00
Zhenlei Huang
3d846e4822 Do not forward datagrams originated by link-local addresses
The current implement of ip_input() reject packets destined for
169.254.0.0/16, but not those original from 169.254.0.0/16 link-local
addresses.

Fix to fully respect RFC 3927 section 2.7.

PR:		255388
Reviewed by:	donner, rgrimes, karels
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D29968
2021-05-18 22:59:46 +02:00
Markus Stoff
63b6a08ce2 ng_parse: IP address parsing in netgraph eating too many characters
Once the final component of the IP address has been parsed, the offset
on the input must not be advanced, as this would remove an unparsed
character from the input.

Submitted by:	Markus Stoff
Reviewed by:	donner
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D26489
2021-05-18 22:36:28 +02:00
Lv Yunlong
b295c5ddce socket: Release cred reference later in sodealloc()
We dereference so->so_cred to update the per-uid socket buffer
accounting, so the crfree() call must be deferred until after that
point.

PR:		255869
MFC after:	1 week
2021-05-18 15:25:40 -04:00
Mark Johnston
c4a6258d70 dummynet: Fix mbuf tag allocation failure handling
PR:		255875, 255878, 255879, 255880
Reviewed by:	donner, kp
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30318
2021-05-18 15:25:16 -04:00
Konstantin Belousov
8cf912b017 ttydev_write: prevent stops while terminal is busied
Since busy state is checked by all blocked writes, stopping a process
which waits in ttydisc_write() causes cascade.  Utilize sigdeferstop()
to avoid the issue.

Submitted by:	Jakub Piecuch <j.piecuch96@gmail.com>
PR:	255816
MFC after:	1 week
2021-05-18 20:52:03 +03:00
Mateusz Guzik
cc6f46ac2f vfs: refactor vdrop
In particular move vunlazy into its own routine.
2021-05-18 15:30:28 +00:00
Mateusz Guzik
715fcc0d34 vfs: change vn_freevnodes_* prefix to idiomatic vfs_freevnodes_* 2021-05-18 15:30:28 +00:00
Hans Petter Selasky
e5ff940a81 Propagate down USB explore error codes, so that failures to enumerate USB HUBs
behind USB HUBs are detected and the USB reset counter logic will kick in
preventing enumeration of continuously failing ports.

Submitted by:	phk@
Tested by:	bz@
PR:		237666
MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-05-18 16:11:35 +02:00
Hans Petter Selasky
70ffaaa69c Update USB_PORT_RESET_RECOVERY to comply with the USB 2.0 specification which
says it should be max 10 milliseconds.

This may fix some USB enumeration issues:
> usbd_req_re_enumerate: addr=3, set address failed! (USB_ERR_IOERROR, ignored)
> usbd_setup_device_desc: getting device descriptor at addr 3 failed,

Found by:	Zhichao1.Li@dell.com
MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-05-18 15:52:41 +02:00
Hans Petter Selasky
00e501d720 Update usb_timings_sysctl_handler() to accept any value for timings between
0 milliseconds and 2 seconds inclusivly. Some style fixes while at it.

The USB specification has minimum values and maximum values,
and not only minimum values.

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-05-18 15:52:41 +02:00
Roger Pau Monné
9e14ac116e x86/xen: further PVHv1 removal cleanup
The AP startup extern variable declarations are not longer needed,
since PVHv2 uses the native AP startup path using the lapic. Remove
the declaration and make the variables static to mp_machdep.c

Sponsored by: Citrix Systems R&D
2021-05-18 10:43:31 +02:00
Colin Percival
b6be9566d2 Fix buffer overflow in preloaded hostuuid cleaning
When a module of type "hostuuid" is provided by the loader,
prison0_init strips any trailing whitespace and ASCII control
characters by (a) adjusting the buffer length, and (b) zeroing out
the characters in question, before storing it as the system's
hostuuid.

The buffer length adjustment was correct, but the zeroing overwrote
one byte higher in memory than intended -- in the typical case,
zeroing one byte past the end of the hostuuid buffer.  Due to the
layout of buffers passed by the boot loader to the kernel, this will
be the first byte of a subsequent buffer.

This was *probably* harmless; prison0_init runs after preloaded kernel
modules have been linked and after the preloaded /boot/entropy cache
has been processed, so in both cases having the first byte overwritten
will not cause problems.  We cannot however rule out the possibility
that other objects which are preloaded by the loader could suffer from
having the first byte overwritten.

Since the zeroing does not in fact serve any purpose, remove it and
trim trailing whitespace and ASCII control characters by adjusting
the buffer length alone.

Fixes:		c3188289 Preload hostuuid for early-boot use
Reviewed by:	kevans, markj
MFC after:	3 days
2021-05-17 20:07:49 -07:00
Colin Percival
330f110bf1 Fix 'hostuuid: preload data malformed' warning
If the preloaded hostuuid value is invalid and verbose booting is
enabled, a warning is printed.  This printf had two bugs:

1. It was missing a trailing \n character.
2. The malformed UUID is printed with %s even though it is not known
to be NUL-terminated.

This commit adds the missing \n and uses %.*s with the (already known)
length of the preloaded UUID to ensure that we don't read past the end
of the buffer.

Reported by:	kevans
Fixes:		c3188289 Preload hostuuid for early-boot use
MFC after:	3 days
2021-05-17 20:07:49 -07:00
John Baldwin
8d2b4b2e7c cxgbe: Cast pointer arguments to trunc_page() to vm_offset_t.
Reported by:	mjg, jenkins, rmacklem
Fixes:		46bee8043e
Sponsored by:	Chelsio Communications
2021-05-17 17:04:22 -07:00
Mark Johnston
4224dbf4c7 xen: Remove leftover bits missed in commit ac3ede5371
Fixes:		ac3ede5371 ("x86/xen: remove PVHv1 code")
Reviewed by:	royger
Differential Revision:	https://reviews.freebsd.org/D30316
2021-05-17 13:06:44 -04:00
Kristof Provost
02c44f40f9 dummynet: Remove unused code
We never set 'busy' and never dequeue from the pending mq. Remove this
code.

Reviewed by:	ae
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30313
2021-05-17 15:03:55 +02:00
Kristof Provost
d69cc04014 pf: Set the pfik_group for userspace
Userspace relies on this pointer to work out if the kif is a group or
not. It can't use it for anything else, because it's a pointer to a
kernel address. Substitute 0xfeedc0de for 'true', so that we don't leak
kernel memory addresses to userspace.

PR:		255852
Reviewed by:	donner
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D30284
2021-05-17 13:48:06 +02:00
Justin Hibbits
b2ee069e8c Fix locking in qoriq_gpio
qoriq_gpio_pin_setflags() locks the device mutex, as does
qoriq_gpio_map_gpios(), causing a recursion on non-recursive lock.  This
was missed during testing for 16e549ebe.
2021-05-17 08:46:45 -05:00
Justin Hibbits
ffd21bd289 Make ISA_206_ATOMICS a kernel option
Summary:
To make it easier to build a kernel with PowerISA 2.06 atomics (sub-word
atomics), add a kernel config option.  User space still needs to specify
it as a CFLAG but that seems easier to do than for the kernel config.

Reviewed By: luporl
Differential Revision: https://reviews.freebsd.org/D29809
2021-05-17 08:46:38 -05:00
Justin Hibbits
7ed09a6778 powerpc: Rework IPI message processing
Summary:
There's no need to use a while loop in the IPI handler, the message list
is cached once and processed.  Instead, since the existing code calls
ffs(), sort the handlers, and use a simple 'if' sequence.

Reviewed By: nwhitehorn
Differential Revision: https://reviews.freebsd.org/D30018
2021-05-17 08:26:40 -05:00
Justin Hibbits
9aad27931e powerpc64/radix mmu: Remove dead variable
Remove dead variable from mmu_radix_extract_and_hold().  Based on
r352408 for amd64.
2021-05-17 08:26:39 -05:00
Roger Pau Monné
ac3ede5371 x86/xen: remove PVHv1 code
PVHv1 was officially removed from Xen in 4.9, so just axe the related
code from FreeBSD.

Note FreeBSD supports PVHv2, which is the replacement for PVHv1.

Sponsored by: Citrix Systems R&D
Reviewed by: kib, Elliott Mitchell
Differential Revision: https://reviews.freebsd.org/D30228
2021-05-17 11:41:21 +02:00
Mitchell Horne
2117a66af5 xen: remove hypervisor_info
This was a source of indirection needed to support PVHv1. Now that that
support has been removed, we can eliminate it.

Reviewed by: royger
2021-05-17 10:56:52 +02:00
Mitchell Horne
c93e6ea344 xen: remove support for PVHv1 bootpath
PVHv1 is a legacy interface supported only by Xen versions 4.4 through
4.9.

Reviewed by: royger
2021-05-17 10:56:52 +02:00
Mark Johnston
60cb98a1bd linux: Fix a mistake in commit fb58045145
The change to futex_andl_smap() should have ordered stac before the
load from a user address, otherwise it does not fix anything.

Fixes:	fb58045145 ("linux: Fix SMAP-enabled futex routines")
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-05-16 22:23:14 -04:00
Mark Johnston
5b81e2e1bc virtio_scsi: Zero stack-allocated CCBs
Fixes:	3394d4239b ("cam: allocate CCBs from UMA for SCSI and ATA IO")
Reported by:	syzbot+2e9ce63919709feb3d1c@syzkaller.appspotmail.com
Reviewed by:	trasz
Sponsored by:	The FreeBSD Foundation
2021-05-16 22:20:39 -04:00
Kirk McKusick
9a2fac6ba6 Fix handling of embedded symbolic links (and history lesson).
The original filesystem release (4.2BSD) had no embedded sysmlinks.
Historically symbolic links were just a different type of file, so
the content of the symbolic link was contained in a single disk block
fragment. We observed that most symbolic links were short enough that
they could fit in the area of the inode that normally holds the block
pointers. So we created embedded symlinks where the content of the
link was held in the inode's pointer area thus avoiding the need to
seek and read a data fragment and reducing the pressure on the block
cache. At the time we had only UFS1 with 32-bit block pointers,
so the test for a fastlink was:

	di_size < (NDADDR + NIADDR) * sizeof(daddr_t)

(where daddr_t would be ufs1_daddr_t today).

When embedded symlinks were added, a spare field in the superblock
with a known zero value became fs_maxsymlinklen. New filesystems
set this field to (NDADDR + NIADDR) * sizeof(daddr_t). Embedded
symlinks were assumed when di_size < fs->fs_maxsymlinklen. Thus
filesystems that preceeded this change always read from blocks
(since fs->fs_maxsymlinklen == 0) and newer ones used embedded
symlinks if they fit. Similarly symlinks created on pre-embedded
symlink filesystems always spill into blocks while newer ones will
embed if they fit.

At the same time that the embedded symbolic links were added, the
on-disk directory structure was changed splitting the former
u_int16_t d_namlen into u_int8_t d_type and u_int8_t d_namlen.
Thus fs_maxsymlinklen <= 0 (as used by the OFSFMT() macro) can
be used to distinguish old directory formats. In retrospect that
should have just been an added flag, but we did not realize we
needed to know about that change until it was already in production.

Code was split into ufs/ffs so that the log structured filesystem could
use ufs functionality while doing its own disk layout. This meant
that no ffs superblock fields could be used in the ufs code. Thus
ffs superblock fields that were needed in ufs code had to be copied
to fields in the mount structure. Since ufs_readlink needed to know
if a link was embedded, fs_maxlinklen gets copied to mnt_maxsymlinklen.

The kernel panic that arose to making this fix was triggered when a
disk error created an inode of type symlink with no allocated data
blocks but a large size. When readlink was called the uiomove was
attempted which segment faulted.

static int
ufs_readlink(ap)
	struct vop_readlink_args /* {
		struct vnode *a_vp;
		struct uio *a_uio;
		struct ucred *a_cred;
	} */ *ap;
{
	struct vnode *vp = ap->a_vp;
	struct inode *ip = VTOI(vp);
	doff_t isize;

	isize = ip->i_size;
	if ((isize < vp->v_mount->mnt_maxsymlinklen) ||
	    DIP(ip, i_blocks) == 0) { /* XXX - for old fastlink support */
		return (uiomove(SHORTLINK(ip), isize, ap->a_uio));
	}
	return (VOP_READ(vp, ap->a_uio, 0, ap->a_cred));
}

The second part of the "if" statement that adds

	DIP(ip, i_blocks) == 0) { /* XXX - for old fastlink support */

is problematic. It never appeared in BSD released by Berkeley because
as noted above mnt_maxsymlinklen is 0 for old format filesystems, so
will always fall through to the VOP_READ as it should. I had to dig
back through `git blame' to find that Rodney Grimes added it as
part of ``The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.''
He must have brought it across from an earlier FreeBSD. Unfortunately
the source-control logs for FreeBSD up to the merger with the
AT&T-blessed 4.4BSD-Lite conversion were destroyed as part of the
agreement to let FreeBSD remain unencumbered, so I cannot pin-point
where that line got added on the FreeBSD side.

The one change needed here is that mnt_maxsymlinklen is declared as
an `int' and should be changed to be `u_int64_t'.

This discovery led us to check out the code that deletes symbolic
links. Specifically

	if (vp->v_type == VLNK &&
	    (ip->i_size < vp->v_mount->mnt_maxsymlinklen ||
	     datablocks == 0)) {
		if (length != 0)
			panic("ffs_truncate: partial truncate of symlink");
		bzero(SHORTLINK(ip), (u_int)ip->i_size);
		ip->i_size = 0;
		DIP_SET(ip, i_size, 0);
		UFS_INODE_SET_FLAG(ip, IN_SIZEMOD | IN_CHANGE | IN_UPDATE);
		if (needextclean)
			goto extclean;
		return (ffs_update(vp, waitforupdate));
	}

Here too our broken symlink inode with no data blocks allocated
and a large size will segment fault as we are incorrectly using the
test that we have no data blocks to decide that it is an embdedded
symbolic link and attempting to bzero past the end of the inode.
The test for datablocks == 0 is unnecessary as the test for
ip->i_size < vp->v_mount->mnt_maxsymlinklen will do the right
thing in all cases.

The test for datablocks == 0 was added by David Greenman in this commit:

Author: David Greenman <dg@FreeBSD.org>
Date:   Tue Aug 2 13:51:05 1994 +0000

    Completed (hopefully) the kernel support for old style "fastlinks".

    Notes:
	svn path=/head/; revision=1821

I am guessing that he likely earlier added the incorrect test in the
ufs_readlink code.

I asked David if he had any recollection of why he made this change.
Amazingly, he still had a recollection of why he had made a one-line
change more than twenty years ago. And unsurpisingly it was because
he had been stuck between a rock and a hard place.

FreeBSD was up to 1.1.5 before the switch to the 4.4BSD-Lite code
base. Prior to that, there were three years of development in all
areas of the kernel, including the filesystem code, from the combined
set of people including Bill Jolitz, Patchkit contributors, and
FreeBSD Project members. The compatibility issue at hand was caused
by the FASTLINKS patches from Curt Mayer. In merging in the 4.4BSD-Lite
changes David had to find a way to provide compatibility with both
the changes that had been made in FreeBSD 1.1.5 and with 4.4BSD-Lite.
He felt that these changes would provide compatibility with both systems.

In his words:
``My recollection is that the 'FASTLINKS' symlinks support in
FreeBSD-1.x, as implemented by Curt Mayer, worked differently than
4.4BSD. He used a spare field in the inode to duplicately store the
length. When the 4.4BSD-Lite merge was done, the optimized symlinks
support for existing filesystems (those that were initialized in
FreeBSD-1.x) were broken due to the FFS on-disk structure of
4.4BSD-Lite differing from FreeBSD-1.x. My commit was needed to
restore the backward compatibility with FreeBSD-1.x filesystems.
I think it was the best that could be done in the somewhat urgent
circumstances of the post Berkeley-USL settlement. Also, regarding
Rod's massive commit with little explanation, some context: John
Dyson and I did the initial re-port of the 4.4BSD-Lite kernel to
the 386 platform in just 10 days. It was by far the most intense
hacking effort of my life. In addition to the porting of tons of
FreeBSD-1 code, I think we wrote more than 30,000 lines of new code
in that time to deal with the missing pieces and architectural
changes of 4.4BSD-Lite. We didn't make many notes along the way.
There was a lot of pressure to get something out to the rest of the
developer community as fast as possible, so detailed discrete commits
didn't happen - it all came as a giant wad, which is why Rod's
commit message was worded the way it was.''

Reported by:  Chuck Silvers
Tested by:    Chuck Silvers
History by:   David Greenman Lawrence
MFC after:    1 week
Sponsored by: Netflix
2021-05-16 17:04:11 -07:00
Rick Macklem
46269d66ed NFSv4 server: Re-establish the delegation recall timeout
Commit 7a606f280a allowed the server to do retries of CB_RECALL
callbacks every couple of seconds.  This was needed to allow the
Linux client to re-establish the back channel.
However this patch broke the delegation timeout check, such that
it would just keep retrying CB_RECALLS.
If the client has crashed or been network patitioned from the
server, this continues until the client TCP reconnects to
the server and re-establishes the back channel.

This patch modifies the code such that it still times out the
delegation recall after some minutes, so that the server will
allow the conflicting client request once the delegation times out.

This patch only affects the NFSv4 server when delegations are
enabled and a NFSv4 client that holds a delegation has crashed
or been network partitioned from the server for at least several
minutes when a delegation needs to be recalled.

MFC after:	2 weeks
2021-05-16 16:40:01 -07:00
Edward Tomasz Napierala
75b5caa08e cam: turn KASSERTs into printfs for now
It looks like I've missed a couple of places where we don't clear
stack-allocated CCBs.  Don't panic when that happens, just print
a warning.

This is a temporary measure until I get those cases fixed.

Reviewed By:	markj
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D30296
2021-05-16 20:19:19 +01:00
Mark Johnston
fb58045145 linux: Fix SMAP-enabled futex routines
Some of them were dereferencing the user pointer before disabling SMAP.

PR:		255591
Reviewed by:	kib
Tested by:	pitwuu@gmail.com
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D30276
2021-05-16 13:42:08 -04:00
Lutz Donnerhacke
687e510e5c netgraph/ng_checksum: Fix double free error
m_pullup(9) frees the mbuf(9) chain in the case of an allocation error.
The mbuf chain must not be freed again in this case.

PR:		255874
Submitted by:	<lylgood@foxmail.com>
Approved by:	markj
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30273
2021-05-16 19:39:51 +02:00
Edward Tomasz Napierala
8252fe56a0 cam: Fix race condition in dainit()
Previously, daregister() could have been called before dainit()
initialized the UMA zone.  This would trip a KASSERT.

Reported By:	pho
Tested By:	pho
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
2021-05-16 13:36:54 +01:00
Edward Tomasz Napierala
0f206cc912 cam: add missing zeroing of a stack-allocated CCB.
This could cause a panic at boot.

Reported By:	Shawn Webb <shawn.webb AT hardenedbsd.org>
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
2021-05-16 11:38:26 +01:00
Lutz Donnerhacke
2e6b07866f libalias: Ensure ASSERT behind varable declarations
At some places the ASSERT was inserted before variable declarations are
finished.  This is fixed now.

Reported by:	kib
Reviewed by:	kib
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30282
2021-05-16 02:28:36 +02:00
Mateusz Guzik
eec2e4ef7f tmpfs: reimplement the mtime scan to use the lazy list
Tested by:	pho
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D30065
2021-05-15 20:48:45 +00:00
Mateusz Guzik
128e25842e vm: add another pager private flag
Move OBJ_SHADOWLIST around to let pager flags be next to each other.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D30258
2021-05-15 20:47:29 +00:00
Edward Tomasz Napierala
3394d4239b cam: allocate CCBs from UMA for SCSI and ATA IO
This patch makes it possible for CAM to use small CCBs allocated
from an periph-specific UMA zone instead of the usual, huge ones.
The end result is that CCBs issued via da(4) take 544B (size of
ccb_scsiio) instead of the usual 2kB (size of 'union ccb', ~1.5kB,
rounded up by malloc(9)).  For ATA it's 272B.  We waste less
memory, we avoid zeroing the unused 1kB, and it should be easier
to allocate those CCBs in low memory conditions.  It should also
be possible to use uma_zone_reserve(9) to improve behaviour
in low memory conditions even further.

Note that this does not change the size, or the layout, of CCBs
as such.  CCBs get allocated in various different ways, in particular
on the stack, and I don't want to redo all that.  Instead, this
provides an opt-in mechanism for the periph to declare "my start()
callback is fine with receiving a CCB allocated from this UMA zone".
In other words, most of the code works exactly as it used to; the
change only happens to IOs issued by xpt_run_allockq(), which
is - conveniently - pretty much all that matters for performance.

The reason for doing it this way is that it's pretty small, localized
change, and can be implemented gradually and iteratively: take a
periph, make sure its start() callback only casts the CCBs it takes
to a particular type of CCB, for example ccb_scsiio, and that it only
casts CCBs returned by cam_periph_getccb() to that type, then add UMA
zone for that size, and declare it safe to XPT.

This is disabled by default.  Set 'kern.cam.ada.enable_uma_ccbs=1'
and 'kern.cam.da.enable_uma_ccbs=1' tunables to enable it.  Testing
is welcome; I will flip the default to enable in two weeks from now.

Reviewed By:	imp
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D28674
2021-05-15 12:03:49 +01:00
Lutz Donnerhacke
189f8eea13 libalias: replace placeholder with static constant
The field nullAddress in struct libalias is never set and never used.
It exists as a placeholder for an unused argument only.

Reviewed by:	hselasky
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D30253
2021-05-15 09:05:30 +02:00
Lutz Donnerhacke
effc8e57fb libalias: Style cleanup
libalias is a convolut of various coding styles modified by a series
of different editors enforcing interesting convetions on spacing and
comments.

This patch is a baseline to start with a perfomance rework of
libalias.  Upcoming patches should be focus on the code, not on the
style.  That's why most annoying style errors should be fixed
beforehand.

Reviewed by:	hselasky
Discussed by:	emaste
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D30259
2021-05-15 08:57:55 +02:00
John Baldwin
e73e2ee0ac cxgbei: Handle target transfers with excess unsolicited data.
The CTL frontend might have provided a buffer that is smaller than the
FirstBurstLength and thus smaller than the amount of unsolicited data
included in the request PDU.  Treat these transfers as an empty
transfer.

Reported by:	Jithesh Arakkan @ Chelsio
Sponsored by:	Chelsio Communications

Differential Revision:	https://reviews.freebsd.org/D29940
2021-05-14 12:21:34 -07:00
John Baldwin
e894e3adb2 cxgbei: Explicitly clear the page pode reservation pointer after freeing it.
A single union ctl_io can be reused across multiple transfers (in
particular by the ramdisk backend).  On a reuse, the reservation
pointer would retain its value from the previous transfer tripping an
assertion.

Reported by:	Jithesh Arakkan @ Chelsio
Sponsored by:	Chelsio Communications

Differential Revision:	https://reviews.freebsd.org/D29939
2021-05-14 12:21:34 -07:00
John Baldwin
1ad32ad0be cxgbei: Don't clamp iSCSI PDUs to 8K.
The firmware no longer requires this workaround.

Discussed with:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29912
2021-05-14 12:21:24 -07:00
John Baldwin
4add8e4c89 cxgbei: Don't leak resources for an aborted target transfer.
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29911
2021-05-14 12:17:26 -07:00
John Baldwin
a1c687347a cxgbei: Add support for zero-copy iSCSI target transmission/read.
- Switch to allocating the cxgbei version of icl_pdu explicitly
  as a separate refcounted object allocated via malloc/free
  instead of storing it in the bhs mbuf prior to the bhs.

- Support the icl_conn_pdu_queue_cb() method to set a callback
  on a PDU to be invoked when the PDU is freed.

- For ICL_NOCOPY buffers, use an external mbuf to manage the
  storage for the buffer via m_extaddref().  Each external mbuf
  holds a reference on the associated PDU, so the callback is
  invoked once all of the external mbufs have been freed.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29910
2021-05-14 12:17:20 -07:00
John Baldwin
31df8ff73e cxgbei: Rework the pdu_append_data hook to support M_WAITOK.
- Only allocate 16K jumbo mbufs if the region of data to be
  appended is sufficiently large, and use a loop.

- Use m_getm2() to allocate a chain for data less than 16K, or
  if m_getjcl() fails.

- Use ENOMEM as the return value instead of '1' if the hook fails due
  to a memory allocation error.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29909
2021-05-14 12:17:14 -07:00
John Baldwin
46bee8043e cxgbei: Support DDP for target I/O S/G lists with more than one entry.
A CAM target layer I/O CCB can use a S/G list of virtual address ranges
to describe its data buffer.  This change adds zero-copy receive support
for such requests.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29908
2021-05-14 12:17:06 -07:00
John Baldwin
23b209ee88 cxgbe tom: Account for pre-iSCSI mode data on suspended connections.
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29907
2021-05-14 12:17:02 -07:00
John Baldwin
91ca7b0954 cxgbei: Whitespace fixes, comment typo, and rewrap a comment.
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29906
2021-05-14 12:16:57 -07:00
John Baldwin
87bb5ed606 cxgbei: Use hardware RX flow control for offloaded iSCSI connections.
Forthcoming T6 iSCSI DDP support requires hardware RX flow control.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29905
2021-05-14 12:16:51 -07:00
John Baldwin
4427ac3675 cxgbe tom: Set the tid in the work requests to program page pods for iSCSI.
As a result, CPL_FW4_ACK now returns credits for these work requests.
To support this, page pod work requests are now constructed in special
mbufs similar to "raw" mbufs used for NIC TLS in plain TX queues.
These special mbufs are stored in the ulp_pduq and dispatched in order
with PDU work requests.

Sponsored by:	Chelsio Communications
Discussed with:	np
Differential Revision:	https://reviews.freebsd.org/D29904
2021-05-14 12:16:40 -07:00
John Baldwin
4b6ed0758d cxgbe: Make the TOE ISCSI RX stats per-queue instead of per adapter.
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29903
2021-05-14 12:16:33 -07:00
Alexander V. Chernikov
76cfc6fa0d Fix a use after free in update_rtm_from_rc().
update_rtm_from_rc() calls update_rtm_from_info() internally.
The latter one may update provided prtm pointer with a new rtm.
Reassign rtm from prtm afeter calling update_rtm_from_info() to
 avoid touching the freed rtm.

PR:		255871
Submitted by:	lylgood@foxmail.com
MFC after:	3 days
2021-05-14 16:06:41 +00:00
Mateusz Guzik
852088f6af vfs: add missing atomic conversion to writecount adjustment
Fixes:	("vfs: lockless writecount adjustment in set/unset text")
2021-05-14 17:42:05 +02:00
Mateusz Guzik
ca1ce50b2b vfs: add more safety against concurrent forced unmount to vn_write
1. stop re-reading ->v_mount (can become NULL)
2. stop re-reading ->v_type (can change to VBAD)
2021-05-14 14:22:22 +00:00
Mateusz Guzik
b5fb9ae687 vfs: lockless writecount adjustment in set/unset text
... for cases where this is not the first/last exec.
2021-05-14 14:22:21 +00:00
Mark Johnston
2cca77ee01 kqueue timer: Remove detached knotes from the process stop queue
There are some scenarios where a timer event may be detached when it is
on the process' kqueue timer stop queue.  If kqtimer_proc_continue() is
called after that point, it will iterate over the queue and access freed
timer structures.

It is also possible, at least in a multithreaded program, for a stopped
timer event to be scheduled without removing it from the process' stop
queue.  Ensure that we do not doubly enqueue the event structure in this
case.

Reported by:	syzbot+cea0931bb4e34cd728bd@syzkaller.appspotmail.com
Reported by:	syzbot+9e1a2f3734652015998c@syzkaller.appspotmail.com
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30251
2021-05-14 10:08:14 -04:00
Marcin Wojtas
f55bd0e579 qoriq_dw_pci: disable LS1028A support
Enabled driver initialization causes an abort
on the NXP LS1028ARDB platform (without any external
endpoints connected). Temporarily disable qoriq_dw_pci
probe, so that to allow successful booting of the OS.

Submitted by: Lukasz Hajec <lha@semihalf.com>
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30229
2021-05-14 10:50:17 +02:00
Marcin Wojtas
1f84b3a247 sdhci_fsl_fdt.c: Read supported voltages from dts.
We shouldn't overwrite capability register. Instead, voltages supported
by the controller have to be read from dts, as the hardware doesn't
report correct values.

Submitted by: Lukasz Hajec <lha@semihalf.com>
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30123
2021-05-14 10:34:37 +02:00
Marcin Wojtas
f0a9d7d799 sdhci_fsl_fdt.c: Add a missing call to mmc_fdt_parse.
Add a missing call to mmc_fdt_parse, without it some dts properties
are not parsed.

Submitted by: Lukasz Hajec <lha@semihalf.com>
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30122
2021-05-14 10:29:31 +02:00
Marcin Wojtas
ffd61af32c sdhci_fsl_fdt.c: Add support for LS1028a.
Add data specific for SoC, including all necessary quirks.

Submitted by: Lukasz Hajec <lha@semihalf.com>
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30121
2021-05-14 10:28:09 +02:00
Lutz Donnerhacke
a56e5ad690 netgraph/ng_bridge: Handle send errors during loop handling
If sending out a packet fails during the loop over all links, the
allocated memory is leaked and not all links receive a copy.  This
patch fixes those problems, clarifies a premature abort of the loop,
and fixes a minory style(9) bug.

PR:		255430
Submitted by:	Dancho Penev
Tested by:	Dancho Penev
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30008
2021-05-13 21:49:20 +02:00
Lutz Donnerhacke
4dfe70fdbd netgraph/ng_bridge: Avoid cache thrashing
Hint the compiler, that this update is needed at most once per second.
Only in this case the memory line needs to be written.  This will
reduce the amount of cache trashing during forward of most frames.

Suggested by:	zec
Approved by:	zec
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D28601
2021-05-13 21:14:36 +02:00
Mitchell Horne
f59127dac5 hwpmc: fix PMC_CPU_LAST
It is unused, but incorrect.

MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2021-05-13 16:02:59 -03:00
Emmanuel Vadot
0b426a1c2c modules: Only build sdhci_fdt for arm and arm64
Other FDT platform (like powerpc64* or riscv64) don't have gpio built
by default so just compile the module for those two arches.

Fixes:	9e08f82058 ("modules: Add sdhci_fdt module")
2021-05-13 20:23:59 +02:00
Ed Maste
2c9764f36b regen syscall files after d51198d63b63 2021-05-13 14:09:58 -04:00
Ed Maste
ad385f7b46 makesyscalls.lua: improve generated file style(9) compliance
We generally like to avoid style changes when other changes are not
planned.  In this case there are some makesyscalls.lua changes in the
pipeline, and this cleans up style nits in generated files that were
highlighted by experiments with clang-format.

Reviewed by:	brooks, kevans
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30235
2021-05-13 13:59:25 -04:00
Konstantin Belousov
28bc23ab92 tmpfs: dynamically register tmpfs pager
Remove OBJT_SWAP_TMPFS. Move tmpfs-specific swap pager bits into
tmpfs_subr.c.

There is no longer any code to directly support tmpfs in sys/vm, most
tmpfs knowledge is shared by non-anon swap object type implementation.
The tmpfs-specific methods are provided by registered tmpfs pager, which
inherits from the swap pager.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30168
2021-05-13 20:13:34 +03:00
Konstantin Belousov
b730fd30b7 vm: Add KPI to dynamically register pagers
Pager is allowed to inherit part of its implementation from the existing
pager, which is done by copying non-NULL virtual method slots.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30168
2021-05-13 20:12:29 +03:00
Konstantin Belousov
7079449b0b sys/vm: remove several other uses of OBJT_SWAP_TMPFS
Mostly in cases where OBJ_SWAP flag works as well, or by reversing the
condition so that object types can be listed.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30168
2021-05-13 20:10:35 +03:00
Konstantin Belousov
3e7a11ca21 vm_object_set_memattr(): handle all object types without listing them explicitly
This avoids the need to know all existing object types in advance, by the
cost of loosing the assert that unknown object type is handled in a sane
manner.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30168
2021-05-13 20:10:35 +03:00
Konstantin Belousov
8b99833ac2 procfs_map: switch to use vm_object_kvme_type
to get object type, and stop enumerating OBJT_XXX constants.  This also
provides properly a pointer for the vnode, if object backs any.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30168
2021-05-13 20:10:35 +03:00
Konstantin Belousov
00a3fe968b vm_object_kvme_type(): reimplement by embedding kvme_type into pagerops
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30168
2021-05-13 20:10:35 +03:00
Emmanuel Vadot
eb09408085 arm64: rockchip: Add some DTSO to disable sd/mmc
This helps during developement to reduce the number of mmc controller.
2021-05-13 18:15:31 +02:00
Emmanuel Vadot
9e08f82058 modules: Add sdhci_fdt module
This is a module for sdhci on fdt system
2021-05-13 18:15:31 +02:00
Lutz Donnerhacke
9674c2e68c netgraph/ng_bridge: become SMP aware
The node ng_bridge underwent a lot of changes in the last few months.
All those steps were necessary to distinguish between structure
modifying and read-only data transport paths.  Now it's done, the node
can perform frame forwarding on multiple cores in parallel.

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D28123
2021-05-13 17:53:07 +02:00
Lutz Donnerhacke
f6e0c47169 netgraph/ng_bridge: move MACs via control message
Use the new control message to move ethernet addresses from a link to
a new link in ng_bridge(4).  Send this message instead of doing the
work directly requires to move the loop detection into the control
message processing.  This will delay the loop detection by a few
frames.

This decouples the read-only activity from the modification under a
more strict writer lock.

Reviewed by:	manpages (gbe)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D28559
2021-05-13 17:27:01 +02:00
Mark Johnston
8b3c4231ab posix timers: Check for overflow when converting to ns
Disallow a time or timer period value when the conversion to nanoseconds
would overflow.  Otherwise it is possible to trigger a divison by zero
in realtime_expire_l(), where we compute the number of overruns by
dividing by the timer interval.

Fixes:	7995dae9 ("posix timers: Improve the overrun calculation")
Reported by:	syzbot+5ab360bd3d3e3c5a6e0e@syzkaller.appspotmail.com
Reported by:	syzbot+157b74ff493140d86eac@syzkaller.appspotmail.com
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30233
2021-05-13 08:34:03 -04:00
Mark Johnston
9246b3090c fork: Suspend other threads if both RFPROC and RFMEM are not set
Otherwise, a multithreaded parent process may trigger races in
vm_forkproc() if one thread calls rfork() with RFMEM set and another
calls rfork() without RFMEM.

Also simplify vm_forkproc() a bit, vmspace_unshare() already checks to
see if the address space is shared.

Reported by:	syzbot+0aa7c2bec74c4066c36f@syzkaller.appspotmail.com
Reported by:	syzbot+ea84cb06937afeae609d@syzkaller.appspotmail.com
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30220
2021-05-13 08:33:23 -04:00
Randall Stewart
02cffbc250 tcp: Incorrect KASSERT causes a panic in rack
Skyzall found an interesting panic in rack. When a SYN and FIN are
both sent together a KASSERT gets tripped where it is validating that
a mbuf pointer is in the sendmap. But a SYN and FIN often will not
have a mbuf pointer. So the fix is two fold a) make sure that the
SYN and FIN split the right way when cloning an RSM SYN on left
edge and FIN on right. And also make sure the KASSERT properly
accounts for the case that we have a SYN or FIN so we don't
panic.

Reviewed by: mtuexen
Sponsored by: Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D30241
2021-05-13 07:36:04 -04:00
Mateusz Guzik
cef8a95acb vfs: fix vnode use count leak in O_EMPTY_PATH support
The vnode returned by namei_setup is already referenced.

Reported by:	pho
2021-05-13 09:39:27 +00:00
Konstantin Belousov
6de3cf14c4 vn_open_cred(): disallow O_CREAT | O_EMPTY_PATH
This combination does not make sense, and cannot be satisfied by lookup.
In particular, lookup cannot supply dvp, it only can directly return vp.

Reported and reviewed by:	markj using syzkaller
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-05-13 02:32:04 +03:00
Michael Tuexen
eec6aed5b8 sctp: fix another locking bug in COOKIE handling
Thanks to Tolya Korniltsev for reporting the issue for
the userland stack and testing the fix.

MFC after:	3 days
2021-05-12 23:05:28 +02:00
Mark Johnston
d8acd2681b Fix mbuf leaks in various pru_send implementations
The various protocol implementations are not very consistent about
freeing mbufs in error paths.  In general, all protocols must free both
"m" and "control" upon an error, except if PRUS_NOTREADY is specified
(this is only implemented by TCP and unix(4) and requires further work
not handled in this diff), in which case "control" still must be freed.

This diff plugs various leaks in the pru_send implementations.

Reviewed by:	tuexen
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30151
2021-05-12 13:00:09 -04:00
Mark Johnston
c1dd4d642f nd6: Avoid using an uninitialized sockaddr in nd6_prefix_offlink()
Commit 81728a538 ("Split rtinit() into multiple functions.") removed
the initialization of sa6, but not one of its uses.  This meant that we
were passing an uninitialized sockaddr as the address to
lltable_prefix_free().  Remove the variable outright to fix the problem.
The caller is expected to hold a reference on pr.

Fixes:		81728a538 ("Split rtinit() into multiple functions.")
Reported by:	KMSAN
Reviewed by:	donner
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30166
2021-05-12 12:52:06 -04:00
Mark Johnston
ad22ba2b9f if: Remove unnecessary validation in the SIOCSIFNAME handler
A successful copyinstr() call guarantees that the returned string is
nul-terminated.  Furthermore, the removed check would harmlessly compare
an uninitialized byte with '\0' if the new name is shorter than
IFNAMESIZ - 1.

Reported by:	KMSAN
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-05-12 12:52:06 -04:00
Mark Johnston
06d1fd9f42 swap_pager: Zero swap info before exporting to userspace
Otherwise padding bytes are leaked.

Reported by:	KMSAN
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-05-12 12:52:05 -04:00
Michael Tuexen
251842c639 tcp rack: improve initialisation of retransmit timeout
When the TCP is in the front states, don't take the slop variable
into account. This improves consistency with the base stack.

Reviewed by:		rrs@
Differential Revision:	https://reviews.freebsd.org/D30230
MFC after:		1 week
Sponsored by:		Netflix, Inc.
2021-05-12 18:02:21 +02:00
Michael Tuexen
12dda000ed sctp: fix locking in case of error handling during a restart
Thanks to Taylor Brandstetter for finding the issue and providing
a patch for the userland stack.

MFC after:	3 days
2021-05-12 15:29:06 +02:00
John Baldwin
ed93deba11 Remove a write-only variable.
While refactoring an earlier series of changes during review, the
'saved_data' variable stopped being used at the bottom of if_ioctl().

Suggested by:	brooks
Reviewed by:	brooks, imp, kib
Fixes:		d17e0940f7 Rework compat shims in ifioctl().
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D30197
2021-05-11 14:56:23 -07:00
Mark Johnston
1a04f0156c cryptodev: Fix some input validation bugs
- When we do not have a separate IV, make sure that the IV length
  specified by the session is not larger than the payload size.
- Disallow AEAD requests without a separate IV.  crp_sanity() asserts
  that CRYPTO_F_IV_SEPARATE is set for AEAD requests, and some (but not
  all) drivers require it.
- Return EINVAL for AEAD requests if an IV is specified but the
  transform does not expect one.

Reported by:	syzbot+c9e8f6ff5cb7fa6a1250@syzkaller.appspotmail.com
Reported by:	syzbot+007341439ae295cee74f@syzkaller.appspotmail.com
Reported by:	syzbot+46e0cc42a428b3b0a40d@syzkaller.appspotmail.com
Reported by:	syzbot+2c4d670173b8bdb947df@syzkaller.appspotmail.com
Reported by:	syzbot+220faa5eeb4d47b23877@syzkaller.appspotmail.com
Reported by:	syzbot+e83434b40f05843722f7@syzkaller.appspotmail.com
Reviewed by:	jhb
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30154
2021-05-11 17:36:12 -04:00
Hans Petter Selasky
b8f113cab9 Implement cdev_device_add() and cdev_device_del() in the LinuxKPI.
MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-05-11 21:00:23 +02:00
Hans Petter Selasky
67807f5066 cdev_del() should only put it's kernel object in the LinuxKPI.
The destructor takes care of the rest.

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-05-11 21:00:23 +02:00
Hans Petter Selasky
904390b478 Implement read-only VM_SHARED flag in the LinuxKPI.
For use by mmap(2) callbacks.

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-05-11 21:00:14 +02:00
Roger Pau Monné
4772e86beb xen/blkback: fix reconnection of backend
The hotplug script will be executed only once for each backend,
regardless of the frontend triggering reconnections. Fix blkback to
deal with the hotplug script being executed only once, so that
reconnections don't stall waiting for a hotplug script execution
that will never happen.

As a result of the fix move the initialization of dev_mode, dev_type
and dev_name to the watch callback, as they should be set only once
the first time the backend connects.

This fix is specially relevant for guests wanting to use UEFI OVMF
firmware, because OVMF will use Xen PV block devices and disconnect
afterwards, thus allowing them to be used by the guest OS. Without
this change the guest OS will stall waiting for the block backed to
attach.

Fixes: de0bad0001 ('blkback: add support for hotplug scripts')
MFC after: 1 week
Sponsored by: Citrix Systems R&D
2021-05-11 15:43:42 +02:00
Randall Stewart
4b86a24a76 tcp: In rack, we must only convert restored rtt when the hostcache does restore them.
Rack now after the previous commit is very careful to translate any
value in the hostcache for srtt/rttvar into its proper format. However
there is a snafu here in that if tp->srtt is 0 is the only time that
the HC will actually restore the srtt. We need to then only convert
the srtt restored when it is actually restored. We do this by making
sure it was zero before the call to cc_conn_init and it is non-zero
afterwards.

Reviewed by:	Michael Tuexen
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30213
2021-05-11 08:15:05 -04:00
Wojciech Macek
0b103f7237 mrouter: do not loopback packets unconditionally
Looping back router multicast traffic signifficantly
stresses network stack. Add possibility to disable or enable
loopbacked based on sysctl value.

Reported by:    Daniel Deville
Reviewed by:	mw
Differential Revision:	https://reviews.freebsd.org/D29947
2021-05-11 12:36:07 +02:00
Wojciech Macek
65634ae748 mroute: fix race condition during mrouter shutting down
There is a race condition between V_ip_mrouter de-init
    and ip_mforward handling. It might happen that mrouted
    is cleaned up after V_ip_mrouter check and before
    processing packet in ip_mforward.
    Use epoch call aproach, similar to IPSec which also handles
    such case.

Reported by:    Damien Deville
Obtained from:	Stormshield
Reviewed by:	mw
Differential Revision:	https://reviews.freebsd.org/D29946
2021-05-11 12:34:20 +02:00
Mateusz Guzik
12288bd999 cache: fix lockless absolute symlink traversal to non-fp mounts
Said lookups would incorrectly fail with EOPNOTSUP.

Reported by:	kib
2021-05-11 04:30:12 +00:00
Justin Hibbits
a436e66531 powerpc/radix pmap: Convert stat counters from ulongs to counters
This should help performance a hair, for concurrent stat updates, by
reducing contention on cache lines.
2021-05-10 21:26:14 -05:00
Justin Hibbits
31c3770ee5 powerpc/mmu: Actually use the Radix pmap_align_superpage function
This was missed in the conversion to ifuncs.  It might help improve
promotion rates.
2021-05-10 21:26:14 -05:00
Rick Macklem
cb07628d9e nfscl: Delete unneeded redundant MODULE_DEPEND() calls
There are two module declarations in the nfscl.ko module for "nfscl"
and "nfs".  Both of these declarations had MODULE_DEPEND() calls.
This patch deletes the MODULE_DEPEND() calls for "nfs" to avoid
confusion with respect to what modules this module is dependent upon.

The patch also adds comments explaining why there are two module
declarations within the module.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D30102
2021-05-10 17:34:29 -07:00
Mark Johnston
c8bbb1272c vfs: Fix error handling in vn_fullpath_hardlink()
vn_fullpath_any_smr() will return a positive error number if the
caller-supplied buffer isn't big enough.  In this case the error must be
propagated up, otherwise we may copy out uninitialized bytes.

Reported by:	syzkaller+KMSAN
Reviewed by:	mjg, kib
MFC aftr:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30198
2021-05-10 20:22:27 -04:00
Konstantin Belousov
5e7cdf1817 openat(2): add O_EMPTY_PATH
It reopens the passed file descriptor, checking the file backing vnode'
current access rights against open mode. In particular, this flag allows
to convert file descriptor opened with O_PATH, into operable file
descriptor, assuming permissions allow that.

Reviewed by:	markj
Tested by:	Andrew Walker <awalker@ixsystems.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30148
2021-05-11 02:39:24 +03:00
Richard Scheffenegger
0471a8c734 tcp: SACK Lost Retransmission Detection (LRD)
Recover from excessive losses without reverting to a
retransmission timeout (RTO). Disabled by default, enable
with sysctl net.inet.tcp.do_lrd=1

Reviewed By: #transport, rrs, tuexen, #manpages
Sponsored by: Netapp, Inc.
Differential Revision: https://reviews.freebsd.org/D28931
2021-05-10 19:06:20 +02:00
Randall Stewart
9867224bab tcp:Host cache and rack ending up with incorrect values.
The hostcache up to now as been updated in the discard callback
but without checking if we are all done (the race where there are
more than one calls and the counter has not yet reached zero). This
means that when the race occurs, we end up calling the hc_upate
more than once. Also alternate stacks can keep there srtt/rttvar
in different formats (example rack keeps its values in microseconds).
Since we call the hc_update *before* the stack fini() then the
values will be in the wrong format.

Rack on the other hand, needs to convert items pulled from the
hostcache into its internal format else it may end up with
very much incorrect values from the hostcache. In the process
lets commonize the update mechanism for srtt/rttvar since we
now have more than one place that needs to call it.

Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30172
2021-05-10 11:25:51 -04:00
Kristof Provost
2ef5d803e3 in6_mcast: Return EADDRINUSE when we've already joined the group
Distinguish between truly invalid requests and those that fail because
we've already joined the group. Both cases fail, but differentiating
them allows userspace to make more informed decisions about what the
error means.

For example. radvd tries to join the all-routers group on every SIGHUP.
This fails, because it's already joined it, but this failure should be
ignored (rather than treated as a sign that the interface's multicast is
broken).

This puts us in line with OpenBSD, NetBSD and Linux.

Reviewed by:	donner
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30111
2021-05-10 09:48:51 +02:00
Juraj Lutter
c2c9ef3ced rpi_ft5406: Recognize raspberrypi,firmware-ts touchscreen
- Recognize raspberrypi,firmware-ts touchscreen
- Move the driver from ofwbus to simplebus

Reviewed by:	manu
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D30169
2021-05-09 12:13:19 +02:00
Ruslan Bukin
9146c6240d ofw: support for a single 'port' DTS property.
On rk3399 the VOP-little node has a single 'port' property (not a
collection of 'ports' or indexed ports).

Reviewed by:	manu
Sponsored by:	UKRI
Differential Revision:	https://reviews.freebsd.org/D30165
2021-05-08 15:41:57 +01:00
Fedor Uporov
2a984c2b49 Make encode/decode extra time functions inline.
Mentioned by:   pfg
MFC after:      2 weeks
2021-05-08 06:42:20 +03:00
Rick Macklem
dd02d9d605 nfscl: Add support for va_birthtime to NFSv4
There is a NFSv4 file attribute called TimeCreate
that can be used for va_birthtime.
r362175 added some support for use of TimeCreate.
This patch completes support of va_birthtime by adding
support for setting this attribute to the server.
It also eanbles the client to
acquire and set the attribute for a NFSv4
server that supports the attribute.

Reviewed by:	markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D30156
2021-05-07 17:30:56 -07:00
Randall Stewart
5a4333a537 This takes Warners suggested approach to making it so that
platforms that for whatever reason cannot include the RATELIMIT option
can still work with rack. It adds two dummy functions that rack will
call and find out that the highest hw supported b/w is 0 (which
kinda makes sense and rack is already prepared to handle).

Reviewed by: Michael Tuexen, Warner Losh
Sponsored by: Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D30163
2021-05-07 17:32:32 -04:00
Alexander V. Chernikov
aad59c79f5 Fix panic when trying to delete non-existent gateway in multipath route.
IF non-existend gateway was specified, the code responsible for calculating
 an updated nexthop group, returned the same already-used nexthop group.
After the route table update, the operation result contained the same
 old & new nexthop groups. Thus, the code responsible for decomposing
 the notification to the list of simple nexthop-level notifications,
 was not able to find any differences. As a result, it hasn't updated any
  of the "simple" notification fields, resulting in empty rtentry pointer.
This empty pointer was the direct reason of a panic.

Fix the problem by returning ESRCH when the new nexthop group is the same
 as the old one after applying gateway filter.

Reported by:	Michael <michael.adm at gmail.com>
PR:		255665
MFC after:	3 days
2021-05-07 20:41:31 +00:00
Kristof Provost
93abcf17e6 pf: Support killing 'matching' states
Optionally also kill states that match (i.e. are the NATed state or
opposite direction state entry for) the state we're killing.

See also https://redmine.pfsense.org/issues/8555

Submitted by:	Steven Brown
Reviewed by:	bcr (man page)
Obtained from:	https://github.com/pfsense/FreeBSD-src/pull/11/
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30092
2021-05-07 22:13:31 +02:00
Kristof Provost
abbcba9cf5 pf: Allow states to by killed per 'gateway'
This allows us to kill states created from a rule with route-to/reply-to
set.  This is particularly useful in multi-wan setups, where one of the
WAN links goes down.

Submitted by:	Steven Brown
Obtained from:	https://github.com/pfsense/FreeBSD-src/pull/11/
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30058
2021-05-07 22:13:31 +02:00
Kristof Provost
e989530a09 pf: Introduce DIOCKILLSTATESNV
Introduce an nvlist based alternative to DIOCKILLSTATES.

MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30054
2021-05-07 22:13:30 +02:00
Kristof Provost
7606a45dcc pf: Introduce DIOCCLRSTATESNV
Introduce an nvlist variant of DIOCCLRSTATES.

MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30052
2021-05-07 22:13:30 +02:00
Mark Johnston
a1fadf7de2 divert: Fix mbuf ownership confusion in div_output()
div_output_outbound() and div_output_inbound() relied on the caller to
free the mbuf if an error occurred.  However, this is contrary to the
semantics of their callees, ip_output(), ip6_output() and
netisr_queue_src(), which always consume the mbuf.  So, if one of these
functions returned an error, that would get propagated up to
div_output(), resulting in a double free.

Fix the problem by making div_output_outbound() and div_output_inbound()
responsible for freeing the mbuf in all cases.

Reported by:	Michael Schmiedgen <schmiedgen@gmx.net>
Tested by:	Michael Schmiedgen
Reviewed by:	donner
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30129
2021-05-07 14:31:08 -04:00
Mark Johnston
831850d8b0 stack(9): Disable KASAN in stack_capture()
When unwinding the stack, we may encounter a stack frame in a poisoned
region of the stack, triggering a false positive.

Reviewed by:	andrew, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30126
2021-05-07 14:31:08 -04:00
Mark Johnston
cfad8bd24f cdefs: Make __nosanitizeaddress work for KASAN as well
Add __nosanitizememory while I'm here.

Reviewed by:	andrew, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30126
2021-05-07 14:31:08 -04:00
Mark Johnston
2d499d5052 linker_set: Disable ASAN only in userspace
KASAN does not insert redzones around global variables and so is not
susceptible to the problem that led to us disabling ASAN for linker set
elements in the first place (see commit fe3d8086fb).

Reviewed by:	andrew, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30126
2021-05-07 14:31:08 -04:00
Randall Stewart
a16cee0218 Fix a UDP tunneling issue with rack. Basically there are two
issues.
A) Not enough hdrlen was being calculated when a UDP tunnel is
   in place.
and
B) Not enough memory is allocated in racks fsb. We need to
   overbook the fsb to include a udphdr just in case.

Submitted by: Peter Lei
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30157
2021-05-07 14:06:43 -04:00
Konstantin Belousov
d474440ab3 Constify vm_pager-related virtual tables.
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30070
2021-05-07 17:08:03 +03:00
Konstantin Belousov
4b8365d752 Add OBJT_SWAP_TMPFS pager
This is OBJT_SWAP pager, specialized for tmpfs.  Right now, both swap pager
and generic vm code have to explicitly handle swap objects which are tmpfs
vnode v_object, in the special ways.  Replace (almost) all such places with
proper methods.

Since VM still needs a notion of the 'swap object', regardless of its
use, add yet another type-classification flag OBJ_SWAP. Set it in
vm_object_allocate() where other type-class flags are set.

This change almost completely eliminates the knowledge of tmpfs from VM,
and opens a way to make OBJT_SWAP_TMPFS loadable from tmpfs.ko.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30070
2021-05-07 17:08:03 +03:00
Konstantin Belousov
0d2dfc6fed pagertab: use designated initializers
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30070
2021-05-07 17:08:03 +03:00
Konstantin Belousov
838adc533f Style enum obj_type
Put each type into dedicated line, which makes addition of new
types cleaner.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30070
2021-05-07 17:08:03 +03:00
Konstantin Belousov
a7c198a24b Implement vm_object_vnode() using vm_pager_getvp()
Allow vp_heldp argument to be NULL, in which case the returned vnode
is not held for tmpfs swap objects.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30070
2021-05-07 17:08:03 +03:00
Konstantin Belousov
1390a5cbeb Add pgo_freespace method
Makes the code in vm_object collapse/page_remove cleaner

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30070
2021-05-07 17:08:03 +03:00
Konstantin Belousov
192112b74f Add pgo_getvp method
This eliminates the staircase of conditions in vm_map_entry_set_vnode_text().

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30070
2021-05-07 17:08:03 +03:00