140087 Commits

Author SHA1 Message Date
kan
66f368ceba Enter net epoch in msk_tick.
One more instance of if_input being called outside of
interrupt, by means of msk_handle_events.

Differential Revision:  https://reviews.freebsd.org/D23379
2020-01-27 00:14:51 +00:00
vmaffione
f1f3c50834 netmap_mem_unmap: fix NULL pointer dereference
MFC after:	3 days
2020-01-26 21:34:46 +00:00
rmacklem
b616c3e0b2 Fix a crash in the NFSv4 server.
The PR reported a crash that occurred when a file was removed while
client(s) were actively doing lock operations on it.
Since nfsvno_getvp() will return NULL when the file does not exist,
the bug was obvious and easy to fix via this patch. It is a little
surprising that this wasn't found sooner, but I guess the above
case rarely occurs.

Tested by:	iron.udjin@gmail.com
PR:		242768
Reported by:	iron.udjin@gmail.com
MFC after:	2 weeks
2020-01-26 17:59:05 +00:00
jhb
7119b7bfe6 Revert accidental change from r357146. 2020-01-26 14:23:27 +00:00
jhb
740be4f83e Fix some misleading indentation warnings reported by recent clang.
These should not be any functional change.  While the change in
emul10kx-pcm.c looks like a real bug fix (as opposed to inconsistent
whitespace), the extra statements were not harmful.

Reviewed by:	kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D23363
2020-01-26 14:20:57 +00:00
jhb
ce1ef8c58e Compile hack.c with normal CFLAGS + -shared -nostdlib.
Originally, hack.c was compiled into a shard object with just -shared
-nostdlib.  This assumed that ${CC} did not require any additional
flags for ABIs, cross-building, etc.

When kern.post.mk was created in r89509 by reducing duplication in
kernel Makefile.<arch> files, the -shared flag was moved into a
HACK_EXTRA_FLAGS variable so that sparc64 could override it with
-Wl,-shared.  The sparc64 hack was removed in r111650, but
HACK_EXTRA_FLAGS was left in place.  Over time, we have started
support toolchains that require flags to support alternate ABIs on
MIPS and PowerPC and started (ab)using HACK_EXTRA_FLAGS to set only
those flags.

I need to fix risc-v to pass -mno-relax to the hack.c build for lld in
llvm 10, and the patches to support cross-build from non-FreeBSD hosts
need to include -target for clang in CFLAGS for hack.c.  Rather than
adding more hacks into HACK_EXTRA_FLAGS, just use the full set of
CFLAGS with hack.c.

Reviewed by:	kib, arichardson
MFC after:	1 month
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D23362
2020-01-26 14:19:08 +00:00
melifaro
0f2aacbaf5 Fix NOINET6 build after r357038.
Reported by:	AN <andy at neu.net>
2020-01-26 11:54:21 +00:00
mjg
16c3f2767a vfs: do an unlocked check before iterating the lazy list
For most filesystems it is expected to be empty most of the time.
2020-01-26 07:06:18 +00:00
mjg
b4ee89fba6 vfs: remove vop loop from vop_sigdefer
All ops are guaranteed to be present since r357131.
2020-01-26 07:05:06 +00:00
mjg
a0c963e766 vfs: stop null checking routines in vop wrappers
Calls to vop_bypass pass the same argument, but type casted to something else.
Thus by replacing NULL routines with vop_bypass we avoid a runtime check.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23357
2020-01-26 00:41:38 +00:00
mjg
97a4244ffe vfs: fix freevnodes count update race against preemption
vdbatch_process leaves the critical section too early, openign a time
window where another thread can get scheduled and modify vd->freevnodes.
Once it the preempted thread gets back it overrides the value with 0.

Just move critical_exit to the end of the function.
2020-01-26 00:40:27 +00:00
mjg
51c9b85471 ufs: add vgone calls for unconstructed vnodes in the error path
This mostly eliminates the requirement that vput never unlocks the vnode
before calling VOP_INACTIVE. Note it may still be present for other
filesystems.

See r356126 for an example bug.

Note vput stopped doing early unlock in r357070 thus this change does
not affect correctness as it is.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23215
2020-01-26 00:38:06 +00:00
mjg
f600a862f4 vfs: predict vn_lock failure as unlikely in vget 2020-01-26 00:34:57 +00:00
tuexen
f779ce9ed3 Sending CWR after an RTO is according to RFC 3168 generally required
and not only for the DCTCP congestion control.

Submitted by:		Richard Scheffenegger
Reviewed by:		rgrimes, tuexen@, Cheng Cui
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D23119
2020-01-25 13:45:10 +00:00
tuexen
e6f7ffd056 Don't set the ECT codepoint on retransmitted packets during SACK loss
recovery. This is required by RFC 3168.

Submitted by:		Richard Scheffenegger
Reviewed by:		rgrimes@, tuexen@, Cheng Cui
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D23118
2020-01-25 13:34:29 +00:00
tuexen
bc51b4b2b5 As a TCP client only enable ECN when the corresponding sysctl variable
indicates that ECN should be negotiated for the client side.

Submitted by:		Richard Scheffenegger
Reviewed by:		rgrimes@, tuexen@
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D23228
2020-01-25 13:11:14 +00:00
jah
8f43a81524 Implement cycle-detecting garbage collector for AF_UNIX sockets
The existing AF_UNIX socket garbage collector destroys any socket
which may potentially be in a cycle, as indicated by its file reference
count being equal to its enqueue count. However, this can produce false
positives for in-flight sockets which aren't part of a cycle but are
part of one or more SCM_RIGHTS mssages and which have been closed
on the sending side. If the garbage collector happens to run at
exactly the wrong time, destruction of these sockets will render them
unusable on the receiving side, such that no previously-written data
may be read.

This change rewrites the garbage collector to precisely detect cycles:

1. The existing check of msgcount==f_count is still used to determine
   whether the socket is potentially in a cycle.
2. The socket is now placed on a local "dead list", which is used to
   reduce iteration time (and therefore contention on the global
   unp_link_rwlock).
3. The first pass through the dead list removes each potentially-dead
   socket's outgoing references from the graph of potentially-dead
   sockets, using a gc-specific copy of the original reference count.
4. The second series of passes through the dead list removes from the
   list any socket whose remaining gc refcount is non-zero, as this
   indicates the socket is actually accessible outside of any possible
   cycle.  Iteration is repeated until no further sockets are removed
   from the dead list.
5. Sockets remaining in the dead list are destroyed as before.

PR:		227285
Submitted by:	jan.kokemueller@gmail.com (prior version)
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D23142
2020-01-25 08:57:26 +00:00
glebius
15660f43d7 Enter the network epoch in RX processing taskqueue. 2020-01-25 00:06:18 +00:00
tuexen
77b1d6dcde Don't delay the ACK for a TCP segment with the CWR flag set.
This allows the data sender to increase the CWND faster.

Submitted by:		Richard Scheffenegger
Reviewed by:		rgrimes@, tuexen@, Cheng Cui
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D22670
2020-01-24 22:50:23 +00:00
tuexen
68ae78ed36 The server side of TCP fast open relies on the delayed ACK timer to allow
including user data in the SYN-ACK. When DSACK support was added in
r347382, an immediate ACK was sent even for the received SYN with
user data. This patch fixes that and allows again to send user data with
the SYN-ACK.

Reported by:		Jeremy Harris
Reviewed by:		Richard Scheffenegger, rrs@
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D23212
2020-01-24 22:37:53 +00:00
glebius
9aa8e6fbcd Enter the network epoch when rack_output() is called in setsockopt(2). 2020-01-24 21:56:10 +00:00
glebius
d93435dbab Enter the network epoch in USB WiFi drivers when processing input
mbuf queues.

Submitted by:	Idwer Vollering <vidwer gmail.com>
2020-01-24 21:04:33 +00:00
melifaro
20aa310e22 Add support for RFC 6598/Carrier Grade NAT subnets. to libalias and ipfw.
In libalias, a new flag PKT_ALIAS_UNREGISTERED_RFC6598 is added.
 This is like PKT_ALIAS_UNREGISTERED_ONLY, but also is RFC 6598 aware.
Also, we add a new NAT option to ipfw called unreg_cgn, which is like
 unreg_only, but also is RFC 6598-aware.  The reason for the new
 flags/options is to avoid breaking existing networks, especially those
 which rely on RFC 6598 as an external address.

Submitted by:	Neel Chauhan <neel AT neelc DOT org>
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D22877
2020-01-24 20:35:41 +00:00
kib
a5a52555d0 Handle a race of collapse with a retrying fault.
Both vm_object_scan_all_shadowed() and vm_object_collapse_scan() might
observe an invalid page left in the default backing object by the
fault handler that retried.  Check for the condition and refuse to collapse.

Reported and tested by:	pho
Reviewed by:	jeff
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D23331
2020-01-24 19:42:53 +00:00
glebius
4b1034fea9 re(4) uses taskqueue to process input packets. Enter network epoch
in there.
2020-01-24 17:24:02 +00:00
glebius
73a359f8ff ath(4) processing input packets in taskqueue. Enter network epoch
before calling ieee80211_input_mimo().
2020-01-24 17:11:54 +00:00
br
8bc3d9ba4c Include the PCI stack to the riscv GENERIC kernel.
It will be used by an upcoming PCI root complex driver.

Sponsored by:	DARPA, AFRL
2020-01-24 17:10:21 +00:00
br
72a8c7765d Enable NEW_PCIB on riscv.
Sponsored by:	DARPA, AFRL
2020-01-24 16:50:51 +00:00
br
796933a470 o Move the software context struct to a header file.
o Make the pci_host_generic_acpi_attach() globally visible.
o Declare a new driver class.

These will be used by a new PCI root complex driver.

Sponsored by:	DARPA, AFRL
2020-01-24 16:43:49 +00:00
br
408b11df09 Move the ECAM macroses to the header file.
These will be used by other PCI root complex drivers.

Sponsored by:	DARPA, AFRL
2020-01-24 16:08:06 +00:00
markj
1acbd8ef64 Revert r357050.
It seems to have introduced a couple of regressions.

Reported by:	cy, pho
2020-01-24 14:58:02 +00:00
hselasky
d22d1cb947 Implement mmget_not_zero() in the LinuxKPI.
Submitted by:	Austin Shafer <ashafer@badland.io>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-01-24 13:05:53 +00:00
trasz
c384fe0936 Make linux(4) handle MAP_32BIT.
This unbreaks Mono (mono-devel-4.6.2.7+dfsg-1ubuntu1 from Ubuntu Bionic);
previously would crash on "amd64_is_imm32" assert.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23306
2020-01-24 12:08:23 +00:00
trasz
286137aef5 Add kern_unmount() and use in Linuxulator. No functional changes.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22646
2020-01-24 11:57:55 +00:00
dougm
c89a1dc405 Most uses of vm_map_clip_start follow a call to vm_map_lookup. Define
an inline function vm_map_lookup_clip_start that invokes them both and
use it in places that invoke both. Drop a couple of local variables
made unnecessary by this function.

Reviewed by:	markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22987
2020-01-24 07:48:11 +00:00
mjg
ec983c9c6f vfs: allow v_usecount to transition 0->1 without the interlock
There is nothing to do but to bump the count even during said transition.
There are 2 places which can do it:
- vget only does this after locking the vnode, meaning there is no change in
  contract versus inactive or reclamantion
- vref only ever did it with the interlock held which did not protect against
  either (that is, it would always succeed)

VCHR vnodes retain special casing due to the need to maintain dev use count.

Reviewed by:	jeff, kib
Tested by:	pho (previous version)
Differential Revision:	https://reviews.freebsd.org/D23185
2020-01-24 07:47:44 +00:00
mjg
1c80afbb99 vfs: stop handling VI_OWEINACT in vget
vget is almost always called with LK_SHARED, meaning the flag (if present) is
almost guaranteed to get cleared. Stop handling it in the first place and
instead let the thread which wanted to do inactive handle the bumepd usecount.

Reviewed by:	jeff
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D23184
2020-01-24 07:45:59 +00:00
mjg
837eba9f6f vfs: stop unlocking the vnode upfront in vput
Doing so runs into races with filesystems which make half-constructed vnodes
visible to other users, while depending on the chain vput -> vinactive ->
vrecycle to be executed without dropping the vnode lock.

Impediments for making this work got cleared up (notably vop_unlock_post now
does not do anything and lockmgr stops touching the lock after the final
write). Stacked filesystems keep vhold/vdrop across unlock, which arguably can
now be eliminated.

Reviewed by:	jeff
Differential Revision:	https://reviews.freebsd.org/D23344
2020-01-24 07:44:25 +00:00
mjg
1b5fd03218 lockmgr: don't touch the lock past unlock
This evens it up with other locking primitives.

Note lock profiling still touches the lock, which again is in line with the
rest.

Reviewed by:	jeff
Differential Revision:	https://reviews.freebsd.org/D23343
2020-01-24 07:42:57 +00:00
cem
1c59e40a69 cpufreq(4): Fix missing MODULE_DEPEND on hwpstate_intel
DRIVER_MODULE does not actually define a MODULE_VERSION, which is required
to satisfy a MODULE_DEPENDency.  Declare one explicitly in
hwpstate_intel(4).

Reported by:	flo
X-MFC-With:	r357002
2020-01-23 23:52:57 +00:00
kp
95e3476bfd pf: Apply kif flags to new group members
If we have a 'set skip on <ifgroup>' rule this flag it set on the group
kif, but must also be set on all members. pfctl does this when the rules
are set, but if groups are added afterwards we must also apply the flags
to the new member. If not, new group members will not be skipped until
the rules are reloaded.

Reported by:	dvl@
Reviewed by:	glebius@
Differential Revision:	https://reviews.freebsd.org/D23254
2020-01-23 22:13:41 +00:00
emaste
1b03f50a3e add MIPS-specific PT header ELF definitions
Submitted by:	David Carlier
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D19428
2020-01-23 17:38:17 +00:00
markj
a562905fca sparc64: Busy the TSB page before freeing it in pmap_release().
This is now required by vm_page_free().

PR:	243534
Reported and tested by:	Michael Reim <kraileth@elderlinux.org>
2020-01-23 17:18:58 +00:00
kib
8d9c802d64 Fix r356919.
Instead of waiting for pc_curthread which is overwritten by
init_secondary_tail(), wait for non-NULL pc_curpcb, to be set by the
first context switch.
Assert that pc_curpcb is not set too early.

Reported and tested by:	rlibby
Reviewed by:	markj, rlibby
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23330
2020-01-23 17:08:33 +00:00
markj
bd75319a9b ng_nat: Pass IPv6 packets through.
ng_nat implements NAT for IPv4 traffic only.  When connected to an
ng_ether node it erroneously handled IPv6 packets as well.

This change is not sufficient: ng_nat does not do any validation of IP
packets in this mode, even though they have not yet passed through
ip_input().

PR:		243096
Reported by:	Robert James Hernandez <rob@sarcasticadmin.com>
Reviewed by:	julian
Differential Revision:	https://reviews.freebsd.org/D23080
2020-01-23 16:45:48 +00:00
markj
425bc748d2 vm_map_submap(): Avoid unnecessary clipping.
A submap can only be created from an entry spanning the entire request
range.  In particular, if vm_map_lookup_entry() returns false or the
returned entry contains "end".

Since the only use of submaps in FreeBSD is for the static pipe and
execve argument KVA maps, this has no functional effect.

Github PR:	https://github.com/freebsd/freebsd/pull/420
Submitted by:	Wuyang Chung <wuyang.chung1@gmail.com> (original)
Reviewed by:	dougm, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23299
2020-01-23 16:45:10 +00:00
glebius
fa046782d1 With MSI interrupts bge(4) just schedules taskqueue. Enter the network
epoch in the taskqueue handler.

Reported by:	kib
2020-01-23 16:36:58 +00:00
markj
00e6826462 Set td_oncpu before dropping the thread lock during a switch.
After r355784 we no longer hold a thread's thread lock when switching it
out.  Preserve the previous synchronization protocol for td_oncpu by
setting it together with td_state, before dropping the thread lock
during a switch.

Reported and tested by:	pho
Reviewed by:	kib
Discussed with:	jeff
Differential Revision:	https://reviews.freebsd.org/D23270
2020-01-23 16:24:51 +00:00
markj
9838560238 Print missing ID_AA64PFR{0,1}_EL1 register fields.
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23213
2020-01-23 16:10:38 +00:00
markj
50a3e6955b arm64: Don't enable interrupts in init_secondary().
Doing so can cause deadlocks or panics during boot, if an interrupt
handler accesses uninitialized per-CPU scheduler structures.  This seems
to occur frequently when running under QEMU or AWS.  The idle threads
are set up to release a spinlock section and enable interrupts in
fork_exit(), so there is no need to enable interrupts earlier.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23328
2020-01-23 16:07:27 +00:00