136316 Commits

Author SHA1 Message Date
Richard Scheffenegger
31d7a27c6e PRR: Avoid accounting left-edge twice in partial ACK.
Reviewed By:	#transport, kbowling
MFC after:	3 days
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D28819
2021-02-25 18:37:47 +01:00
Richard Scheffenegger
48396dc779 Address two incorrect calculations and enhance readability of PRR code
- address second instance of cwnd potentially becoming zero
- fix sublte bug due to implicit int to uint typecase in max()
- fix bug due to typo in hand-coded CEILING() function by using howmany() macro
- use int instead of long, and add a missing long typecast
- replace if conditionals with easier to read imax/imin (as in pseudocode)

Reviewed By: #transport, kbowling
MFC after: 3 days
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D28813
2021-02-25 18:32:04 +01:00
Mark Johnston
369706a6f8 buf: Fix the dirtybufthresh check
dirtybufthresh is a watermark, slightly below the high watermark for
dirty buffers.  When a delayed write is issued, the dirtying thread will
start flushing buffers if the dirtybufthresh watermark is reached.  This
helps ensure that the high watermark is not reached, otherwise
performance will degrade as clustering and other optimizations are
disabled (see buf_dirty_count_severe()).

When the buffer cache was partitioned into "domains", the dirtybufthresh
threshold checks were not updated.  Fix this.

Reported by:	Shrikanth R Kamath <kshrikanth@juniper.net>
Reviewed by:	rlibby, mckusick, kib, bdrewery
Sponsored by:	Juniper Networks, Inc., Klara, Inc.
Fixes:		3cec5c77d6
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D28901
2021-02-25 10:04:44 -05:00
Mark Johnston
faa998f6ff sendfile: Use the pager size to determine the file extent when possible
Previously sendfile would issue a VOP_GETATTR and use the returned size,
i.e., the file size.  When paging in file data, sendfile_swapin() will
use the pager to determine whether it needs to zero-fill, most often
because of a hole in a sparse file.  An attempt to page in beyond the
end of a file is treated this way, and occurs when the requested page is
past the end of the pager.  In other words, both the file size and pager
size were used interchangeably.

With ZFS, updates to the pager and file sizes are not synchronized by
the exclusive vnode lock, at least partially due to its use of
MNTK_SHARED_WRITES.  In particular, the pager size is updated after the
file size, so in the presence of a writer concurrently extending the
file, sendfile could incorrectly instantiate "holes" in the page cache
pages backing the file, which manifests as data corruption when reading
the file back from the page cache.  The on-disk copy is unaffected.

Fix this by consistently using the pager size when available.

Reported by:	dumbbell
Reviewed by:	chs, kib
Tested by:	dumbbell, pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28811
2021-02-25 10:04:44 -05:00
Andrew Turner
3fd63ddfdf Limit when we call DELAY from KCSAN on amd64
In some cases the DELAY implementation on amd64 can recurse on a spin
mutex in the i8254 early delay code. Detect when this is going to
happen and don't call delay in this case. It is safe to not delay here
with the only issue being KCSAN may not detect data races.

Reviewed by:	kib
Tested by:	arichardson
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D28895
2021-02-25 12:38:05 +00:00
Andrew Turner
59f6ddb2bc Use pmap_qenter in the N1SDP PCIe driver
In the Neoverse N1 SDP PCIe driver we need to map a page shared
between the firmware and the kernel. Previously we would use
pmap_kenter for this, however as this is not standardised between
architectures switch to the common pmap_qenter.

While here fix the error handling code to clean up on failure.

Reviewed by:	br
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D28890
2021-02-25 12:38:05 +00:00
Kristof Provost
f5537cd069 bridgestp: Ensure we send STP on VLAN interfaces
Reviewed by:	donner@
MFC after:	1 week
X-MFC-with:	711ed156b94562c3dcb2ee9c1b3f240f960a75d2
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D28916
2021-02-25 10:16:25 +01:00
Kristof Provost
f3245be349 net: remove legacy in_addmulti()
Despite the comment to the contrary neither pf nor carp use
in_addmulti(). Nothing does, so get rid of it.

Carp stopped using it in 08b68b0e4c6b132127919cfbaf7275c727ca7843
(2011). It's unclear when pf stopped using it, but before
d6d3f01e0a3395c1fae34a3c4be7b051cb2d7581 (2012).

Reviewed by:	bz@, melifaro@
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D28918
2021-02-25 10:13:52 +01:00
Jamie Gritton
c861373bdf jail: re-commit 811e27fa3c44 with fixes
Make sure PD_KILL isn't passed to do_jail_attach, where it might end
up trying to kill the caller's prison (even prison0).

Fix the child jail loop in prison_deref_kill, which was doing the
post-order part during the pre-order part.  That's not a system-
killer, but make jails not always die correctly.
2021-02-24 21:54:49 -08:00
Jamie Gritton
ddfffb41a2 jail: back out 811e27fa3c44 until it doesn't break Jenkins
Reported by:	arichardson
2021-02-24 21:10:47 -08:00
Marcin Wojtas
ef567155d3 Fix powerpc build after 6dd69f0064f1
Commit 6dd69f0064f1 ("iflib: introduce isc_dma_width")
failed to build on powerpc due to implicit type conversion
error. Fix that.

Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
2021-02-25 02:35:41 +01:00
Ryan Libby
bf667f282a ofed: quiet gcc -Wint-in-bool-context
The int in the argument to the ternary triggered -Wint-in-bool-context
from gcc.  Upstream linux has a larger and more entangled patch,
12f727721eee61b3d19dedb95cb893b2baa9fe41, which doesn't apply cleanly.
When we eventually sync that, we can just drop this change.

Reviewed by:	hselasky, imp, kib
MFC after:	3 days
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D28762
2021-02-24 15:56:16 -08:00
Ryan Libby
d8404b7ec3 ddb: just move cursor when the lexer backs up
Get rid of db_look_char because it's not compatible with db_get_line().
This fixes the following issue:

db> script lockinfo=show alllocks
db> run lockinfo
db:0:lockinfo> how alllocks
No such command; use "help" to list available commands

Reported by:	markj
Reviewed by:	markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D28725
2021-02-24 15:56:16 -08:00
Ryan Libby
d85c9cef13 ddb: reliably fail with ambiguous commands
db_cmd_match had an even/odd bug, where if a third command was partially
matched (or any odd number greater than one) the search result would be
set back from CMD_AMBIGUOUS to CMD_FOUND, causing the last command in
the list to be executed instead of failing the match.

Reported by:	mlaier
Reviewed by:	markj, mlaier, vangyzen
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D28659
2021-02-24 15:56:16 -08:00
Max Laier
14b5a3c7d5 vm pqbatch: move unmanaged page assert under pagequeue lock
This KASSERT is overzealous because of the following race condition:
 1) A managed page which is currently in PQ_LAUNDRY is freed.
    vm_page_free_prep calls vm_page_dequeue_deferred()

    The page state is:
       PQ_LAUNDRY, PGA_DEQUEUE|PGA_ENQUEUED

 2) The laundry worker comes around and pick up the page and calls
    vm_pageout_defer(m, PQ_LAUNDRY, true) to check if page is still in the
    queue.  We do a vm_page_astate_load and get
       PQ_LAUNDRY, PGA_DEQUEUE|PGA_ENQUEUED
    as per above.

 3) The laundry worker is pre-empted and another thread allocates our page
    from the free pool.  For example vm_page_alloc_domain_after calls
    vm_page_dequeue() and sets VPO_UNMANAGED because we are allocating for
    an OBJT_UNMANAGED object.

    The page state is:
       PQ_NONE, 0 - VPO_UNMANAGED

 4) The laundry worker resumes, and processes vm_pageout_defer based on the
    stale astate which leads to a call to vm_page_pqbatch_submit, which will
    trip on the KASSERT.

Submitted by:	mlaier
Reviewed by:	markj, rlibby
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D28563
2021-02-24 15:56:16 -08:00
Marcin Wojtas
6dd69f0064 iflib: introduce isc_dma_width
Some DMA controllers are unable to address the full host memory space
and are instead limited to a subset of address range (e.g. 48-bit).

Allow the driver to specify the maximum allowed DMA addressing width
(in bits) for the NIC hardware, by introducing a new field in
if_softc_ctx.

If said field is omitted (set to 0), the lowaddr of DMA window bounds
defaults to BUS_SPACE_MAXADDR.

Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Differential Revision: https://reviews.freebsd.org/D28706
2021-02-25 00:25:39 +01:00
Alexander V. Chernikov
cc3fa1e29f Fix crash with rtadv-originated multipath IPv6 routes.
PR:		253800
Reported by:	Frederic Denis <freebsdml at hecian.net>
MFC after:	immediately
2021-02-24 16:44:10 +00:00
Konstantin Belousov
e2494f7561 atomic: add atomic_interrupt_fence()
with the semantic following C11 signal_fence, that is, it establishes
ordering between its place and any interrupt handler executing on the
same CPU.

Reviewed by:	markj, mjg, rlibby
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D28909
2021-02-24 22:45:24 +02:00
Brett Mastbergen
43d4dfac96 pwm_backlight: Add MODULE_DEPEND on backlight
Make the pwm_backlight module depend on backlight, so it
has access to the backlight interface symbols.  Otherwise you'll
get an error like:

link_elf: symbol backlight_get_info_desc undefined

Signed-off-by: Brett Mastbergen <brett.mastbergen@gmail.com>
MFC after:	3 days
PR: 		253765
2021-02-24 17:56:26 +01:00
Mark Johnston
b6999635b1 iflib: Avoid double counting in rxeof
iflib_rxeof() was counting everything twice.  This was introduced when
pfil hooks were added to the iflib receive path.  We want to count rx
packets/bytes before the pfil hooks are executed, so remove the counter
adjustments that are executed after.

PR:		253583
Reviewed by:	gallatin, erj
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28900
2021-02-24 10:08:53 -05:00
Konstantin Belousov
6f30ac9995 Call softdep_prealloc() before taking ffs_lock_ea(), if unlock is committing
softdep_prealloc() must be called to ensure enough journal space is
available, before ffs_extwrite(). Also it must be done before taking
ffs_lock_ea(), because it calls ffs_syncvnode(), potentially dropping
the vnode lock.

Reviewed by:	mckusick
Tested by:	pho
MFC after:      1 week
Sponsored by:   The FreeBSD Foundation
2021-02-24 09:55:21 +02:00
Konstantin Belousov
5e198e7646 ffs_close_ea: do not relock vnode under lock_ea
ffs_lock_ea is after the vnode lock, so vnode must not be relocked under
lock_ea. Move ffs_truncate() call in ffs_close_ea() after the lock_ea is
dropped, and only truncate to length zero, since this is the only mode
supported by ffs_truncate() for EAs. Previously code did truncation and
then write.

Zero the part of the ext area that is unused, if truncation is due but not
done because ea area is not zero-length.

Reviewed by:	mckusick
Tested by:	pho
MFC after:      1 week
Sponsored by:   The FreeBSD Foundation
2021-02-24 09:55:04 +02:00
Konstantin Belousov
c6d68ca842 ffs_vnops.c: style
Use local var to shorten ap->a_vp expression.

Reviewed by:	mckusick
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-02-24 09:54:53 +02:00
Konstantin Belousov
4983146279 ffs: do not call softdep_prealloc() from UFS_BALLOC()
Do it in ffs_write(), where we can gracefuly handle relock and its
consequences. In particular, recheck the v_data to see if the vnode
reclamation ended, and return EBADF when we cannot proceed with the
write.

Reviewed by:	mckusick
Reported by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-02-24 09:54:50 +02:00
Konstantin Belousov
cc9958bf22 ffs_reallocblks: change the guard for softdep_prealloc() call to DOINGSUJ()
instead of DOINGSOFTDEP().  The softdep_prealloc() function does nothing
in SU case.

Note that the call should be safe with regard to the vnode relock,
because it is called with MNT_NOWAIT, which does not descend into fsync.

Reviewed by:	mckusick
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-02-24 09:54:30 +02:00
Mark Johnston
1d44514fcd rmlock: Add a required compiler membar to the rlock slow path
The tracker flags need to be loaded only after the tracker is removed
from its per-CPU queue.  Otherwise, readers may fail to synchronize with
pending writers attempting to propagate priority to active readers, and
readers and writers deadlock on each other.  This was observed in a
stable/12-based armv7 kernel where the compiler had reordered the load
of rmp_flags to before the stores updating the queue.

Reviewed by:	rlibby, scottl
Discussed with:	kib
Sponsored by:	Rubicon Communications, LLC ("Netgate")
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D28821
2021-02-23 21:17:12 -05:00
Allan Jude
6d67af5f8e Revert "ipmi_smbios: Deduplicate smbios entry point discovery logic"
This depends on another commit that has not landed yet, and broke the build

This reverts commit ba6e37e47f41484fc61cc034619267b82ddd056c.
2021-02-23 22:49:13 +00:00
Allan Jude
4a5dfded17 Revert "ipmi_smbios: remove unused smbios_cksum function"
This reverts commit d2589dc3d56ce063b28b54df11c950c3758d9578.
2021-02-23 22:48:59 +00:00
Alexander V. Chernikov
9c4a8d24f0 Fix nd6 rib_action() handling.
rib_action() guarantees valid rc filling IFF it returns without error.
Check rib_action() return code instead of checking rc fields.

PR:		253800
Reported by:	Frederic Denis <freebsdml@hecian.net>
MFC after:	immediately
2021-02-23 22:40:01 +00:00
Vladimir Kondratyev
bbacb7ce72 ig4: Add PCI IDs for Intel Gemini Lake I2C controller.
Submitted by:	Dmitry Luhtionov
MFC after:	2 weeks
2021-02-24 01:23:43 +03:00
Allan Jude
d2589dc3d5 ipmi_smbios: remove unused smbios_cksum function
Sponsored By:	Ampere Computing LLC
Submitted By:	Klara Inc.
Differential Revision:	https://reviews.freebsd.org/D28751
2021-02-23 21:24:47 +00:00
Allan Jude
ba6e37e47f ipmi_smbios: Deduplicate smbios entry point discovery logic
Sponsored by:	Ampere Computing LLC
Submitted by:	Klara Inc.
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D28743
2021-02-23 21:17:37 +00:00
Allan Jude
d0673fe160 smbios: Move smbios driver out from x86 machdep code
Add it to the x86 GENERIC and MINIMAL kernels

Sponsored by:	Ampere Computing LLC
Submitted by:	Klara Inc.
Reviewed by:	rpokala
Differential Revision:	https://reviews.freebsd.org/D28738
2021-02-23 21:17:09 +00:00
Allan Jude
11ba8488b8 iicsmb: Request the bus recursively in bread()
ipmi_ssif will `smbus_request_bus()` to do multiple smbus requests
(which requests the iicbus), and then here in `bread()` we also need to
request the bus because `bread()` takes multiple transactions.
This causes deadlock as it's waiting for the bus it already has without
`IIC_RECURSIVE`.

Sponsored by:	Ampere Computing LLC
Submitted by:	Klara Inc.
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D28742
2021-02-23 20:06:16 +00:00
Konstantin Belousov
3ae8d83d04 Remove __NO_TLS.
All supported platforms support thread-local vars and __thread.

Reviewed by:	emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D28796
2021-02-23 20:08:10 +02:00
Alex Richardson
fa32350347 close_range: add audit support
This fixes the closefrom test in sys/audit.

Includes cherry-picks of the following commits from openbsm:

4dfc628aaf
99ff6fe32a
da48a0399e

Reviewed By:	kevans
Differential Revision: https://reviews.freebsd.org/D28388
2021-02-23 17:47:07 +00:00
Alexander Motin
7d4c444374 Bump CTL block backend threads from 14 to 32 per LUN.
This makes random read benchmarks look better on a wide ZFS pools.
I am not sure where the original value goes from, but it is there
for too long now.

MFC after:	1 week
2021-02-23 11:03:32 -05:00
Kristof Provost
c139b3c19b arp/nd: Cope with late calls to iflladdr_event
When tearing down vnet jails we can move an if_bridge out (as
part of the normal vnet_if_return()). This can, when it's clearing out
its list of member interfaces, change its link layer address.
That sends an iflladdr_event, but at that point we've already freed the
AF_INET/AF_INET6 if_afdata pointers.

In other words: when the iflladdr_event callbacks fire we can't assume
that ifp->if_afdata[AF_INET] will be set.

Reviewed by:	donner@, melifaro@
MFC after:	1 week
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D28860
2021-02-23 13:54:07 +01:00
Kristof Provost
38c0951386 bridge: Remove members when assigned to a new vnet
When the bridge is moved to a different vnet we must remove all of its
member interfaces (and span interfaces), because we don't know if those
will be moved along with it. We don't want to hold references to
interfaces not in our vnet.

Reviewed by:	donner@
MFC after:	1 week
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D28859
2021-02-23 13:54:07 +01:00
Kristof Provost
89fa9c34d7 bridge/stp: Ensure we enter NET_EPOCH whenever we can send traffic
Reviewed by:	donner@
MFC after:	1 week
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D28858
2021-02-23 13:54:07 +01:00
Kristof Provost
711ed156b9 bridge: Support STP on VLAN devices
VLAN devices have type IFT_L2VLAN, so the STP code mistakenly believed
they couldn't be used for STP. That's not the case, so add the
ITF_L2VLAN to the check.

Reviewed by:	donner@
MFC after:	1 week
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D28857
2021-02-23 13:54:06 +01:00
Eric Joyner
a7ac518bff ice_ddp: Update package file to 1.3.19.0
This package is intended to be used with ice(4) version 0.28.1-k.
That update will happen in a forthcoming commit.

Signed-off-by: Eric Joyner <erj@FreeBSD.org>
Sponsored by: Intel Corporation
2021-02-22 18:02:19 -08:00
Jamie Gritton
0a2a96f35a jail: Don't allow jails under dying parents
If a jail is created with jail_set(...JAIL_DYING), and it has a parent
currently in a dying state, that will bring the parent jail back to
life.  Restrict that to require that the parent itself be explicitly
brought back first, and not implicitly created along with the new
child jail.

Differential Revision:	https://reviews.freebsd.org/D28515
2021-02-22 17:04:06 -08:00
Jamie Gritton
701d6b50ae jail: Fix a LOR introduced in 1158508a8086 2021-02-22 15:51:10 -08:00
Alexander V. Chernikov
5964172837 Simplify ifa/ifp refcounting in the routing stack.
The routing stack control depends on quite a tree of functions to
 determine the proper attributes of a route such as a source address (ifa)
 or transmit ifp of a route.

When actually inserting a route, the stack needs to ensure that ifa and ifp
 points to the entities that are still valid.
Validity means slightly more than just pointer validity - stack need guarantee
 that the provided objects are not scheduled for deletion.

Currently, callers either ignore it (most ifp parts, historically) or try to
 use refcounting (ifa parts). Even in case of ifa refcounting it's not always
 implemented in fully-safe manner. For example, some codepaths inside
 rt_getifa_fib() are referencing ifa while not holding any locks, resulting in
 possibility of referencing scheduled-for-deletion ifa.

Instead of trying to fix all of the callers by enforcing proper refcounting,
 switch to a different model.
As the rib_action() already requires epoch, do not require any stability guarantees
 other than the epoch-provided one.
Use newly-added conditional versions of the refcounting functions
 (ifa_try_ref(), if_try_ref()) and fail if any of these fails.

Reviewed by:	donner
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D28837
2021-02-22 23:37:59 +00:00
Alexander V. Chernikov
7563019bc6 Add if_try_ref() to simplify refcount handling inside epoch.
When we have an ifp pointer and the code is running inside epoch,
 epoch guarantees the pointer will not be freed.
However, the following case can still happen:

* in thread 1 we drop to refcount=0 for ifp and schedule its deletion.
* in thread 2 we use this ifp and reference it
* destroy callout kicks in
* unhappy user reports a bug

This can happen with the current implementation of ifnet_byindex_ref(),
 as we're not holding any locks preventing ifnet deletion by a parallel thread.

To address it, add if_try_ref(), allowing to return failure when
 referencing ifp with refcount=0.
Additionally, enforce existing if_ref() is with KASSERT to provide a
 cleaner error in such scenarios.

Finally, fix ifnet_byindex_ref() by using if_try_ref() and returning NULL
 if the latter fails.

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D28836
2021-02-22 23:37:59 +00:00
Mark Johnston
537f92cd35 uma: Update the comment above startup_alloc() to reflect reality
The scheme used for early slab allocations changed in commit a81c400e75.

Reported by:	alc
Reviewed by:	alc
MFC after:	1 week
2021-02-22 18:22:51 -05:00
Alexander Motin
d510bf133d cxgb(4): Rework my commit 9dc7c250.
The previous implementation was reported to try to coalesce packets
in situations when it should not, that resulted in assertion later.
This implementation better checks the first packet of the chain for
the coallescing elligibility.

MFC after:	3 days
2021-02-22 17:33:43 -05:00
Mark Johnston
23e875fd97 vm_kern: Avoid sign extension in the KVA_QUANTUM definition
Otherwise, on a powerpc64 NUMA system with hashed page tables, the
first-level superpage reservation size is large enough that the value of
the kernel KVA arena import quantum, KVA_NUMA_IMPORT_QUANTUM, is
negative and gets sign-extended when passed to vmem_set_import().  This
results in a boot-time hang on such platforms.

Reported by:	bdragon
MFC after:	3 days
2021-02-22 15:50:09 -05:00
Jamie Gritton
811e27fa3c jail: Add PD_KILL to remove a prison in prison_deref().
Add the PD_KILL flag that instructs prison_deref() to take steps
to actively kill a prison and its descendents, namely marking it
PRISON_STATE_DYING, clearing its PR_PERSIST flag, and killing any
attached processes.

This replaces a similar loop in sys_jail_remove(), bringing the
operation under the same single hold on allprison_lock that it already
has. It is also used to clean up failed jail (re-)creations in
kern_jail_set(), which didn't generally take all the proper steps.

Differential Revision:  https://reviews.freebsd.org/D28473
2021-02-22 12:27:44 -08:00