Commit Graph

268916 Commits

Author SHA1 Message Date
Colin Percival
04b9b7c507 loader bcache: Track unconsumed readahead
The loader bcache attempts to determine whether readahead is useful,
increasing or decreasing its readahead length based on whether a
read could be serviced out of the cache.  This resulted in two
unfortunate behaviours:

1. A series of consecutive 32 kB reads are requested and bcache
performs 16 kB readaheads.  For each read, bcache determines that,
since only the first 16 kB is already in the cache, the readahead
was not useful, and keeps the readahead at the minimum (16 kB) level.

2. A series of consecutive 32 kB reads are requested and bcache
starts with a 32 kB readahead resulting in a 64 kB being read on
the first request.  The second 32 kB request can be serviced out of
the cache, and bcache responds by doubling its readahead length to
64 kB.  The third 32 kB request cannot be serviced out of the cache,
and bcache reduces its readahead length back down to 32 kB.

The first syndrome converts a series of 32 kB reads into a series of
(misaligned) 32 kB reads, while the second syndrome converts a series
of 32 kB reads into a series of 64 kB reads; in both cases we do not
increase the readahead length to its limit (currently 128 kB) no
matter how many consecutive read requests are made.

This change avoids this problem by tracking the "unconsumed
readahead" length; readahead is deemed to be useful (and the
read-ahead length is potentially increased) not only if a request was
completely serviced out of the cache, but also if *any* of the request
was serviced out of the cache and that length matches the amount of
unconsumed readahead.  Conversely, we now only reduce the readahead
length in cases where there was unconsumed readahead data.

In my testing on an EC2 c5.xlarge instance, this change reduces the
boot time by roughly 120 ms.

Reviewed by:	imp, tsoome
MFC after:	1 week
Sponsored by:	https://patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D32250
2021-10-03 14:54:09 -07:00
Colin Percival
b841148bbb loader: Refactor readahead adjustment in bcache
While I'm here, add an explanatory comment.

No functional change intended.

Reviewed by:	imp, tsoome (previous version)
MFC after:	1 week
Sponsored by:	https://patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D32249
2021-10-03 12:10:36 -07:00
Jessica Clarke
31776afdc7 pci_pci: Support growing bus ranges in bus_adjust_resource for NEW_PCIB
This is the same underlying problem as 2624598064, just for bus ranges
rather than windows. SiFive's HiFive Unmatched has the following
topology:

  Root Port <---> Bridge <---> Bridge <-+-> Bridge <---> (Unused)
   (pcib0)        (pcib1)      (pcib2)  |   (pcib3)
                                        +-> Bridge <---> xHCI
                                        |   (pcib4)
                                        +-> Bridge <---> M.2 E-key
                                        |   (pcib5)
                                        +-> Bridge <---> M.2 M-key
                                        |   (pcib6)
                                        +-> Bridge <---> x16 slot
                                            (pcib7)

If a device is plugged into the x16 slot that itself has a bridge, such
as many graphics cards, we currently fail to allocate a bus number for
its child bus (and so pcib_attach_child skips adding a child bus for
further enumeration) as, when the new child bridge attaches, it attempts
to allocate a bus number from its parent (pcib7) which in turn attempts
to grow its own bus range by calling bus_adjust_resource on its own
parent (pcib2) whose bus rman cannot accommodate the request and needs
to itself be extended by calling its own parent (pcib1). Note that
pcib3-7 do not face the same issue when they attach since pcib1 ends up
managing bus numbers 1-255 from the beginning and so never needs to grow
its own range.

Reviewed by:	jhb, mav
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D32011
2021-10-03 19:35:26 +01:00
Jessica Clarke
2404f03fca riscv: Add vt and kbdmux to GENERIC for video console support
No in-tree drivers are supported for RISC-V (given it supports UEFI we
could enable the EFI framebuffer, but U-Boot has very limited hardware
support and EDK2 remains a work in progress), but drm-kmod exists with
drivers for video cards that can be used with the HiFive Unmatched.

Reviewed by:	imp, jhb
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D32001
2021-10-03 19:34:53 +01:00
Jessica Clarke
8167c92f65 LinuxKPI: Add more #ifdef VM_MEMATTR_WRITE_COMBINING guards
One of the three uses is already guarded; this guards the remaining ones
to support architectures like riscv that do not provide write-combining,
and is needed to build drm-kmod on riscv.

Reviewed by:	hselasky, manu
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D31999
2021-10-03 19:34:40 +01:00
Jessica Clarke
7047568821 libgcc_s: Export 64-bit int to 128-bit float functions
The corresponding 32-bit int and 128-bit int functions were added in
790a6be5a1, as were all combinations of the float to int functions,
but these two were overlooked. __floatditf is needed to build curl for
riscv as there's a signed 64-bit int to 128-bit float conversion in
lib/progress.c's trspeed as of 7.77.0.

Reviewed by:	dim
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D31997
2021-10-03 19:34:25 +01:00
Jessica Clarke
1be2e16df1 riscv: Add a stub pmap_change_attr implementation
pmap_change_attr is required by drm-kmod so we need the function to
exist. Since the Svpbmt extension is on the horizon we will likely end
up with a real implementation of it, so this stub implementation does
all the necessary page table walking to validate the input, ensuring
that no new errors are returned once it's implemented fully (other than
due to out of memory conditions when demoting L2 entries) and providing
a skeleton for that future implementation.

Reviewed by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D31996
2021-10-03 19:33:47 +01:00
Kyle Evans
6b88668f0b vfs: remove dead fifoop VOP_KQFILTER implementations
These began to become obsolete in d6d64f0f2c (r137739) and the deal
was later sealed in 003e18aef4 (r137801) when vfs.fifofs.fops was
dropped and vop-bypass for pipes became mandatory.

PR:		225934
Suggested by:	markj
Reviewe by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D32270
2021-10-03 01:02:51 -05:00
Alexander Motin
6df1359e55 sleepqueue(9): Remove sbinuptime() from sleepq_timeout().
Callout c_time is always bigger or equal than the scheduled time.  It
is also smaller than sbinuptime() and can't change while the callback
is running.  So we reliably can use it instead of sbinuptime() here.
In case there was a race and the callout was rescheduled to the later
time, the callback will be called again.

According to profiles it saves ~5% of the timer interrupt time even
with fast TSC timecounter.

MFC after:	1 month
2021-10-02 21:08:41 -04:00
Robert Wing
9acea16404 ffs: retire unused fsckpid mount option
The fsckpid mount option was introduced in 927a12ae16 along with a
couple sysctl's to support SU+J with snapshots. However, those sysctl's
were never used and eventually removed in f2620e9ceb.

There are no in-tree consumers of this mount option.

Reviewed by:	mckusick, kib
Differential Revision:	https://reviews.freebsd.org/D32015
2021-10-02 15:11:40 -08:00
Rick Macklem
93a32050ab nfsd: Fix pNFS handling of Deallocate
For a pNFS server configuration, an NFSv4.2 Deallocate operation
is proxied to the DS(s).  The code that parsed the reply for the
proxy RPC is broken and did not process the pre-operation attributes.

This patch fixes this problem.

This bug would only affect pNFS servers built from recent main/FreeBSD14
sources.
2021-10-02 14:11:15 -07:00
Elyes HAOUAS
3fe686f25a src/bin/ps: Fix spelling error
Spell interruptible correctly.

Pull Request: https://github.com/freebsd/freebsd-src/pull/544
Signed-off-by: Elyes HAOUAS <ehaouas@noos.fr>
2021-10-02 10:39:37 -06:00
Elyes HAOUAS
8db446034e src/bin/pax: Fix spelling error
"whats" -> "what's"

Pull Request: https://github.com/freebsd/freebsd-src/pull/544
Signed-off-by: Elyes HAOUAS <ehaouas@noos.fr>
2021-10-02 10:39:34 -06:00
Elyes HAOUAS
28c3c137f6 src/bin/mkdir: Spell occur correctly.
Pull Request: https://github.com/freebsd/freebsd-src/pull/544
Signed-off-by: Elyes HAOUAS <ehaouas@noos.fr>
2021-10-02 10:39:31 -06:00
Elyes HAOUAS
48556dff3d src/bin/sh: Fix spelling errors
Pull Request: https://github.com/freebsd/freebsd-src/pull/544
Signed-off-by: Elyes HAOUAS <ehaouas@noos.fr>
2021-10-02 10:39:24 -06:00
Tom Hukins
202692b1a7 arm64: Spell BeagleBone correctly in config file.
Pull Request:		https://github.com/freebsd/freebsd-src/pull/545
2021-10-02 10:30:14 -06:00
Gordon Bergling
42dfad2ef1 ti(4): Fix a typo in an error message
- s/chanels/channels/

MFC after:	1 week
2021-10-02 10:51:29 +02:00
Gordon Bergling
957d9ba0c3 qlnxe: Fix typos in two error messages
- s/erorr/error/

MFC after:	1 week
2021-10-02 10:49:51 +02:00
Gordon Bergling
15c5f657a0 cam: Fix a typo in a comment
- s/perorming/performing/

MFC after:	3 days
2021-10-02 10:48:43 +02:00
Gordon Bergling
fafb1c574d vnic: Fix a typo in a comment
- s/setings/settings/

MFC after:	3 days
2021-10-02 10:47:21 +02:00
Gordon Bergling
9599d8141f smsc(4): Fix a typo in a comment
- s/setings/settings/

MFC after:	3 days
2021-10-02 10:45:58 +02:00
Gordon Bergling
efd8749fe5 evdev: Fix a typo in a commit
- s/prefered/preferred/

MFC after:	3 days
2021-10-02 10:43:41 +02:00
Gordon Bergling
9ebd651b58 netvsc: Fix a typo in a comment
- s/prefered/preferred/

MFC after:	3 days
2021-10-02 10:42:18 +02:00
Alexander Motin
1c119e173d sched_ule(4): Fix possible significance loss.
Before this change kern.sched.interact sysctl setting above 32 gave
all interactive threads identical priority of PRI_MIN_INTERACT due to
((PRI_MAX_INTERACT - PRI_MIN_INTERACT + 1) / sched_interact) turning
zero.  Setting the sysctl lower reduced the range of used priority
levels up to half, that is not great either.

Change of the operations order should fix the issue, always using full
range of priorities, while overflow is impossible there since both
score and priority values are small.  While there, make the variables
unsigned as they really are.

MFC after:	1 month
2021-10-02 00:09:45 -04:00
Philip Paeps
93d120dbc0 contrib/tzdata: import tzdata 2021c
Merge commit '9530c11c35707c2ed4a95aa90097b30f8a230563'

Changes: https://github.com/eggert/tz/blob/2021c/NEWS

MFC after:	3 days
2021-10-02 10:52:02 +08:00
Philip Paeps
9530c11c35 Import tzdata 2021c 2021-10-02 10:47:29 +08:00
Warner Losh
83581511d9 nvme: Use adaptive spinning when polling for completion or state change
We only use nvme_completion_poll in the initialization path. The
commands they queue and wait for finish quickly as they involve no I/O
to the drive's media. These command take about 20-200 microsecnds
each. Set the wait time to 1us and then increase it by 1.5 each
successive iteration (max 1ms). This reduces initialization time by
80ms in cpervica's tests.

Use this same technique waiting for RDY state transitions. This saves
another 20ms. In total we're down from ~330ms to ~2ms.

Tested by:		cperciva
Sponsored by:		Netflix
Reviewed by:		mav
Differential Review:	https://reviews.freebsd.org/D32259
2021-10-01 19:17:55 -06:00
Mateusz Guzik
ef7d2c1fc1 nfs: eliminate thread argument from nfsvno_namei
This is a step towards retiring struct componentname cn_thread

Reviewed by:	rmacklem
Differential Revision:	https://reviews.freebsd.org/D32267
2021-10-02 00:57:20 +00:00
Michael Tuexen
3ff3733991 sctp: don't keep being locked on a stream which is removed
Reported by:	syzbot+f5f551e8a3a0302a4914@syzkaller.appspotmail.com
MFC after:	1 week
2021-10-02 00:48:01 +02:00
Mateusz Guzik
c9536389d7 vfs: hoist cn_thread assert in namei
Making it condtional on whether ktrace happens to be enabled makes no
sense.
2021-10-01 21:56:29 +00:00
Li-Wen Hsu
fbece76095
Update Azure release bits
Imports the changes for building official images on Azure Marketplace,
which fulfill the requirements of Azure and FreeBSD cloud images like
disk layout and UEFI for Gen2 VM, along with some minor improvements like
configurations to speed up booting.

"CLOUDWARE" list will be updated after some more collaborations with re
completed.

Reviewed by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Technical assistance from:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D23804
2021-10-02 04:59:10 +08:00
Gleb Smirnoff
a37e4fd1ea Re-style dfcef87714 to keep the code and variables related to
listening sockets separated from code for generic sockets.

No objection:	markj
2021-10-01 13:38:24 -07:00
Justin Hibbits
63cb9308a7 Fix segment size in compressing core dumps
A core segment is bounded in size only by memory size.  On 64-bit
architectures this means a segment can be much larger than 4GB.
However, compress_chunk() takes only a u_int, clamping segment size to
4GB-1, resulting in a truncated core.  Everything else, including the
compressor internally, uses size_t, so use size_t at the boundary here.

This dates back to the original refactor back in 2015 (r279801 /
aa14e9b7).

MFC after:	1 week
Sponsored by:	Juniper Networks, Inc.
2021-10-01 14:16:33 -05:00
John Baldwin
0177102173 arm64, riscv: Fix TRAF_PC() to return the PC, not the return address.
Reviewed by:	mhorne
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D31969
2021-10-01 11:53:12 -07:00
Faraz Vahedi
c76da1f010 freebsd-update(8): Add -j flag to support jails
Make freebsd-update(8) support jails by adding the -j flag which takes
a jail jid or name as an argument. This takes advantage of the recently
added -j support to freebsd-version(8) in order to get the version of
the installed userland.

Reviewed by:	dteske, kevans
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25711
2021-10-01 13:51:03 -05:00
Faraz Vahedi
f54b18fc4d freebsd-version(1): Add -j flag to support jails
Make freebsd-version(1) support jails by adding the -j flag which takes
a jail jid or name as an argument. As with other options, -j
flags stack and display in the order requested.

Reviewed by:	bcr (manpages), kevans
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25705
2021-10-01 13:50:56 -05:00
Kyle Evans
2f4dbe279f kqueue: fix recent assertion
NOTE_ABSTIME may also have a zero timeout, which indicates that we
should still fire immediately as an absolute time in the past.  A test
has been added for this one as well.

Fixes:	9c999a259f ("kqueue: don't arbitrarily restrict long-past...")
Point hat:	kevans
Reported by:	syzbot+1c8d1154f560b3930042@syzkaller.appspotmail.com
2021-10-01 13:17:30 -05:00
Warner Losh
4aed5c3c9d time_t is pathological: use %j + cast to print it.
Sponsored by:		Netflix
2021-10-01 12:16:10 -06:00
Gleb Smirnoff
b984d153e0 Don't set GELI UMA zone as UMA_ZONE_NOFREE.
That fixes memory leak on last GELI provider destroyed, introduced
in 2dbc9a388e. This patch was originally developed late 2019 and
the flag was necessary to prevent zone drainage under memory pressure.
Today, with f09cbea31a the UMA is fixed not to drain into reserves.

Discussed with:	jtl, markj
Fixes:		2dbc9a388e
PR:		258787
2021-10-01 10:31:17 -07:00
Warner Losh
4b3da659bf nvme: Only reset once on attach.
The FreeBSD nvme driver has reset the nvme controller twice on attach to
address a theoretical issue assuring the hardware is in a known
state. However, exierence has shown the second reset is unnecessary and
increases the time to boot. Eliminate the second reset. Should there be
a situation when you need a second reset (for buggy or at least somewhat
out of the mainstream hardware), the hardware option NVME_2X_RESET will
restore the old behavior. Document this in nvme(4).

If there's any trouble at all with this, I'll add a sysctl tunable to
control it.

Sponsored by:		Netflix
Reviewed by:		cperciva, mav
Differential Revision:	https://reviews.freebsd.org/D32241
2021-10-01 11:09:34 -06:00
Warner Losh
e5e26e4a24 nvme: Remove pause while resetting
After some study of the code and the standard, I think we can just drop
the pause(), unconditionally.  If we're not initialized, then there's
nothing to wait for from a software perspective.  If we are initialized,
then there might be outstanding I/O. If so, then the qpair 'recovery
state' will transition to WAITING in nvme_ctrlr_disable_qpairs, which
will ignore any interrupts for items that complete before we complete
the reset by setting cc.en=0.

If we go on to fail the controller, we'll cancel the outstanding I/O
transactions.  If we reset the controller, the hardware throws away
pending transactions and we retry all the pending I/O transactions. Any
transactions that happend to complete before cc.en=0 will have the same
effect in the end (doing the same transaction twice is just inefficient,
it won't affect the state of the device any differently than having done
it once).

The standard imposes no wait times here, so it isn't needed from that
perspective.

Unanswered Question: Do we may need to disable interrupts while we
disable in legacy mode since those are level-sensitive.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D32248
2021-10-01 11:09:05 -06:00
Warner Losh
77054a897f nvme: Explain a workaround a little better
The don't touch the mmio of the drive after we do a EN 1->0 transition
is only for a tiny number of dirves that have this unforunate issue.

Sponsored by:		Netflix
2021-10-01 10:56:10 -06:00
Warner Losh
a245627a4e nvme_ctrlr_enable: Small style nits
Rewrite the nested if's using the preferred FreeBSD style for branches
of ifs that return. NFC. Minor tweaks to the comments to better fit new
code layout.

Sponsored by:		Netflix
Reviewed by:		mav, chuck (prior rev, but comments rolled in)
Differential Revision:	https://reviews.freebsd.org/D32245
2021-10-01 10:56:10 -06:00
Warner Losh
26259f6ab9 nvme: Use MS_2_TICKS rather than rolling our own
Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D32246
2021-10-01 10:56:10 -06:00
Warner Losh
d5fca1dc1d nvme_ctrlr_enable: Remove unnecessary 5ms delays
Remove the 5ms delays after writing the administrative queue
registers. These delays are from the very earliest days of the driver
(they are in the first commit) and were most likely vestiges of the
Chatham NVMe prototype card that was used to create this driver. Many of
the workarounds necessary for it aren't necessary for standards
compliant cards. The original driver had other areas marked for Chatham,
but these were not. They are unneeded. There's three lines of supporting
evidence.

First, the NVMe standards make no mention of a delay time after these
registers are written. Second, the Linux driver doesn't have them, even
as an option. Third, all my nvme cards work w/o them.

To be safe, add a write barrier between setting up the admin queue and
enabling the controller.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D32247
2021-10-01 10:56:10 -06:00
Eric van Gyzen
35e4527e88 sem_clockwait_np test: fix usage of ATF API
ATF_REQUIRE_ERRNO requires the given errno iff the given expression is
true.  These test cases used it incorrectly, potentially allowing
sem_clockwait_np to succeed when it was expected to fail.  Use separate
ATF calls to require failure and the expected errno.

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2021-10-01 06:39:34 -05:00
Eric van Gyzen
2334abfd01 sem test: move sem_clockwait_np tests into individual cases
Move these tests into individual test cases for all the usual reasons.
No functional change intended.

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2021-10-01 06:39:34 -05:00
Eric van Gyzen
31466594cd sem_clockwait_np test: relax time constraint on VMs
In a guest on a busy hypervisor, the time remaining after an
interrupted sleep could be much lower than other environments.
Relax the lower bound on VMs.

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2021-10-01 06:39:30 -05:00
Randall Stewart
a36230f75e tcp: Make dsack stats available in netstat and also make sure its aware of TLP's.
DSACK accounting has been for quite some time under a NETFLIX_STATS ifdef. Statistics
on DSACKs however are very useful in figuring out how much bad retransmissions you
are doing. This is further complicated, however, by stacks that do TLP. A TLP
when discovering a lost ack in the reverse path will cause the generation
of a DSACK. For this situation we introduce a new dsack-tlp-bytes as well
as the more traditional dsack-bytes and dsack-packets. These will now
all display in netstat -p tcp -s. This also updates all stacks that
are currently built to keep track of these stats.

Reviewed by: tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D32158
2021-10-01 10:36:27 -04:00
Hans Petter Selasky
5b40c0aa73 mixer(3): Fix buildworld after 38c857d956 .
s/default_unit/dunit/g

Differential Revision:	https://reviews.freebsd.org/D32254
Sponsored by:	NVIDIA Networking
2021-10-01 16:34:10 +02:00