Commit Graph

266749 Commits

Author SHA1 Message Date
Robert Wing
dc35484536 dumpfs(8): add option to only print superblock information
Add an option to dumpfs, `-s`, that only prints the super block information.

Reviewed by:	chs, imp
Differential Revision:	https://reviews.freebsd.org/D30881
2021-07-02 14:18:17 -08:00
Warner Losh
aa0ab681ae nvme: coherently read status of completion records
Coherently read the phase bit of the status completion record. We loop
over the completion record array, looking for all the transactions in
the same phase that have been completed. In doing that, we have to be
careful to read the status field first, and if it indicates a complete
record, we need to read and process that record. Otherwise, the host
might be overtaken by device when reading this completion record,
leading to a mistaken belief that the record is in phase. This leads to
the code using old values and looking at an already completed entry, which
has no current tracker.

To work around this problem, we read the status and make sure it is in
phase, we then re-read the entire completion record guaranteeing it's
complete, valid, and consistent . In addition we resync the dmatag to
reflect changes since the prior loop for the bouncing dma case.

Reviewed by:		jrtc27@, chuck@
Found by:		jrtc27 (this fix is based in part on her D30995 fix)
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D31002
2021-07-02 16:05:19 -06:00
Warner Losh
fea3cf1d6d nvme: Fix alignment on nvme structures
Remove __packed from nvme_command, nvme_completion and
nvme_dsm_trim. Add super-alignment to nvme_completion since it's always
at least that aligned in hardware (and in our existing uses of it
embedded in structures). It generates better code in
nvme_qpair_process_completions on riscv64 because otherwise the ABI
assumes a 4-byte alignment, and the same on all other platforms.

Reviewed by:		jrtc27@, mav@, chuck@
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D31001
2021-07-02 16:05:19 -06:00
Warner Losh
80a75155e1 nvme: style nit
Put the { on the same line as the struct nvme_foo when we define these
structures. It's FreeBSD standard and these were inconsistent.

Sponsored by:		Netflix
2021-07-02 16:05:19 -06:00
Justin Gottula
f24c7c359e Use substantially more robust program exit status logic in zvol_id
Currently, there are several places in zvol_id where the program logic
returns particular errno values, or even particular ioctl return values,
as the program exit status, rather than a straightforward system of
explicit zero on success and explicit nonzero value(s) on failure.

This is problematic for multiple reasons. One particularly interesting
problem that can arise, is that if any of these values happens to have
all 8 least significant bits unset (i.e., it is a positive or negative
multiple of 256), then although the C program sees a nonzero int value
(presumed to be a failure exit status), the actual exit status as seen
by the system is only the bottom 8 bits of that integer: zero.

This can happen in practice, and I have encountered it myself. In a
particularly weird situation, the zvol_open code in the zfs kernel
module was behaving in such a manner that it caused the open() syscall
to fail and for errno to be set to a kernel-private value (ERESTARTSYS,
which happens to be defined as 512). It turns out that 512 is evenly
divisible by 256; or, in other words, its least significant 8 bits are
all-zero. So even though zvol_id believed it was returning a nonzero
(failure) exit status of 512, the system modulo'd that value by 256,
resulting in the actual exit status visible by other programs being 0!
This actually-zero (non-failure) exit status caused problems: udev
believed that the program was operating successfully, when in fact it
was attempting to indicate failure via a nonzero exit status integer.
Combined with another problem, this led to the creation of nonsense
symlinks for zvol dev nodes by udev.

Let's get rid of all this problematic logic, and simply return
EXIT_SUCCESS (0) is everything went fine, and EXIT_FAILURE (1) if
anything went wrong.

Additionally, let's clarify some of the variable names (error is similar
to errno, etc) and clean up the overall program flow a bit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Justin Gottula <justin@jgottula.com>
Closes #12302
2021-07-02 13:10:36 -07:00
Justin Gottula
b19e2bdfb5 Print zvol_id error messages to stderr rather than stdout
The zvol_id program is invoked by udev, via a PROGRAM key in the
60-zvol.rules.in rule file, to determine the "pretty" /dev/zvol/*
symlink paths paths that should be generated for each opaquely named
/dev/zd* dev node.

The udev rule uses the PROGRAM key, followed by a SYMLINK+= assignment
containing the %c substitution, to collect the program's stdout and then
"paste" it directly into the name of the symlink(s) to be created.

Unfortunately, as currently written, zvol_id outputs both its intended
output (a single string representing the symlink path that should be
created to refer to the name of the dataset whose /dev/zd* path is
given) AND its error messages (if any) to stdout.

When processing PROGRAM keys (and others, such as IMPORT{program}), udev
uses only the data written to stdout for functional purposes. Any data
written to stderr is used solely for the purposes of logging (if udev's
log_level is set to debug).

The unintended consequence of this is as follows: if zvol_id encounters
an error condition; and then udev fails to halt processing of the
current rule (either because zvol_id didn't return a nonzero exit
status, or because the PROGRAM key in the rule wasn't written properly
to result in a "non-match" condition that would stop the current rule on
a nonzero exit); then udev will create a space-delimited list of symlink
names derived directly from the words of the error message string!

I've observed this exact behavior on my own system, in a situation where
the open() syscall on /dev/zd* dev nodes was failing sporadically (for
reasons that aren't especially relevant here). Because the open() call
failed, zvol_id printed "Unable to open device file: /dev/zd736\n" to
stdout and then exited.

The udev rule finished with SYMLINK+="zvol/%c %c". Assuming a volume
name like pool/foo/bar, this would ordinarily expand to
   SYMLINK+="zvol/pool/foo/bar pool/foo/bar"
and would cause symlinks to be created like this:
   /dev/zvol/pool/foo/bar -> /dev/zd736
   /dev/pool/foo/bar      -> /dev/zd736

But because of the combination of error messages being printed to
stdout, and the udev syntax freely accepting a space-delimited sequence
of names in this context, the error message string
   "Unable to open device file: /dev/zd736\n"
in reality expanded to
   SYMLINK+="zvol/Unable to open device file: /dev/zd736"
which caused the following symlinks to actually be created:
   /dev/zvol/Unable -> /dev/zd736
   /dev/to          -> /dev/zd736
   /dev/open        -> /dev/zd736
   /dev/device      -> /dev/zd736
   /dev/file:       -> /dev/zd736
   /dev//dev/zd736  -> /dev/zd736

(And, because multiple zvols had open() syscall errors, multiple zvols
attempted to claim several of those symlink names, resulting in numerous
udev errors and timeouts and general chaos.)

This commit rectifies all this silliness by simply printing error
messages to stderr, as Dennis Ritchie originally intended.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Justin Gottula <justin@jgottula.com>
Closes #12302
2021-07-02 13:10:06 -07:00
Justin Gottula
f1aef8d4df Udev rules: use match (==) rather than assign (=) for PROGRAM
Assignment syntax (=) can be used for the PROGRAM key. But the PROGRAM
key is really a match key, not an assign key. The internal logic used by
udev to decide whether a PROGRAM key "matched" or not (which determines
whether the remainder of the rule is evaluated) depends on whether the
operator was OP_MATCH (==) or OP_NOMATCH (!=). [1]

The man page claims that '"=", ":=", and "+=" have the same effect as
"=="' for PROGRAM keys. And, after a brief perusal, the udev source code
does seem to confirm that operators other than OP_MATCH (==) or
OP_NOMATCH (!=) are implicitly converted to OP_MATCH (==). [2] But it's
not entirely clear that this is definitely the case: anecdotal testing
seems to indicate that when OP_ASSIGN (=) is used, the program's exit
status is disregarded and the remainder of the rule is processed
regardless of whether it was, in fact, a successful exit.

The bottom line here is that, if zvol_id hits some snag and returns a
nonzero exit status, then we almost certainly do NOT want to continue on
with the rule and use whatever the stdout contents may have been to
mindlessly create /dev/zvol/* symlinks. Therefore, let's be extra-sure
and use the match (==) operator explicitly, to eliminate any possibility
that udev might do the wrong thing, and ensure that a nonzero exit
status will definitely short-circuit the rest of the rule, bypassing the
SYMLINK+= assignments.

[1]
udev,
 file src/udev/udev-rules.c,
  func udev_rule_apply_token_to_event,
   switch case TK_M_PROGRAM if r != 0 (nonzero exit status):
      return token->op == OP_NOMATCH;
   switch case TK_M_PROGRAM if r == 0 (zero exit status):
      return token->op == OP_MATCH;
   func retval 0 => key is considered to have matched
   func retval 1 => key is considered to have NOT matched

[2]
udev,
 file src/udev/udev-rules.c,
  func parse_token,
   at func start:
      bool is_match = IN_SET(op, OP_MATCH, OP_NOMATCH);
   in else-if case streq(key, "PROGRAM"):
      if (!is_match) op = OP_MATCH;

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Justin Gottula <justin@jgottula.com>
Closes #12302
2021-07-02 13:09:39 -07:00
Justin Gottula
17c794e7b0 Udev rules: replace deprecated $tempnode with $devnode
The $tempnode substitution is so old that it's not even mentioned in the
man page anymore. It is still technically supported by udev, but with
plenty of "deprecated" comments surrounding it.

The preferred modern equivalent of $tempnode is $devnode (or
alternatively, %N).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Justin Gottula <justin@jgottula.com>
Closes #12302
2021-07-02 13:09:09 -07:00
Justin Gottula
a5398f8782 Udev rules: use non-ancient comma syntax
This file is old as dirt. It's entirely possible that commas were
optional in udev back at that time. But they're definitely supposed to
be there nowadays.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Justin Gottula <justin@jgottula.com>
Closes #12302
2021-07-02 13:08:42 -07:00
Ceri Davies
202edfe357 committers-doc.dot: move myself out of alumni 2021-07-02 19:43:02 +01:00
Ceri Davies
db6eac6821 nvmem(9): install the manpage
This is being installed on all architectures in line with the OF_*
pages.

Discussed with:	fernape, manu
2021-07-02 19:43:02 +01:00
Kristof Provost
0e9f1892ec libpfctl: memory leak fix
We must remember to free the nvlist we create from the kernel's response
to DIOCGETSTATESNV, on every iteration.

Reviewed by:	donner
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30957
2021-07-02 14:48:25 +02:00
Kristof Provost
a19ff8ce9b pf: getstates: avoid taking the hashrow lock if the row is empty
Reviewed by:	mjg
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30946
2021-07-02 14:47:54 +02:00
Kristof Provost
34285eefdd pf: Reduce the data returned in DIOCGETSTATESNV
This call is particularly slow due to the large amount of data it
returns. Remove all fields pfctl does not use. There is no functional
impact to pfctl, but it somewhat speeds up the call.

It might affect other (i.e. non-FreeBSD) code that uses the new
interface, but this call is very new, so there's unlikely to be any. No
releases contained the previous version, so we choose to live with the
ABI modification.

Reviewed by:	donner
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30944
2021-07-02 14:47:23 +02:00
Kristof Provost
d8d43b2de1 pf tests: Stress state retrieval
Create and retrieve 20.000 states. There have been issues with nvlists
causing very slow state retrieval. We don't impose a specific limit on
the time required to retrieve the states, but do log it. In excessive
cases the Kyua timeout will fail this test.

Reviewed by:	donner
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30943
2021-07-02 14:46:32 +02:00
Alex Richardson
c951566915 Remove lib/kyua from the build
I forgot to include this line in 2eb9ad4274.

Reported by:    Jenkins CI
MFC after:      1 week
Fixes:          2eb9ad427475190ei ("Simplify and speed up the kyua build")
2021-07-02 10:18:00 +01:00
Alex Richardson
89da04fcaa Revert "Remove lib/kyua from the build"
Accidentally removed it from the wrong file...

This reverts commit 8ec4ba8a76.
2021-07-02 10:17:03 +01:00
Alex Richardson
8ec4ba8a76 Remove lib/kyua from the build
I forgot to include this line in 2eb9ad4274.

Reported by:	Jenkins CI
MFC after:	1 week
Fixes:		2eb9ad427475190ei ("Simplify and speed up the kyua build")
2021-07-02 09:56:05 +01:00
Mateusz Guzik
48d5b86364 pf: make DIOCGETSTATESNV iterations killable
Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-02 08:30:22 +00:00
Mateusz Guzik
904a08f342 ktls: switch bare zone_mbuf use to m_free_raw
Reviewed by:	gallatin
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30955
2021-07-02 08:30:22 +00:00
Mateusz Guzik
bad5f0b6c2 iflib: switch bare zone_mbuf use to m_free_raw
Reviewed by:	kbowling
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30961
2021-07-02 08:30:22 +00:00
Mateusz Guzik
05462babd4 mbuf: add m_free_raw to be used instead of directly calling uma_zfree
The intent is to remove all direct zone_mbuf consumers so that ctor/dtor
from that zone can be reimplemented as wrappers around uma, avoiding an
indirect function call.

Reviewed by:	kbowling
Discussed with:	gallatin
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30959
2021-07-02 08:30:22 +00:00
Mateusz Guzik
fb32c8dbeb iflib: retire MB_DTOR_SKIP
The flag was added in 2016 but remains unused.

Reviewed by:	kbowling
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30958
2021-07-02 08:30:22 +00:00
Alex Richardson
2eb9ad4274 Simplify and speed up the kyua build
Instead of having multiple kyua libraries, just include the files as part
of usr.bin/kyua. Previously, we would build each kyua source up to four
times: once as a .o file and once as a .pieo. Additionally, the kyua
libraries might be built again for compat32. As all the kyua libraries
amount to 102 C++ sources the build time is significant (especially when
using an assertions enabled compiler). This change ensures that we build
306 fewer .cpp source files as part of buildworld.

Reviewed By:	brooks
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D30967
2021-07-02 09:21:05 +01:00
Edward Tomasz Napierala
acb1f1269c proccontrol(1): implement 'nonewprivs'
This adds the 'nonewprivs' mode, corresponding to newly added
procctl(2) commands PROC_NO_NEW_PRIVS_CTL and PROC_NO_NEW_PRIVS_STATUS.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30940
2021-07-02 08:50:36 +01:00
Ganbold Tsagaankhuu
3416513a41 dtb: rockchip: Add NanoPI-R4S and RockPI E to the build 2021-07-02 15:20:25 +08:00
Stefan Eßer
c5b8d7b7c1 netstat: Fix typo
Correct spelling of "received packers" to "received packets".

PR:		256926
Reported by:	ghuckriede@blackberry.com
MFC after:	3 days
2021-07-02 08:42:34 +02:00
Peter Holm
fe5d22f8f4 stress2: Added a test scenario from Bug 227041 2021-07-02 07:24:38 +02:00
Peter Holm
e02046f747 stress2: Update the list of test not to run 2021-07-02 07:23:05 +02:00
Peter Holm
5bd70f0500 stress2: Improve cleanup code 2021-07-02 07:22:18 +02:00
Ceri Davies
2ce5851295 rc.conf.5: -Tlint fixes. 2021-07-01 22:54:26 +01:00
Ceri Davies
40944510db rc.conf.5: add .Xr to firewall(7), growfs(7), and tuning(7) 2021-07-01 22:54:26 +01:00
Ceri Davies
cd4346a0e4 cd(9): correct minor typo in manpage. 2021-07-01 22:54:25 +01:00
Mateusz Guzik
858937bea4 pfctl: cache getprotobynumber results
As for example pfctl -ss keeps calling it, it saves a lot of overhead
from elided parsing of /etc/nsswitch.conf and /etc/protocols.

Sample result when running a pre-nvlist binary with nfs root and dumping
7 mln states:
before: 24.817u 62.993s 1:28.52 99.1%
after:	8.064u 1.117s 0:18.87 48.5%

Idea by Jim Thompson

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-01 21:31:45 +00:00
Alexander Motin
fa3d57c256 mrsas(4): Report more correct maximum I/O size.
Subtract one SGE for the case of misaligned address.  Also take into
account maximum number of sectors reported by firmware, that gives
nicer 256KB limit instead of 276KB calculated from the SGE limit.

While there, remove number of I/O size checks, duplicating what is
already checked by CAM and busdma(9).

MFC after:	1 month
Sponsored by:	iXsystems, Inc.
2021-07-01 15:37:01 -04:00
Kristof Provost
dd82fd3543 pf tests: ftp-proxy test
Basic test case for ftp-proxy

PR:		256917
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-01 21:34:40 +02:00
Kristof Provost
8923ea6c86 ftp-proxy: Revert incorrect migration to libpfctl
libpfctl supports creating rules, but not (yet) adding addresses to a
pool. Adding addresses certainly does not work through adding a rule.

PR:		256917
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-01 21:34:40 +02:00
Kristof Provost
8f76eebce4 dummynet: fix sysctls
The sysctl nodes which use V_dn_cfg must be marked as CTLFLAG_VNET so
that we use the correct per-vnet offset

PR:		256819
Reviewed by:	donner
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30974
2021-07-01 21:34:08 +02:00
Kornel Duleba
9428765626 ofw_pci: fix probing for non-DT cases
phandle_t is a uint32_t type, <= 0 comparison doesn't work with it as intended.
This caused the ofw_pci code to attach to PCI bus on ACPI based systems.
Since 3eae4e106a ("Fix error value returned by ofw_bus_gen_get_node().")
ofw subsystem can only return -1 for invalid nodes. Use that.

MFC after: 4 weeks
Reviewed by: mw
Differential revision: https://reviews.freebsd.org/D30953
2021-07-01 20:35:23 +02:00
Kornel Duleba
ddb928096b dts: fsl-ls1028a: Correct ECAM PCIE window ranges
Currently all PCIE windows point to bus address 0x0, which does not match
the values obtained from hardware during EA.
Replace those values with CPU addresses, since in reality we
have a 1:1 mapping between the two.

This patch is queued for Linux v5.14 in linux-next tree:
6bee93d93111 "arm64: dts: fsl-ls1028a: Correct ECAM PCIE window ranges"
2021-07-01 20:23:40 +02:00
Ferhat Gecdogan
8df71ea1aa tegra_pcie: use switch instead of if in tegra_pcib_pex_ctrl
Simplify obtaining per-port data in tegra_pcib_pex_ctrl() routine.

Reviewed by:    imp, mw
Pull Request:   https://github.com/freebsd/freebsd-src/pull/481
2021-07-01 20:09:46 +02:00
Emmanuel Vadot
2ca21223c5 dts: Bump the freebsd branding version to 5.13
Sponsored by:	Diablotin Systems
2021-07-01 18:48:56 +02:00
Emmanuel Vadot
993e8236c3 arm64: allwinner: Add r_intc driver
The r intc interrupt controller seems to do a lot of things :
- It can handle the NMI interrupt
- It have local interrupts for some device that also can be muxed with GIC
- It can serve as an forwarder for the GIC

It's mostly used for deepsleep/wakeup if I understood correctly and we do not
support this on arm64.

For now just forward everything to the GIC so interrupts works again for device
which now have this interrupts controller set since dts v5.12

Sponsored by:	Diablotin Systems
2021-07-01 18:46:38 +02:00
Ceri Davies
1f7d11e636 build.7: remove documentation of "make update"
update target was removed in e290182bcf
2021-07-01 16:52:32 +01:00
Emmanuel Vadot
2eb4d8dc72 Import device-tree files from Linux 5.13
Sponsored by:	Diablotin Systems
2021-07-01 17:51:01 +02:00
Emmanuel Vadot
71ca10f8bb Import device-tree files from Linux 5.13 2021-07-01 17:50:17 +02:00
Emmanuel Vadot
82ea1a07b4 Import device-tree files from Linux 5.12
Sponsored by:	Diablotin Systems
2021-07-01 17:41:57 +02:00
Emmanuel Vadot
c7a3c7298f Import device-tree files from Linux 5.12 2021-07-01 17:39:42 +02:00
Warner Losh
1426907f4d hz.9: update stathz for current usage
Update the stathz description to reflect reality. profhz is the only
thing we should deprecate. Add some implementation notes that describe
the optimizations made to date.

Discusssed with:	emaste
Reviewed by:		kib (prior), jhb (prior), gbe
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D30815
2021-07-01 09:33:03 -06:00
Alexander Motin
b192a2c0a1
Remove avl_size field from struct avl_tree
This field is used only by illumos mdb.  On other platforms it only
increases the struct size from 32 to 40 bytes.  For struct vdev_queue
including 13 instances of avl_tree_t size means active cache lines.

Keep the padding in user-space for now to not break the ABI.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12290
2021-07-01 09:32:31 -06:00