254491 Commits

Author SHA1 Message Date
Ruslan Bukin
4cc8701067 Introduce IOMMU support for arm64 platform.
This adds an arm64 iommu interface and a driver for Arm System Memory
Management Unit version 3.2 (ARM SMMU v3.2) specified in ARM IHI 0070C
document.

Hardware overview is provided in the header of smmu.c file.

The support is disabled by default. To enable add 'options IOMMU' to your
kernel configuration file.

The support was developed on Arm Neoverse N1 System Development Platform
(ARM N1SDP), kindly provided by ARM Ltd.

Currently, PCI-based devices and ACPI platforms are supported only.
The support was tested on IOMMU-enabled Marvell SATA controller,
Realtek Ethernet controller and a TI xHCI USB controller with a low to
medium load only.

Many thanks to Konstantin Belousov for help forming the generic IOMMU
framework that is vital for this project; to Andrew Turner for adding
IOMMU support to MSI interrupt code; to Mark Johnston for help with SMMU
page management; to John Baldwin for explaining various IOMMU bits.

Reviewed by:	mmel
Relnotes:	yes
Sponsored by:	DARPA / AFRL
Sponsored by:	Innovate UK (Digital Security by Design programme)
Differential Revision:	https://reviews.freebsd.org/D24618
2020-11-16 21:55:52 +00:00
Brooks Davis
02e65672b4 Add a guard for broken SUBDIR.${MK_FOO} use
Check for the variable SUBDIR. and error as it usually means someone
forgot to include src.opts.mk.

This guard from CheriBSD found the bugs in r367655 and r367728.

Reviewed by:	bdrewery, arichardson
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D27211
2020-11-16 19:15:11 +00:00
Mitchell Horne
a521f21164 bsdiff: fix off-by-one error
The program reads oldsize bytes from oldfile, and proceeds to initialize
a suffix array of oldsize elements using divsufsort(). As per the
function's API [1], array indices 0 through n-1 are initialized.

Later, search() is called, but with index bounds [0, n]. Depending on
the contents of the malloc'd buffer, accessing this uninitialized index
at the end of can result in a segmentation fault. Fix this by passing
oldsize-1 to search(), limiting the search bounds to [0, n-1].

This bug is a result of r303285, which introduced divsufsort() as an
alternate suffix sorting function to the existing qsufsort(). It seems
that qsufsort() did initialize the final empty element, meaning it could
be safely accessed. This difference in the implementations was missed at
the time.

[1] https://github.com/y-256/libdivsufsort

Discussed with:	cperciva
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26911
2020-11-16 18:41:49 +00:00
Mateusz Guzik
89deca0a33 malloc: make malloc_large closer to standalone
This moves entire large alloc handling out of all consumers, apart from
deciding to go there.

This is a step towards creating a fast path.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D27198
2020-11-16 17:56:58 +00:00
Brooks Davis
73734d6eb1 Add missing includes of src.opts.mk
Without this "SUBDIR.${MK_TESTS}=tests" would always expand to
"SUBDIR.=tests" resulting in the tests not being built.

Sponsored by:	DARPA
2020-11-16 17:20:35 +00:00
Ruslan Bukin
dea8594f19 Fix a bug in assertion: entry flags also includes IOMMU_MAP_ENTRY_UNMAPPED.
The entry->flags field is initialized in iommu_gas_init_domain().

Reviewed by:	kib
Sponsored by:	Innovate DSbD
Differential Revision:	https://reviews.freebsd.org/D27235
2020-11-16 15:37:09 +00:00
Ruslan Bukin
f593116991 Add device_t member to struct iommu.
This is needed on arm64 for the interface between iommu framework
and iommu controller drivers.

Reviewed by:	kib
Sponsored by:	Innovate DSbD
Differential Revision:	https://reviews.freebsd.org/D27229
2020-11-16 15:29:52 +00:00
Emmanuel Vadot
b212bf30ee imx7gpc: Remove unused functions 2020-11-16 11:54:38 +00:00
Emmanuel Vadot
f4e5e045e2 dwmmc: dwmmc_switch_vccq is only used in MMCCAM kernel
Silence the build for non MMCCAM kernel
2020-11-16 11:53:36 +00:00
Alex Richardson
d34f599cd2 Revert "When building on Ubuntu bootstrap bmake with bash as the default shell"
This reverts r365950 since the latest bmake update includes fixes for the test
failures that prompted the change.
2020-11-16 11:38:51 +00:00
Hans Petter Selasky
7eefcb5eea Make mlx5_cmd_exec_cb() a safe API in mlx5core.
APIs that have deferred callbacks should have some kind of cleanup
function that callers can use to fence the callbacks. Otherwise things
like module unloading can lead to dangling function pointers, or worse.

The IB MR code is the only place that calls this function and had a
really poor attempt at creating this fence. Provide a good version in
the core code as future patches will add more places that need this
fence.

Linux commit:
e355477ed9e4f401e3931043df97325d38552d54

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2020-11-16 10:15:03 +00:00
Hans Petter Selasky
f34f0a65b2 Report EQE data upon CQ completion in mlx5core.
Report EQE data upon CQ completion to let upper layers use this data.

Linux commit:
4e0e2ea1886afe8c001971ff767f6670312a9b04

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2020-11-16 10:10:53 +00:00
Hans Petter Selasky
ffdb195f31 Enhance the mlx5_core_create_cq() function in mlx5core.
Enhance mlx5_core_create_cq() to get the command out buffer from the
callers to let them use the output.

Linux commit:
38164b771947be9baf06e78ffdfb650f8f3e908e

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2020-11-16 10:06:10 +00:00
Hans Petter Selasky
4a64b690f1 Use mlx5core to create/destroy all Dynamically Connected Targets, DCTs.
To prevent a hardware memory leak when a DEVX DCT object is destroyed
without calling drain DCT before, (e.g. under cleanup flow), need to
manage its creation and destruction via mlx5 core.

Linux commit:
c5ae1954c47d3fd8815bd5a592aba18702c93f33

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2020-11-16 10:03:18 +00:00
Hans Petter Selasky
8114aeea44 Fix error handling order in create_kernel_qp in mlx5ib.
Make sure order of cleanup is exactly the opposite of initialization.

Linux commit:
f4044dac63e952ac1137b6df02b233d37696e2f5

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2020-11-16 10:00:21 +00:00
Mateusz Guzik
19d3e47dca select: call seltdfini on process and thread exit
Since thread_zone is marked NOFREE the thread_fini callback is never
executed, meaning memory allocated by seltdinit is never released.

Adding the call to thread_dtor is not sufficient as exiting processes
cache the main thread.
2020-11-16 03:12:21 +00:00
Mateusz Guzik
31b2ac4b5a select: replace reference counting with memory barriers in selfd
Refcounting was added to combat a race between selfdfree and doselwakup,
but it adds avoidable overhead.

selfdfree detects it can free the object by ->sf_si == NULL, thus we can
ensure that the condition only holds after all accesses are completed.
2020-11-16 03:09:18 +00:00
Dimitry Andric
2406f943a8 Ensure make delete-old does not unlink the llvm-cxxfilt and its manpage,
after r367304 and r367324, when WITH_LLVM_CXXFILT is enabled.

Noticed by:	"Herbert J. Skuhra" <herbert@gojira.at>
MFC after:	3 days
X-MFC-With:	r367304
2020-11-15 22:49:28 +00:00
Scott Long
8e1031086d Revert the whole getlocalbase() set of changes while a different design is
hashed out.
2020-11-15 20:24:59 +00:00
Toomas Soome
fc7cf7241f zfsboot: add prototype for main()
Some compilers are complaining about missing prototype.

PR:		251150
Reported by:	markiyan.kushnir@gmail.com
2020-11-15 14:04:27 +00:00
Peter Grehan
cd5b6d16ca Fix regression in AHCI controller settings.
When the AHCI code was reworked to use FreeBSD struct
definitions, the valid element was mis-transcribed resulting
in the UMDA capability being hidden. This prevented Illumos
from using AHCI disk/cdrom drives.

Fix by using definitions that match the code pre-rework.

PR:	250924
Submitted by:	Rolf Stalder
Reported by:	Rolf Stalder
MFC after:	3 days
2020-11-15 12:59:24 +00:00
Scott Long
1b249101df Fix the previous revision, it suffered from an incomplete change to the
getlocalbase API.  Also don't erroneously subtract the lenth from the
buffer a second time.
2020-11-15 07:50:29 +00:00
Scott Long
85a5fe290b Because getlocalbase() returns -1 on error, it needs to use a signed type
internally.  Do that, and make sure that conversations between signed and
unsigned don't overflow
2020-11-15 07:48:52 +00:00
Mateusz Guzik
b77594bbbf sched: fix an incorrect comparison in sched_lend_user_prio_cond
Compare with sched_lend_user_prio.
2020-11-15 01:54:44 +00:00
Mateusz Guzik
6b45dbe56c cred: annotate credbatch_process argument as unused
Fixes libprocstat compilation as zfs defines _KERNEL.
2020-11-14 19:56:11 +00:00
Mateusz Guzik
5596f836e7 zfs: disable periodic arc updates
They are only there to provide less innacurate statistics for debuggers.
However, this is quite heavy-weight and instead it would be better to
teach debuggers how to obtain the necessary information.
2020-11-14 19:23:07 +00:00
Mateusz Guzik
f34a2f56c3 thread: batch credential freeing 2020-11-14 19:22:02 +00:00
Mateusz Guzik
fb8ab68084 thread: batch resource limit free calls 2020-11-14 19:21:46 +00:00
Mateusz Guzik
5ef7b7a0f3 thread: rework tid batch to use helpers 2020-11-14 19:20:58 +00:00
Mateusz Guzik
25600ad516 cred: reorder cr_audit to be closer to the lock
This makes cr_uid avoid sharing.
2020-11-14 19:20:37 +00:00
Mateusz Guzik
d1ca25be49 thread: pad tid lock
On a kernel with other changes this bumps 104-way thread creation/destruction
from 0.96 mln ops/s to 1.1 mln ops/s.
2020-11-14 19:19:27 +00:00
Baptiste Daroussin
09ef995baf Change the default locale to C.UTF-8
The C.UTF-8 locales is the same as the actual C locale except it does support
the unicode character set. But the collation etc are still the same as the C
locale one.

Reviewed by:	many
Approved by:	many
Differential Revision:	https://reviews.freebsd.org/D26973
2020-11-14 19:16:39 +00:00
Scott Long
bcf9ae2751 Fix a problem with r367686 related to the use of ssize_t. Not sure how this
escaped prior testing, but it should be better now.

Reported by:	lots
2020-11-14 19:04:36 +00:00
Kyle Evans
cf82304d7d Makefile: re-wordsmith the blurb about xtoolchain ports
The new version only includes a specific version once, and uses the one
that's currently advised by tinderbox: -gcc6.

It also advises just installing the pkg, but mentions in a side-note at the
end where to find the source in the ports tree.

Reviewed by:	jrtc27
Suggested by:	jhb (use default from tinderbox)
Differential Revision:	https://reviews.freebsd.org/D26820
2020-11-14 18:06:35 +00:00
Scott Long
7ca0d5403e Replace hardcoded references to _PATH_LOCALBASE with calls to getlocalbase.3
Reviewed by:	imp, se
2020-11-14 18:01:14 +00:00
Scott Long
98b76d2227 Add the library function getlocalbase and its manual page. This helps to
unify the retrieval of the various ways that the local software base directory,
typically "/usr/local", is expressed in the system.

Reviewed by:	se
Differential Revision:	https://reviews.freebsd.org/D27022
2020-11-14 17:57:50 +00:00
Jonathan T. Looney
36c52a52ee Add a regression test for the port-selection behavior fixed in r367680.
Reviewed by:	markj, olivier, tuexen
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D27173
2020-11-14 15:44:28 +00:00
Jonathan T. Looney
440598dd9e Fix implicit automatic local port selection for IPv6 during connect calls.
When a user creates a TCP socket and tries to connect to the socket without
explicitly binding the socket to a local address, the connect call
implicitly chooses an appropriate local port. When evaluating candidate
local ports, the algorithm checks for conflicts with existing ports by
doing a lookup in the connection hash table.

In this circumstance, both the IPv4 and IPv6 code look for exact matches
in the hash table. However, the IPv4 code goes a step further and checks
whether the proposed 4-tuple will match wildcard (e.g. TCP "listen")
entries. The IPv6 code has no such check.

The missing wildcard check can cause problems when connecting to a local
server. It is possible that the algorithm will choose the same value for
the local port as the foreign port uses. This results in a connection with
identical source and destination addresses and ports. Changing the IPv6
code to align with the IPv4 code's behavior fixes this problem.

Reviewed by:	tuexen
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D27164
2020-11-14 14:50:34 +00:00
Mateusz Piotrowski
32f4592764 Document the PAGER environment variable
Sometimes users want to use freebsd-update(8) in a non-interactive way and
what they often miss is that they have to set PAGER to cat(1) in order to
avoid interactive prompts from less(1).

MFC after:	4 weeks
2020-11-14 13:07:41 +00:00
Toomas Soome
3e9f0f1d29 loader: cstyle cleanup of console.c
cstyle cleanup only, no functional changes intended.
2020-11-14 10:56:40 +00:00
Vladimir Kondratyev
146e176df7 LinuxKPI: Exclude linux/acpi.h content on non-ACPI archs.
LinuxKPI ACPI support is based on FreeBSD import of ACPICA which can be
compiled only on aarch64, amd64 and i386. Ifdef-out broken parts on our
side to avoid patching of vendor code.

This fixes drm-devel-kmod build on powerpc64(le).

Reported by:	pkubaj
2020-11-14 10:34:18 +00:00
Konstantin Belousov
8a1509e442 Handle LoR in flush_pagedep_deps().
When operating in SU or SU+J mode, ffs_syncvnode() might need to
instantiate other vnode by inode number while owning syncing vnode
lock.  Typically this other vnode is the parent of our vnode, but due
to renames occuring right before fsync (or during fsync when we drop
the syncing vnode lock, see below) it might be no longer parent.

More, the called function flush_pagedep_deps() needs to lock other
vnode while owning the lock for vnode which owns the buffer, for which
the dependencies are flushed.  This creates another instance of the
same LoR as was fixed in softdep_sync().

Put the generic code for safe relocking into new SU helper
get_parent_vp() and use it in flush_pagedep_deps().  The case for safe
relocking of two vnodes with undefined lock order was extracted into
vn helper vn_lock_pair().

Due to call sequence
     ffs_syncvnode()->softdep_sync_buf()->flush_pagedep_deps(),
ffs_syncvnode() indicates with ERELOOKUP that passed vnode was
unlocked in process, and can return ENOENT if the passed vnode
reclaimed.  All callers of the function were inspected.

Because UFS namei lookups store auxiliary information about directory
entry in in-memory directory inode, and this information is then used
by UFS code that creates/removed directory entry in the actual
mutating VOPs, it is critical that directory vnode lock is not dropped
between lookup and VOP.  For softdep_prelink(), which ensures that
later link/unlink operation can proceed without overflowing the
journal, calls were moved to the place where it is safe to drop
processing VOP because mutations are not yet applied.  Then, ERELOOKUP
causes restart of the whole VFS operation (typically VFS syscall) at
top level, including the re-lookup of the involved pathes.  [Note that
we already do the same restart for failing calls to vn_start_write(),
so formally this patch does not introduce new behavior.]

Similarly, unsafe calls to fsync in snapshot creation code were
plugged.  A possible view on these failures is that it does not make
sense to continue creating snapshot if the snapshot vnode was
reclaimed due to forced unmount.

It is possible that relock/ERELOOKUP situation occurs in
ffs_truncate() called from ufs_inactive().  In this case, dropping the
vnode lock is not safe.  Detect the situation with VI_DOINGINACT and
reschedule inactivation by setting VI_OWEINACT.  ufs_inactive()
rechecks VI_OWEINACT and avoids reclaiming vnode is truncation failed
this way.

In ffs_truncate(), allocation of the EOF block for partial truncation
is re-done after vnode is synced, since we cannot leave the buffer
locked through ffs_syncvnode().

In collaboration with:	pho
Reviewed by:	mckusick (previous version), markj
Tested by:	markj (syzkaller), pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26136
2020-11-14 05:30:10 +00:00
Konstantin Belousov
738ea0010b Add ffs_inode_bwrite() helper.
In collaboration with:	pho
Reviewed by:	mckusick (previous version), markj
Tested by:	markj (syzkaller), pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26136
2020-11-14 05:19:59 +00:00
Konstantin Belousov
7b795aa3c0 Revert r367669 to re-commit with proper message 2020-11-14 05:19:44 +00:00
Konstantin Belousov
c0d2077f41 Add a framework that tracks exclusive vnode lock generation count for UFS.
This count is memoized together with the lookup metadata in directory
inode, and we assert that accesses to lookup metadata are done under
the same lock generation as they were stored.  Enabled under DIAGNOSTICS.

UFS saves additional data for parent dirent when doing lookup
(i_offset, i_count, i_endoff), and this data is used later by VOPs
operating on dirents.  If parent vnode exclusive lock is dropped and
re-acquired between lookup and the VOP call, we corrupt directories.

Framework asserts that corruption cannot occur that way, by tracking
vnode lock generation counter.  Updates to inode dirent members also
save the counter, while users compare current and saved counters
values.

Also, fix a case in ufs_lookup_ino() where i_offset and i_count could
be updated under shared lock.  It is not a bug on its own since dvp
i_offset results from such lookup cannot be used, but it causes false
positive in the checker.

In collaboration with:	pho
Reviewed by:	mckusick (previous version), markj
Tested by:	markj (syzkaller), pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26136
2020-11-14 05:17:04 +00:00
Konstantin Belousov
61846fc4dc Add a framework that tracks exclusive vnode lock generation count for UFS.
This count is memoized together with the lookup metadata in directory
inode, and we assert that accesses to lookup metadata are done under
the same lock generation as they were stored.  Enabled under DIAGNOSTICS.

UFS saves additional data for parent dirent when doing lookup
(i_offset, i_count, i_endoff), and this data is used later by VOPs
operating on dirents.  If parent vnode exclusive lock is dropped and
re-acquired between lookup and the VOP call, we corrupt directories.

Framework asserts that corruption cannot occur that way, by tracking
vnode lock generation counter.  Updates to inode dirent members also
save the counter, while users compare current and saved counters
values.

Also, fix a case in ufs_lookup_ino() where i_offset and i_count could
be updated under shared lock.  It is not a bug on its own since dvp
i_offset results from such lookup cannot be used, but it causes false
positive in the checker.

In collaboration with:	pho
Reviewed by:	mckusick (previous version), markj
Tested by:	markj (syzkaller), pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26136
2020-11-14 05:10:39 +00:00
Rick Macklem
3ba86a63e9 Add a entry for r367660. 2020-11-14 01:55:02 +00:00
Rick Macklem
01b139f212 Fix startup of gssd when /usr is a separately mounted local file system.
meowthink@gmail.com reported that the gssd daemon was not
starting, because /etc/rc.d/gssd was executed before his local
/usr file system was mounted.
He fixed the problem by adding mountcritlocal to the REQUIRED
line.

This fix seems safe and works for a separately mounted /usr file
system on a local disk.
The case of a separately mounted remote /usr file system (such as
NFS) is still broken, but there is no obvious solution for that.
Adding mountcritremote would fix the problem, but it would
cause a POLA violation, because all kerberized NFS mounts
in /etc/fstab would need the "late" option specified to work.

Submitted by:	meowthink@gmail.com
Reported by:	meowthink@gmail.com
Reviewed by:	0mp
MFC after:	2 weeks
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D27203
2020-11-14 01:49:49 +00:00
Alexander Motin
0bed3eabc5 Add PMRCAP printing and fix earlier CAP_HI.
MFC after:	3 days
2020-11-14 01:45:34 +00:00
Rick Macklem
1b57066e8b Add an entry for r367026, r367423. 2020-11-14 01:39:27 +00:00