The two flags are distinct and it is impossible to correctly handle clone(2)
without the assistance of fork1(). This change depends on the pwddesc split
introduced in r367777.
I've added a fork_req flag, FR2_SHARE_PATHS, which indicates that p_pd
should be treated the opposite way p_fd is (based on RFFDG flag). This is a
little ugly, but the benefit is that existing RFFDG API is preserved.
Holding FR2_SHARE_PATHS disabled, RFFDG indicates both p_fd and p_pd are
copied, while !RFFDG indicates both should be cloned.
In Chrome, clone(2) is used with CLONE_FS, without CLONE_FILES, and expects
independent fd tables.
The previous conflation of CLONE_FS and CLONE_FILES was introduced in
r163371 (2006).
Discussed with: markj, trasz (earlier version)
Differential Revision: https://reviews.freebsd.org/D27016
No functional change intended.
Tracking these structures separately for each proc enables future work to
correctly emulate clone(2) in linux(4).
__FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof.
Reviewed by: kib
Discussed with: markj, mjg
Differential Revision: https://reviews.freebsd.org/D27037
As this ABI is still fresh (r367287), let's correct some mistakes now:
- Version the structure to allow for future changes
- Include sender's pid in control message structure
- Use a distinct control message type from the cmsgcred / sockcred mess
Discussed with: kib, markj, trasz
Differential Revision: https://reviews.freebsd.org/D27084
This is used by some Linux programs using filehandles (r367773) to locate
the mountpoint for a given fsid.
Differential Revision: https://reviews.freebsd.org/D27136
This tripped up in llvm compilation on amd64 noting that lz4_init/lz4_fini
were lacking in being previously defined.
Reviewed by: emaste, freqlabs, brooks
Differential Revision: https://reviews.freebsd.org/D27240
In case u-boot was compiled without video support set the PLL
to 432Mhz (which allow us to use most of the HDMI resolution for
tcon) and set it as the parent for the DE clock.
r367416 should have called save_fpu() before kern_sigprocmask to avoid
race condition
Thanks jhibbits and bdragon for pointing it out
Reviewed by: jhibbits
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D27241
After r367417, both mmu_oea64 and mmu_radix were defining the vm.pmap
sysctl node, resulting in the later definition hiding the properties of
the previous one. Avoid this issue by defining vm.pmap in a common
source file and declaring it where needed.
This change also standardizes the tunable name used to enable superpages
and change its default to disabled on radix MMU, because it still has some
issues with superpages.
Reviewed by: bdragon, jhibbits
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D27156
The former tries to dereference memory allocated by the latter. If counting
the redistributor fails it may try to dereference memory that was never
allocated.
Sponsored by: Innovate UK
The same driver works on both, allow the driver to attach to a GICv4
controller with the ACPI attachment.
Reported by: Andrey Fesenko <f0andrey_gmail.com>
Sponsored by: Innovate UK
One of the last shifts inadvertently moved these static assertions out of a
COMPAT_FREEBSD32 block, which the relevant definitions are limited to.
Fix it.
Pointy hat: kevans
All of the compat32 variants are substantially the same, save for
copyin/copyout (mostly). Apply the same kind of technique used with kevent
here by having the syscall routines supply a umtx_copyops describing the
operations needed.
umtx_copyops carries the bare minimum needed- size of timespec and
_umtx_time are used for determining if copyout is needed in the sem2_wait
case.
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27222
Specifically, if we're waking up some value n > BATCH_SIZE, then the
copyin(9) is wrong on the second iteration due to upp being the wrong type.
upp is currently a uint32_t**, so upp + pos advances it by twice as many
elements as it should (host pointer size vs. compat32 pointer size).
Fix it by just making upp a uint32_t*; it's still technically a double
pointer, but the distinction doesn't matter all that much here since we're
just doing arithmetic on it.
Add a test case that demonstrates the problem, placed with the libthr tests
since one messing with _umtx_op should be running these tests. Running under
compat32, the new test case will hang as threads after the first 128 get
missed in the wake. it's not immediately clear how to hit it in practice,
since pthread_cond_broadcast() uses a smaller (sleepq batch?) size observed
to be around ~50 -- I did not spend much time digging into it.
The uintptr_t change makes no functional difference, but i've tossed it in
since it's more accurate (semantically).
Reported by: Andrew Gierth (andrew_tao173.riddles.org.uk, inspection)
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27231
Add __unused to some args.
Change type of the iterator variables to match loop control.
Remove excessive {}.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27220
This adds an arm64 iommu interface and a driver for Arm System Memory
Management Unit version 3.2 (ARM SMMU v3.2) specified in ARM IHI 0070C
document.
Hardware overview is provided in the header of smmu.c file.
The support is disabled by default. To enable add 'options IOMMU' to your
kernel configuration file.
The support was developed on Arm Neoverse N1 System Development Platform
(ARM N1SDP), kindly provided by ARM Ltd.
Currently, PCI-based devices and ACPI platforms are supported only.
The support was tested on IOMMU-enabled Marvell SATA controller,
Realtek Ethernet controller and a TI xHCI USB controller with a low to
medium load only.
Many thanks to Konstantin Belousov for help forming the generic IOMMU
framework that is vital for this project; to Andrew Turner for adding
IOMMU support to MSI interrupt code; to Mark Johnston for help with SMMU
page management; to John Baldwin for explaining various IOMMU bits.
Reviewed by: mmel
Relnotes: yes
Sponsored by: DARPA / AFRL
Sponsored by: Innovate UK (Digital Security by Design programme)
Differential Revision: https://reviews.freebsd.org/D24618
This moves entire large alloc handling out of all consumers, apart from
deciding to go there.
This is a step towards creating a fast path.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27198
The entry->flags field is initialized in iommu_gas_init_domain().
Reviewed by: kib
Sponsored by: Innovate DSbD
Differential Revision: https://reviews.freebsd.org/D27235
This is needed on arm64 for the interface between iommu framework
and iommu controller drivers.
Reviewed by: kib
Sponsored by: Innovate DSbD
Differential Revision: https://reviews.freebsd.org/D27229
APIs that have deferred callbacks should have some kind of cleanup
function that callers can use to fence the callbacks. Otherwise things
like module unloading can lead to dangling function pointers, or worse.
The IB MR code is the only place that calls this function and had a
really poor attempt at creating this fence. Provide a good version in
the core code as future patches will add more places that need this
fence.
Linux commit:
e355477ed9e4f401e3931043df97325d38552d54
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Report EQE data upon CQ completion to let upper layers use this data.
Linux commit:
4e0e2ea1886afe8c001971ff767f6670312a9b04
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Enhance mlx5_core_create_cq() to get the command out buffer from the
callers to let them use the output.
Linux commit:
38164b771947be9baf06e78ffdfb650f8f3e908e
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
To prevent a hardware memory leak when a DEVX DCT object is destroyed
without calling drain DCT before, (e.g. under cleanup flow), need to
manage its creation and destruction via mlx5 core.
Linux commit:
c5ae1954c47d3fd8815bd5a592aba18702c93f33
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Make sure order of cleanup is exactly the opposite of initialization.
Linux commit:
f4044dac63e952ac1137b6df02b233d37696e2f5
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Since thread_zone is marked NOFREE the thread_fini callback is never
executed, meaning memory allocated by seltdinit is never released.
Adding the call to thread_dtor is not sufficient as exiting processes
cache the main thread.
Refcounting was added to combat a race between selfdfree and doselwakup,
but it adds avoidable overhead.
selfdfree detects it can free the object by ->sf_si == NULL, thus we can
ensure that the condition only holds after all accesses are completed.
They are only there to provide less innacurate statistics for debuggers.
However, this is quite heavy-weight and instead it would be better to
teach debuggers how to obtain the necessary information.
When a user creates a TCP socket and tries to connect to the socket without
explicitly binding the socket to a local address, the connect call
implicitly chooses an appropriate local port. When evaluating candidate
local ports, the algorithm checks for conflicts with existing ports by
doing a lookup in the connection hash table.
In this circumstance, both the IPv4 and IPv6 code look for exact matches
in the hash table. However, the IPv4 code goes a step further and checks
whether the proposed 4-tuple will match wildcard (e.g. TCP "listen")
entries. The IPv6 code has no such check.
The missing wildcard check can cause problems when connecting to a local
server. It is possible that the algorithm will choose the same value for
the local port as the foreign port uses. This results in a connection with
identical source and destination addresses and ports. Changing the IPv6
code to align with the IPv4 code's behavior fixes this problem.
Reviewed by: tuexen
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D27164
LinuxKPI ACPI support is based on FreeBSD import of ACPICA which can be
compiled only on aarch64, amd64 and i386. Ifdef-out broken parts on our
side to avoid patching of vendor code.
This fixes drm-devel-kmod build on powerpc64(le).
Reported by: pkubaj
When operating in SU or SU+J mode, ffs_syncvnode() might need to
instantiate other vnode by inode number while owning syncing vnode
lock. Typically this other vnode is the parent of our vnode, but due
to renames occuring right before fsync (or during fsync when we drop
the syncing vnode lock, see below) it might be no longer parent.
More, the called function flush_pagedep_deps() needs to lock other
vnode while owning the lock for vnode which owns the buffer, for which
the dependencies are flushed. This creates another instance of the
same LoR as was fixed in softdep_sync().
Put the generic code for safe relocking into new SU helper
get_parent_vp() and use it in flush_pagedep_deps(). The case for safe
relocking of two vnodes with undefined lock order was extracted into
vn helper vn_lock_pair().
Due to call sequence
ffs_syncvnode()->softdep_sync_buf()->flush_pagedep_deps(),
ffs_syncvnode() indicates with ERELOOKUP that passed vnode was
unlocked in process, and can return ENOENT if the passed vnode
reclaimed. All callers of the function were inspected.
Because UFS namei lookups store auxiliary information about directory
entry in in-memory directory inode, and this information is then used
by UFS code that creates/removed directory entry in the actual
mutating VOPs, it is critical that directory vnode lock is not dropped
between lookup and VOP. For softdep_prelink(), which ensures that
later link/unlink operation can proceed without overflowing the
journal, calls were moved to the place where it is safe to drop
processing VOP because mutations are not yet applied. Then, ERELOOKUP
causes restart of the whole VFS operation (typically VFS syscall) at
top level, including the re-lookup of the involved pathes. [Note that
we already do the same restart for failing calls to vn_start_write(),
so formally this patch does not introduce new behavior.]
Similarly, unsafe calls to fsync in snapshot creation code were
plugged. A possible view on these failures is that it does not make
sense to continue creating snapshot if the snapshot vnode was
reclaimed due to forced unmount.
It is possible that relock/ERELOOKUP situation occurs in
ffs_truncate() called from ufs_inactive(). In this case, dropping the
vnode lock is not safe. Detect the situation with VI_DOINGINACT and
reschedule inactivation by setting VI_OWEINACT. ufs_inactive()
rechecks VI_OWEINACT and avoids reclaiming vnode is truncation failed
this way.
In ffs_truncate(), allocation of the EOF block for partial truncation
is re-done after vnode is synced, since we cannot leave the buffer
locked through ffs_syncvnode().
In collaboration with: pho
Reviewed by: mckusick (previous version), markj
Tested by: markj (syzkaller), pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26136