This existing helper function is preferable to the hand-rolled
calculation of the kstack bounds.
Make some small style improvements while here. Notably, rename every
instance of "r", the return address, to "ra". Tidy the includes in the
affected files.
Reviewed by: jkoshy
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39909
Use the unwind_frame() function, which properly validates the frame
pointer and uses ADDR_MAKE_CANONICAL() for the pc, required when PAC is
enabled.
Reviewed by: andrew, markj, jkoshy
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39934
The end result is much more legible in both cases.
Reviewed by: jkoshy
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39906
MIPS is gone, and this is the last remaining bit in the pmc code.
Reviewed by: jkoshy, emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39905
Improve the legibility of the list. Bump overall indentation, fix some
whitespace, and sort the IAF block.
Reviewed by: jkoshy
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39903
This comment is no longer in sync with the contents of __PMC_EVENTS().
Update to reflect the removal of various Intel event definitions from
this list; these event definitions now come from Linux and live in
lib/libpmc/pmu-events/.
Reviewed by: jkoshy
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39902
Although this block has remained in __PMC_EVENTS(), there is no handling
of UCP in libpmc/libpmc.c, so it is not possible to select one of these
events. It should therefore be impossible to trigger the code removed
from ucp_start_pmc(). Note that the GQ_SNOOP_MSF MSR exists only for
Nehalem and Westmere architectures, and the related events do not exist
for later generations.
The Uncore support in hwpmc has severely atrophied in general. We have
uncore event definitions in pmu-events, but the kernel support was
written against Intel Performance Measurement Architecture version 2,
and is disabled for processor generations later than Westmere. Nehalem
and Westmere lack uncore event definitions in pmu-events. I'd be
surprised if Uncore support is usable on any machine in its current
state.
Reviewed by: jkoshy
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39901
These are maintained elsewhere. No functional change.
Reviewed by: jkoshy
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39900
This is the MINIMAL config with SMP/NUMA options turned off.
Useful to ensure that UP configuration still builds, until it is removed
finally.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
On amd64 ACPI is required to boot, it cannot work as a module, and we do
not build the ACPI module for long time.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
When the TCP_LOG option is used to enable logging on a listening
socket, inherit this if the listener is not auto selected and does
not have a log id set.
Reviewed by: cc
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D38436
Log all errors for PRUs, except when INP_DROPPED is set. In that case,
don't log it.
Reviewed by: glebius, rrs
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D39591
If the i-node number (d_fileno) for a file on the server did
not fit in 32bits, it would be truncated to the low order 32bits
for the NFSv3 Readdir and ReaddirPlus RPC replies.
This is no longer correct, given that ino_t is now 64bits.
This patch fixes this by sending the full 64bits of d_fileno
on the wire in the NFSv3 Readdir/ReaddirPlus RPC reply.
PR: 271174
Reported by: bmueller@panasas.com
Tested by: bmueller@panasas.com
MFC after: 2 weeks
If vmx or svm is disabled in BIOS or the device isn't supported by vmm,
modinit won't allocate these state save areas. As kmem_free panics when
passing a NULL pointer to it, loading the vmm kernel module causes a
panic too.
PR: 271251
Reviewed by: markj
Fixes: 74ac712f72 ("vmm: Dynamically allocate a couple of per-CPU state save areas")
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D39974
Fix the number of targets we inquiry to be one less than the maximum
number of targets adapter reports. This gets rid of the errors reported
on VMware Workstation:
(probe36:pvscsi0:0:65:0): INQUIRY. CDB: 12 00 00 00 24 00
(probe36:pvscsi0:0:65:0): CAM status: CCB request completed with an error
While here, print the maximum number of targets.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D39867
Use al_bool for interfaces and structures defined in the
OS-independent HAL in sys/contrib, and use plain bool for
FreeBSD-specific APIs and structures in sys/dev/al_eth.
Reviewed by: imp, emaste
Differential Revision: https://reviews.freebsd.org/D39923
- Use an enum for the button type (it is not really a boolean value).
- Use bool for fixed.
Reviewed by: imp, emaste
Differential Revision: https://reviews.freebsd.org/D39922
uma_zfree() can be called on a NULL pointer. Simplify the pf code a
little by removing the redundant checks.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Do not preallocate pcpu area backing pages on early startup, only
allocate enough of KVA for pcpu[MAXCPU] and the page for BSP. Other
pages are allocated after we know the number of cpus and their
assignments to the domains.
PCPUs are not accessed until they are initialized, which happens on AP
startup.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D39945
The APs pcpu area is zeroed in init_secondary() by pcpu_init(), so the
early initialization in pmap_bootstrap() is nop.
Fixes: 42f722e721cd010ae5759a4b0d3b7b93c2b9cad2ESC
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D39945
After the locking protocol changed in commit 75a67bf3d0 ("AF_UNIX:
make unix socket locking finer grained"), uipc_peeraddr() was not
updated accordingly.
The link lock now only protects global socket lists. The PCB lock is
used to protect the link between connected PCBs, so use that. Remove an
old comment which appears to be noting that unp_conn is not set for
connected SOCK_DGRAM sockets (in one direction anyway).
Reviewed by: glebius
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D39855
When shutdown(..., SHUT_WR) is called in the front states, send a
SHUTDOWN chunk when a COOKIE ACK chunk is received and there is
no outstanding data.
MFC after: 1 week
Notable upstream pull request merges:
#11680 Add support for zpool user properties
#14145 Storage device expansion "silently" fails on degraded vdev
#14405 Create zap for root vdev
#14659 Allow MMP to bypass waiting for other threads
#14674 Miscellaneous FreBSD compilation bugfixes
#14692 Fix some signedness issues in arc_evict()
#14702 Fix typo in check_clones()
#14715 module: small fixes for FreeBSD/aarch64
#14716 Trim needless zeroes from checksum events
#14719 vdev: expose zfs_vdev_max_ms_shift as a module parameter
#14722 Fix "Detach spare vdev in case if resilvering does not happen"
#14723 freebsd clone range fixes
#14728 Fix BLAKE3 aarch64 assembly for FreeBSD and macOS
#14735 Fix in check_filesystem()
#14739 Fix data corruption when cloning embedded blocks
#14758 Fix VERIFY(!zil_replaying(zilog, tx)) panic
#14761 Revert "ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()"
#14774 FreeBSD .zfs fixups
#14776 FreeBSD: make zfs_vfs_held() definition consistent with declaration
#14779 powerpc64: Support ELFv2 asm on Big Endian
#14788 FreeBSD: add missing vop_fplookup assignments
#14789 PAM: support the authentication facility
#14790 Revert "Fix data race between zil_commit() and zil_suspend()"
#14795 Fix positive ABD size assertion in abd_verify()
#14798 Mark TX_COMMIT transaction with TXG_NOTHROTTLE
#14804 Correct ABD size for split block ZIOs
#14806 Use correct block pointer in block cloning case.
#14808 blake3: fix up bogus checksums in face of cpu migration
Obtained from: OpenZFS
OpenZFS commit: d96e29576c
Functions manipulating source nodes can fail due to various reasons like
memory allocation errors, hitting configured limits or lack of
redirection targets. Ensure those errors are properly caught and
propagated in the code. Increase the error counters not only when
parsing the main ruleset but the NAT ruleset too.
Cherry-picked from development of D39880
Reviewed by: kp
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D39940
Whilst we don't have ahci(4) currently, we do have umass(4), and need
pass(4) for smartctl(8) to be able to talk to such devices.
Reported by: David Gilbert <dgilbert@daveg.ca>
MFC after: 1 week
Use the new IfAPI interface and address iterators so the nfs driver
doesn't need direct access to the interface structures.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38962
if_llmaddr_count() only counts link-level multicast addresses.
hv_netvsc(4) needs to know if there are any multicast addresses. Since
hv_netvsc(4) is the only instance where this would be used, make it a
simple boolean. If others need a if_maddr_count(), that can be added in
the future.
Reviewed by: melifaro
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D39493
We always build the kernel floating point support. Now that the
riscv64sf userspace variant has been removed the option is required for
correct operation.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39851
Early versions of the VT-d spec mentioned 6-level paging support as a
possible value for the SAGAW capability, but later versions removed it
and SAGAW=0x10 is currently listed as a reserved value.
The 6-level (agaw=64) entry in sagaw_bits is furthermore problematic
with clang15 because the attempted comparison against 1ULL << 64 in
dmar_maxaddr2mgaw() causes the compiler to elide the last iteration
of the initial loop, which bypasses the subsequent logic to find the
greatest HW-supported address width. This results in 5-level paging
always being selected regardless of whether the hardware supports it,
which can result address translation failure due to invalid context-
entry programming.
Reviewed by: kib
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D39896
This change eliminates the struct pmap_pcid array embedded into struct
pmap and sized by MAXCPU, which would bloat with MAXCPU increase. Also
it removes false sharing of cache lines, since the array elements are
mostly locally accessed by corresponding CPUs.
Suggested by: mjg
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D39890
Cpuid is used to index the pmap->pm_pcids array only.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D39890
to initialize pm_pcids array for a new user pmap
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D39890
and rename the structure to pmap_pcid.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D39890
Add one ifdef to upstrem code and get rid of compiling the horrible
checked-in aarch64 assembler for the boot loader that the loader will
never use. I'll attempt to upstream this and adjust as needed.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D39897
Reduce number of hashing operations when handling source nodes by always
having a pointer to the hash row mutex in the source node. Provide
macros for handling and asserting the mutex. Calculate the hash only
once in pf_find_src_node() and then use this hash in subsequent
operations.
Cherry-picked from development of D39880
Reviewed by: kp, mjg
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D39888
TI support is in a sad state for years.
We haven't been able to keep up with all the breaking changes that
upstream do in the DTS. This requires a lot of new drivers to handle the
new buses that they create and all the new clocks that they expose.
Keep the code for now in case somebody is interested in reviving this
platform but stop bloating GENERIC with code that don't work.
Reviewed by: imp, mmel
MFC after: never
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D39843
As documented in listen.2 manual page, the kernel emits a LOG_DEBUG
syslog message if a socket listen queue overflows. For some appliances,
it may be desirable to change the priority to some higher value
like LOG_INFO while keeping other debugging suppressed.
OTOH there are cases when such overflows are normal and expected.
Then it may be desirable to suppress overflow logging altogether,
so that dmesg buffer is not flooded over long run.
In addition to existing sysctl kern.ipc.sooverinterval,
introduce new sysctl kern.ipc.sooverprio that defaults to 7 (LOG_DEBUG)
to preserve current behavior. It may be changed to any value
in a range of 0..7 for corresponding priority or to -1 to suppress logging.
Document it in the listen.2 manual page.
MFC after: 1 month
There were two issues with the carp key configuration in the new netlink
code.
The first is that userspace failed to actually pass the CARP_NL_KEY
attribute to the kernel, so a key was never set.
The second issue is that snl_attr_get_string() returns a pointer to the
string inside the netlink message. It does not copy the string to the
target buffer. That's somewhat inconvenient to work with in libifconfig
where we have a static buffer for the key.
Introduce snl_attr_copy_string() which can copy a string to a target
buffer and uses the 'arg' parameter to pass the buffer size, so it
doesn't accidentally exceed the available space.
Reviewed by: melifaro
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D39874
* If a measure of staleness of 0 is reported, use the RTT instead.
* Ensure that we always send a cookie preservative parameter by
rounding up during the calculation.
* If allowed, perform a round trip time measurement.
* Clear the overall error counter, since the error cause also
acts like an ACK.
MFC after: 1 week
Check for an uninitialed (zero valued) fs_maxbsize and set it
to its minimum valid size (fs_bsize). Uninitialed fs_maxbsize
were left by older versions of makefs(8) and the superblock
integrity checks fail when they are found.
No legitimate superblocks should fail as a result of these changes.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Other virtual interface drivers (e.g. if_gif, if_stf, if_ovpn) all start
with if_. The wireguard file is also named if_wg, but the module name
was 'wg'.
Fix this inconsistency.
Reported by: Christian McDonald <cmcdonald@netgate.com>
Reviewed by: zlei, kevans
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D39853
There is now assertion which requires all memory allocations of positive size.
Negative and zero-sized allocations lead to panic, so plug them off.
Reviewed by: imp, emaste
Differential Revision: https://reviews.freebsd.org/D39846
Offsets greater than 255 bytes reside on page1 of the SPD device.
Accessing them requires switching to page1, and adjusting the absolute
offset to be relative to the start of page1. After the access, the page
must be set back to page0. These operations are performed in several
places, so break them out into their own functions.
Also, replace a pair of default cases, which should be impossible due to
earlier checks, with __assert_unreachable().
Reviewed by: imp
MFC after: 1 week
Sponsored by: Panasas
Differential Revision: https://reviews.freebsd.org/D39842
DDR3 and DDR4 encode the week and year that the DIMM was manufactured,
as a pair of two-digit binary-coded decimal values. Read the values, and
report them as (uint8_t)s.
Reviewed by: imp, jhb
MFC after: 1 week
Sponsored by: Panasas
Differential Revision: https://reviews.freebsd.org/D39795
The hardware supported by mfi(4) and mrsas(4) use the same dcmd's.
mfiutil(8) in theory could run on controlled attached to mrsas(4).
It can't since mrsas(4) doesn't have support for the FreeBSD mfi(4)
ioctl. Porting the ioctl from mfi(4) to mrsas(4) would be the first
step in making mrsasutil(8) which is an additional name for mfiutil(8)
but opens /dev/mrsasX instead of /dev/mfiX
PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=265794
Reviewed by: jhb
Differential revision: https://reviews.freebsd.org/D36342
Tested by: Dan Mahoney <freebsd@gushi.org>
Otherwise POSTREAD syncs may re-invalidate the shadow of the data buffer
when copying from bounce pages, resulting in false-positive KMSAN
reports.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Replacing rtsock with netlink also means providing similar tracing facilities,
rtsock provides `route -n monitor` interface, where each message can be traced
to the originating PID.
This diff closes the feature gap between rtsock and netlink in that regard.
Netlink works slightly differently from rtsock, as it is a generic message
"broker". It calls some kernel KPIs and returns the result to the caller.
Other Netlink consumers gets notified on the changed kernel state using the
relevant subsystem callbacks. Typically, it is close to impossible to pass
some data through these KPIs to enhance the notification.
This diff approaches the problem by using osd(9) to assign the relevant
socket pointer (`'nlp`) to the per-socket taskqueue execution thread.
This change allows to recover the pointer in the aforementioned notification
callbacks and extract some additional data.
Using `osd(9)` (and adding additional metadata) to the notification receiver
comes with some additional cost attached, so this interface needs to be
enabled explicitly by using a newly-created `NETLINK_MSG_INFO` `SOL_NETLINK`
socket option.
The actual medatadata (which includes the originator PID) is provided via
control messages. To enable extensibility, the control message data is
encoded in the standard netlink(TLV-based) fashion. The list of the
currently-provided properties can be found in `nlmsginfo_attrs`.
snl(3) is extended to enable decoding of netlink messages with metadata
(`snl_read_message_dbg()` stores the parsed structure in the provided buffer).
Differential Revision: https://reviews.freebsd.org/D39391
The two main uses of dev_t are in struct stat and as a parameter of the
mknod system calls.
As of version 2.6.0 of the Linux kernel, dev_t is a 32-bit quantity
with 12 bits set asaid for the major number and 20 for the minor number.
The in-kernel dev_t encoded as MMMmmmmm, where M is a hex digit of the
major number and m is a hex digit of the minor number.
The user-space dev_t encoded as mmmM MMmm, where M and m is the major
and minor numbers accordingly. This is downward compatible with legacy
systems where dev_t is 16 bits wide, encoded as MMmm.
In glibc dev_t is a 64-bit quantity, with 32-bit major and minor numbers,
encoded as MMMM Mmmm mmmM MMmm. This is downward compatible with the Linux
kernel and with legacy systems where dev_t is 16 bits wide.
In the FreeBSD dev_t is a 64-bit quantity. The major and minor numbers
are encoded as MMMmmmMm, therefore conversion of the device numbers between
Linux user-space and FreeBSD kernel required.
In between kern_fstat() and translate_fd_major_minor(), another process
having the same filedesc could modify or close fd.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D39763
Get rid of calling Linux stat translation hook and specific to Linux
handling of non-vnode dirfd from kern_statat(),
Reviewed by: kib, mjg
Differential revision: https://reviews.freebsd.org/D35474
As of version 2.6.0 of the Linux kernel, dev_t is a 32-bit unsigned integer
on all platforms. Prior the 2.6 kernel dev_t type was an unsigned short.
However, since the firs commit of the Linuxulator, mknod syscall get int dev
argument.
Also, there is some confusion here, while the kernel declares a dev_t type
as a 32-bit sized, the user-space dev_t type can be size of 64 bits, e.g.,
in the Glibc library.
To avoid confusion and to help porting of the Linuxulator to other platforms
use explicit l_dev_t for dev argument of mknod syscalls.