Commit Graph

281006 Commits

Author SHA1 Message Date
Mark Johnston
675e2618ae inpcb: Deduplicate some assertions
It makes more sense to check lookupflags in the function which actually
uses SMR.  No functional change intended.

Reviewed by:	glebius
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38359
2023-02-03 11:48:25 -05:00
Mark Johnston
5f03f96fbe shm: Document shm_create_largepage()
While here, move notes about FreeBSD-specific functionality to the
COMPATIBILITY section, and document the ECAPMODE error for shm_open().

Reviewed by:	pauamma, kib
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Sponsored by:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D38282
2023-02-03 11:48:25 -05:00
Mark Johnston
a2286a1f46 man4: Add a manual page for kvmclock
Reviewed by:	pauamma, imp, kib
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Sponsored by:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D38343
2023-02-03 11:48:25 -05:00
Mark Johnston
2bed14192c pvclock: Export a vDSO page even without rdtscp available
When the cycle counter is "stable", i.e., synchronized across vCPUs by
the hypervisor, userspace can use a serialized rdtsc instead of relying
on rdtscp, just like the kernel timecounter does.  This can be useful
for performance in guests where the hypervisor hides rdtscp for some
reason.

To avoid breaking compatibility with older userspace which expects
rdtscp to be usable when pvclock exports timekeeping info, hide this
feature behind a sysctl.

Reviewed by:	kib
Tested by:	Shrikanth R Kamath <kshrikanth@juniper.net>
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Sponsored by:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D38342
2023-02-03 11:48:25 -05:00
Mark Johnston
26d105199e libc: Fall back to rdtsc when using pvclock and rdtscp is not available
In preparation for a follow-up revision wherein kvmclock may export
timekeeping info to userspace even in the absence of AMDID_RDTSCP, fall
back to using rdtsc when rdtscp isn't available.  This mimics
pvclock_read_time_info() in the kernel.

Reviewed by:	kib
Tested by:	Shrikanth R Kamath <kshrikanth@juniper.net>
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Sponsored by:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D38341
2023-02-03 11:47:11 -05:00
Dmitry Chagin
eb08932156 linux(4): Microoptimize linux_ipc code to unindent else blocks.
No functional change.

MFC after:		1 week
2023-02-03 19:17:34 +03:00
Dmitry Chagin
3e0c56a717 linux(4): Use designated initializers.
MFC after:		1 week
2023-02-03 19:17:15 +03:00
Warner Losh
a5c0d55177 test: Add fstab to all ufs images
Ensure that we populate /etc/fstab for all the ufs images.  Tweak sizes
while I'm at it.

Note: This file could use a good refactoring... or maybe a rewrite in
python or lua.

Sponsored by:		Netflix
Reviewed by:		tsoome
Differential Revision:	https://reviews.freebsd.org/D38317
2023-02-03 08:41:41 -07:00
Warner Losh
335e3daaf0 kboot: Keep track of what's used in the segment
Keep track of how much is used in the segment as we allocate it to the
application. Set memsz to 0 first, and increment it as used. Adjust the
bufsz before we call kexec so the kernel copies the right amount (it's
an error for bufsz to be bigger than memsz, so we set them == when we
retrieve the segment). Make sure we round to the page size, otherwise
kexec_load gets cranky.

Sponsored by:		Netflix
Reviewed by:		tsoome
Differential Revision:	https://reviews.freebsd.org/D38315
2023-02-03 08:41:41 -07:00
Warner Losh
db8d0c0cd9 kboot: Allocate a really big first segment
Allocate a huge segment for the first kexec_load segments. We limit the
lessor of:
	allocation to the size of the remaining memory segment
	45% of available memory
	95% of the memory we can allocate

This allows us to have really large RAM disks. We likely need to limit
this to the amount we actually used, though, since this can be a lot of
memory.

We have to do this complicated calculation for a few reasons: First, we
need 2 copies of the loaded kernel in the memory: The kernel can copy
everything to a temporary buffer. Next, malloc (via mmap) is limited to
a certain amount due to over commit, so we have to not allocate all we
can (only most of what we can).

Sponsored by:		Netflix
Reviewed by:		tsoome
Differential Revision:	https://reviews.freebsd.org/D38314
2023-02-03 08:41:41 -07:00
Warner Losh
1d3a7e849b kboot: Remove externs
kboot_get_phys_load_segment is defined in kboot.h, so remove them from
the .c files.

Sponsored by:		Netflix
Reviewed by:		tsoome
Differential Revision:	https://reviews.freebsd.org/D38310
2023-02-03 08:41:41 -07:00
Warner Losh
045fa2801a kboot: Try to read UEFI memory from physical memory on aarch64
Try to open /dev/mem to read in the UEFI memory map. If we can't, then
we'll read it in the trampoline.

Retain reading in /proc/iomem to find reserved areas in Linux. We need
to know them for good places to put the kernel. These are not reflected
in the UEFI memory map. However, we should not adjust the UEFI memory
map since these reserved areas of the Linux kernel are free to be used
once we enter the kexec trampoline...

Sponsored by:		Netflix
Reviewed by:		tsoome, kevans, andrew
Differential Revision:	https://reviews.freebsd.org/D38264
2023-02-03 08:41:41 -07:00
Warner Losh
4601064126 kboot: Enable for aarch64
Enable building loader.kboot for aarch64/arm64.

Sponsored by:		Netflix
Reviewed by:		tsoome, kevans, andrew
Differential Revision:	https://reviews.freebsd.org/D38262
2023-02-03 08:41:40 -07:00
Warner Losh
2b51791053 kboot: Don't need an arch pointer to get segments
There's no need for an arch pointer to get segments. We can call the
routine directly since we don't need this code to be called from
different context where a pointer is needed.

Sponsored by:		Netflix
Reviewed by:		kevans, andrew
Differential Revision:	https://reviews.freebsd.org/D38266
2023-02-03 08:41:40 -07:00
Warner Losh
84eb9b2306 kboot: MI fixups to enable aarch64 booting
A number of bug fixes to loading kernels and modules on aarch64 and amd64.
Fix offset calcuations.
Add a number of debugs, commented out for now (will GC them in the future)

With this, and the MD aarch64 commands, we can linux boot in qemu and on
real hardware.

Sponsored by:		Netflix
Reviewed by:		kevans
Differential Revision:	https://reviews.freebsd.org/D38261
2023-02-03 08:41:40 -07:00
Warner Losh
2069a2a08f kboot: Improve amd64 booting
Copy more of the necessary state for FreeBSD to boot:
o Copy EFI memory tables
o Create custom page tables needed for the kernel to find itself
o Simplify the passing of args to the trampoline by putting them
  on the stack rather than in dedicated memory.

This is only partially successful... we get only part way through the
amd64 startup code before dying. However, it's much further than before
the changes.

Sponsored by:		Netflix
Reviewed by:		tsoome, kevans
Differential Revision:	https://reviews.freebsd.org/D38259
2023-02-03 08:41:40 -07:00
Warner Losh
dfcca21075 kboot: aarch64 trampoline implementation
Update exec.c (copyied from efi/loader/arch/arm64/exec.c) to allow
execution of aarch64 kernels. This includes a new trampoline code that
handles copying the UEFI memory map, if available from the Linux FDT
provided PA. This is a complete implementation now, able to boot from
the LinuxBoot environment on an aarch64 server that only offers
LinuxBoot (though a workaround for the gicv3 inability to re-init is not
yet in FreeBSD). Many 'fit and finish' issues will be addressed in
subsequent commits.

Sponsored by:		Netflix
Reviewed by:		tsoome, kevans, andrew
Differential Revision:	https://reviews.freebsd.org/D38258
2023-02-03 08:41:40 -07:00
Warner Losh
0928550c3e stand: share bootinfo.c between EFI and KBOOT
Connect efi's bootinfo.c to the kboot build, and adjust to use
the kboot specific routines.

The getrootmount() call is independent of EFI. Remove ifdefs so it's
called for kboot too.

The differences between the kboot and efi bootinfo.c files are now tiny.
This could use some more refactoring, but this is a working checkpoint.

Sponsored by:		Netflix
Reviewed by:		tsoome
Differential Revision:	https://reviews.freebsd.org/D38350
2023-02-03 08:41:40 -07:00
Warner Losh
e49773296c kboot: aarch64 bi_loadsmap
Since aarch64 is different, it needs a different smap. We first see if
we have the PA of the table from the FDT info. If so, we copy that and
quit. Otherwise, we do the best we can in translating the /proc/iomap
into EFI Memory Table format.

We also send the system table to the kernel.

Sponsored by:		Netflix
Reviewed by:		kevans
Differential Revision:	https://reviews.freebsd.org/D38255
2023-02-03 08:41:40 -07:00
Warner Losh
b6755eabcc kboot: bi_loadsmap for amd64
Copy the EFI memory tables we were able to get into the MODINFOMD_SMAP
metadata area for the kernel.

Sponsored by:		Netflix
Reviewed by:		tsoome, kevans
Differential Revision:	https://reviews.freebsd.org/D38254
2023-02-03 08:41:40 -07:00
Warner Losh
6e99dc1375 kboot: Powerpc provide bi_loadsmap
It's just a stub, since the kernel learns of memory via FDT.

Sponsored by:		Netflix
Reviewed by:		tsoome, kevans
Differential Revision:	https://reviews.freebsd.org/D38253
2023-02-03 08:41:39 -07:00
Warner Losh
d1a3cc0abe kboot: Define bi_loadsmap for loading memory maps
Each architecture will soon be required to provide this to load memory
maps as metadata for the platforms that require it (or a stub function
for those that don't).

Sponsored by:		Netflix
Reviewed by:		tsoome, kevans
Differential Revision:	https://reviews.freebsd.org/D38252
2023-02-03 08:41:39 -07:00
Warner Losh
2e53353280 kboot: Call enumerate_memory_arch()
Now that all architectures provide this, enumerate the platform's memory
before we go to interact(). This needs to be done only once, but relies
on our ability to open host: files on some platforms, so it needs to be
done after devinit().

Sponsored by:		Netflix
Reviewed by:		tsoome, kevans
Differential Revision:	https://reviews.freebsd.org/D38251
2023-02-03 08:41:39 -07:00
Warner Losh
a967cd4db2 kboot: Update amd64 to use enumerate_memory_arch()
Move memory enumeration to the enumerate_memory_arch(), tweak the code a
bit to make that fit into that framework.

Also fix a bug in the name of the end location. The old code never found
memory (though amd64 doesn't yet work, this lead to using fallback
addresses that were good enough for QEMU...).

Sponsored by:		Netflix
Reviewed by:		kevans
Differential Revision:	https://reviews.freebsd.org/D38250
2023-02-03 08:41:39 -07:00
Warner Losh
1c98cd1569 kboot: aarch64 memory enumeration enumerate_memory_arch()
We have an odd situation with aarch64 memory enumeration. The fdt that
we can get has a PA of the UEFI memory map, as modified by the current
running Linux kernel so it can retain those pages it needs for EFI and
other services. We have to pass in this EFI tablem but don't have access
to it in the boot loader. We do in the trampoline code, so a forthcoming
commit will copy it there for the kernel to use. All for want of /dev/mem
in the target environment sometimes.

However, we also have to find a place to load the kernel, so we have to
fallback to /proc/iomem when we can't read the UEFI memory map directly
from /dev/mem. It will give us good enough results to do this task. This
table isn't quite suitable to be converted to the EFI table, so we use
both methods. We'll fall back to this method also if there's no EFI
table advertised in the fdt. There's no /sys file on aarch64 that has
this information, hence using the old-style /proc/iomem. We're unlikely
to work if there's no EFI, though.

Note: The underlying Linux mechanism is different than the amd64 method
which seems like it should be MI, but unimplemented on aarch64.

Sponsored by:		Netflix
Discussed with:		kevans
Differential Revision:	https://reviews.freebsd.org/D38249
2023-02-03 08:41:39 -07:00
Warner Losh
1d5f967fa7 kboot: Add powerpc stub for enumerate_memory_arch()
Add stub for new MI interface for enumerating memory. Right now powerpc
looks in the FDT table at a later point in boot since we don't need to
pass a specific memory table to the kernel. Leave it like that for now,
but note plans for the future.

Sponsored by:		Netflix
Reviewed by:		kevans
Differential Revision:	https://reviews.freebsd.org/D38248
2023-02-03 08:41:39 -07:00
Warner Losh
81fbd74a4b kboot: space_avail -- how much space exists from 'start' to end of segment
Sponsored by:		Netflix
Reviewed by:		tsoome
Differential Revision:	https://reviews.freebsd.org/D38313
2023-02-03 08:41:39 -07:00
Warner Losh
33e5b27254 kboot: Add parsing of /proc/iomem into seg.c
We'll be using this code for most / all of the platforms since iomem is
the only interface that can tell us of the reserved to the linux kernel
areas that we cannot place the new kernel into, but that we are free to
use once we hit trampoline. aarch64 will use this shortly, and similar
code in amd64 will be refactored when I make that platform work.

Sponsored by:		Netflix
Reviewed by:		tsoome
Differential Revision:	https://reviews.freebsd.org/D38309
2023-02-03 08:41:39 -07:00
Warner Losh
08779e839a kboot: Create segment handling code at main level
Create segment handling code up to the top level. Move it all into
seg.c, and make necessary adjustments for it being in a new file,
including inventing print_avail() and first_avail() to print the array
and find the first large enough memory hole.  aarch64 will use this,
and I'll refactor the other platforms to use it as I make them work.

Sponsored by:		Netflix
Discussed with:		kevans
Differential Revision:	https://reviews.freebsd.org/D38308
2023-02-03 08:41:39 -07:00
Warner Losh
9e50222131 kboot: MI part of the memory enumeration code
enumerate_memory_arch is called once early in kboot's startup to allow
us to discover the memory layout, reserved areas, etc of the system
memory. Add the MI interface part of this.

Sponsored by:		Netflix
Reviewed by:		tsoome, kevans
Differential Revision:	https://reviews.freebsd.org/D38247
2023-02-03 08:41:38 -07:00
Warner Losh
fb26a14fc4 kboot: Add aarch64 fdt fixup
Sponsored by:		Netflix
Reviewed by:		kevans
Differential Revision:	https://reviews.freebsd.org/D38256
2023-02-03 08:41:38 -07:00
Warner Losh
d76330efd9 kboot: Probe all disks and partitions for a kernel
Guess where to boot from when bootdev= isn't on the command line or
other config. Search all the disks and partitions for one that looks
like it could be a boot partition (same as we do when probing
zpools). Return the first one we find.

Sponsored by:		Netflix
Reviewed by:		tsoome
Differential Revision:	https://reviews.freebsd.org/D38319
2023-02-03 08:41:38 -07:00
Dag-Erling Smørgrav
cb96a0ef00 cp: Minor code cleanup.
* Fix includes in utils.c, cf. style(9).
* Fix type mismatch: readlink(2) returns ssize_t, not int.
* It is not necessary to set errno to 0 as fts_read(3) already does it.

MFC after:	1 week
Sponsored by:	Klara, Inc.
Reviewed by:	allanjude
Differential Revision:	https://reviews.freebsd.org/D38369
2023-02-03 16:37:37 +01:00
Justin Hibbits
87e728340b Mechanically convert wg(4) to IfAPI
Reviewed By:	jhb
Sponsored by:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38307
2023-02-03 09:38:03 -05:00
Justin Hibbits
c50f70b5a9 linsysfs: Use IfAPI accessors
Replace the only two ifnet member accesses with IfAPI accessor calls.

Sponsored by:	Juniper Networks, Inc.
2023-02-03 09:38:03 -05:00
Justin Hibbits
5243598927 linprocfs: Migrate to IfAPI
Summary:
Migrate linprocfs to use the IfAPI interfaces instead of direct ifnet
accesses.

Reviewed by:	dchagin
Sponsored by:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38358
2023-02-03 09:38:03 -05:00
Justin Hibbits
2eeb808361 IfAPI: Add iterator to loop over all interfaces
Summary:
Sometimes it's useful to iterate over all interfaces in the current
VNET, as the linuxulator does in several places.

Unlike other iterators in the IfAPI this propagates any error received
up to the caller, instead of returning a count.

Sponsored by:	Juniper Networks, Inc.
Reviewed by:	glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38348
2023-02-03 09:38:02 -05:00
Toomas Soome
a1f8a0c793 efiserialio: use port settings (sio->Mode) for initial setup
Use serial port setup done by system firmware.
ARM64 Hyper-V does hung if we attempt to override the defaults,
therefore we should default to use settings from firmware.

Tested by: schakrabarti@microsoft.com
PR:		266248
MFC after:	1 week
2023-02-03 11:53:32 +02:00
Kristof Provost
afa77b6996 pf tests: improve pfsync:basic_defer test
Create state on output only, to ensure we trigger the defer code.

MFC after:	2 weeks
2023-02-03 09:39:21 +01:00
Kristof Provost
0ed5f66c5a pfsync: add missing bucket lock
pfsync_q_ins() expects us to hold the bucket lock, but when we enter it
from pfsync_state_import() we don't.

MFC after:	2 weeks
2023-02-03 09:39:09 +01:00
Xin LI
fdbfaefefa hastctl: use zlib's crc32 implementation.
X-MFC-with:	6998572a74
MFC after:      2 weeks
2023-02-03 00:30:08 -08:00
Xin LI
6998572a74 hastd: use zlib's crc32 implementation.
Reviewed by:	pjd
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D35767
2023-02-02 23:14:21 -08:00
Pawel Jakub Dawidek
c54d240eb1 kern_prot.c p_candebug(): Remove single-use variable.
Reviewed by:		allanjude, oshogbo
Approved by:		allanjude, oshogbo
Differential Revision:	https://reviews.freebsd.org/D38288
2023-02-02 17:00:24 -08:00
Pawel Jakub Dawidek
14ba79255b nv.9: Improve style in one of the examples.
Reviewed by:		allanjude, oshogbo
Approved by:		allanjude, oshogbo
Differential Revision:	https://reviews.freebsd.org/D38287
2023-02-02 17:00:23 -08:00
Brooks Davis
5c274b3622 whitespace: rewrap to match case directly above
It's easier to visually diff the two case blocks if there aren't
gratutious whitespace differences.

Sponsored by:	DARPA
2023-02-03 00:37:31 +00:00
Rick Macklem
7926a01ed7 vfs_export: Add checks for correct prison when updating exports
mountd(8) basically does the following:
getmntinfo()
for each mount
      delete_exports
using nmount(2) to do the creation/deletion of individual exports.

For prison0 (and for other prisons if enforce_statfs == 0) getmntinfo()
returns all mount points, including ones being used within other prisons.
This can cause confusion if the same file system is specified in the
exports(5) file for multiple prisons.

This patch adds a perminent identifier to each prison
and marks which prison did the exports in a field of
the mount structure called mnt_exjail.  This field can
then be compared to the perminent identifier for the
prison that the thread's credentials is in.
Also required was a new function called prison_isalive_permid()
which returns if the prison is alive, so that the check can be
ignored for prisons that have been removed.

This prepares the system to allow mountd(8) to run in multiple
prisons, including prison0.

Future commits will complete the modifications to allow mountd(8)
to run in vnet prisons.  Until then, these changes should not affect
semantics.

Reviewed by:	markj
MFC after:	3 months
Differential Revision:	https://reviews.freebsd.org/D38144
2023-02-02 16:20:58 -08:00
Dag-Erling Smørgrav
57aa630220 tarfs: Remove unused code.
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
2023-02-02 23:16:17 +00:00
Dag-Erling Smørgrav
cf93505e8d tarfs: Fix non-ZSTDIO build.
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
2023-02-02 23:25:34 +01:00
Michael Tuexen
7b2f1a7fe9 sctp: improve delivery of stream reset notifications
Two functions are not called via sctp_ulp_notify() and therefore
need additional checks when being called.

Reported by:	syzbot+eb888d3a5a6c54413de5@syzkaller.appspotmail.com
MFC after:	3 days
2023-02-02 14:46:10 +01:00
Warner Losh
ab926ba4c3 kboot: Remove kboot_loadaddr
Turns out that the loadaddr interface is not sufficiently expressive to
do the loading we need to do. Instead, we'll emulate some of its
features with inline math in copyin/copyout.

Sponsored by:		Netflix
Reviewed by:		kevans
Differential Revision:	https://reviews.freebsd.org/D38260
2023-02-02 14:09:55 -07:00