freebsd-dev

Author	SHA1	Message	Date
Dmitry Chagin	4a6c2d075d	linux(4): Properly restore the thread signal mask after signal delivery on i386 Replace sigframe sf_extramask by native sigset_t and use it to store/restore the thread signal mask without conversion to/from Linux signal mask. Pointy hat to: dchagin MFC after: 2 weeks	2022-05-30 20:03:49 +03:00
Corvin Köhne	7468332f55	x86/mp: don't create empty cpu groups When some APICs are disabled by tunables, some cpu groups could end up empty. An empty cpu group causes the system to panic because not all functions handle them correctly. Additionally, it's wasted time to handle and inspect empty cpu groups. Therefore, just don't create them. Reviewed by: kib, avg, cem Sponsored by: Beckhoff Automation GmbH & Co. KG MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D24927	2022-05-30 11:21:46 +02:00
Dmitry Chagin	9016ec056a	linux(4): Deduplicate bsd_to_linux_trapcode() As bsd_to_linux_trapcode() is common for x86 Linuxulators, move it under x86/linux. MFC after: 2 weeks	2022-05-23 13:16:58 +03:00
Dmitry Chagin	2434137f69	linux(4): Deduplicate translate_traps() As translate_traps() is common for x86 Linuxulators, move it under x86/linux. MFC after: 2 weeks	2022-05-23 13:16:26 +03:00
Dmitry Chagin	6e826d27c3	linux(4): Better naming for ucontext field of struct rt_sigframe To reduce sendsig code difference and to avoid confusing me, rename sf_sc to sf_uc to match the content. MFC after: 2 weeks	2022-05-15 21:06:47 +03:00
Dmitry Chagin	21f2461741	linux(4): Move sigframe definitions to separate headers The signal trampoine-related definitions are used only in the MD part of code, wherefore moved from everywhere used linux.h to separate MD headers. MFC after: 2 weeks	2022-05-15 21:03:01 +03:00
Dmitry Chagin	5a6a4fb284	linux(4): Implement vdso getcpu for x86. This is modeled after `f2395455` (by kib@). MFC after: 2 weeks	2022-05-08 17:20:52 +03:00
Dmitry Chagin	332eca05b5	linux(4): Refactor vdso_gettc_x86 includes. Factor out includes from common vdso_gettc_x86 file to the corresponding MD files. MFC after: 2 weeks	2022-05-08 17:20:51 +03:00
John Baldwin	80d2b3de16	x86: Remove unused devclass arguments to DRIVER_MODULE.	2022-05-06 15:46:58 -07:00
John Baldwin	b3407dcc58	cpufreq: Remove unused devclass arguments to DRIVER_MODULE.	2022-05-06 15:39:29 -07:00
John Baldwin	09fd3b43ad	Remove isa_devclass from ISA bus drivers.	2022-05-06 15:39:28 -07:00
Dmitry Chagin	fe2c9f83a6	Remove dead code. is_physical_memory() dead since `235a54de`. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35056 MFC after: 2 weeks	2022-04-26 19:40:59 +03:00
John Baldwin	d4ab3a8d4f	busdma_bounce: Add free_bounce_pages helper function. Deduplicate code to iterate over the bpages list in a bus_dmamap_t freeing bounce pages during bus_dmamap_unload. Reviewed by: imp Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D34967	2022-04-21 10:42:14 -07:00
John Baldwin	489e8f24a5	smbios/vpd: Use devclass_find to lookup devclass in module event handler. While here, use a modern function declaration for smbios_modevent and vpd_modevent. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D34996	2022-04-21 10:29:14 -07:00
Kornel Duleba	06f659c39d	dmar: Disable PMR in driver attach routine Previously it was disabled right before translation was enabled. This way the disable logic is still executed even when translation is not be activated, e.g. with hw.iommu.dma=0 tunable set. On some platforms we need to disable PMR in order for core dump to work. At the same time it was observed that enabling translation has a significant impact on network performance. With this patch PMR can be disabled, with IOMMU translation not being turned on by appending the following to the loader.conf: hw.dmar.enable=1 hw.dmar.pmr.disable=1 hw.dmar.dma=0 Sponsored by: Stormshield Obtained from: Semihalf Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D34907	2022-04-20 09:40:28 +02:00
John Baldwin	3d6f4411e4	Remove checks for <sys/cdefs.h> being included. These files no longer depend on the macros required when these checks were added. PR: 263102 (exp-run) Reviewed by: brooks, imp, emaste Differential Revision: https://reviews.freebsd.org/D34804	2022-04-12 10:06:18 -07:00
John Baldwin	56f5947a71	Remove checks for __GNUCLIKE_ASM assuming it is always true. All supported compilers (modern versions of GCC and clang) support this. Many places didn't have an #else so would just silently do the wrong thing. Ancient versions of icc (the original motivation for this) are no longer a compiler FreeBSD supports. PR: 263102 (exp-run) Reviewed by: brooks, imp Differential Revision: https://reviews.freebsd.org/D34797	2022-04-12 10:05:45 -07:00
Mark Johnston	aa597d4049	i386: Fix the nodevice apic build PR: 263124 Fixes: `62d09b46ad` ("x86: Defer LAPIC calibration until after timecounters are available") Reviewed by: kib, jhb, emaste MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34830	2022-04-08 11:47:52 -04:00
John Baldwin	354ef278e9	powernow(4): Fix unused variable warnings by using the variables.	2022-04-06 16:45:28 -07:00
John Baldwin	89abc0fbbd	x86 bounce_bus_dma_tag_destroy: Silence set but unused warning.	2022-04-06 16:45:27 -07:00
Gordon Bergling	bba12ee453	xen(4): Fix a few typos in source code comments - s/querried/queried/ MFC after: 3 days	2022-03-28 19:37:20 +02:00
John Baldwin	931983ee08	x86: Add a NT_X86_SEGBASES register set. This register set contains the values of the fsbase and gsbase registers. Note that these registers can already be controlled individually via ptrace(2) via MD operations, so the main reason for adding this is to include these register values in core dumps. In particular, this will enable looking up the value of TLS variables from core dumps in gdb. The value of NT_X86_SEGBASES was chosen to match the value of NT_386_TLS on Linux. The notes serve similar purposes, but FreeBSD will never dump a note equivalent to NT_386_TLS (which dumps a single segment descriptor rather than a pair of addresses) and picking a currently-unused value in the NT_X86_* range could result in a future conflict. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D34650	2022-03-24 11:36:19 -07:00
Roger Pau Monné	1ca34862dc	x86/tsc: fetch frequency from CPUID when running on Xen Introduce a helper to fetch the TSC frequency from CPUID when running under Xen. Since the TSC can also be initialized early when running as a Xen guest pull out the call to tsc_init() from the early_clock_source_init() handlers and place it in clock_init(), as otherwise all handlers would call tsc_init() anyway. Reviewed by: markj Sponsored by: Citrix Systems R&D Differential revision: https://reviews.freebsd.org/D34581	2022-03-18 10:21:04 +01:00
Roger Pau Monné	396a8479b0	x86/xen: fix CPUID signature MFC: 3 days Reviewed by: cem Sponsored by: Citrix Systems R&D Differential revision: https://reviews.freebsd.org/D34580	2022-03-17 12:56:36 +01:00
Zhenlei Huang	ba46c6c4b7	x86: Correctly report unexpected cache level Reviewed by: rpokala, emaste MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D34577	2022-03-16 16:30:38 -04:00
Mark Johnston	075e2779ac	x86: Defer early TSC timecounter calibration to SI_SUB_CPU If we can't determine the TSC frequency using CPU registers, we need to give a chance for Hyper-V drivers to register a timecounter (during SI_SUB_HYPERVISOR) since an emulated 8254 might not be available. Thus, split probe_tsc_freq() into early and late stages, and wait until the latter to attempt calibration using a reference clock. Fixes: `84369dd523` ("x86: Probe the TSC frequency earlier") Reported and tested by: khng, Shawn Webb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34444	2022-03-04 19:34:43 -05:00
Mark Johnston	84369dd523	x86: Probe the TSC frequency earlier This lets us use the TSC to implement early DELAY, limiting the use of the sometimes-unreliable 8254 PIT. PR: 262155 Reviewed by: emaste Tested by: emaste, mike tancsa <mike@sentex.net>, Stefan Hegnauer <stefan.hegnauer@gmx.ch> MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34367	2022-03-01 09:39:35 -05:00
Elliott Mitchell	ad7dd51499	xen: switch to use headers in contrib These headers originate with the Xen project and shouldn't be mixed with the main portion of the FreeBSD kernel. Notably they shouldn't be the target of clean-up commits. Switch to use the headers in sys/contrib/xen. Reviewed by: royger	2022-02-07 10:11:56 +01:00
Elliott Mitchell	b6da4ec609	xen: remove leftover bits missed in commit `ac3ede5371` These fields are now unused, remove them. Fixes: `ac3ede5371` ('x86/xen: remove PVHv1 code') Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D31206	2022-02-07 10:06:27 +01:00
Takanori Watanabe	eb815a7419	atrtc: Install address space handler for \_SB and its descendant. SystemCMOS address space is accessible for system wide. So install address handler in \_SB space. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D33892	2022-01-21 15:32:30 +09:00
Roger Pau Monné	e0516c7553	x86/apic: remove apic_ops All supported Xen instances by FreeBSD provide a local APIC implementation, so there's no need to replace the native local APIC implementation anymore. Leave just the ipi_vectored hook in order to be able to override it with an implementation based on event channels if the underlying local APIC is not virtualized by hardware. Note the hook cannot use ifuncs, because at the point where ifuncs are resolved the kernel doesn't yet know whether it will benefit from using the optimization. Sponsored by: Citrix Systems R&D Reviewed by: kib Differential revision: https://reviews.freebsd.org/D33917	2022-01-18 10:19:04 +01:00
Roger Pau Monné	2450da6776	x86/xen: use x{2}APIC if virtualized by hardware Instead of using event channels or hypercalls to deal with IPIs and NMIs. Using a hardware virtualized APIC should be faster than using any PV interface, since the VM exit can be avoided. Xen exposes whether the domain is using hardware assisted x{2}APIC emulation in a CPUID bit. Sponsored by: Citrix Systems R&D	2022-01-18 10:18:22 +01:00
Roger Pau Monné	ad15eeeaba	x86/xen: fallback when VCPUOP_send_nmi is not available It has been reported that on some AWS instances VCPUOP_send_nmi returns -38 (ENOSYS). The hypercall is only available for HVM guests in Xen 4.7 and newer. Add a fallback to use the native NMI sending procedure when VCPUOP_send_nmi is not available, so that the NMI is not lost. Reported and Tested by: avg MFC after: 1 week Fixes: `b2802351c1` ('xen: fix dispatching of NMIs') Sponsored by: Citrix Systems R&D	2022-01-17 11:06:40 +01:00
Colin Percival	de1292c6ff	Use CPUID leaf 0x40000010 for local APIC freq Some VM systems announce the frequency of the local APIC via the CPUID leaf 0x40000010. Using this allows us to boot slightly faster by avoiding the need for timer calibration. Reviewed by: markj Sponsored by: https://www.patreon.com/cperciva	2022-01-14 17:30:17 -08:00
Colin Percival	4a432614f6	TSC: Use 0x40000010 CPUID leaf for all VM types While this CPUID leaf was originally only used by VMWare, other hypervisors now also use it to announce the TSC frequency to guests. This speeds up the boot process by 100 ms in EC2 and other systems, by allowing the early calibration DELAY to be skipped. Reviewed by: markj Sponsored by: https://www.patreon.com/cperciva	2022-01-14 17:30:17 -08:00
Colin Percival	fd980feb57	Detect CPU type before asking VMWare for TSC freq This allows us to set tsc_is_invariant and select appropriately fenced versions of RDTSC based on the CPU type. Reviewed by: markj Sponsored by: https://www.patreon.com/cperciva	2022-01-14 17:30:17 -08:00
Austin Zhang	e1ef6c0ef2	atrtc: reads Century field from FADT table The ACPI spec describes the FADT->Century field as: The RTC CMOS RAM index to the century of data value (hundred and thousand year decimals). If this field contains a zero, then the RTC centenary feature is not supported. If this field has a non-zero value, then this field contains an index into RTC RAM space that OSPM can use to program the centenary field. Use this field to decide whether to program the CENTURY register of the CMOS RTC device. Reviewed by: akumar3@isilon.com, dab, vangyzen MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D33667 MFC after: 1 week Sponsored by: Dell EMC Isilon	2022-01-13 11:24:00 -06:00
Roger Pau Monné	7d06c761c8	x86/madt: allow Xen guest to use x2APIC mode The old bogus Xen versions that would deliver a GPF when writing to the LAPIC MSR are likely retired, so it's safe to enable x2APIC unconditionally now if available. Tested by: avg Reviewed by: kib Sponsored by: Citrix Systems R&D Differential revision: https://reviews.freebsd.org/D33877	2022-01-13 17:15:24 +01:00
Roger Pau Monné	ca46f3289d	xen: use an hypercall for shutdown and reboot When running as a Xen guest it's easier to use an hypercall in order to do power management operations (power off, power cycle). Do this for all supported guest types (HVM and PVH). Note that for HVM the power operation could also be done using ACPI, but there's no reason to differentiate between PVH and HVM. While there fix the shutdown handler to properly differentiate between power cycle and power off requests. Reported by: Freddy DISSAUX MFC: 1 week Sponsored by: Citrix Systems R&D	2022-01-13 16:54:30 +01:00
Colin Percival	c2705ceaeb	x86: Speed up clock calibration Prior to this commit, the TSC and local APIC frequencies were calibrated at boot time by measuring the clocks before and after a one-second sleep. This was simple and effective, but had the disadvantage of requiring a one-second sleep. Rather than making two clock measurements (before and after sleeping) we now perform many measurements; and rather than simply subtracting the starting count from the ending count, we calculate a best-fit regression between the target clock and the reference clock (for which the current best available timecounter is used). While we do this, we keep track of an estimate of the uncertainty in the regression slope (aka. the ratio of clock speeds), and stop measuring when we believe the uncertainty is less than 1 PPM. In order to avoid the risk of aliasing resulting from the data-gathering loop synchronizing with (a multiple of) the frequency of the reference clock, we add some additional spinning depending upon the iteration number. For numerical stability and simplicity of implementation, we make use of floating-point arithmetic for the statistical calculations. On the author's Dell laptop, this reduces the time spent in calibration from 2000 ms to 29 ms; on an EC2 c5.xlarge instance, it is reduced from 2000 ms to 2.5 ms. Reviewed by: bde (previous version), kib MFC after: 1 month Sponsored by: https://www.patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D33802	2022-01-12 12:34:07 -08:00
John Baldwin	7def1e10b3	bus_dma: Deduplicate locking helper functions. - Move busdma_lock_mutex to subr_bus_dma.c. - Move _busdma_lock_dflt to subr_bus_dma.c. This function was named a couple of different things previously. It is not a public API but an internal helper used in place of a NULL pointer. The prototype is in <sys/bus_dma.h> as not all backends include <sys/bus_dma_internal.h>. Reviewed by: kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D33694	2022-01-05 13:50:40 -08:00
John Baldwin	85b4607324	Deduplicate bus_dma bounce code. Move mostly duplicated code in various MD bus_dma backends to support bounce pages into sys/kern/subr_busdma_bounce.c. This file is currently #include'd into the backends rather than compiled standalone since it requires access to internal members of opaque bus_dma structures such as bus_dmamap_t and bus_dma_tag_t. Reviewed by: kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D33684	2022-01-05 13:50:40 -08:00
Mark Johnston	0e494a9e3f	x86: Skip late calibration if our reference timer has low quality Some AMD Geode-based systems end up using the 8254 PIT to calibrate the TSC during late calibration, which doesn't work because that timecounter's mask (65535) is much smaller than its frequency (1193182). Moreover, early calibration is done against the 8254 timer anyway. Work around the problem by simply using early calibration results if no high-quality timecounters exist. PR: 260868 Fixes: `22875f8879` ("x86: Implement deferred TSC calibration") Reported and tested by: mike@sentex.net, Stefan Hegnauer <stefan.hegnauer@gmx.ch> Reviewed by: imp, kib MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33730	2022-01-03 13:00:50 -05:00
Colin Percival	698727d637	Fix variable name: freq_khz -> freq An earlier version of this code computed the TSC frequency in kHz. When the code was changed to compute the frequency more accurately, the variable name was not updated. Reviewed by: markj Fixes: `22875f8879` x86: Implement deferred TSC calibration MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D33696	2022-01-02 13:07:53 -08:00
Colin Percival	9cb3288287	Skip TSC calibration if exact value known It's possible that the "early" TSC calibration gave us a value which is known to be exact; in that case, skip the later re-calibration. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D33695	2022-01-02 13:07:53 -08:00
Doug Moore	f1e7a532d1	busdma: _bus_dmamap_addseg repaired A recent change introduced a one-off error into a test allowing coalescing chunks into segments. This fixes that error. broke a check in _bus_dmamap_addseg on many architectures. This change makes it clear that it is not a particular range that is being boundary-checked, but the proposed union of the two adjacent ranges. Reported by: se Reviewed by: se Fixes: `c606ab59e7` vm_extern: use standard address checkers everywhere Differential Revision: https://reviews.freebsd.org/D33715	2022-01-02 12:37:05 -06:00
Doug Moore	b7810e05ff	x86-busdma - Add missing paren Reported by: jenkins Fixes: `c606ab59e7` vm_extern: use standard address checkers everywhere	2021-12-31 02:33:54 -06:00
Doug Moore	c606ab59e7	vm_extern: use standard address checkers everywhere Define simple functions for alignment and boundary checks and use them everywhere instead of having slightly different implementations scattered about. Define them in vm_extern.h and use them where possible where vm_extern.h is included. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D33685	2021-12-30 22:09:08 -06:00
Stefan Eßer	e2650af157	Make CPU_SET macros compliant with other implementations The introduction of <sched.h> improved compatibility with some 3rd party software, but caused the configure scripts of some ports to assume that they were run in a GLIBC compatible environment. Parts of sched.h were made conditional on -D_WITH_CPU_SET_T being added to ports, but there still were compatibility issues due to invalid assumptions made in autoconfigure scripts. The differences between the FreeBSD version of macros like CPU_AND, CPU_OR, etc. and the GLIBC versions was in the number of arguments: FreeBSD used a 2-address scheme (one source argument is also used as the destination of the operation), while GLIBC uses a 3-adderess scheme (2 source operands and a separately passed destination). The GLIBC scheme provides a super-set of the functionality of the FreeBSD macros, since it does not prevent passing the same variable as source and destination arguments. In code that wanted to preserve both source arguments, the FreeBSD macros required a temporary copy of one of the source arguments. This patch set allows to unconditionally provide functions and macros expected by 3rd party software written for GLIBC based systems, but breaks builds of externally maintained sources that use any of the following macros: CPU_AND, CPU_ANDNOT, CPU_OR, CPU_XOR. One contributed driver (contrib/ofed/libmlx5) has been patched to support both the old and the new CPU_OR signatures. If this commit is merged to -STABLE, the version test will have to be extended to cover more ranges. Ports that have added -D_WITH_CPU_SET_T to build on -CURRENT do no longer require that option. The FreeBSD version has been bumped to 1400046 to reflect this incompatible change. Reviewed by: kib MFC after: 2 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D33451	2021-12-30 12:20:32 +01:00
Mark Johnston	0ecda8d5ae	x86: Do not attempt to calibrate the LAPIC timer if no APIC is present Reported and tested by: Michael Butler <imb@protected-networks.net> Reviewed by: jhb, kib Fixes: `62d09b46ad` ("x86: Defer LAPIC calibration until after timecounters are available") MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33669	2021-12-28 17:47:49 -05:00
Mark Johnston	deca0138dc	x86: Check for APIC presence only if DEV_ATPIC is defined We only attempt to gracefully handle absence of an APIC if "device atpic" is defined in the kernel configuration. Suggested by: kib Reviewed by: jhb, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation	2021-12-28 17:47:49 -05:00
John Baldwin	254e4e5b77	Simplify swi for bus_dma. When a DMA request using bounce pages completes, a swi is triggered to schedule pending DMA requests using the just-freed bounce pages. For a long time this bus_dma swi has been tied to a "virtual memory" swi (swi_vm). However, all of the swi_vm implementations are the same and consist of checking a flag (busdma_swi_pending) which is always true and if set calling busdma_swi. I suspect this dates back to the pre-SMPng days and that the intention was for swi_vm to serve as a mux. However, in the current scheme there's no need for the mux. Instead, remove swi_vm and vm_ih. Each bus_dma implementation that uses bounce pages is responsible for creating its own swi (busdma_ih) which it now schedules directly. This swi invokes busdma_swi directly removing the need for busdma_swi_pending. One consequence is that the swi now works on RISC-V which had previously failed to invoke busdma_swi from swi_vm. Reviewed by: imp, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D33447	2021-12-28 13:51:25 -08:00
Alexander Motin	1d6fb900ed	x86: Remove CTLFLAG_NEEDGIANT from sysctls. MFC after: 2 weeks	2021-12-25 22:24:20 -05:00
Corvin Köhne	16f02a4cb4	pci: add missing PCI id of Coffee Lake GPU The PCI id of an UHD Graphics 630 for Coffee Lake GPUs is missing in the PCI id list of all Intel GPUs. You can take a look at https://dgpu-docs.intel.com/devices/hardware-table.html to check that this device id exists. Or check the linux code: `d0e062ebb3` MFC after: 1 week Differential revision: https://reviews.freebsd.org/D33460	2021-12-17 23:18:31 +02:00
Mateusz Guzik	e7236a7ddf	xen: plug some of set-but-not-used vars Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-12-15 13:46:17 +00:00
Mateusz Guzik	ceed3949bc	x86: plug a set-but-not-unused var in native_lapic_ipi_free Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-12-10 11:55:03 +00:00
Alexander Motin	8493918868	busdma: Remove outdated comments about Giant. MFC after: 2 weeks	2021-12-09 22:18:53 -05:00
Alexander Motin	a69f810466	smist: Remove unneeded Giant from bus_dma_tag_create(). bus_dmamap_load() call uses BUS_DMA_NOWAIT. MFC after: 2 weeks	2021-12-09 20:54:22 -05:00
John Baldwin	1a62e9bc00	Add <machine/tls.h> header to hold MD constants and helpers for TLS. The header exports the following: - Definition of struct tcb. - Helpers to get/set the tcb for the current thread. - TLS_TCB_SIZE (size of TCB) - TLS_TCB_ALIGN (alignment of TCB) - TLS_VARIANT_I or TLS_VARIANT_II - TLS_DTV_OFFSET (bias of pointers in dtv[]) - TLS_TP_OFFSET (bias of "thread pointer" relative to TCB) Note that TLS_TP_OFFSET does not account for if the unbiased thread pointer points to the start of the TCB (arm and x86) or the end of the TCB (MIPS, PowerPC, and RISC-V). Note also that for amd64, the struct tcb does not include the unused tcb_spare field included in the current structure in libthr. libthr does not use this field, and the existing calls in libc and rtld that allocate a TCB for amd64 assume it is the size of 3 Elf_Addr's (and thus do not allocate room for tcb_spare). A <sys/_tls_variant_i.h> header is used by architectures using Variant I TLS which uses a common struct tcb. Reviewed by: kib (older version of x86/tls.h), jrtc27 Sponsored by: The University of Cambridge, Google Inc. Differential Revision: https://reviews.freebsd.org/D33351	2021-12-09 13:17:13 -08:00
Bjoern A. Zeeb	df38ada293	modules: increase MAXMODNAME and provide backward compat With various firmware files used by graphics and wireless drivers we are exceeding the current 32 character module name (file path in kldxref) length. In order to overcome this issue bump it to the maximum path length for the next version. To be able to MFC provide backward compat support for another version of the struct as the offsets for the second half change due to the array size increase. MAXMODNAME being defined to MAXPATHLEN needs param.h to be included first. With only 7 modules (or LinuxKPI module.h) not doing that adjust them rather than including param.h in module.h [1]. Reported by: Greg V (greg unrelenting.technology) Sponsored by: The FreeBSD Foundation Suggested by: imp [1] MFC after: 10 days Reviewed by: imp (and others to different level) Differential Revision: https://reviews.freebsd.org/D32383	2021-12-09 18:09:53 +00:00
Alexander Motin	63346fef33	mca: Some error handling logic improvements. - Enable local MCEs on capable Intel CPUs. It delivers exceptions only to the affected CPU instead of global broadcast, requiring a lot of synchronization between CPUs. AMD always deliver MCEs locally. - Make MCE handler process only uncorrected errors, while CMCI and polling only corrected. It reduces synchronization problems between them and is explicitly recommended by the documentation. - Add minimal support for uncorrected software recoverable errors on Intel CPUs. It allows to avoid kernel panics in case uncorrected errors do not affect current operation, like ones found during scrub or write. Such errors are only logged, postponing the panic until the corrupted data will actually be needed (that may never happen). - Reduce polling period from 1 hour to 5 minutes. MFC after: 2 weeks	2021-12-08 21:39:24 -05:00
Alexander Motin	9a128e1678	mca: Switch to using taskqueue_enqueue_timeout_sbt(). Previously it was not allowed on fast taskqueues. It was fixed in `4730a8972b`. This should make no functional change, just a bit cleaner and efficient code. MFC after: 1 week	2021-12-08 12:29:15 -05:00
Alexander Motin	3bdba24c74	mca: Decode new Intel status bits. MFC after: 1 week	2021-12-08 12:03:28 -05:00
Alexander Motin	935dc0de88	mca: Remove excessively verbose debug messages. Expecially in case of AMD there was more than dozen lines per CPU. MFC after: 1 week	2021-12-07 22:27:09 -05:00
Alexander Motin	c2003f2684	mca: Make some sysctls also a loader tunables. MFC after: 1 week	2021-12-07 22:22:01 -05:00
Mark Johnston	553af8f1ec	x86: Perform late TSC calibration before LAPIC timer calibration This ensures that LAPIC calibration is done using the correct tsc_freq value, i.e., the one associated with the TSC timecounter. It does mean though that TSC calibration cannot use sbinuptime() to read the reference timecounter, as timehands are not yet set up. Reviewed by: kib, jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33209	2021-12-06 10:42:19 -05:00
Mark Johnston	62d09b46ad	x86: Defer LAPIC calibration until after timecounters are available This ensures that we have a good reference timecounter for performing calibration. Change lapic_setup to avoid configuring the timer when booting, and move calibration and initial configuration to a new lapic routine, lapic_calibrate_timer. This calibration will be initiated from cpu_initclocks(), before an eventtimer is selected. Reviewed by: kib, jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33206	2021-12-06 10:42:10 -05:00
Mark Johnston	f06f1d1fdb	x86: Deduplicate clock.h The headers were mostly identical on amd64 and i386. No functional change intended. Reviewed by: cperciva, mav, imp, kib, jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33205	2021-12-06 10:39:08 -05:00
Scott Long	c0ea0b4989	Fix "set but not used" in the x86 pci driver. Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-12-05 15:10:16 -07:00
Mitchell Horne	03b3d7bbec	x86: remove unused T_USER flag It stopped being used in `3c256f5395`, when trap() was reorganized to have separate switch statements for user and kernel traps. Remove the two leftover references and the flag itself. Reviewed by: kib MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D33253	2021-12-05 11:12:40 -04:00
Alexander Motin	a07d426509	x86: Make AMD elvtX dump more compact. MFC after: 2 weeks	2021-12-04 21:47:19 -05:00
Scott Long	d85a58cb0c	Fix "set but not used" in busdma_bounce. Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-12-03 15:20:42 -07:00
Mitchell Horne	1adebe3cd6	minidump: Parameterize minidumpsys() The minidump code is written assuming that certain global state will not change, and rightly so, since it executes from a kernel debugger context. In order to support taking minidumps of a live system, we should allow copies of relevant global state that is likely to change to be passed as parameters to the minidumpsys() function. This patch does the work of parameterizing this function, by adding a struct minidumpstate argument. For now, this struct allows for copies of the kernel message buffer, and the bitset that tracks which pages should be dumped (vm_page_dump). Follow-up changes will actually make use of these arguments. Notably, dump_avail[] does not need a snapshot, since it is not expected to change after system initialization. The existing minidumpsys() definitions are renamed, and a thin MI wrapper is added to kern_dump.c, which handles the construction of the state struct. Thus, calling minidumpsys() remains as simple as before. Reviewed by: kib, markj, jhb Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D31989	2021-11-19 15:05:52 -04:00
Mark Johnston	22875f8879	x86: Implement deferred TSC calibration There is no universal way to find the TSC frequency. Newer Intel CPUs may report it via CPUID leaves 0x15 and 0x16. Sometimes it can be obtained from the PLATFORM_INFO MSR as well, though we never use that. On older platforms we derive the frequency using a DELAY(1000000) call, which uses the 8254 PIT. On some newer platforms the 8254 is apparently non-functional, leading to bogus calibration results. On such platforms the TSC frequency must be available from CPUID. It is also possible to disable calibration with a tunable, in which case we try to parse the brand string if the TSC freq is not available from CPUID. CPUID 0x15 provides an authoritative TSC frequency value, but even that is not always available on new Intel platforms. CPUID 0x16 provides the specified processor base frequency, which is not the same as the TSC frequency. Empirically, it is close enough for early boot, but too far off for timekeeping: on a Comet Lake NUC, CPUID 0x16 yields 1600MHz but the TSC frequency is rougly 1608MHz, leading to frequent clock stepping when NTP is in use. Thus we have a situation where we cannot calibrate using the PIT and cannot obtain a precise frequency from CPUID (or MSRs). This change seeks to address that by using the CPUID 0x16 value during early boot and refining the calibration later once ACPI-based timecounters are available. TSC frequency detection is thus split into two phases: Early phase: - On Intel platforms, query CPUID 0x15 and 0x16 and use that value initially if available. - Otherwise, get an estimate using the PIT, reducing the delay loop to 100ms from 1s. - Continue to register the TSC as the CPU ticks provider early, even though the frequency may be off. Otherwise any code executed during boot that uses cpu_ticks() (e.g., context switching) gets tripped up when the ticks provider changes. Later phase: - In SI_SUB_CLOCKS, once the timehands are initialized, load the current TSC and timecounter (sbinuptime()) values at the beginning and end of a 1s interval and use the timecounter frequency (typically from kvmclock, HPET or the ACPI PM timer) to estimate the TSC frequency. - Update the TSC timecounter, global tsc_freq and CPU ticker with the new frequency and finally register the TSC as a timecounter. Reviewed by: kib, jhb (previous version) Discussed with: imp, cperciva MFC after: 6 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32512	2021-11-15 16:13:24 -05:00
Mark Johnston	ab12e8db29	amd64: Reduce the amount of cpuset copying done for TLB shootdowns We use pmap_invalidate_cpu_mask() to get the set of active CPUs. This (32-byte) set is copied by value through multiple frames until we get to smp_targeted_tlb_shootdown(), where it is copied yet again. Avoid this copying by having smp_targeted_tlb_shootdown() make a local copy of the active CPUs for the pmap, and drop the cpuset parameter, simplifying callers. Also leverage the use of the non-destructive CPU_FOREACH_ISSET to avoid unneeded copying within smp_targeted_tlb_shootdown(). Reviewed by: alc, kib Tested by: pho MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32792	2021-11-15 13:01:31 -05:00
Alexander Motin	6badb512a9	Prefer CPUID leaf 1Fh for Intel CPU topology detection. Leaf 1Fh is a prefered extended version of 0Bh. It is supported by new Lader Lake CPUs, though does not report anything new so far. MFC after: 2 weeks	2021-11-06 00:53:52 -04:00
Kyle Evans	6a8ea6d174	sched: split sched_ap_entry() out of sched_throw() sched_throw() can no longer take a NULL thread, APs enter through sched_ap_entry() instead. This completely removes branching in the common case and cleans up both paths. No functional change intended. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D32829	2021-11-05 15:45:51 -05:00
Kyle Evans	589aed00e3	sched: separate out schedinit_ap() schedinit_ap() sets up an AP for a later call to sched_throw(NULL). Currently, ULE sets up some pcpu bits and fixes the idlethread lock with a call to sched_throw(NULL); this results in a window where curthread is setup in platforms' init_secondary(), but it has the wrong td_lock. Typical platform AP startup procedure looks something like: - Setup curthread - ... other stuff, including cpu_initclocks_ap() - Signal smp_started - sched_throw(NULL) to enter the scheduler cpu_initclocks_ap() may have callouts to process (e.g., nvme) and attempt to sched_add() for this AP, but this attempt fails because of the noted violated assumption leading to locking heartburn in sched_setpreempt(). Interrupts are still disabled until cpu_throw() so we're not really at risk of being preempted -- just let the scheduler in on it a little earlier as part of setting up curthread. Reviewed by: alfredo, kib, markj Triage help from: andrew, markj Smoke-tested by: alfredo (ppc), kevans (arm64, x86), mhorne (arm) Differential Revision: https://reviews.freebsd.org/D32797	2021-11-03 15:54:59 -05:00
Kornel Duleba	06e6ca6dd3	dmar: Disable protected memory regions after initialization Some BIOSes protect memory region they reside in by using DMAR to prevent devices from doing any DMA transactions to that part of RAM. AMI refers to this as "DMA Control Guarantee". Disable the protection when address translation is enabled. I stumbled upon this while investigation a failing coredump on a device which has this feature enabled. Sponsored by: Stormshield Obtained from: Semihalf Reviewed by: kib Differential revision: https://reviews.freebsd.org/D32591	2021-10-29 10:08:25 +02:00
Kornel Duleba	3c02da8096	dmar: Don't try to reserve PCI regions for non-existing devices In some cases we might have to create DMAR context before the corresponding device has been enumerated by the PCI bus. In that case we get called with NULL dev, because of that trying to reserve PCI regions causes a NULL pointer dereference in pci_find_pcie_root_port. Sponsored by: Stormshield Obtained from: Semihalf MFC after: 2 weeks Reviewed by: kib, rlibby Differential revision: https://reviews.freebsd.org/D32589	2021-10-29 10:08:25 +02:00
Gleb Smirnoff	6aae3517ed	Retire synchronous PPP kernel driver sppp(4). The last two drivers that required sppp are cp(4) and ce(4). These devices are still produced and can be purchased at Cronyx <http://cronyx.ru/hardware/wan.html>. Since Roman Kurakin <rik@FreeBSD.org> has quit them, they no longer support FreeBSD officially. Later they have dropped support for Linux drivers to. As of mid-2020 they don't even have a developer to maintain their Windows driver. However, their support verbally told me that they could provide aid to a FreeBSD developer with documentaion in case if there appears a new customer for their devices. These drivers have a feature to not use sppp(4) and create an interface, but instead expose the device as netgraph(4) node. Then, you can attach ng_ppp(4) with help of ports/net/mpd5 on top of the node and get your synchronous PPP. Alternatively you can attach ng_frame_relay(4) or ng_cisco(4) for HDLC. Actually, last time I used cp(4) back in 2004, using netgraph(4) instead of sppp(4) was already the right way to do. Thus, remove the sppp(4) related part of the drivers and enable by default the negraph(4) part. Further maintenance of these drivers in the tree shouldn't be a big deal. While doing that, remove some cruft and enable cp(4) compilation on amd64. The ce(4) for some unknown reason marks its internal DDK functions with __attribute__ fastcall, which most likely is safe to remove, but without hardware I'm not going to do that, so ce(4) remains i386-only. Reviewed by: emaste, imp, donner Differential Revision: https://reviews.freebsd.org/D32590 See also: https://reviews.freebsd.org/D23928	2021-10-22 11:41:36 -07:00
Konstantin Belousov	661bd70bd7	DMAR: clean up warnings about write-only variables For some of them, used only when KTR or KMSAN are configured, apply __unused attribute directly. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2021-10-21 21:40:46 +03:00
Mark Johnston	06ebadc5f5	x86: Remove some leftover APM support This is obsolete since commit `8c576a279e` ("Remove APM BIOS support"). Reviewed by: imp, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32510	2021-10-18 09:56:59 -04:00
Mark Johnston	de8554295b	cpuset(9): Add CPU_FOREACH_IS(SET\|CLR) and modify consumers to use it This implementation is faster and doesn't modify the cpuset, so it lets us avoid some unnecessary copying as well. No functional change intended. This is a re-application of commit `9068f6ea69`. Reviewed by: cem, kib, jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32029	2021-10-18 09:56:58 -04:00
Konstantin Belousov	f8d3368b43	apic: initialize lapic_paddr statically The default value for LAPIC registers page physical address is usually right. Having this value available early makes pmap_force_invalidate_cache_range(), used on non-self-snoop machines, avoid flushing LAPIC range for early calls. Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32318	2021-10-06 05:52:56 +03:00
Mitchell Horne	ab4ed843a3	minidump: De-duplicate the progress bar The implementation of the progress bar is simple, but duplicated for most minidump implementations. Extract the common bits to kern_dump.c. Ensure that the bar is reset with each subsequent dump; this was only done on some platforms previously. Reviewed by: markj MFC after: 2 weeks Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D31885	2021-09-29 16:42:21 -03:00
Konstantin Belousov	24a3897c2c	x86 bounce_bus_dmamem_alloc(): use malloc_aligned() only when possible malloc_domainset_aligned() requires that alignment is less than page size. Fall back to other allocation methods, most likely kmem_alloc_contig(), when malloc_aligned() cannot fullfill the driver request. Reported by: Loic F <loic.f@hardenedbsd.org> Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32127	2021-09-25 15:58:12 +03:00
Alexander Motin	d3a8f98acb	Make CPU children explicitly share parent unit numbers. Before this device unit number match was coincidental and broke if I disabled some CPU device(s). Aside of cosmetics, for some drivers (may be considered broken) it caused talking to wrong CPUs.	2021-09-24 23:31:51 -04:00
Alexander Motin	ef50d5fbc3	x86: Add NUMA nodes into CPU topology. Depending on hardware, NUMA nodes may match last level caches, or they may be above them (AMD Zen 2/3) or below (Intel Xeon w/ SNC). This information is provided by ACPI instead of CPUID, and it is provided for each CPU individually instead of mask widths, but this code should be able to properly handle all the above cases. This change should immediately allow idle stealing in sched_ule(4) to prefer load from NUMA-local CPUs to remote ones when the node does not match LLC. Later we may think of how to better handle it on sched_pickcpu() side. MFC after: 1 month	2021-09-23 14:31:38 -04:00
Konstantin Belousov	e36d0e86e3	Revert "linux32: add a hack to avoid redefining the type of the savefpu tag" This reverts commit `0f6829488e`. Also it changes the type of md_usr_fpu_save struct mdthread member to void *, which is what uncovered this trouble. Now the save area is untyped, but since it is hidden behind accessors, it is not too significant. Since apparently there are consumers affected outside the tree, this hack is better than one from the reverted revision. PR: 258678 Reported by: cy Reviewed by: cy, kevans, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32060	2021-09-22 23:17:47 +03:00
Mark Johnston	bcdc599dc2	Revert "cpuset(9): Add CPU_FOREACH_IS(SET\|CLR) and modify consumers to use it" This reverts commit `9068f6ea69`. The underlying macro needs to be reworked to avoid problems with control flow statements. Reported by: rlibby	2021-09-21 13:51:42 -04:00
Konstantin Belousov	0f6829488e	linux32: add a hack to avoid redefining the type of the savefpu tag when compiling in amd64 kernel environment with -m32. This is a temporal workaround for some future proper (but unclear) fix. Reviewed by: jhb, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31954	2021-09-21 20:20:15 +03:00
Mark Johnston	9068f6ea69	cpuset(9): Add CPU_FOREACH_IS(SET\|CLR) and modify consumers to use it This implementation is faster and doesn't modify the cpuset, so it lets us avoid some unnecessary copying as well. No functional change intended. Reviewed by: cem, kib, jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32029	2021-09-21 12:07:47 -04:00
Konstantin Belousov	2b6eec531a	x86: duplicate acpi_wakeup.c per i386 and amd64 The file as is is the maze of #ifdef passages, all slightly different. Divorcing i386 and amd64 version actually makes changing the code easier, also no changes for i386 are planned. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31931	2021-09-14 00:23:14 +03:00
Konstantin Belousov	db2ba218d9	amd64 acpi_wakeup: map 1:1 whole low 4G for the trampoline page table This is required since kernel text might be physically located anywhere below 4G. PR: 258432 Reported by: Taku YAMAMOTO <taku@tackymt.homeip.net> Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31916	2021-09-13 19:52:13 +03:00
Konstantin Belousov	ceca8ac1ce	x86 acpi_install_wakeup_handler(): style Do not use tab between type and variable name in local declarations. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31916	2021-09-13 19:52:06 +03:00
Konstantin Belousov	e99255c8a6	amd64: do not touch low memory in acpi_wakeup_ap() if booted by UEFI Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31916	2021-09-13 19:51:52 +03:00
Colin Percival	cd165c8bf0	x86/tsc.c: Add TSLOG to test_tsc On my benchmark system this takes ~ 14 ms; enough to be worth recording in the boot time profile.	2021-09-09 17:02:15 -07:00
Andrew Turner	b792434150	Create sys/reg.h for the common code previously in machine/reg.h Move the common kernel function signatures from machine/reg.h to a new sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2). Reviewed by: imp, markj Sponsored by: DARPA, AFRL (original work) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19830	2021-08-30 12:50:53 +01:00
Dimitry Andric	8e3c56d6b6	xen: Fix warning by adding KERNBASE to modlist_paddr before casting Clang 13 produces the following warning for hammer_time_xen(): sys/x86/xen/pv.c:183:19: error: the pointer incremented by -2147483648 refers past the last possible element for an array in 64-bit address space containing 256-bit (32-byte) elements (max possible 576460752303423488 elements) [-Werror,-Warray-bounds] (vm_paddr_t)start_info->modlist_paddr + KERNBASE; ^ ~~~~~~~~ sys/xen/interface/arch-x86/hvm/start_info.h:131:5: note: array 'modlist_paddr' declared here uint64_t modlist_paddr; /* Physical address of an array of / ^ This is because the expression first casts start_info->modlist_paddr to struct hvm_modlist_entry (via vmpaddr_t), and then adds KERNBASE, which is then interpreted as KERNBASE * sizeof(struct hvm_modlist_entry). Instead, parenthesize the addition to get the intended result, and cast it to struct hvm_modlist_entry * afterwards. Also remove the cast to vmpaddr_t since it is not necessary. Reviewed by: royger MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D31711	2021-08-29 19:43:00 +02:00
Adam Fenn	d4b2d3035a	pvclock: Add vDSO support Add vDSO support for timekeeping devices that support the KVM/XEN paravirtual clock API. Also, expose, in the userspace-accessible '<machine/pvclock.h>', definitions that will be needed by 'libc' to support 'VDSO_TH_ALGO_X86_PVCLK'. Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D31418	2021-08-14 15:57:54 +03:00
Adam Fenn	6c69c6bb4c	kvm_clock: KVM paravirtual clock support Add support for the KVM paravirtual clock device. Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D29733	2021-08-14 15:57:54 +03:00
Adam Fenn	0b3382b863	pvclock: Add 'struct pvclock' API Consolidate more hypervisor-agnostic functionality behind a new 'struct pvclock' API. This should also make it easier to subsequently add hypervisor-agnostic vDSO timekeeping support. Also, perform some clean-up: - Remove 'pvclock_get_last_cycles()'; do not allow external access to 'pvclock_last_systime' since this is not necessary. - Consolidate/simplify wall and system time reading codepaths. - Ensure correct ordering within wall and system time reading codepaths via 'atomic(9)' and 'rdtsc_ordered()' rather than via 'rmb()'. - Remove some extra newlines. Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D31418	2021-08-14 15:57:54 +03:00
Adam Fenn	652ae7b114	x86: cpufunc: Add rdtsc_ordered() Add a variant of 'rdtsc()' that performs the ordered version of 'rdtsc' appropriate for the invoking x86 variant. Also, expose the 'lfence'-ed and 'mfence'-ed 'rdtsc()' variants needed by 'rdtsc_ordered()' for general use. Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D31416	2021-08-14 15:57:53 +03:00
NagaChaitanya Vellanki	2a9b4076dc	Merge common parts of i386 and amd64's ieeefp.h into x86/x86_ieeefp.h MFC after: 1 week Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D26292	2021-08-12 18:45:22 +08:00
Dmitry Chagin	de8374df28	fork: Allow ABI to specify fork return values for child. At least Linux x86 ABI's does not use carry bit and expects that the dx register is preserved. For this add a new sv_set_fork_retval hook and call it from cpu_fork(). Add a short comment about touching dx in x86_set_fork_retval(), for more details see phab comments from kib@ and imp@. Reviewed by: kib Differential revision: https://reviews.freebsd.org/D31472 MFC after: 2 weeks	2021-08-12 11:45:25 +03:00
Mark Johnston	693c9516fa	busdma: Add KMSAN integration Sanitizer instrumentation of course cannot automatically update shadow state when devices write to host memory. KMSAN thus hooks into busdma, both to update shadow state after a device write, and to verify that the kernel does not publish uninitalized bytes to devices. To implement this, when KMSAN is configured, each dmamap embeds a memory descriptor describing the region currently loaded into the map. bus_dmamap_sync() uses the operation flags to determine whether to validate the loaded region or to mark it as initialized in the shadow map. Note that in cases where the amount of data written is less than the buffer size, the entire buffer is marked initialized even when it is not. For example, if a NIC writes a 128B packet into a 2KB buffer, the entire buffer will be marked initialized, but subsequent accesses past the first 128 bytes are likely caused by bugs. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31338	2021-08-10 21:27:54 -04:00
Mark Johnston	3a1802fef4	busdma: Add an internal BUS_DMA_FORCE_MAP flag to x86 bounce_busdma Use this flag to indicate that busdma should allocate a map structure even no bouncing is required to satisfy the tag's constraints. This will be used for KMSAN. Also fix a memory leak that can occur if the kernel fails to allocate bounce pages in bounce_bus_dmamap_create(). Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31338	2021-08-10 21:27:54 -04:00
Mark Johnston	b0f71f1bc5	amd64: Add MD bits for KMSAN Interrupt and exception handlers must call kmsan_intr_enter() prior to calling any C code. This is because the KMSAN runtime maintains some TLS in order to track initialization state of function parameters and return values across function calls. Then, to ensure that this state is kept consistent in the face of asynchronous kernel-mode excpeptions, the runtime uses a stack of TLS blocks, and kmsan_intr_enter() and kmsan_intr_leave() push and pop that stack, respectively. Use these functions in amd64 interrupt and exception handlers. Note that handlers for user->kernel transitions need not be annotated. Also ensure that trap frames pushed by the CPU and by handlers are marked as initialized before they are used. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31467	2021-08-10 21:27:53 -04:00
Ed Maste	9feff969a0	Remove "All Rights Reserved" from FreeBSD Foundation sys/ copyrights These ones were unambiguous cases where the Foundation was the only listed copyright holder (in the associated license block). Sponsored by: The FreeBSD Foundation	2021-08-08 10:42:24 -04:00
Konstantin Belousov	d0bc4b4666	x86_msr_op: extend the KPI to allow MSR read and single-CPU operations Reivewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31386	2021-08-05 18:46:37 +03:00
Mark Johnston	b2ed7e988a	bus: Convert to the new interceptor scheme This was missed in commit `a90d053b84`. Fixes: `a90d053b84` MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-07-30 15:15:27 -04:00
Mark Johnston	a90d053b84	Simplify kernel sanitizer interceptors KASAN and KCSAN implement interceptors for various primitive operations that are not instrumented by the compiler. KMSAN requires them as well. Rather than adding new cases for each sanitizer which requires interceptors, implement the following protocol: - When interceptor definitions are required, define SAN_NEEDS_INTERCEPTORS and SANITIZER_INTERCEPTOR_PREFIX. - In headers that declare functions which need to be intercepted by a sanitizer runtime, use SANITIZER_INTERCEPTOR_PREFIX to provide declarations. - When SAN_RUNTIME is defined, do not redefine the names of intercepted functions. This is typically the case in files which implement sanitizer runtimes but is also needed in, for example, files which define ifunc selectors for intercepted operations. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-07-29 21:13:32 -04:00
Konstantin Belousov	b27fe1c3ba	amd64: stop doing special allocation for the AP startup trampoline There is no reason now why do we need to allocate trampoline page very early in the boot process. The only requirement for the page is that it is below 1M to be usable by the real mode during init. This can be handled by vm_alloc_contig() when we do the startup. Also assert that startup trampoline fits into single page. In principle we can do multi-page allocation if needed, but it is not. Move the alloc_ap_trampoline() function and the boot_address variable to i386/mp_machdep.c. Keep existing mechanism of early alloc on i386. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D31343	2021-07-30 01:20:45 +03:00
Alexander Motin	5a49f19141	Do not expose to scheduler caches of single CPU. Before this change my dual-Xeon(R) Gold 6242R always reported 3 levels or topology (root, package/L3 and core/L2). But with SMT disabled core/L2 matches thread, so additional topology level only causes more traversal work. With this change SMT case is reported same as before, while non-SMT is reported with only 2 much more simple levels. MFC after: 2 weeks	2021-07-28 16:38:01 -04:00
Julien Grall	ac959cf544	xen: introduce xen_has_percpu_evtchn() xen_vector_callback_enabled is x86 specific and availability of per-cpu event channel delivery differs on other architectures. Introduce a new helper to check if there's support for per-cpu event channel injection. Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com> Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D29402	2021-07-28 17:27:05 +02:00
Julien Grall	0b4f30c236	xen/control: introduce xen_pv_shutdown_handler() While x86 only register PV shutdown handler for PV guests. ARM guests are always using HVM and requires the PV shutdown handler. Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com> Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D29406	2021-07-28 17:27:04 +02:00
Julien Grall	69c6eee756	xen: introduce xen_pv_disks_disabled() ARM guest is considered as HVM in Freebsd but they only support PV disk (no emulation available). Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com> Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D29403	2021-07-28 17:27:04 +02:00
Julien Grall	5f70008327	xen/netfront: introduce xen_pv_nics_disabled() ARM guest is considered as HVM but it only supports PV nics (no emulation available). Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com> Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D29405	2021-07-28 17:27:04 +02:00
Elliott Mitchell	c89f1f12b0	xen/xen-os: move inclusion of machine/xen-os.h later Several of x86 enable/disable functions depend upon the xendomain() functions. As such the xendomain() functions need to be declared before machine/xen-os.h. Officially declare direct inclusion of machine/xen/xen-os.h verboten as such will break these functions/macros. Remove one such soon to be broken inclusion. Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D29811	2021-07-28 17:27:04 +02:00
Elliott Mitchell	9976c5a540	xen/intr: use __func__ instead of function names Functions tend to get renamed and unless the developer is careful often debugging messages are missed. As such using func is far superior. Replace several instances of hard-coded function names. Reviewed by: royger Differential revision: https://reviews.freebsd.org/D29499	2021-07-28 17:27:03 +02:00
Elliott Mitchell	5ca00e0c98	xen/intr: use struct xenisrc * as xen_intr_handle_t Since xen_intr_handle_t is meant to be an opaque handle and the only use is retrieving the associated struct xenisrc *, directly use it as the opaque handler. Also add a wrapper function for converting the other direction. If some other value becomes appropriate in the future, these two functions will be the only spots needing modification. Reviewed by: mhorne, royger Differential Revision: https://reviews.freebsd.org/D29500	2021-07-28 17:27:03 +02:00
Elliott Mitchell	b6ff9345a4	xen: create VM_MEMATTR_XEN for Xen memory mappings The requirements for pages shared with Xen/other VMs may vary from architecture to architecture. As such create a macro which various architectures can use. Remove a use of PAT_WRITE_BACK in xenstore.c. This is a x86-ism which shouldn't have been present in a common area. Original idea: Julien Grall <julien@xen.org>, 2014-01-14 06:44:08 Approach suggested by: royger Reviewed by: royger, mhorne Differential Revision: https://reviews.freebsd.org/D29351	2021-07-28 17:27:02 +02:00
Julien Grall	a48f7ba444	xen: move x86/xen/xenpv.c to dev/xen/bus/xenpv.c Minor changes are necessary to make this processor-independent, but moving the file out of x86 and into common is the first step (so others don't add /more/ x86-isms). Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com> Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D29042	2021-07-28 17:27:02 +02:00
Konstantin Belousov	d6717f8778	amd64: rework AP startup Stop using temporal page table with 1:1 mapping of low 1G populated over the whole VA. Use 1:1 mapping of low 4G temporarily installed in the normal kernel page table. The features are: - now there is one less step for startup asm to perform - the startup code still needs to be at lower 1G because CPU starts in real mode. But everything else can be located anywhere in low 4G because it is accessed by non-paged 32bit protected mode. Note that kernel page table root page is at low 4G, as well as the kernel itself. - the page table pages can be allocated by normal allocator, there is no need to carve them from the phys_avail segments at very early time. The allocation of the page for startup code still requires some magic. Pages are freed after APs are ignited. - la57 startup for APs is less tricky, we directly load the final page table and do not need to tweak the paging mode. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31121	2021-07-27 20:11:15 +03:00
Mark Johnston	f95e683fa2	Annotate amd64 stack unwinders with __nomemorysanitize Sponsored by: The FreeBSD Foundation	2021-07-23 10:47:13 -04:00
Dmitry Chagin	1ca6b15bbd	Drop "All rights reserved" from my copyright statements. Add email and fixup years while here. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D30912 MFC after: 2 weeks	2021-07-20 10:05:50 +03:00
Dmitry Chagin	9931033bbf	linux(4); Almost complete the vDSO. The vDSO (virtual dynamic shared object) is a small shared library that the kernel maps R/O into the address space of all Linux processes on image activation. The vDSO is a fully formed ELF image, shared by all processes with the same ABI, has no process private data. The primary purpose of the vDSO: - non-executable stack, signal trampolines not copied to the stack; - signal trampolines unwind, mandatory for the NPTL; - to avoid contex-switch overhead frequently used system calls can be implemented in the vDSO: for now gettimeofday, clock_gettime. The first two have been implemented, so add the implementation of system calls. System calls implemenation based on a native timekeeping code with some limitations: - ifunc can't be used, as vDSO r/o mapped to the process VA and rtld can't relocate symbols; - reading HPET memory is not implemented for now (TODO). In case on any error vDSO system calls fallback to the kernel system calls. For unimplemented vDSO system calls added prototypes which call corresponding kernel system call. Tested by: trasz (arm64) Differential revision: https://reviews.freebsd.org/D30900 MFC after: 2 weeks	2021-07-20 10:01:18 +03:00
Mark Johnston	36226163fa	x86: Mark the trapframe as initialized in ipi_bitmap_handler() Otherwise KASAN may generate false positives if the trapframe was written into a poisoned region of the stack. Reported by: pho Reported by: syzbot+ee60455cd58e6eed20c9@syzkaller.appspotmail.com Reported by: syzbot+be5f9df26426ace3a00c@syzkaller.appspotmail.com Sponsored by: The FreeBSD Foundation	2021-07-09 20:38:50 -04:00
Alex Richardson	9bb8a4091c	Reduce code duplication in machine/_types.h Many of these typedefs are the same across all architectures or can be set based on an architecture-independent compiler-provided macro (e.g. __SIZEOF_SIZE_T__). These macros have been available since GCC 4.6 and Clang sometime before 3.0 (godbolt.org does not have any older clang versions installed). I originally considered using the compiler-provided `__FOO_TYPE__` directly. However, in order to do so we have to check that those match the previous typedef exactly (not just that they have the same size) since any change would be an ABI break. For example, changing `long` to `long long` results in different C++ name mangling. Additionally, Clang and GCC disagree on the underlying type for some of (u)int*_fast_t types, so this change only moves the definitions that are identical across all architectures and does not touch those types. This de-deduplication will allow us to have a smaller diff downstream in CheriBSD: we only have to only change the (u)intptr_t definition in sys/_types.h in CheriBSD instead of having to change machine/_types.h for all CHERI-enabled architectures (currently RISC-V, AArch64 and MIPS). Reviewed By: imp, kib Differential Revision: https://reviews.freebsd.org/D29895	2021-06-14 16:30:16 +01:00
Konstantin Belousov	37f780d3e0	Disable x2APIC for SandyBridge laptops with Samsung BIOS From the PR: Almost always, my Samsung RF511 laptop could not boot with x2APIC enabled in the kernel. It froze during SMP initialization, shortly after "ACPI APIC Table: <SECCSD LH43STAR>" was printed to the console. When the kernel is instructed not to use x2APIC, the system boots correctly. PR: 256389 Submitted by: David Sebek <dasebek@gmail.com> Reviewed by: markj MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30624	2021-06-03 22:47:31 +03:00
Konstantin Belousov	e9e00cc0c9	madt_setup_local: extract special case checks into a helper Reviewed by: markj Tested by: David Sebek <dasebek@gmail.com> Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30624	2021-06-03 22:47:31 +03:00
Konstantin Belousov	92adf00d05	madt_setup_local: convert series of strcmp to iteration over the array to prepare for one more addition Reviewed by: markj Tested by: David Sebek <dasebek@gmail.com> Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30624	2021-06-03 22:47:31 +03:00
Konstantin Belousov	a603d41aca	madt_setup_local: skip further checks if ACPI DMAR table already disabled x2APIC Reviewed by: markj Tested by: David Sebek <dasebek@gmail.com> Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30624	2021-06-03 22:47:31 +03:00
Mark Johnston	cbe59a6475	i386: Make setidt_disp a size_t instead of uintptr_t setidt_disp is the offset of the ISR trampoline relative to the address of the routines in exception.s, so uintptr_t is not quite right. Also remove a bogus declaration I added in commit `18f55c67f7`, it is not required after all. Reported by: jrtc27 Reviewed by: jrtc27, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30590	2021-06-01 19:37:50 -04:00
Mark Johnston	18f55c67f7	x86: Fix lapic_ipi_alloc() on i386 The loop which checks to see if "dynamic" IDT entries are allocated needs to compare with the trampoline address of the reserved ISR. Otherwise it will never succeed. Reported by: Harry Schmalzbauer <freebsd@omnilan.de> Tested by: Harry Schmalzbauer <freebsd@omnilan.de> Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30576	2021-05-31 18:51:14 -04:00
Edward Tomasz Napierala	83043a741d	linux: deduplicate DUMMY() entries No functional changes. Reviewed By: emaste Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D30524	2021-05-29 17:51:36 +00:00
Roger Pau Monné	9e14ac116e	x86/xen: further PVHv1 removal cleanup The AP startup extern variable declarations are not longer needed, since PVHv2 uses the native AP startup path using the lapic. Remove the declaration and make the variables static to mp_machdep.c Sponsored by: Citrix Systems R&D	2021-05-18 10:43:31 +02:00
Mark Johnston	4224dbf4c7	xen: Remove leftover bits missed in commit `ac3ede5371` Fixes: `ac3ede5371` ("x86/xen: remove PVHv1 code") Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D30316	2021-05-17 13:06:44 -04:00
Roger Pau Monné	ac3ede5371	x86/xen: remove PVHv1 code PVHv1 was officially removed from Xen in 4.9, so just axe the related code from FreeBSD. Note FreeBSD supports PVHv2, which is the replacement for PVHv1. Sponsored by: Citrix Systems R&D Reviewed by: kib, Elliott Mitchell Differential Revision: https://reviews.freebsd.org/D30228	2021-05-17 11:41:21 +02:00
Mitchell Horne	2117a66af5	xen: remove hypervisor_info This was a source of indirection needed to support PVHv1. Now that that support has been removed, we can eliminate it. Reviewed by: royger	2021-05-17 10:56:52 +02:00
Mitchell Horne	c93e6ea344	xen: remove support for PVHv1 bootpath PVHv1 is a legacy interface supported only by Xen versions 4.4 through 4.9. Reviewed by: royger	2021-05-17 10:56:52 +02:00
Mark Johnston	831850d8b0	stack(9): Disable KASAN in stack_capture() When unwinding the stack, we may encounter a stack frame in a poisoned region of the stack, triggering a false positive. Reviewed by: andrew, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30126	2021-05-07 14:31:08 -04:00
Eric van Gyzen	2f32a971b7	Wait longer for a previous IPI to be sent When sending an IPI, if a previous IPI is still pending delivery, native_lapic_ipi_vectored() waits for the previous IPI to be sent. We've seen a few inexplicable panics with the current timeout of 50 ms. Increase the timeout to 1 second and make it tunable. No hardware specification mentions a timeout in this case; I checked the Intel SDM, Intel MP spec, and Intel x2APIC spec. Linux and illumos wait forever. In Linux, see __default_send_IPI_shortcut() in arch/x86/kernel/apic/ipi.c. In illumos, see apic_send_ipi() in usr/src/uts/i86pc/io/pcplusmp/apic_common.c. However, misbehaving hardware could hang the system if we wait forever. Reviewed by: mav kib MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D29942	2021-04-30 13:32:29 -05:00
Mark Johnston	f115c06121	amd64: Add MD bits for KASAN - Initialize KASAN before executing SYSINITs. - Add a GENERIC-KASAN kernel config, akin to GENERIC-KCSAN. - Increase the kernel stack size if KASAN is enabled. Some of the ASAN instrumentation increases stack usage and it's enough to trigger stack overflows in ZFS. - Mark the trapframe as valid in interrupt handlers if it is assigned to td_intr_frame. Otherwise, an interrupt in a function which creates a poisoned alloca region can trigger false positives. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29455	2021-04-13 17:42:20 -04:00
Piotr Pawel Stefaniak	a212f56d10	Balance parentheses in sysctl descriptions	2021-04-11 10:30:55 +02:00
Konstantin Belousov	a8b75a57c9	x86: add x86_clear_dbregs() helper Move the code from exec_setregs() to reset debug registers state on exec, to the x86_clear_dbregs() helper Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D29687	2021-04-10 04:25:01 +03:00
Mark Johnston	0f07c234ca	Remove more remnants of sio(4) Reviewed by: imp MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29626	2021-04-07 14:33:02 -04:00
Mitchell Horne	4beb385813	gdb: allow setting/removing hardware watchpoints Handle the 'z' and 'Z' remote packets for manipulating hardware watchpoints. This could be expanded quite easily to support hardware or software breakpoints as well. https://sourceware.org/gdb/onlinedocs/gdb/Packets.html Reviewed by: cem, markj MFC after: 3 weeks Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. NetApp PR: 51 Differential Revision: https://reviews.freebsd.org/D29173	2021-03-30 11:36:41 -03:00
Mitchell Horne	15dc1d4452	x86: implement kdb watchpoint functions Add wrappers around the dbreg interface that can be consumed by MI kernel debugger code. The dbreg functions themselves are updated to return error codes, not just -1. dbreg_set_watchpoint() is extended to accept access bits as an argument. Reviewed by: jhb, kib, markj MFC after: 3 weeks Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D29155	2021-03-29 12:05:43 -03:00

1 2 3 4 5 ...

1375 Commits