bus_get_cpus() returns a specified set of CPUs for a device. It accepts
an enum for the second parameter that indicates the type of cpuset to
request. Currently two valus are supported:
- LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to
the device when DEVICE_NUMA is enabled)
- INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core)
For systems that do not support NUMA (or if it is not enabled in the kernel
config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus'
by default. The idea is that INTR_CPUS should always return a valid set.
Device drivers which want to use per-CPU interrupts should start using
INTR_CPUS instead of simply assigning interrupts to all available CPUs.
In the future we may wish to add tunables to control the policy of
INTR_CPUS (e.g. should it be local-only or global, should it ignore
SMT threads or not).
The x86 nexus driver exposes the internal set of interrupt CPUs from the
the x86 interrupt code via INTR_CPUS.
The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable
LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and
the global INTR_CPUS set from the nexus driver with the per-domain set from
_PXM to generate a local INTR_CPUS set for child devices.
Reviewed by: wblock (manpage)
Differential Revision: https://reviews.freebsd.org/D5519
Summary:
The Initial Local APIC ID is returned by CPUID function 1 (in EBX).
On AMD Family 10h systems the way that ID is built is controlled by
an MSR bit (InitApicIdCpuIdLo). BKDG instructs BIOS to set it in a
certain way, but a BIOS can be buggy. In that case the ID can confuse
tools that use it, e.g. hwloc.
For example, on a system that I own real Local APIC IDs are configured
as 0, 1, 2, 3, but IDs reported via CPUID.1 are 0, 0x40, 0x80, 0xc0.
See: https://github.com/open-mpi/hwloc/issues/183
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6060
rounddown2 tends to produce longer lines than the original code
and when the code has a high indentation level it was not really
advantageous to do the replacement.
This tries to strike a balance between readability using the macros
and flexibility of having the expressions, so not everything is
converted.
(And 4Kn minidump support, but only for amd64.)
Make sure all I/O to the dump device is of the native sector size. To
that end, we keep a native sector sized buffer associated with dump
devices (di->blockbuf) and use it to pad smaller objects as needed (e.g.
kerneldumpheader).
Add dump_write_pad() as a convenience API to dump smaller objects with
zero padding. (Rather than pull in NPM leftpad, we wrote our own.)
Savecore(1) has been updated to deal with these dumps. The format for
512-byte sector dumps should remain backwards compatible.
Minidumps for other architectures are left as an exercise for the
reader.
PR: 194279
Submitted by: ambrisko@
Reviewed by: cem (earlier version), rpokala
Tested by: rpokala (4Kn/512 except 512 fulldump), cem (512 fulldump)
Relnotes: yes
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D5848
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: jhb, kib, sephe
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5910
doreti provides the common code path for returning from interrupt
andlers on x86. Exposing doreti as a global symbol allows kernel
modules to include low-level interrupt handlers instead of requiring
all low-level handlers to be statically compiled into the kernel.
Submitted by: Howard Su <howard0su@gmail.com>
Reviewed by: kib
Some BIOSes disable AMD Topology extension on AMD Family 15h notebook
processors. We re-enable the extension, so that we can properly discover
core and cache topology. Linux seems to do the same.
Reported by: Johannes Dieterich <dieterich.joh@gmail.com>
Reviewed by: jhb, kib
Tested by: Johannes Dieterich <dieterich.joh@gmail.com>
(earlier version)
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D5883
DTrace-related exceptions in userland code are handled elsewhere.
One practical problem was a crash in dtrace_invop_start() when saved
%rsp pointed to a virtual address that was not backed.
i386 code already ignored userland exceptions.
Reviewed by: markj, kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D5906
We're currently seeing how hard it would be to run CloudABI binaries on
operating systems cannot be modified easily (Windows, Mac OS X). The
idea is that we want to just run them without any sandboxing. Now
that CloudABI executables are PIE, this is already a bit easier, but TLS
is still problematic:
- CloudABI executables want to write to the %fs, which typically
requires extra system calls by the emulator every time it needs to
switch between CloudABI's and its own TLS.
- If CloudABI executables overwrite the %fs base unconditionally, it
also becomes harder for the emulator to store a backup of the old
value of %fs. To solve this, let's no longer overwrite %fs, but just
%fs:0.
As CloudABI's C library does not use a TCB, this space can now be used
by an emulator to keep track of its internal state. The executable can
now safely overwrite %fs:0, as long as it makes sure that the TCB is
copied over to the new TLS area.
Ensure that there is an initial TLS area set up when the process starts,
only containing a bogus TCB. We don't really care about its contents on
FreeBSD.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D5836
kern.features.linux: 1 meaning linux 32 bits binaries are supported
kern.features.linux64: 1 meaning linux 64 bits binaries are supported
The goal here is to help 3rd party applications (including ports) to determine
if the host do support linux emulation
Reviewed by: dchagin
MFC after: 1 week
Relnotes: yes
Differential Revision: D5830
- Set BI_CAN_EXEC_DYN, so we can execute ET_DYN ELF files in addition to
regular ET_EXECs.
- Provide an AT_BASE entry in the auxiliary vector, so the executable
knows at which address it got loaded and can apply relocations.
Simplify and unify placeholder type definitions.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D5771
This moves the enabling of interrupts slightly earlier (the old location
was still before devices were enumerated and probed) and does it in the
interrupt code (rather than in the device configuration code). This
also avoids tripping over an assertion on the first TLB shootdown with
earlier AP startup.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D5710
during argument validity verification, unbound zero'ing of the process LDT
and adjacent memory can be initiated from usermode.
Submitted by: CORE Security
Patch by: kib
Security: SA-16:15
non-multiple of 64 bytes. Thereafter, the user state save area is
misaligned, which triggers assertion in the debugging kernels, or
segmentation violation on accesses for non-debugging configs.
Force the desired alignment of the user save area as the fix
(workaround is to disable bit 9 in the hw.xsave_mask loader tunable).
This correction is required for booting on the upcoming Intel' Purley
platform.
Reported and tested by: "Pieper, Jeffrey E" <jeffrey.e.pieper@intel.com>,
jimharris
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
- Advertise the word size for CloudABI ABIs via the SV_LP64 flag. All of
the other ABIs include either SV_ILP32 or SV_LP64.
- Fix kdump to not assume a 32-bit ABI if the ABI flags field is non-zero
but SV_LP64 isn't set. Instead, only assume a 32-bit ABI if SV_ILP32 is
set and fallback to the unknown value of "00" if neither SV_LP64 nor
SV_ILP32 is set.
Reviewed by: kib, ed
Differential Revision: https://reviews.freebsd.org/D5560
need to include it explicitly when <vm/vm_param.h> is already included.
Suggested by: alc
Reviewed by: alc
Differential Revision: https://reviews.freebsd.org/D5379
POSIX requires these members to be of type void * rather than the
char * inherited from 4BSD. NetBSD and OpenBSD both changed their
fields to void * back in 1998. No new build failures were reported
via an exp-run.
PR: 206503 (exp-run)
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D5092
AT_SECURE auxv entry has been added to the Linux 2.5 kernel to pass a
boolean flag indicating whether secure mode should be enabled. 1 means
that the program has changes its credentials during the execution.
Being exported AT_SECURE used by glibc issetugid() call.
Submitted by: imp, dchagin
Security: FreeBSD-SA-16:10.linux
Security: CVE-2016-1883
registers. This brings amd64 on par with i386, providing consistent
initial FPU state.
Note that we do not clear any extended state, at least because kernel
does not understand extended state structure and consequences of zero
overwrite after fninit()/fpusave().
Submitted by: joss.upton@yahoo.com
PR: 206370
MFC after: 2 weeks
The set_robust_list system call request the kernel to record the head
of the list of robust futexes owned by the calling thread. The head
argument is the list head to record.
The get_robust_list system call should return the head of the robust
list of the thread whose thread id is specified in pid argument.
The list head should be stored in the location pointed to by head
argument.
In contrast, our implemenattion of get_robust_list system call copies
the known portion of memory pointed by recorded in set_robust_list
system call pointer to the head of the robust list to the location
pointed by head argument.
So, it is possible for a local attacker to read portions of kernel
memory, which may result in a privilege escalation.
Submitted by: mjg
Security: SA-16:03.linux
providing compiled-in static environment data that is used instead of any
data passed in from a boot loader.
Previously 'env' worked only on i386 and arm xscale systems, because it
required the MD startup code to examine the global envmode variable and
decide whether to use static_env or an environment obtained from the boot
loader, and set the global kern_envp accordingly. Most startup code wasn't
doing so. Making things even more complex, some mips startup code uses an
alternate scheme that involves calling init_static_kenv() to pass an empty
buffer and its size, then uses a series of kern_setenv() calls to populate
that buffer.
Now all MD startup code calls init_static_kenv(), and that routine provides
a single point where envmode is checked and the decision is made whether to
use the compiled-in static_kenv or the values provided by the MD code.
The routine also continues to serve its original purpose for mips; if a
non-zero buffer size is passed the routine installs the empty buffer ready
to accept kern_setenv() values. Now if the size is zero, the provided buffer
full of existing env data is installed. A NULL pointer can be passed if the
boot loader provides no env data; this allows the static env to be installed
if envmode is set to do so.
Most of the work here is a near-mechanical change to call the init function
instead of directly setting kern_envp. A notable exception is in xen/pv.c;
that code was originally installing a buffer full of preformatted env data
along with its non-zero size (like mips code does), which would have allowed
kern_setenv() calls to wipe out the preformatted data. Now it passes a zero
for the size so that the buffer of data it installs is treated as
non-writeable.
While here, move the common bits of <machine/cputypes.h> to
<x86/cputypes.h> as well.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D4670