Thread might be preempted after testing, which causes the flag to be
cleared. If ast was not delivered, we will do sysret with potentially
wrong fs/gs bases.
Reviewed by: jhb, jkim
MFC after: 1 week (together with r220430, r220452)
return. The ast() function may cause a context switch in which case
PCB_FULL_IRET would be set in the pcb. However, the code was not
rechecking the flag after ast() returned and would not properly restore
the FSBASE and GSBASE MSRs. To fix, recheck the PCB_FULL_IRET flag after
ast() returns.
While here, trim an instruction (and memory access) from the doreti path
and fix a typo in a comment.
MFC after: 1 week
safer for i386 because it can be easily over 4 GHz now. More worse, it can
be easily changed by user with 'machdep.tsc_freq' tunable (directly) or
cpufreq(4) (indirectly). Note it is intentionally not used in performance
critical paths to avoid performance regression (but we should, in theory).
Alternatively, we may add "virtual TSC" with lower frequency if maximum
frequency overflows 32 bits (and ignore possible incoherency as we do now).
path via the sysretq instruction to return from the system call. This was
removed in 190620 and not quite fully restored in 195486. This resolves
most of the performance regression in system call microbenchmarks between
7 and 8 on amd64.
Reviewed by: kib
MFC after: 1 week
In particular:
- implement compat shims for old stat(2) variants and ogetdirentries(2);
- implement delivery of signals with ancient stack frame layout and
corresponding sigreturn(2);
- implement old getpagesize(2);
- provide a user-mode trampoline and LDT call gate for lcall $7,$0;
- port a.out image activator and connect it to the build as a module
on amd64.
The changes are hidden under COMPAT_43.
MFC after: 1 month
I have not properly thought through the commit. After r220031 (linux
compat: improve and fix sendmsg/recvmsg compatibility) the basic
handling for SO_PASSCRED is not sufficient as it breaks recvmsg
functionality for SCM_CREDS messages because now we would need to handle
sockcred data in addition to cmsgcred. And that is not implemented yet.
Pointyhat to: avg
Introduce the AHB glue for Atheros embedded systems. Right now it's
hard-coded for the AR9130 chip whose support isn't yet in this HAL;
it'll be added in a subsequent commit.
Kernel configuration files now need both 'ath' and 'ath_pci' devices; both
modules need to be loaded for the ath device to work.
and per-loginclass resource accounting information, to be used by the new
resource limits code. It's connected to the build, but the code that
actually calls the new functions will come later.
Sponsored by: The FreeBSD Foundation
Reviewed by: kib (earlier version)
instead of 0x100000. As a side effect, an amd64 kernel now loads at
physical address 0x200000 instead of 0x100000. This is probably for the
best because it avoids the use of a 2MB page mapping for the first 1MB of
the kernel that also spans the fixed MTRRs. However, getmemsize() still
thinks that the kernel loads at 0x100000, and so the physical memory between
0x100000 and 0x200000 is lost. Fix this problem by replacing the hard-wired
constant in getmemsize() by a symbol "kernphys" that is defined by the
linker script.
In collaboration with: kib
This seems to have been a part of a bigger patch by dchagin that either
haven't been committed or committed partially.
Submitted by: dchagin, nox
MFC after: 2 weeks
And drop dummy definitions for those system calls.
This may transiently break the build.
PR: kern/149168
Submitted by: John Wehle <john@feith.com>
Reviewed by: netchild
MFC after: 2 weeks
Since signal trampolines are copied to the shared page do not need to
leave place on the stack for it. Forgotten in the previous commit.
MFC after: 1 Week
CPUs. These CPUs need explicit MSR configuration to expose ceratin CPU
capabilities (e.g., CMPXCHG8B) to work around compatibility issues with
ancient software. Unfortunately, Rise mP6 does not set the CX8 bit in CPUID
and there is no MSR to expose the feature although all mP6 processors are
capable of CMPXCHG8B according to datasheets I found from the Net. Clean up
and simplify VIA PadLock detection while I am in the neighborhood.
configurations and make it opt-in for those who want it. LINT will
still build it.
While it may be a perfect win in some scenarios, it still troubles users
(see PRs) in general cases. In addition we are still allocating resources
even if disabled by sysctl and still leak arp/nd6 entries in case of
interface destruction.
Discussed with: qingli (2010-11-24, just never executed)
Discussed with: juli (OCTEON1)
PR: kern/148018, kern/155604, kern/144917, kern/146792
MFC after: 2 weeks
This is a minor cosmetic change - the users are more likely to want to
increase (rather than decrease) default kernel stack size,
which is already 4 pages on amd64.
MFC after: 4 days
explicit process at fork trampoline path instead of eventhadler(schedtail)
invocation for each child process.
Remove eventhandler(schedtail) code and change linux ABI to use newly added
sysvec method.
While here replace explicit comparing of module sysentvec structure with the
newly created process sysentvec to detect the linux ABI.
Discussed with: kib
MFC after: 2 Week
on processors that support 1 GB pages. Specifically, if the end of physical
memory is not aligned to a 1 GB page boundary, then map the residual
physical memory with multiple 2 MB page mappings rather than a single 1 GB
page mapping. When a 1 GB page mapping is used for this residual memory,
access to the memory is slower than when multiple 2 MB page mappings are
used. (I suspect that the reason for this slowdown is that the TLB is
actually being loaded with 4 KB page mappings for the residual memory.)
X-MFC after: r214425
White list sysarch calls allowed in capability mode; arguably, there
should be some link between the capability mode model and the privilege
model here. Sysarch is a morass similar to ioctl, in many senses.
Submitted by: anderson
Discussed with: benl, kris, pjd
Sponsored by: Google, Inc.
Obtained from: Capsicum Project
MFC after: 3 months
MI ucontext_t and x86 MD parts.
Kernel allocates the structures on the stack, and not clearing
reserved fields and paddings causes leakage.
Noted and discussed with: bde
MFC after: 2 weeks
should_yield(). Use this in various places. Encapsulate the common
case of check-and-yield into a new function maybe_yield().
Change several checks for a magic number of iterations to use
should_yield() instead.
MFC after: 1 week
be used by linuxolator itself.
Move linux_wait4() to MD path as it requires native struct
rusage translation to struct l_rusage on linux32/amd64.
MFC after: 1 Month.
members, thus making a signed extension of 32 bit register
context. If the register is not touched in usermode between
return from signal and next syscall entry, the sign-extension
part of 64bit register is not cleared, causing
linux32_fetch_syscall_args() to read wrong values.
Use unsigned type for the registers in the linux sigcontext.
Reported by: Jacob Frelinger <jacob.frelinger duke edu>, arundel
In collaboration with: dchagin
MFC after: 1 week
- Only check largs->num against max_ldt_segment on amd64 for I386_SET_LDT
when descriptors are provided. Specifically, allow the 'start == 0'
and 'num == 0' special case used to free all LDT entries that previously
failed with EINVAL.
Submitted by: clang via rdivacky (some of 1)
Reviewed by: kib
Compile sys/dev/mem/memutil.c for all supported platforms and remove now
unnecessary dev_mem_md_init(). Consistently define mem_range_softc from
mem.c for all platforms. Add missing #include guards for machine/memdev.h
and sys/memrange.h. Clean up some nearby style(9) nits.
MFC after: 1 month
started to execute, it seems that the corresponding ISR bit in the "old"
local APIC can be cleared. This causes the local APIC interrupt routine
to fail to find an interrupt to service. Rather than panic'ing in this
case, simply return from the interrupt without sending an EOI to the
local APIC. If there are any other pending interrupts in other ISR
registers, the local APIC will assert a new interrupt.
Tested by: steve
setting SV_SHP flag and providing pointer to the vm object and mapping
address. Provide simple allocator to carve space in the page, tailored
to put the code with alignment restrictions.
Enable shared page use for amd64, both native and 32bit FreeBSD
binaries. Page is private mapped at the top of the user address
space, moving a start of the stack one page down. Move signal
trampoline code from the top of the stack to the shared page.
Reviewed by: alc
architecture macros (__mips_n64, __powerpc64__) when 64 bit types (and
corresponding macros) are different from 32 bit. [1]
Correct the type of INT64_MIN, INT64_MAX and UINT64_MAX.
Define (U)INTMAX_C as an alias for (U)INT64_C matching the type definition
for (u)intmax_t. Do this on all architectures for consistency.
Suggested by: bde [1]
Approved by: kib (mentor)
On some architectures UCHAR_MAX and USHRT_MAX had type unsigned int.
However, lacking integer suffixes for types smaller than int, their type
should correspond to that of an object of type unsigned char (or short)
when used in an expression with objects of type int. In that case unsigned
char (short) are promoted to int (i.e. signed) so the type of UCHAR_MAX and
USHRT_MAX should also be int.
Where MIN/MAX constants implicitly have the correct type the suffix has
been removed.
While here, correct some comments.
Reviewed by: bde
Approved by: kib (mentor)
for manipulating pcb_flags. These inline functions are very similar to
atomic_set_char(9) and atomic_clear_char(9) but without unnecessary LOCK
prefix for SMP. Add comments about the rationale[1]. Use these functions
wherever possible. Although there are some places where it is not strictly
necessary (e.g., a PCB is copied to create a new PCB), it is done across
the board for sake of consistency. Turn pcb_full_iret into a PCB flag as
it is safe now. Move rarely used fields before pcb_flags and reduce size
of pcb_flags to one byte. Fix some style(9) nits in pcb.h while I am in
the neighborhood.
Reviewed by: kib
Submitted by: kib[1]
MFC after: 2 months
the original amd64 and i386 headers with stubs.
Rename (AMD64|I386)_BUS_SPACE_* to X86_BUS_SPACE_* everywhere.
Reviewed by: imp (previous version), jhb
Approved by: kib (mentor)
function always returned the nominal frequency instead of current frequency
because we use RDTSC instruction to calculate difference in CPU ticks, which
is supposedly constant for the case. Now we support cpu_get_nominal_mhz()
for the case, instead. Note it should be just enough for most usage cases
because cpu_est_clockrate() is often times abused to find maximum frequency
of the processor.
its similar disabling of adaptive mutexes and rwlocks. The existing
comment on why this is the case also applies to sx locks.
MFC after: 3 days
Discussed with: attilio
mark user FPU context initialized, if current context is user context.
It was reversed in r215865, by inadequate change of this code fragment
to a call to fpuuserinited()/npxuserinited().
The issue is only relevant for in-kernel users of FPU.
Reported by: Jan Henrik Sylvester <me janh de>, Mike Tancsa <mike sentex net>
Tested by: Mike Tancsa
MFC after: 3 days
to support PV drivers (such as xenpci), and non-adptive locking (along
with a comment about why).
This change eliminates the synchronisation problem between GENERIC and
XENHVM, which had become severely rotted in HEAD, and in 8-STABLE
included non-production kernel debugging features such as WITNESS.
However, it comes at the cost of enabling devices and options that may
not be present under Xen (such as random ethernet cards). For now, opt
for a simpler kernel configuration file rather than using nooptions/
nodevice to enumerate and eliminate them. This leads to a somewhat
larger XENHVM kernel.
This is an MFC candidate for 8-STABLE before 8.2, in order to provide
a production-worthy XENHVM kernel configuration for amd64.
Discussed with: gibbs, cperciva
Reported by: Piete Brooks <Piete.Brooks at cl.cam.ac.uk>
Sponsored by: DARPA, AFRL
MFC after: 3 days
while on i386 we have MAX_BPAGES=512. Implement this difference via
'#ifdef __i386__'.
With this commit, the i386 and amd64 busdma_machdep.c files become
identical; they will soon be replaced by a single file under sys/x86.
no-op currently, since FreeBSD/amd64 doesn't have (paravirtualized) Xen
support, but if/when that support is ever added we'll want this, and
until then it's harmless.
In exec_linux_setregs(), use locally cached pointer to pcb to set
pcb_full_iret.
In set_regs(), note that full return is needed when code that sets
segment registers is enabled.
MFC after: 1 week
execve(2). Note that ia32 binaries already handle this properly,
since ia32_setregs() resets td_retval[1], but not exec_setregs().
We still do not conform to the amd64 ABI specification, since %rsp
on the image startup is not aligned to 16 bytes.
PR: amd64/124134
Discussed with: Petr Salinger <Petr.Salinger seznam cz>
(who convinced me that there is indeed several bugs)
MFC after: 1 week