Commit Graph

5998 Commits

Author SHA1 Message Date
kib
dae91cc140 Disable local interrupts before testing the PCB_FULL_IRET flag.
Thread might be preempted after testing, which causes the flag to be
cleared. If ast was not delivered, we will do sysret with potentially
wrong fs/gs bases.

Reviewed by:	jhb, jkim
MFC after:	1 week (together with r220430, r220452)
2011-04-08 21:26:50 +00:00
rstone
cd5ddda20d Add tunables that mirror the functionality of sysctls machdep.panic_on_nmi
and machdep.kdb_on_nmi.

Approved by:	emaste (mentor)
MFC after:	1 week
2011-04-08 14:39:41 +00:00
jhb
9a34a392bb Fix a bug in the previous change to restore the fast path for syscall
return.  The ast() function may cause a context switch in which case
PCB_FULL_IRET would be set in the pcb.  However, the code was not
rechecking the flag after ast() returned and would not properly restore
the FSBASE and GSBASE MSRs.  To fix, recheck the PCB_FULL_IRET flag after
ast() returns.

While here, trim an instruction (and memory access) from the doreti path
and fix a typo in a comment.

MFC after:	1 week
2011-04-08 13:33:57 +00:00
jhb
6ff99aec8c Catch up to PCB_FULL_IRET becoming a pcb flag rather than a full field.
MFC after:	3 days
2011-04-08 13:30:48 +00:00
jkim
95c723445e Use atomic load & store for TSC frequency. It may be overkill for amd64 but
safer for i386 because it can be easily over 4 GHz now.  More worse, it can
be easily changed by user with 'machdep.tsc_freq' tunable (directly) or
cpufreq(4) (indirectly).  Note it is intentionally not used in performance
critical paths to avoid performance regression (but we should, in theory).
Alternatively, we may add "virtual TSC" with lower frequency if maximum
frequency overflows 32 bits (and ignore possible incoherency as we do now).
2011-04-07 23:28:28 +00:00
jhb
c30cef5071 pcb_flags is an int, so use testl rather than testq.
Pointy hat to:	jhb
Submitted by:	jkim
MFC after:	1 week
2011-04-07 23:13:22 +00:00
jhb
ab2321a07e If a system call does not request a full interrupt return, use a fast
path via the sysretq instruction to return from the system call.  This was
removed in 190620 and not quite fully restored in 195486.  This resolves
most of the performance regression in system call microbenchmarks between
7 and 8 on amd64.

Reviewed by:	kib
MFC after:	1 week
2011-04-07 21:32:25 +00:00
jkim
4808c79fbe Remove stale checks for RDTSC support. amd64 must have TSC support anyway. 2011-04-07 21:29:34 +00:00
kib
7c2eaa21fe Add support for executing the FreeBSD 1/i386 a.out binaries on amd64.
In particular:
- implement compat shims for old stat(2) variants and ogetdirentries(2);
- implement delivery of signals with ancient stack frame layout and
  corresponding sigreturn(2);
- implement old getpagesize(2);
- provide a user-mode trampoline and LDT call gate for lcall $7,$0;
- port a.out image activator and connect it to the build as a module
  on amd64.

The changes are hidden under COMPAT_43.

MFC after:   1 month
2011-04-01 11:16:29 +00:00
avg
94ec7d2988 Revert r220032:linux compat: add SO_PASSCRED option with basic handling
I have not properly thought through the commit.  After r220031 (linux
compat: improve and fix sendmsg/recvmsg compatibility) the basic
handling for SO_PASSCRED is not sufficient as it breaks recvmsg
functionality for SCM_CREDS messages because now we would need to handle
sockcred data in addition to cmsgcred.  And that is not implemented yet.

Pointyhat to:	avg
2011-03-31 08:14:51 +00:00
adrian
6f4c1d61a6 Break out the ath PCI logic into a separate device/module.
Introduce the AHB glue for Atheros embedded systems. Right now it's
hard-coded for the AR9130 chip whose support isn't yet in this HAL;
it'll be added in a subsequent commit.

Kernel configuration files now need both 'ath' and 'ath_pci' devices; both
modules need to be loaded for the ath device to work.
2011-03-31 08:07:13 +00:00
trasz
02c98d7ccf Revert part of r220137, committed by mistake - RACCT is _not_ supposed
to be enabled in GENERIC.
2011-03-29 18:16:49 +00:00
trasz
b8d3e8755d Add racct. It's an API to keep per-process, per-jail, per-loginclass
and per-loginclass resource accounting information, to be used by the new
resource limits code.  It's connected to the build, but the code that
actually calls the new functions will come later.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib (earlier version)
2011-03-29 17:47:25 +00:00
alc
b3ffae779b The new binutils has correctly redefined MAXPAGESIZE on amd64 as 0x200000
instead of 0x100000.  As a side effect, an amd64 kernel now loads at
physical address 0x200000 instead of 0x100000.  This is probably for the
best because it avoids the use of a 2MB page mapping for the first 1MB of
the kernel that also spans the fixed MTRRs.  However, getmemsize() still
thinks that the kernel loads at 0x100000, and so the physical memory between
0x100000 and 0x200000 is lost.  Fix this problem by replacing the hard-wired
constant in getmemsize() by a symbol "kernphys" that is defined by the
linker script.

In collaboration with:	kib
2011-03-28 06:35:17 +00:00
alc
79f5bd9d16 Amd64 doesn't have a lazypmap ipi. 2011-03-27 16:18:51 +00:00
avg
df7a39b1d0 linux compat: add SO_PASSCRED option with basic handling
This seems to have been a part of a bigger patch by dchagin that either
haven't been committed or committed partially.

Submitted by:	dchagin, nox
MFC after:	2 weeks
2011-03-26 11:25:36 +00:00
avg
ae4ae2c803 linux compat: add non-dummy capget and capset system calls, regenerate
And drop dummy definitions for those system calls.
This may transiently break the build.

PR:		kern/149168
Submitted by:	John Wehle <john@feith.com>
Reviewed by:	netchild
MFC after:	2 weeks
2011-03-26 10:59:24 +00:00
avg
b49c51915d linux compat: add non-dummy capget and capset system calls
PR:		kern/149168
Submitted by:	John Wehle <john@feith.com>
Reviewed by:	netchild
MFC after:	2 weeks
2011-03-26 10:51:56 +00:00
dchagin
7a5ef72838 Export the correct AT_PLATFORM value.
Since signal trampolines are copied to the shared page do not need to
leave place on the stack for it. Forgotten in the previous commit.

MFC after:	1 Week
2011-03-26 09:25:35 +00:00
alc
ec0c37c355 Move an external declaration to the appropriate header file. 2011-03-26 06:21:05 +00:00
jkim
c5c94c9d77 Improve CPU identifications of various IDT/Centaur/VIA, Rise and Transmeta
CPUs.  These CPUs need explicit MSR configuration to expose ceratin CPU
capabilities (e.g., CMPXCHG8B) to work around compatibility issues with
ancient software.  Unfortunately, Rise mP6 does not set the CX8 bit in CPUID
and there is no MSR to expose the feature although all mP6 processors are
capable of CMPXCHG8B according to datasheets I found from the Net.  Clean up
and simplify VIA PadLock detection while I am in the neighborhood.
2011-03-26 02:02:07 +00:00
jeff
2d7d8c05e7 - Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
   and other miscellaneous small features.
2011-03-21 09:40:01 +00:00
bz
c41eae2d13 For now remove options FLOWTABLE from the remaining GENERIC kernel
configurations and make it opt-in for those who want it.  LINT will
still build it.

While it may be a perfect win in some scenarios, it still troubles users
(see PRs) in general cases.  In addition we are still allocating resources
even if disabled by sysctl and still leak arp/nd6 entries in case of
interface destruction.

Discussed with:	qingli (2010-11-24, just never executed)
Discussed with: juli (OCTEON1)
PR:		kern/148018, kern/155604, kern/144917, kern/146792
MFC after:	2 weeks
2011-03-19 15:50:34 +00:00
jkim
ad8ef5e4c7 Deprecate tsc_present as the last of its real consumers finally disappeared. 2011-03-15 17:19:52 +00:00
davidch
4cf0ebe1b2 - Initial release of bxe(4) to support Broadcom NetXtreme II 10GbE.
(BCM57710, BCM57711, BCM57711E)

MFC after:	One month
2011-03-14 22:42:41 +00:00
dchagin
15d1cdd161 Enable shared page use for amd64/linux32 and i386/linux binaries.
Move signal trampoline code from the top of the stack to the shared page.

MFC after:	2 Weeks
2011-03-13 14:58:02 +00:00
avg
5a2a285ac9 add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls
Regenerate system call and systrace support files.

PR:		kern/152822
Submitted by:	Artem Belevich <fbsdlist@src.cx>
Reviewed by:	jhb (earlier version)
MFC after:	3 weeks
2011-03-12 08:58:19 +00:00
avg
666906fcd7 add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls
This commits makes necessary changes in syscall/sysent generation
infrastructure.

PR:		kern/152822
Submitted by:	Artem Belevich <fbsdlist@src.cx>
Reviewed by:	jhb (ealier version)
MFC after:	3 weeks
2011-03-12 08:51:43 +00:00
avg
46a12ed040 amd64/NOTES: use a greater number in KSTACK_PAGES example
This is a minor cosmetic change - the users are more likely to want to
increase (rather than decrease) default kernel stack size,
which is already 4 pages on amd64.

MFC after:	4 days
2011-03-11 19:21:42 +00:00
mdf
f225a1fc1c Mostly revert r219468, as I had misremembered the C standard regarding
the size of an extern array.

Keep one change from strncpy to strlcpy.
2011-03-11 18:56:55 +00:00
jkim
7df55dcdeb Add a tunable "machdep.disable_tsc" to turn off TSC. Specifically, it turns
off boot-time CPU frequency calibration, DELAY(9) with TSC, and using TSC as
a CPU ticker.  Note tsc_present does not change by this tunable.
2011-03-11 00:44:32 +00:00
mdf
675238071f Use MAXPATHLEN rather than the size of an extern array when copying the
kernel name.  Also consistenly use strlcpy().

Suggested by:	Warner Losh
2011-03-10 22:56:00 +00:00
jkim
98d68ca741 Deprecate rarely used tsc_is_broken. Instead, we zero out tsc_freq because
it is almost always used with tsc_freq any way.
2011-03-10 20:02:58 +00:00
julian
144fd87db2 Add a small change to the comment in the GENRIC config files that include udbp
Submitted by:	Chris Forgron, cforgeron at acsi dot ca
MFC after:	1 week
2011-03-09 17:15:11 +00:00
dchagin
69b8756d3d Extend struct sysvec with new method sv_schedtail, which is used for an
explicit process at fork trampoline path instead of eventhadler(schedtail)
invocation for each child process.

Remove eventhandler(schedtail) code and change linux ABI to use newly added
sysvec method.

While here replace explicit comparing of module sysentvec structure with the
newly created process sysentvec to detect the linux ABI.

Discussed with:	kib

MFC after:	2 Week
2011-03-08 19:01:45 +00:00
dchagin
52fbc3b17d Remove dead code.
MFC after:	1 Week
2011-03-07 08:12:07 +00:00
alc
819143a80a Make a change to the implementation of the direct map to improve performance
on processors that support 1 GB pages.  Specifically, if the end of physical
memory is not aligned to a 1 GB page boundary, then map the residual
physical memory with multiple 2 MB page mappings rather than a single 1 GB
page mapping.  When a 1 GB page mapping is used for this residual memory,
access to the memory is slower than when multiple 2 MB page mappings are
used.  (I suspect that the reason for this slowdown is that the TLB is
actually being loaded with 4 KB page mappings for the residual memory.)

X-MFC after:	r214425
2011-03-02 00:24:07 +00:00
rwatson
4c48fabb06 Continue to introduce Capsicum capability mode:
White list sysarch calls allowed in capability mode; arguably, there
should be some link between the capability mode model and the privilege
model here.  Sysarch is a morass similar to ioctl, in many senses.

Submitted by:	anderson
Discussed with:	benl, kris, pjd
Sponsored by:	Google, Inc.
Obtained from:	Capsicum Project
MFC after:	3 months
2011-03-01 13:35:48 +00:00
brucec
6d9b42b486 Fix typos - remove duplicate "the".
PR:	bin/154928
Submitted by:	Eitan Adler <lists at eitanadler.com>
MFC after: 	3 days
2011-02-21 09:01:34 +00:00
alc
2f4da8e71e Remove pmap fields that are either unused or not fully implemented.
Discussed with:	kib
2011-02-17 15:36:29 +00:00
dchagin
f09038c073 To avoid excessive code duplication create wrapper for fill regs
from stack frame. Change the trap() code to use newly created function
instead of explicit regs assignment.
2011-02-16 17:50:21 +00:00
dchagin
be13e396c9 For realtime signals fill the sigval value. 2011-02-15 21:46:36 +00:00
dchagin
c29d5657b5 Sort include files in the alphabetical order. 2011-02-13 19:07:48 +00:00
dchagin
9f708ad0aa Move linux_clone(), linux_fork(), linux_vfork() to a MI path. 2011-02-12 18:17:12 +00:00
dchagin
a999d3553b In preparation for moving linux_clone() to a MI path
introduce linux_set_upcall_kse().
2011-02-12 16:33:00 +00:00
dchagin
8b4a007006 In preparation for moving linux_clone () to a MI path
move the TLS code in a separate function.

Use function parameter instead of direct using register.
2011-02-12 15:50:21 +00:00
dchagin
8abe7e237a Regen for r218610. 2011-02-12 15:36:25 +00:00
dchagin
6803575cba The fourth argument of linux_clone is a pointer to the TLS. Change clone syscall definition to match actual linux one. 2011-02-12 15:33:25 +00:00
kib
7bb770f505 Clear the padding when returning context to the usermode, for
MI ucontext_t and x86 MD parts.
Kernel allocates the structures on the stack, and not clearing
reserved fields and paddings causes leakage.

Noted and discussed with:	bde
MFC after:	2 weeks
2011-02-05 15:10:27 +00:00
mdf
b291e9a365 Put the general logic for being a CPU hog into a new function
should_yield().  Use this in various places.  Encapsulate the common
case of check-and-yield into a new function maybe_yield().

Change several checks for a magic number of iterations to use
should_yield() instead.

MFC after:	1 week
2011-02-02 16:35:10 +00:00