Commit Graph

5992 Commits

Author SHA1 Message Date
Jung-uk Kim
2b052e43be Update CPUID bits to reflect AMD Bulldozer and Intel Sandy Bridge features.
Note AMD dropped SSE5 extensions in order to avoid ISA overlap with Intel
AVX instructions.  The SSE5 bit was recycled as XOP extended instruction
bit, CVT16 was deprecated in favor of F16C (half-precision float conversion
instructions for AVX), and the remaining FMA4 (4-operand FMA instructions)
gained a separate CPUID bit.  Replace non-existent references with today's
CPUID specifications.
2011-05-17 22:36:16 +00:00
Attilio Rao
b2aa562e7b MFC 2011-05-13 20:58:48 +00:00
Matthew D Fleming
cfb00e5aa7 Move the ZERO_REGION_SIZE to a machine-dependent file, as on many
architectures (i386, for example) the virtual memory space may be
constrained enough that 2MB is a large chunk.  Use 64K for arches
other than amd64 and ia64, with special handling for sparc64 due to
differing hardware.

Also commit the comment changes to kmem_init_zero_region() that I
missed due to not saving the file.  (Darn the unfamiliar development
environment).

Arch maintainers, please feel free to adjust ZERO_REGION_SIZE as you
see fit.

Requested by:	alc
MFC after:	1 week
MFC with:	r221853
2011-05-13 19:35:01 +00:00
Attilio Rao
ef607a6aa3 MFC 2011-05-12 14:01:40 +00:00
Dmitry Chagin
98cde5eede Remove wrong comment.
MFC after:	1 week.
2011-05-11 17:57:15 +00:00
Jung-uk Kim
00c885e181 Add SC_PIXEL_MODE to GENERIC for amd64 and i386.
Requested by:	many
2011-05-10 16:44:16 +00:00
Attilio Rao
bd55ede060 MFC 2011-05-09 18:53:13 +00:00
Jung-uk Kim
65e7d70b09 Implement boot-time TSC synchronization test for SMP. This test is executed
when the user has indicated that the system has synchronized TSCs or it has
P-state invariant TSCs.  For the former case, we may clear the tunable if it
fails the test to prevent accidental foot-shooting.  For the latter case, we
may set it if it passes the test to notify the user that it may be usable.
2011-05-09 17:34:00 +00:00
Attilio Rao
aa8b9e0706 MFC 2011-05-06 22:45:33 +00:00
Andriy Gapon
fdf30d59a6 prepare code that does topology detection for amd cpus for bulldozer
This also introduces a new detection path for family 10h and newer
pre-bulldozer cpus, pre-10h hardware should not be affected.

Tested by:	Gary Jennejohn <gljennjohn@googlemail.com>
		(with pre-10h hardware)
MFC after:	2 weeks
2011-05-06 13:51:54 +00:00
Attilio Rao
71a19bdc64 Commit the support for removing cpumask_t and replacing it directly with
cpuset_t objects.
That is going to offer the underlying support for a simple bump of
MAXCPU and then support for number of cpus > 32 (as it is today).

Right now, cpumask_t is an int, 32 bits on all our supported architecture.
cpumask_t on the other side is implemented as an array of longs, and
easilly extendible by definition.

The architectures touched by this commit are the following:
- amd64
- i386
- pc98
- arm
- ia64
- XEN

while the others are still missing.
Userland is believed to be fully converted with the changes contained
here.

Some technical notes:
- This commit may be considered an ABI nop for all the architectures
  different from amd64 and ia64 (and sparc64 in the future)
- per-cpu members, which are now converted to cpuset_t, needs to be
  accessed avoiding migration, because the size of cpuset_t should be
  considered unknown
- size of cpuset_t objects is different from kernel and userland (this is
  primirally done in order to leave some more space in userland to cope
  with KBI extensions). If you need to access kernel cpuset_t from the
  userland please refer to example in this patch on how to do that
  correctly (kgdb may be a good source, for example).
- Support for other architectures is going to be added soon
- Only MAXCPU for amd64 is bumped now

The patch has been tested by sbruno and Nicholas Esborn on opteron
4 x 12 pack CPUs. More testing on big SMP is expected to came soon.
pluknet tested the patch with his 8-ways on both amd64 and i386.

Tested by:	pluknet, sbruno, gianni, Nicholas Esborn
Reviewed by:	jeff, jhb, sbruno
2011-05-05 14:39:14 +00:00
Attilio Rao
8c0ef2464e Revert md_assert_preempt() introduction.
Discussed with:	jeff, jhb
2011-05-04 20:29:40 +00:00
Attilio Rao
94ebcddde3 MFC 2011-05-03 18:57:46 +00:00
John Baldwin
6162795be0 Enable the new PCI-PCI bridge driver on amd64 and i386 by default. It can
be disabled via 'nooptions NEW_PCIB'.
2011-05-03 18:23:11 +00:00
John Baldwin
83c41143ca Reimplement how PCI-PCI bridges manage their I/O windows. Previously the
driver would verify that requests for child devices were confined to any
existing I/O windows, but the driver relied on the firmware to initialize
the windows and would never grow the windows for new requests.  Now the
driver actively manages the I/O windows.

This is implemented by allocating a bus resource for each I/O window from
the parent PCI bus and suballocating that resource to child devices.  The
suballocations are managed by creating an rman for each I/O window.  The
suballocated resources are mapped by passing the bus_activate_resource()
call up to the parent PCI bus.  Windows are grown when needed by using
bus_adjust_resource() to adjust the resource allocated from the parent PCI
bus.  If the adjust request succeeds, the window is adjusted and the
suballocation request for the child device is retried.

When growing a window, the rman_first_free_region() and
rman_last_free_region() routines are used to determine if the front or
end of the existing I/O window is free.  From using that, the smallest
ranges that need to be added to either the front or back of the window
are computed.  The driver will first try to grow the window in whichever
direction requires the smallest growth first followed by the other
direction if that fails.

Subtractive bridges will first attempt to satisfy requests for child
resources from I/O windows (including attempts to grow the windows).  If
that fails, the request is passed up to the parent PCI bus directly
however.

The PCI-PCI bridge driver will try to use firmware-assigned ranges for
child BARs first and only allocate a "fresh" range if that specific range
cannot be accommodated in the I/O window.  This allows systems where the
firmware assigns resources during boot but later wipes the I/O windows
(some ACPI BIOSen are known to do this) to "rediscover" the original I/O
window ranges.

The ACPI Host-PCI bridge driver has been adjusted to correctly honor
hw.acpi.host_mem_start and the I/O port equivalent when a PCI-PCI bridge
makes a wildcard request for an I/O window range.

The new PCI-PCI bridge driver is only enabled if the NEW_PCIB kernel option
is enabled.  This is a transition aide to allow platforms that do not
yet support bus_activate_resource() and bus_adjust_resource() in their
Host-PCI bridge drivers (and possibly other drivers as needed) to use the
old driver for now.  Once all platforms support the new driver, the
kernel option and old driver will be removed.

PR:		kern/143874 kern/149306
Tested by:	mav
2011-05-03 17:37:24 +00:00
Attilio Rao
7be8a2de4f MFC @ r221324 2011-05-02 14:23:36 +00:00
John Baldwin
d2c9344ff9 Add implementations of BUS_ADJUST_RESOURCE() to the PCI bus driver,
generic PCI-PCI bridge driver, x86 nexus driver, and x86 Host to PCI bridge
drivers.
2011-05-02 14:13:12 +00:00
Bernhard Schmidt
d1f25d5dcb Add the remaining wireless drivers.
Discussed with:	joel
2011-05-01 13:26:34 +00:00
Attilio Rao
f1edea81ac Add the function md_assert_nopreempt(), which is a very consistent
function on the possibility of a thread to not preempt.

As this function is very tied to x86 (interrupts disabled checkings)
it is not intended to be used in MI code.
2011-04-30 23:12:37 +00:00
Kevin Lo
5aaea65247 Add urtw(4) 2011-04-29 06:36:39 +00:00
Jung-uk Kim
c34e9dbee1 Define "Hypervisor Present" bit. This bit is used by several hypervisors to
identify CPUs running under emulation.  Currently QEMU-KVM, Xen-HVM, VMware,
and MS Hyper-V are known to set this bit.

MFC after:	3 days
2011-04-28 22:23:39 +00:00
Attilio Rao
2be767e069 Add the watchdogs patting during the (shutdown time) disk syncing and
disk dumping.
With the option SW_WATCHDOG on, these operations are doomed to let
watchdog fire, fi they take too long.

I implemented the stubs this way because I really want wdog_kern_*
KPI to not be dependant by SW_WATCHDOG being on (and really, the option
only enables watchdog activation in hardclock) and also avoid to
call them when not necessary (avoiding not-volountary watchdog
activations).

Sponsored by:	Sandvine Incorporated
Discussed with:	emaste, des
MFC after:	2 weeks
2011-04-28 16:02:05 +00:00
Rick Macklem
4309e17add This patch changes head so that the default NFS client is now the new
NFS client (which I guess is no longer experimental). The fstype "newnfs"
is now "nfs" and the regular/old NFS client is now fstype "oldnfs".
Although mounts via fstype "nfs" will usually work without userland
changes, an updated mount_nfs(8) binary is needed for kernels built with
"options NFSCL" but not "options NFSCLIENT". Updated mount_nfs(8) and
mount(8) binaries are needed to do mounts for fstype "oldnfs".
The GENERIC kernel configs have been changed to use options
NFSCL and NFSD (the new client and server) instead of NFSCLIENT and NFSSERVER.
For kernels being used on diskless NFS root systems, "options NFSCL"
must be in the kernel config.
Discussed on freebsd-fs@.
2011-04-27 17:51:51 +00:00
Alexander Motin
0d307e0905 - Add shim to simplify migration to the CAM-based ATA. For each new adaX
device in /dev/ create symbolic link with adY name, trying to mimic old ATA
numbering. Imitation is not complete, but should be enough in most cases to
mount file systems without touching /etc/fstab.
 - To know what behavior to mimic, restore ATA_STATIC_ID option in cases
where it was present before.
 - Add some more details to UPDATING.
2011-04-26 17:01:49 +00:00
Maxim Sobolev
f30bc1f3b5 With the typical memory size of the system in tenth of gigabytes
counting memory being dumped in 16MB increments is somewhat silly.
Especially if the dump fails and everything you've got for debugging
is screen filled with numbers in 16 decrements... Replace that with
percentage-based progress with max 10 updates all fitting into one
line.

Collapse other very "useful" piece of crash information (total ram) into
the same line to save some more space.

MFC after:	1 week
2011-04-26 16:14:55 +00:00
Rick Macklem
7c208ed659 Fix the experimental NFS client so that it does not bogusly
set the f_flags field of "struct statfs". This had the interesting
effect of making the NFSv4 mounts "disappear" after r221014,
since NFSMNT_NFSV4 and MNT_IGNORE became the same bit.
Move the files used for a diskless NFS root from sys/nfsclient
to sys/nfs in preparation for them to be used by both NFS
clients. Also, move the declaration of the three global data
structures from sys/nfsclient/nfs_vfsops.c to sys/nfs/nfs_diskless.c
so that they are defined when either client uses them.

Reviewed by:	jhb
MFC after:	2 weeks
2011-04-25 22:22:51 +00:00
Alexander Motin
97b53e3634 Switch the GENERIC kernels for all architectures to the new CAM-based ATA
stack. It means that all legacy ATA drivers are disabled and replaced by
respective CAM drivers. If you are using ATA device names in /etc/fstab or
other places, make sure to update them respectively (adX -> adaY,
acdX -> cdY, afdX -> daY, astX -> saY, where 'Y's are the sequential
numbers for each type in order of detection, unless configured otherwise
with tunables, see cam(4)).

ataraid(4) functionality is now supported by the RAID GEOM class.
To use it you can load geom_raid kernel module and use graid(8) tool
for management. Instead of /dev/arX device names, use /dev/raid/rX.
2011-04-24 08:58:58 +00:00
Konstantin Belousov
3136faa59d Make pmap_invalidate_cache_range() available for consumption on amd64.
Add pmap_invalidate_cache_pages() method on x86. It flushes the CPU
cache for the set of pages, which are not neccessary mapped. Since its
supposed use is to prepare the move of the pages ownership to a device
that does not snoop all CPU accesses to the main memory (read GPU in
GMCH), do not rely on CPU self-snoop feature.

amd64 implementation takes advantage of the direct map. On i386,
extract the helper pmap_flush_page() from pmap_page_set_memattr(), and
use it to make a temporary mapping of the flushed page.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2011-04-18 21:24:42 +00:00
Jung-uk Kim
0e72764232 Add a function rdtsc32() to read lower 32 bits from TSC and discard upper
32 bits.  Some times compiler inserts unnecessary instructions to preserve
unused upper 32 bits even when it is casted to a 32-bit value.  It reduces
such compiler mistakes where every cycle counts.
2011-04-14 16:53:32 +00:00
Jung-uk Kim
4854ae249c Consistently use __volatile as the rest of this file. 2011-04-14 16:19:41 +00:00
Jung-uk Kim
f5ac47f44c Prefer C99 standard integers to reduce diff from i386 version. 2011-04-14 16:14:35 +00:00
Jung-uk Kim
a7817c7ae5 Reduce errors in effective frequency calculation. 2011-04-12 23:49:07 +00:00
Jung-uk Kim
b9e4376214 Reinstate cpu_est_clockrate() support for P-state invariant TSC if APERF and
MPERF MSRs are available.  It was disabled in r216443.  Remove the earlier
hack to subtract 0.5% from the calibrated frequency as DELAY(9) is little
bit more reliable now.
2011-04-12 23:04:01 +00:00
Jung-uk Kim
dd3e254ebd Add forgotten declarations for tsc_perf_stat from the previous commit. 2011-04-12 22:22:01 +00:00
Jung-uk Kim
155094d77a Probe capability to find effective frequency. When the TSC is P-state
invariant, APERF/MPERF ratio can be used to find effective frequency.
2011-04-12 22:15:46 +00:00
Jung-uk Kim
3731174954 Add definitions for CPUID instruction 6, ECX information. 2011-04-12 22:12:23 +00:00
Konstantin Belousov
2140d9c83b Remove setting of PCB_FULL_IRET at the places where we are going to call
update_gdt_{f,g}sbase. The functions set the flag when td == curthread,
and sysarch is always called with curthread.

Reviewed by:	jhb, jkim
MFC after:	1 week
2011-04-08 21:27:31 +00:00
Konstantin Belousov
5ab73cbcba Disable local interrupts before testing the PCB_FULL_IRET flag.
Thread might be preempted after testing, which causes the flag to be
cleared. If ast was not delivered, we will do sysret with potentially
wrong fs/gs bases.

Reviewed by:	jhb, jkim
MFC after:	1 week (together with r220430, r220452)
2011-04-08 21:26:50 +00:00
Ryan Stone
7d6a0bf373 Add tunables that mirror the functionality of sysctls machdep.panic_on_nmi
and machdep.kdb_on_nmi.

Approved by:	emaste (mentor)
MFC after:	1 week
2011-04-08 14:39:41 +00:00
John Baldwin
13fb631aff Fix a bug in the previous change to restore the fast path for syscall
return.  The ast() function may cause a context switch in which case
PCB_FULL_IRET would be set in the pcb.  However, the code was not
rechecking the flag after ast() returned and would not properly restore
the FSBASE and GSBASE MSRs.  To fix, recheck the PCB_FULL_IRET flag after
ast() returns.

While here, trim an instruction (and memory access) from the doreti path
and fix a typo in a comment.

MFC after:	1 week
2011-04-08 13:33:57 +00:00
John Baldwin
ff265077cf Catch up to PCB_FULL_IRET becoming a pcb flag rather than a full field.
MFC after:	3 days
2011-04-08 13:30:48 +00:00
Jung-uk Kim
3453537fa5 Use atomic load & store for TSC frequency. It may be overkill for amd64 but
safer for i386 because it can be easily over 4 GHz now.  More worse, it can
be easily changed by user with 'machdep.tsc_freq' tunable (directly) or
cpufreq(4) (indirectly).  Note it is intentionally not used in performance
critical paths to avoid performance regression (but we should, in theory).
Alternatively, we may add "virtual TSC" with lower frequency if maximum
frequency overflows 32 bits (and ignore possible incoherency as we do now).
2011-04-07 23:28:28 +00:00
John Baldwin
615d2dffa3 pcb_flags is an int, so use testl rather than testq.
Pointy hat to:	jhb
Submitted by:	jkim
MFC after:	1 week
2011-04-07 23:13:22 +00:00
John Baldwin
1438b4ced1 If a system call does not request a full interrupt return, use a fast
path via the sysretq instruction to return from the system call.  This was
removed in 190620 and not quite fully restored in 195486.  This resolves
most of the performance regression in system call microbenchmarks between
7 and 8 on amd64.

Reviewed by:	kib
MFC after:	1 week
2011-04-07 21:32:25 +00:00
Jung-uk Kim
efd393d539 Remove stale checks for RDTSC support. amd64 must have TSC support anyway. 2011-04-07 21:29:34 +00:00
Konstantin Belousov
7332c129e0 Add support for executing the FreeBSD 1/i386 a.out binaries on amd64.
In particular:
- implement compat shims for old stat(2) variants and ogetdirentries(2);
- implement delivery of signals with ancient stack frame layout and
  corresponding sigreturn(2);
- implement old getpagesize(2);
- provide a user-mode trampoline and LDT call gate for lcall $7,$0;
- port a.out image activator and connect it to the build as a module
  on amd64.

The changes are hidden under COMPAT_43.

MFC after:   1 month
2011-04-01 11:16:29 +00:00
Andriy Gapon
a930718af1 Revert r220032:linux compat: add SO_PASSCRED option with basic handling
I have not properly thought through the commit.  After r220031 (linux
compat: improve and fix sendmsg/recvmsg compatibility) the basic
handling for SO_PASSCRED is not sufficient as it breaks recvmsg
functionality for SCM_CREDS messages because now we would need to handle
sockcred data in addition to cmsgcred.  And that is not implemented yet.

Pointyhat to:	avg
2011-03-31 08:14:51 +00:00
Adrian Chadd
dba9c85977 Break out the ath PCI logic into a separate device/module.
Introduce the AHB glue for Atheros embedded systems. Right now it's
hard-coded for the AR9130 chip whose support isn't yet in this HAL;
it'll be added in a subsequent commit.

Kernel configuration files now need both 'ath' and 'ath_pci' devices; both
modules need to be loaded for the ath device to work.
2011-03-31 08:07:13 +00:00
Edward Tomasz Napierala
72bcfc9693 Revert part of r220137, committed by mistake - RACCT is _not_ supposed
to be enabled in GENERIC.
2011-03-29 18:16:49 +00:00
Edward Tomasz Napierala
097055e26d Add racct. It's an API to keep per-process, per-jail, per-loginclass
and per-loginclass resource accounting information, to be used by the new
resource limits code.  It's connected to the build, but the code that
actually calls the new functions will come later.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib (earlier version)
2011-03-29 17:47:25 +00:00
Alan Cox
1c675a3bc3 The new binutils has correctly redefined MAXPAGESIZE on amd64 as 0x200000
instead of 0x100000.  As a side effect, an amd64 kernel now loads at
physical address 0x200000 instead of 0x100000.  This is probably for the
best because it avoids the use of a 2MB page mapping for the first 1MB of
the kernel that also spans the fixed MTRRs.  However, getmemsize() still
thinks that the kernel loads at 0x100000, and so the physical memory between
0x100000 and 0x200000 is lost.  Fix this problem by replacing the hard-wired
constant in getmemsize() by a symbol "kernphys" that is defined by the
linker script.

In collaboration with:	kib
2011-03-28 06:35:17 +00:00
Alan Cox
63740078e0 Amd64 doesn't have a lazypmap ipi. 2011-03-27 16:18:51 +00:00
Andriy Gapon
01a9e1a11b linux compat: add SO_PASSCRED option with basic handling
This seems to have been a part of a bigger patch by dchagin that either
haven't been committed or committed partially.

Submitted by:	dchagin, nox
MFC after:	2 weeks
2011-03-26 11:25:36 +00:00
Andriy Gapon
931f0826ea linux compat: add non-dummy capget and capset system calls, regenerate
And drop dummy definitions for those system calls.
This may transiently break the build.

PR:		kern/149168
Submitted by:	John Wehle <john@feith.com>
Reviewed by:	netchild
MFC after:	2 weeks
2011-03-26 10:59:24 +00:00
Andriy Gapon
1f4ec5a3ba linux compat: add non-dummy capget and capset system calls
PR:		kern/149168
Submitted by:	John Wehle <john@feith.com>
Reviewed by:	netchild
MFC after:	2 weeks
2011-03-26 10:51:56 +00:00
Dmitry Chagin
acface683e Export the correct AT_PLATFORM value.
Since signal trampolines are copied to the shared page do not need to
leave place on the stack for it. Forgotten in the previous commit.

MFC after:	1 Week
2011-03-26 09:25:35 +00:00
Alan Cox
1587dfd730 Move an external declaration to the appropriate header file. 2011-03-26 06:21:05 +00:00
Jung-uk Kim
cd45fec044 Improve CPU identifications of various IDT/Centaur/VIA, Rise and Transmeta
CPUs.  These CPUs need explicit MSR configuration to expose ceratin CPU
capabilities (e.g., CMPXCHG8B) to work around compatibility issues with
ancient software.  Unfortunately, Rise mP6 does not set the CX8 bit in CPUID
and there is no MSR to expose the feature although all mP6 processors are
capable of CMPXCHG8B according to datasheets I found from the Net.  Clean up
and simplify VIA PadLock detection while I am in the neighborhood.
2011-03-26 02:02:07 +00:00
Jeff Roberson
e4cd31dd3c - Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
   and other miscellaneous small features.
2011-03-21 09:40:01 +00:00
Bjoern A. Zeeb
d2b74735b8 For now remove options FLOWTABLE from the remaining GENERIC kernel
configurations and make it opt-in for those who want it.  LINT will
still build it.

While it may be a perfect win in some scenarios, it still troubles users
(see PRs) in general cases.  In addition we are still allocating resources
even if disabled by sysctl and still leak arp/nd6 entries in case of
interface destruction.

Discussed with:	qingli (2010-11-24, just never executed)
Discussed with: juli (OCTEON1)
PR:		kern/148018, kern/155604, kern/144917, kern/146792
MFC after:	2 weeks
2011-03-19 15:50:34 +00:00
Jung-uk Kim
38b8542ca9 Deprecate tsc_present as the last of its real consumers finally disappeared. 2011-03-15 17:19:52 +00:00
David Christensen
dd46ab31de - Initial release of bxe(4) to support Broadcom NetXtreme II 10GbE.
(BCM57710, BCM57711, BCM57711E)

MFC after:	One month
2011-03-14 22:42:41 +00:00
Dmitry Chagin
8f1e49a638 Enable shared page use for amd64/linux32 and i386/linux binaries.
Move signal trampoline code from the top of the stack to the shared page.

MFC after:	2 Weeks
2011-03-13 14:58:02 +00:00
Andriy Gapon
d549ef5638 add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls
Regenerate system call and systrace support files.

PR:		kern/152822
Submitted by:	Artem Belevich <fbsdlist@src.cx>
Reviewed by:	jhb (earlier version)
MFC after:	3 weeks
2011-03-12 08:58:19 +00:00
Andriy Gapon
56ede1074e add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls
This commits makes necessary changes in syscall/sysent generation
infrastructure.

PR:		kern/152822
Submitted by:	Artem Belevich <fbsdlist@src.cx>
Reviewed by:	jhb (ealier version)
MFC after:	3 weeks
2011-03-12 08:51:43 +00:00
Andriy Gapon
136882cf92 amd64/NOTES: use a greater number in KSTACK_PAGES example
This is a minor cosmetic change - the users are more likely to want to
increase (rather than decrease) default kernel stack size,
which is already 4 pages on amd64.

MFC after:	4 days
2011-03-11 19:21:42 +00:00
Matthew D Fleming
c77715ef6c Mostly revert r219468, as I had misremembered the C standard regarding
the size of an extern array.

Keep one change from strncpy to strlcpy.
2011-03-11 18:56:55 +00:00
Jung-uk Kim
79422085d4 Add a tunable "machdep.disable_tsc" to turn off TSC. Specifically, it turns
off boot-time CPU frequency calibration, DELAY(9) with TSC, and using TSC as
a CPU ticker.  Note tsc_present does not change by this tunable.
2011-03-11 00:44:32 +00:00
Matthew D Fleming
cd67ac41ae Use MAXPATHLEN rather than the size of an extern array when copying the
kernel name.  Also consistenly use strlcpy().

Suggested by:	Warner Losh
2011-03-10 22:56:00 +00:00
Jung-uk Kim
bc34c87e81 Deprecate rarely used tsc_is_broken. Instead, we zero out tsc_freq because
it is almost always used with tsc_freq any way.
2011-03-10 20:02:58 +00:00
Julian Elischer
a8066a9d3b Add a small change to the comment in the GENRIC config files that include udbp
Submitted by:	Chris Forgron, cforgeron at acsi dot ca
MFC after:	1 week
2011-03-09 17:15:11 +00:00
Dmitry Chagin
e5d81ef1b5 Extend struct sysvec with new method sv_schedtail, which is used for an
explicit process at fork trampoline path instead of eventhadler(schedtail)
invocation for each child process.

Remove eventhandler(schedtail) code and change linux ABI to use newly added
sysvec method.

While here replace explicit comparing of module sysentvec structure with the
newly created process sysentvec to detect the linux ABI.

Discussed with:	kib

MFC after:	2 Week
2011-03-08 19:01:45 +00:00
Dmitry Chagin
372f5e052f Remove dead code.
MFC after:	1 Week
2011-03-07 08:12:07 +00:00
Alan Cox
cb25117d54 Make a change to the implementation of the direct map to improve performance
on processors that support 1 GB pages.  Specifically, if the end of physical
memory is not aligned to a 1 GB page boundary, then map the residual
physical memory with multiple 2 MB page mappings rather than a single 1 GB
page mapping.  When a 1 GB page mapping is used for this residual memory,
access to the memory is slower than when multiple 2 MB page mappings are
used.  (I suspect that the reason for this slowdown is that the TLB is
actually being loaded with 4 KB page mappings for the residual memory.)

X-MFC after:	r214425
2011-03-02 00:24:07 +00:00
Robert Watson
74b5505e5d Continue to introduce Capsicum capability mode:
White list sysarch calls allowed in capability mode; arguably, there
should be some link between the capability mode model and the privilege
model here.  Sysarch is a morass similar to ioctl, in many senses.

Submitted by:	anderson
Discussed with:	benl, kris, pjd
Sponsored by:	Google, Inc.
Obtained from:	Capsicum Project
MFC after:	3 months
2011-03-01 13:35:48 +00:00
Rebecca Cran
6bccea7c2b Fix typos - remove duplicate "the".
PR:	bin/154928
Submitted by:	Eitan Adler <lists at eitanadler.com>
MFC after: 	3 days
2011-02-21 09:01:34 +00:00
Alan Cox
e6ffa21488 Remove pmap fields that are either unused or not fully implemented.
Discussed with:	kib
2011-02-17 15:36:29 +00:00
Dmitry Chagin
dc4f0a9e11 To avoid excessive code duplication create wrapper for fill regs
from stack frame. Change the trap() code to use newly created function
instead of explicit regs assignment.
2011-02-16 17:50:21 +00:00
Dmitry Chagin
09d6cb0a23 For realtime signals fill the sigval value. 2011-02-15 21:46:36 +00:00
Dmitry Chagin
fde6316272 Sort include files in the alphabetical order. 2011-02-13 19:07:48 +00:00
Dmitry Chagin
222198ab0b Move linux_clone(), linux_fork(), linux_vfork() to a MI path. 2011-02-12 18:17:12 +00:00
Dmitry Chagin
c8d6845e9e In preparation for moving linux_clone() to a MI path
introduce linux_set_upcall_kse().
2011-02-12 16:33:00 +00:00
Dmitry Chagin
2c7660ba3e In preparation for moving linux_clone () to a MI path
move the TLS code in a separate function.

Use function parameter instead of direct using register.
2011-02-12 15:50:21 +00:00
Dmitry Chagin
9bd9b52478 Regen for r218610. 2011-02-12 15:36:25 +00:00
Dmitry Chagin
f91ea2518b The fourth argument of linux_clone is a pointer to the TLS. Change clone syscall definition to match actual linux one. 2011-02-12 15:33:25 +00:00
Konstantin Belousov
6f9ec5aab0 Clear the padding when returning context to the usermode, for
MI ucontext_t and x86 MD parts.
Kernel allocates the structures on the stack, and not clearing
reserved fields and paddings causes leakage.

Noted and discussed with:	bde
MFC after:	2 weeks
2011-02-05 15:10:27 +00:00
Matthew D Fleming
08b163fa51 Put the general logic for being a CPU hog into a new function
should_yield().  Use this in various places.  Encapsulate the common
case of check-and-yield into a new function maybe_yield().

Change several checks for a magic number of iterations to use
should_yield() instead.

MFC after:	1 week
2011-02-02 16:35:10 +00:00
Dmitry Chagin
77192fddeb Regen for r218101.
MFC after:	1 Month.
2011-01-30 20:38:26 +00:00
Dmitry Chagin
8d73c2bfd1 Change linux futex syscall definition to match actual linux one.
MFC after:	1 Month.
2011-01-30 20:31:43 +00:00
Dmitry Chagin
9adaae9403 The kern_wait() code already removes the SIGCHLD signal for the waited
process. Removing other SIGCHLD signals is not needed and may cause
problems.

Pointed out by:	jilles

MFC after:	1 Month.
2011-01-30 18:17:38 +00:00
Dmitry Chagin
572fb2e33e My style(9) bug.
Pointed out by:	kib

MFC after:	1 Month.
2011-01-29 07:22:33 +00:00
Dmitry Chagin
adc7ece00a Implement a variation of the linux_common_wait() which should
be used by linuxolator itself.

Move linux_wait4() to MD path as it requires native struct
rusage translation to struct l_rusage on linux32/amd64.

MFC after:	1 Month.
2011-01-28 18:47:07 +00:00
Dmitry Chagin
53c74fc607 To avoid excessive code duplication move struct rusage translation
to a separate function.

MFC after:	1 Month.
2011-01-28 18:28:06 +00:00
Konstantin Belousov
77185f473b linux_sigreturn() loads the struct trapframe from l_sigcontext
members, thus making a signed extension of 32 bit register
context. If the register is not touched in usermode between
return from signal and next syscall entry, the sign-extension
part of 64bit register is not cleared, causing
linux32_fetch_syscall_args() to read wrong values.

Use unsigned type for the registers in the linux sigcontext.

Reported by:	Jacob Frelinger <jacob.frelinger duke edu>, arundel
In collaboration with:	dchagin
MFC after:	1 week
2011-01-27 21:45:38 +00:00
Dmitry Chagin
a5c1afadeb Add macro to test the sv_flags of any process. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures.

MFC after:	1 month
2011-01-26 20:03:58 +00:00
Matthew D Fleming
f89f7ada8d Set td_kstack_pages for thread0. This was already being done for most
architectures, but i386 and amd64 were missing it.

Submitted by:	Mohd Fahadullah <mfahadullah AT isilon DOT com>
2011-01-26 17:06:13 +00:00
Sergey Kandaurov
4053b05b91 Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize.
Submitted by:	perryh pluto.rain.com (previous version)
Reviewed by:	jhb
Approved by:	kib (mentor)
Tested by:	universe
2011-01-21 10:26:26 +00:00
Konstantin Belousov
de64ee1a30 Use CTLFLAG_RDTUN for read-only sysctl that exports tunable.
Reminded by:	pjd
MFC after:	6 days
2011-01-19 21:35:48 +00:00
Konstantin Belousov
9e52b8b629 Make the length of the LDT a loader tunable, machdep.max_ldt_segment,
and export it with read-only sysctl. Remove unused defines.

Reviewed by:	jhb (previous version)
MFC after:	1 week
2011-01-18 23:00:22 +00:00
Konstantin Belousov
a05c98a099 Use malloc(9) instead of kmem_alloc(9) for temporal copy of the
user-supplied descriptor array.

Noted and reviewed by:	jhb (previous version)
MFC after:	1 week
2011-01-18 22:56:10 +00:00
John Baldwin
6bd823f334 - Remove some always-true checks (checking for unsigned < 0).
- Only check largs->num against max_ldt_segment on amd64 for I386_SET_LDT
  when descriptors are provided.  Specifically, allow the 'start == 0'
  and 'num == 0' special case used to free all LDT entries that previously
  failed with EINVAL.

Submitted by:	clang via rdivacky (some of 1)
Reviewed by:	kib
2011-01-18 16:43:01 +00:00
Jung-uk Kim
2fea643112 Add reader/writer lock around mem_range_attr_get() and mem_range_attr_set().
Compile sys/dev/mem/memutil.c for all supported platforms and remove now
unnecessary dev_mem_md_init().  Consistently define mem_range_softc from
mem.c for all platforms.  Add missing #include guards for machine/memdev.h
and sys/memrange.h.  Clean up some nearby style(9) nits.

MFC after:	1 month
2011-01-17 22:58:28 +00:00
Jung-uk Kim
df74996c3d Avoid preemption while manipulating CRs and MTRRs.
Tested by:	ariff
2011-01-17 17:30:35 +00:00
Jung-uk Kim
bdbf2db5b2 Remove redundant, bogus, and even harmful uses of setting TS bit in CR0.
It is done from fpstate_drop() when it is really necessary.

Reviewed by:	kib
MFC after:	1 week
2011-01-14 21:09:01 +00:00
Matthew D Fleming
240577c2a7 Fix up a few more sysctl(9) mis-typing found in various LINT builds. 2011-01-13 18:20:27 +00:00
John Baldwin
072e9838e2 If an interrupt on an I/O APIC is moved to a different CPU after it has
started to execute, it seems that the corresponding ISR bit in the "old"
local APIC can be cleared.  This causes the local APIC interrupt routine
to fail to find an interrupt to service.  Rather than panic'ing in this
case, simply return from the interrupt without sending an EOI to the
local APIC.  If there are any other pending interrupts in other ISR
registers, the local APIC will assert a new interrupt.

Tested by:	steve
2011-01-13 17:00:22 +00:00
Matthew D Fleming
fbbb13f962 sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.
Commit the kernel changes.
2011-01-12 19:54:19 +00:00
Konstantin Belousov
50a57dfbec Move repeated MAXSLP definition from machine/vmparam.h to sys/vmmeter.h.
Update the outdated comments describing MAXSLP and the process
selection algorithm for swap out.

Comments wording and reviewed by:	alc
2011-01-09 12:50:44 +00:00
Tijl Coosemans
d22e78d6b9 Copy powerpc/include/_inttypes.h to x86 and replace i386/amd64/pc98
headers with stubs.

Approved by:	kib (mentor)
2011-01-08 18:09:48 +00:00
Konstantin Belousov
6297a3d843 Create shared (readonly) page. Each ABI may specify the use of page by
setting SV_SHP flag and providing pointer to the vm object and mapping
address. Provide simple allocator to carve space in the page, tailored
to put the code with alignment restrictions.

Enable shared page use for amd64, both native and 32bit FreeBSD
binaries.  Page is private mapped at the top of the user address
space, moving a start of the stack one page down. Move signal
trampoline code from the top of the stack to the shared page.

Reviewed by:	 alc
2011-01-08 16:13:44 +00:00
Tijl Coosemans
a56e818f29 On mixed 32/64 bit architectures (mips, powerpc) use __LP64__ rather than
architecture macros (__mips_n64, __powerpc64__) when 64 bit types (and
corresponding macros) are different from 32 bit. [1]

Correct the type of INT64_MIN, INT64_MAX and UINT64_MAX.

Define (U)INTMAX_C as an alias for (U)INT64_C matching the type definition
for (u)intmax_t. Do this on all architectures for consistency.

Suggested by:	bde [1]
Approved by:	kib (mentor)
2011-01-08 12:43:05 +00:00
Tijl Coosemans
9858863cd4 Fix types of some values in machine/_limits.h.
On some architectures UCHAR_MAX and USHRT_MAX had type unsigned int.
However, lacking integer suffixes for types smaller than int, their type
should correspond to that of an object of type unsigned char (or short)
when used in an expression with objects of type int. In that case unsigned
char (short) are promoted to int (i.e. signed) so the type of UCHAR_MAX and
USHRT_MAX should also be int.

Where MIN/MAX constants implicitly have the correct type the suffix has
been removed.

While here, correct some comments.

Reviewed by:	bde
Approved by:	kib (mentor)
2011-01-08 11:13:34 +00:00
Konstantin Belousov
39198f15ee Add AT_STACKPROT elf aux vector. Will be used to inform rtld about the
initial stack protection set by the kernel image activator.
2011-01-07 14:22:34 +00:00
Jung-uk Kim
50e3cec377 Increase size of pcb_flags to four bytes.
Requested by:	bde, jhb
2010-12-22 19:57:03 +00:00
Jung-uk Kim
e6c006d96a Improve PCB flags handling and make it more robust. Add two new functions
for manipulating pcb_flags.  These inline functions are very similar to
atomic_set_char(9) and atomic_clear_char(9) but without unnecessary LOCK
prefix for SMP.  Add comments about the rationale[1].  Use these functions
wherever possible.  Although there are some places where it is not strictly
necessary (e.g., a PCB is copied to create a new PCB), it is done across
the board for sake of consistency.  Turn pcb_full_iret into a PCB flag as
it is safe now.  Move rarely used fields before pcb_flags and reduce size
of pcb_flags to one byte.  Fix some style(9) nits in pcb.h while I am in
the neighborhood.

Reviewed by:	kib
Submitted by:	kib[1]
MFC after:	2 months
2010-12-22 00:18:42 +00:00
Tijl Coosemans
81bd5041a2 Merge amd64 and i386 bus.h and move the resulting header to x86. Replace
the original amd64 and i386 headers with stubs.

Rename (AMD64|I386)_BUS_SPACE_* to X86_BUS_SPACE_* everywhere.

Reviewed by:	imp (previous version), jhb
Approved by:	kib (mentor)
2010-12-20 16:39:43 +00:00
Konstantin Belousov
7222d2fbee Inform a compiler which asm statements in the x86 implementation of
atomics change eflags.

Reviewed by:	jhb
MFC after:	2 weeks
2010-12-18 16:41:11 +00:00
Jung-uk Kim
e1c9d39ebe Stop lying about supporting cpu_est_clockrate() when TSC is invariant. This
function always returned the nominal frequency instead of current frequency
because we use RDTSC instruction to calculate difference in CPU ticks, which
is supposedly constant for the case.  Now we support cpu_get_nominal_mhz()
for the case, instead.  Note it should be just enough for most usage cases
because cpu_est_clockrate() is often times abused to find maximum frequency
of the processor.
2010-12-14 20:07:51 +00:00
Robert Watson
9c9f06e60d Add options NO_ADAPTIVE_SX to the XENHVM kernel configuration, matching
its similar disabling of adaptive mutexes and rwlocks.  The existing
comment on why this is the case also applies to sx locks.

MFC after:	3 days
Discussed with:	attilio
2010-12-13 12:15:46 +00:00
Konstantin Belousov
60c7c84e85 In fpudna()/npxdna(), mark FPU context initialized and optionally
mark user FPU context initialized, if current context is user context.
It was reversed in r215865, by inadequate change of this code fragment
to a call to fpuuserinited()/npxuserinited().

The issue is only relevant for in-kernel users of FPU.

Reported by:	Jan Henrik Sylvester <me janh de>, Mike Tancsa <mike sentex net>
Tested by:	Mike Tancsa
MFC after:	3 days
2010-12-12 16:16:39 +00:00
Robert Watson
996177338f Derive the XENHVM kernel from GENERIC, adding only the options required
to support PV drivers (such as xenpci), and non-adptive locking (along
with a comment about why).

This change eliminates the synchronisation problem between GENERIC and
XENHVM, which had become severely rotted in HEAD, and in 8-STABLE
included non-production kernel debugging features such as WITNESS.

However, it comes at the cost of enabling devices and options that may
not be present under Xen (such as random ethernet cards).  For now, opt
for a simpler kernel configuration file rather than using nooptions/
nodevice to enumerate and eliminate them.  This leads to a somewhat
larger XENHVM kernel.

This is an MFC candidate for 8-STABLE before 8.2, in order to provide
a production-worthy XENHVM kernel configuration for amd64.

Discussed with:	gibbs, cperciva
Reported by:	Piete Brooks <Piete.Brooks at cl.cam.ac.uk>
Sponsored by:	DARPA, AFRL
MFC after:	3 days
2010-12-10 22:22:01 +00:00
Colin Percival
91ff9dc058 Replace i386/i386/busdma_machdep.c and amd64/amd64/busdma_machdep.c
(which are identical) with a single x86/x86/busdma_machdep.c.
2010-12-09 06:41:50 +00:00
Jung-uk Kim
71e0b05797 Do not subtract 0.5% from estimated frequency if DELAY(9) is driven by TSC.
Remove a confusing comment about converting to MHz as we never did.
2010-12-08 23:40:41 +00:00
Colin Percival
af60888734 On amd64, we have (since r1.72, in December 2005) MAX_BPAGES=8192,
while on i386 we have MAX_BPAGES=512.  Implement this difference via
'#ifdef __i386__'.

With this commit, the i386 and amd64 busdma_machdep.c files become
identical; they will soon be replaced by a single file under sys/x86.
2010-12-08 20:20:10 +00:00
Colin Percival
ec195da48a MFi386 r1.94: If XEN, make pmap_kextract = pmap_kextract_ma. This is a
no-op currently, since FreeBSD/amd64 doesn't have (paravirtualized) Xen
support, but if/when that support is ever added we'll want this, and
until then it's harmless.
2010-12-08 19:52:04 +00:00
Colin Percival
81261a5a6d MFi386 r1.81, r1.82, r1.84: Reorganize code to reduce cache pressure and
branch mispredictions.

No objections from:	scottl
2010-12-08 19:42:21 +00:00
Jung-uk Kim
dd7d207dcb Merge sys/amd64/amd64/tsc.c and sys/i386/i386/tsc.c and move to sys/x86/x86.
Discussed with:	avg
2010-12-08 00:09:24 +00:00
Jung-uk Kim
7214d5d75b Remove stale comments about P-state invariant TSC and fix style(9) nits. 2010-12-07 22:43:25 +00:00
Jung-uk Kim
1bcc28295b Do not register a event handler for CPU freqency changes when it is found
P-state invariant.  This is continuation of r216274.
2010-12-07 22:34:51 +00:00
Jung-uk Kim
4a9c4056dc Now the P-state invariant TSC is probed early enough, do not register event
handlers for CPU freqency changes when it is found P-state invariant.
Adjust a comment about non-existent tsc_freq_max() while I am here.
2010-12-07 22:23:26 +00:00
Jung-uk Kim
78a661bbaa Probe P-state invariant TSC from rightful place. 2010-12-07 22:12:02 +00:00
Konstantin Belousov
1b3c32568a Update some comments related to use of amd64 full context switch.
In exec_linux_setregs(), use locally cached pointer to pcb to set
pcb_full_iret.
In set_regs(), note that full return is needed when code that sets
segment registers is enabled.

MFC after:	1 week
2010-12-07 12:44:33 +00:00
Konstantin Belousov
0f0170e66a Retire write-only PCB_FULLCTX pcb flag on amd64.
Reminded by:	Petr Salinger <Petr.Salinger seznam cz>
Tested by:	pho
MFC after:	1 week
2010-12-07 12:17:43 +00:00
Konstantin Belousov
3e0ddb6781 Do not leak %rdx value in the previous image to the new image after
execve(2). Note that ia32 binaries already handle this properly,
since ia32_setregs() resets td_retval[1], but not exec_setregs().

We still do not conform to the amd64 ABI specification, since %rsp
on the image startup is not aligned to 16 bytes.

PR:	amd64/124134
Discussed with:	Petr Salinger <Petr.Salinger seznam cz>
	(who convinced me that there is indeed several bugs)
MFC after:	1 week
2010-12-06 15:15:27 +00:00
Jung-uk Kim
2f7ab7e85d Revert r216161. It is not necessary because we zero-fill BSS anyway.
Requested by:	jhb
2010-12-03 22:27:51 +00:00
Jung-uk Kim
b14fe63392 Explicitly initialize TSC frequency. To calibrate TSC frequency, we use
DELAY(9) and it may use TSC in turn if TSC frequency is non-zero.

MFC after:	3 days
2010-12-03 21:54:10 +00:00
Jung-uk Kim
e391a266ed Do not change CPU ticker frequency if TSC is P-state invariant. Note this
change was meant to be committed with r184102 (and its subsequent MFCs) but
it fell off somehow.

Pointyhat to:	jkim
MFC after:	3 days
2010-12-03 21:06:30 +00:00
Rebecca Cran
c90f7d9b44 Revert r216134. This checkin broke platforms where bus_space are macros:
they need to be a single statement, and do { } while (0) doesn't work in this
situation so revert until a solution can be devised.
2010-12-03 07:09:23 +00:00
Rebecca Cran
15b4888a24 Disallow passing in a count of zero bytes to the bus_space(9) functions.
Passing a count of zero on i386 and amd64 for [I386|AMD64]_BUS_SPACE_MEM
causes a crash/hang since the 'loop' instruction decrements the counter
before checking if it's zero.

PR:	kern/80980
Discussed with:	jhb
2010-12-02 22:19:30 +00:00
Konstantin Belousov
c6fb218c3c Calling fill_fpregs() for curthread is legitimate, and ELF coredump
does this.

Reported and tested by:	pho
MFC after:	5 days
2010-11-28 17:56:34 +00:00
Alan Cox
686b00d691 Make the size of the direct map easily configurable. Changing NDMPML4E
now suffices.

Increase the size of the direct map to 1TB.

An earler version of this patch was tested by sbruno@.
2010-11-26 19:36:26 +00:00
Konstantin Belousov
5c6eb03790 Remove npxgetregs(), npxsetregs(), fpugetregs() and fpusetregs()
functions, they are unused. Remove 'user' from npxgetuserregs()
etc. names.

For {npx,fpu}{get,set}regs(), always use pcb->pcb_user_save for FPU
context storage. This eliminates the need for ugly copying with
overwrite of the newly added and reserved fields in ucontext on i386
to satisfy alignment requirements for fpusave() and fpurstor().

pc98 version was copied from i386.

Suggested and reviewed by:	bde
Tested by:    pho (i386 and amd64)
MFC after:    1 week
2010-11-26 14:50:42 +00:00
Tijl Coosemans
ce4ec51dbe Merge amd64/i386 _align.h by aligning on the size of register_t (copied
from powerpc).

Reviewed by:	imp, jhb
Approved by:	kib (mentor)
2010-11-26 10:59:20 +00:00
Ulrich Spörlein
02604cd4f4 Remove kernel support for BB profiling, now that kernbb(8) is gone, too.
PR:		bin/83558
Reviewed by:	jkim
2010-11-26 08:11:43 +00:00
Dimitry Andric
1496505287 Apply the same fix as in r215823 to sys/amd64/amd64/fpu.c: use
unambiguous inline assembly to load a float variable.
2010-11-25 22:19:40 +00:00
Dimitry Andric
cfe92f33bc Change ambiguous (or invalid, depending on how strict you want to be :)
assembly instruction "movw %rcx,2(%rax)" to "movw %cx,2(%rax)", since
the intent was to move 16 bits of data, in this case.

Found by:	clang
Reviewed by:	kib
2010-11-24 18:35:11 +00:00
Jung-uk Kim
d2d0fda841 Remove a stale tunable introduced in r215703. 2010-11-23 17:28:23 +00:00
Jung-uk Kim
42ca4a29de Reinitialize PAT MSR via pmap_init_pat() while resuming. This function does
better job since r215703 and it is safer now.
2010-11-23 16:12:35 +00:00
Andriy Gapon
9b984feb3d specialreg.h: add definitions for some useful bits found in CPUID.6 EAX and ECX
CPUID.6 is defined as Thermal and Power Management Leaf by both Intel
and AMD.

Reviewed by:	jhb
MFC after:	7 days
2010-11-23 13:55:30 +00:00
Jung-uk Kim
7dd052c1d9 - Disable caches and flush caches/TLBs when we update PAT as we do for MTRR.
Flushing TLBs is required to ensure cache coherency according to the AMD64
architecture manual.  Flushing caches is only required when changing from a
cacheable memory type (WB, WP, or WT) to an uncacheable type (WC, UC, or
UC-).  Since this function is only used once per processor during startup,
there is no need to take any shortcuts.
- Leave PAT indices 0-3 at the default of WB, WT, UC-, and UC.  Program 5 as
WP (from default WT) and 6 as WC (from default UC-).  Leave 4 and 7 at the
default of WB and UC.  This is to avoid transition from a cacheable memory
type to an uncacheable type to minimize possible cache incoherency.  Since
we perform flushing caches and TLBs now, this change may not be necessary
any more but we do not want to take any chances.
- Remove Apple hardware specific quirks.  With the above changes, it seems
this hack is no longer needed.
- Improve pmap_cache_bits() with an array to map PAT memory type to index.
This array is initialized early from pmap_init_pat(), so that we do not need
to handle special cases in the function any more.  Now this function is
identical on both amd64 and i386.

Reviewed by:	jhb
Tested by:	RM (reuf_m at hotmail dot com)
		Ryszard Czekaj (rychoo at freeshell dot net)
		army.of.root (army dot of dot root at googlemail dot com)
MFC after:	3 days
2010-11-22 19:52:44 +00:00
Andriy Gapon
b43d292565 specialreg.h: add definitions for MPERF/APERF pair of MSRs
These MSRs can be used to determine actual (average) performance as
compared to a maximum defined performance.
Availability of these MSRs is indicated by bit0 in CPUID.6.ECX on both
Intel and AMD processors.

MFC after:	5 days
2010-11-19 15:07:36 +00:00
Andriy Gapon
7af7c7624a specialreg.h: add AMD-specific "Hardware Configuration Register" MSR
It seems that this MSR has been available in a range of AMD processors
families for quite a while now.

Note1: not all AMD MSRs that are found in amd64 specialreg.h are also in
the i386 version.
Note2: perhaps some additional name component is needed to distinguish
AMD-specific MSRs.

MFC after:	5 days
2010-11-19 15:00:20 +00:00
Andriy Gapon
8fd6d51347 specialreg.h: add definition for AMD Core Performance Boost bit
This bit indicates availability of the feature.

MFC after:	4 days
2010-11-19 14:46:17 +00:00
Jung-uk Kim
816b3bd1b0 Restore CR0 after MTRR initialization for correctness sakes. There will be
no noticeable change because we enable caches before we enter here for both
BSP and AP cases.  Remove another pointless optimization for CR4.PGE bit
while I am here.
2010-11-16 23:26:02 +00:00
Jung-uk Kim
50083a5624 Invalidate TLBs explicitly. r1.4 of sys/i386/i386/i686_mem.c removed this
code but probably it only worked by chance because modifying CR4.PGE bit
causes invlidation of entire TLBs.  Since these are very rare events, this
micro-optimization seems useless.

Reviewed by:	jhb
2010-11-16 22:44:58 +00:00
Konstantin Belousov
7022f954c3 Do not use __FreeBSD_version prefix for the special osrel version.
The ports/Mk/bsd.port.mk uses sys/param.h to fetch osrel, and cannot
grok several constants with the prefix.

Reported and tested by:	    swell.k gmail com
MFC after:   1 week
2010-11-14 21:59:11 +00:00
Konstantin Belousov
94bce4535d Use symbolic names instead of hardcoding values for magic p_osrel constants.
MFC after:   1 week
2010-11-14 18:24:12 +00:00
Jung-uk Kim
19da400c64 Move identical copies of apm_bios.h to sys/x86/include, replace them with
stubs, and adjust PC98 stub accordingly.

Reviewed by:	imp, nyan
2010-11-11 19:36:21 +00:00
Andriy Gapon
290e14f881 amd64: introduce minidump version 2
After KVA space was increased to 512GB on amd64 it became impractical
to use PTEs as entries in the minidump map of dumped pages, because size
of that map alone would already be 1GB.
Instead, we now use PDEs as page map entries and employ two stage lookup
in libkvm: virtual address -> PDE -> PTE -> physical address.  PTEs are
now dumped as regular pages.  Fixed page map size now is 2MB.

libkvm keeps support for accessing amd64 minidumps of version 1.
Support for 1GB pages is added.

Many thanks to Alan Cox for his guidance, numerous reviews, suggestions,
enhancments and corrections.

Reviewed by:	alc [kernel part]
MFC after:	15 days
2010-11-11 18:35:28 +00:00
Jung-uk Kim
93a8847473 Make APM emulation look more closer to its origin. Use device_get_softc(9)
instead of hardcoding acpi(4) unit number as we have device_t for it.
2010-11-10 18:50:12 +00:00
Jung-uk Kim
7c2bf852d7 Refactor acpi_machdep.c for amd64 and i386, move APM emulation into a new
file acpi_apm.c, and place it on sys/x86/acpica.
2010-11-10 01:29:56 +00:00
John Baldwin
961135ead8 - Remove <machine/mutex.h>. Most of the headers were empty, and the
contents of the ones that were not empty were stale and unused.
- Now that <machine/mutex.h> no longer exists, there is no need to allow it
  to override various helper macros in <sys/mutex.h>.
- Rename various helper macros for low-level operations on mutexes to live
  in the _mtx_* or __mtx_* namespaces.  While here, change the names to more
  closely match the real API functions they are backing.
- Drop support for including <sys/mutex.h> in assembly source files.

Suggested by:	bde (1, 2)
2010-11-09 20:46:41 +00:00
Attilio Rao
fcb250f392 Move the mptable.h under x86/include/.
Sponsored by:	Sandvine Incorporated
MFC after:	14 days
2010-11-09 20:28:09 +00:00
Jung-uk Kim
cedd86cafa Now OsdEnvironment.c is identical on amd64 and i386. Move it to a new home. 2010-11-09 00:27:18 +00:00
Jung-uk Kim
2473325fa8 Reduce diff between platforms and fix style(9) bugs. 2010-11-09 00:14:39 +00:00
John Baldwin
13e25cb7a5 Move the MADT parser for amd64 and i386 to sys/x86/acpica now that it is
identical on both platforms.
2010-11-08 20:57:02 +00:00
John Baldwin
f67b4bd367 A few small style and whitespace fixes. 2010-11-08 20:05:22 +00:00
Alan Cox
d9a799683c Don't call pmap_demote_DMAP() on MTRR entries from the BIOS that are marked
as "bogus".

Reported by:	Jia-Shiun Li
2010-11-07 21:48:49 +00:00
John Baldwin
0108cce0a4 Adjust the order of operations in spinlock_enter() and spinlock_exit() to
work properly with single-stepping in a kernel debugger.  Specifically,
these routines have always disabled interrupts before increasing the nesting
count and restored the prior state of interrupts after decreasing the nesting
count to avoid problems with a nested interrupt not disabling interrupts
when acquiring a spin lock.  However, trap interrupts for single-stepping
can still occur even when interrupts are disabled.  Now the saved state of
interrupts is not saved in the thread until after interrupts have been
disabled and the nesting count has been increased.  Similarly, the saved
state from the thread cannot be read once the nesting count has been
decreased to zero.  To fix this, use temporary variables to store interrupt
state and shuffle it between the thread's MD area and the appropriate
registers.

In cooperation with:	bde
MFC after:     1 month
2010-11-05 13:42:58 +00:00
Andriy Gapon
3b50d59fef x86 topo_probe: do not probe smp topology if only one cpu is visible
This could lead to a division by zero if hardware is multi-core and/or
multi-threaded, but for some (quite unusual) reason FreeBSD sees only
one logical processor.  This could happen, for example, if neither MADT
nor MP Table are presented by BIOS.

Also:
- assert in topo_probe_0x4 that BSP is accounted for
- neither cpu_cores nor cpu_logical should be zero after successful
  probing, so either being zero is an indication of failed probing

Reported by:	vwe, Dan Allen <danallen46@airwired.net>
Tested by:	Dan Allen <danallen46@airwired.net>
MFC after:	3 days
2010-11-04 08:51:45 +00:00
John Baldwin
32c3d3b6e6 Move <machine/apicreg.h> to <x86/apicreg.h>. 2010-11-01 18:18:46 +00:00
John Baldwin
5ecdb3c46b Move the <machine/mca.h> header to <x86/mca.h>. 2010-11-01 17:40:35 +00:00
Alan Cox
2eeee67ce8 Add another safety belt to pmap_demote_DMAP(). 2010-10-30 23:49:37 +00:00
Alan Cox
59fb2d9b04 Don't demote in pmap_demote_DMAP() if the specified length is zero. 2010-10-30 17:21:32 +00:00
Attilio Rao
ba2a27351b Merge nexus.c from amd64 and i386 to x86 subtree.
Sponsored by:	Sandvine Incorporated
Tested by:	gianni
2010-10-28 16:31:39 +00:00
John Baldwin
89d84a4055 Use 'PCPU_GET(apic_id)' to determine the BSP's APIC ID on a UP machine
when routing interrupts instead of cpu_apic_ids[0] since cpu_apic_ids[]
is only populated for multiple-CPU machines.  This also matches what the
code does when SMP is not enabled.

PR:		bin/151616
Tested by:	"Damian S. Kolodziejczyk"  damkol | gmail
Submitted by:	avg
MFC after:	1 week
2010-10-28 13:44:19 +00:00
Attilio Rao
a3da97926d Merge the mptable support from MD bits to x86 subtree.
Sponsored by:	Sandvine Incorporated
Discussed with:	jhb
2010-10-28 07:58:06 +00:00
Alan Cox
92ababa777 [1] According to the x86 architectural specifications, no virtual-to-
physical page mapping should span two or more MTRRs of different types.
Add a pmap function, pmap_demote_DMAP(), by which the MTRR module can
ensure that the direct map region doesn't have such a mapping.

[2] Fix a couple of nearby style errors in amd64_mrset().

[3] Re-enable the use of 1GB page mappings for implementing the direct
map.  (See also r197580 and r213897.)

Tested by:	kib@ on a Westmere-family processor [3]
MFC after:	3 weeks
2010-10-27 16:46:37 +00:00
Attilio Rao
256439c972 Merge dump_machdep.c i386/amd64 under the x86 subtree.
Sponsored by:	Sandvine Incorporated
Tested by:	gianni
2010-10-26 12:46:26 +00:00
John Baldwin
0689bdcc19 Use 'saveintr' instead of 'savecrit' or 'eflags' to hold the state returned
by intr_disable().

Requested by:	bde
2010-10-25 15:31:13 +00:00
John Baldwin
c6390f7ac5 Use intr_disable() and intr_restore() instead of frobbing the flags register
directly to disable interrupts.

Reviewed by:	bde (earlier version)
MFC after:	2 weeks
2010-10-25 15:28:03 +00:00
Alan Cox
353b642ced Update pmap_extract() to handle 1GB page mappings. Some device drivers
use pmap_extract() rather than pmap_kextract() on direct map addresses.
Thus, pmap_extract() needs to be able to deal with 1GB page mappings if
we are to use 1GB page mappings for the direct map.  (See r197580.)
2010-10-15 15:23:34 +00:00
Jung-uk Kim
56b11f84a7 Remove trailing ", " from `sysctl machdep.idle_available' output. 2010-10-12 20:53:12 +00:00
Konstantin Belousov
78ae4338a2 Add macro DECLARE_MODULE_TIED to denote a module as requiring the
kernel of exactly the same __FreeBSD_version as the headers module was
compiled against.

Mark our in-tree ABI emulators with DECLARE_MODULE_TIED. The modules
use kernel interfaces that the Release Engineering Team feel are not
stable enough to guarantee they will not change during the life cycle
of a STABLE branch. In particular, the layout of struct sysentvec is
declared to be not part of the STABLE KBI.

Discussed with:	bz, rwatson
Approved by:	re (bz, kensmith)
MFC after:	2 weeks
2010-10-12 09:18:17 +00:00
Konstantin Belousov
b3b4bec7e6 Regen. 2010-10-08 07:19:05 +00:00
Konstantin Belousov
5d2a6a61b4 Fix typo.
Submitted by:	arundel
MFC after:	3 days
2010-10-08 07:18:44 +00:00
Konstantin Belousov
3f506a78ce Display PCID capability of CPU and add CPUID define for it.
MFC after:	1 week
2010-10-05 15:31:56 +00:00
Konstantin Belousov
2d5db3709b The makectx() function, used by kdb_trap() to reconstruct pcb from
trap frame when trap initiated kdb entry, incorrectly calculated the
value of %rsp for trapped thread.

According to Intel(R) 64 and IA-32 Architectures Software Developer's Manual
Volume 3A: System Programming Guide, Part 1, rev. 035, 6.14.2 64-Bit Mode
Stack Frame, "64-bit mode ... pushes SS:RSP unconditionally, rather than
only on a CPL change."
Even assuming the conditional push of the %ss:%rsp, the calculation
was still wrong because sizeof(tf_ss) + sizeof(tf_rsp) == 16 on amd64.

Always use the tf_rsp from trap frame. The change supposedly fixes
stepping when using kgdb backend for kdb.

Submitted by:	Zhouyi Zhou <zhouzhouyi gmail com>
PR:	amd64/151167
Reviewed by:	avg
MFC after:	1 week
2010-10-03 13:52:17 +00:00
Andriy Gapon
d443a96ffb i386 and amd64 mp_machdep: improve topology detection for Intel CPUs
This patch is significantly based on previous work by jkim.
List of changes:
- added comments that describe topology uniformity assumption
- added reference to Intel Processor Topology Enumeration article
- documented a few global variables that describe topology
- retired weirdly set and used logical_cpus variable
- changed fallback code for mp_ncpus > 0 case, so that CPUs are treated
  as being different packages rather than cores in a single package
- moved AMD-specific code to topo_probe_amd [jkim]
- in topo_probe_0x4() follow Intel-prescribed procedure of deriving SMT
  and core masks and match APIC IDs against those masks [started by
  jkim]
- in topo_probe_0x4() drop code for double-checking topology parameters
  by looking at L1 cache properties [jkim]
- in topo_probe_0xb() add fallback path to topo_probe_0x4() as
  prescribed by Intel [jkim]

Still to do:
- prepare for upcoming AMD CPUs by using new mechanism of uniform
  topology description [pointed by jkim]
- probe cache topology in addition to CPU topology and probably use that
  for scheduler affinity topology; e.g. Core2 Duo and Athlon II X2 have
  the same CPU topology, but Athlon cores do not share L2 cache while
  Core2's do (no L3 cache in both cases)
- think of supporting non-uniform topologies if they are ever
  implemented for platforms in question
- think how to better described old HTT vs new HTT distinction, HTT vs
  SMT can be confusing as SMT is a generic term
- more robust code for marking CPUs as "logical" and/or "hyperthreaded",
  use HTT mask instead of modulo operation
- correct support for halting logical and/or hyperthreaded CPUs, let
  scheduler know that it shouldn't schedule any threads on those CPUs

PR:			kern/145385 (related)
In collaboration with:	jkim
Tested by:		Sergey Kandaurov <pluknet@gmail.com>,
			Jeremy Chadwick <freebsd@jdc.parodius.com>,
			Chip Camden <sterling@camdensoftware.com>,
			Steve Wills <steve@mouf.net>,
			Olivier Smedts <olivier@gid0.org>,
			Florian Smeets <flo@smeets.im>
MFC after:		1 month
2010-10-01 10:32:54 +00:00
Neel Natu
5c1a8dc028 Fix bogus error message from bus_dmamem_alloc() about incorrect alignment.
The check for alignment should be made against the physical address and not
the virtual address that maps it.

Sponsored by:	NetApp
Submitted by:	Will McGovern (will at netapp dot com)
Reviewed by:	mjacob, jhb
2010-09-29 21:53:11 +00:00
David Xu
295fbd498e Now userland POSIX semaphore is based on umtx. The kernel module
is only used to support binary compatible, if want to run old
binary, you need to kldload the module.
2010-09-24 09:04:16 +00:00
Norikatsu Shigemura
cbf4dac64f Add support 'device tpm' for amd64.
Add tpm(4)'s default setting to /boot/defaults/loader.conf.
Add 'device tpm' to NOTES for amd64 and i386.

Discussed with:	takawata
Approved by:	imp (mentor)
2010-09-19 14:40:37 +00:00
Andriy Gapon
0b750af1b1 amd64: reduce VM_KMEM_SIZE_SCALE to 1 allowing kernel to use more memory
KVA space is abundant on amd64, so there is no reason to limit kernel
map size to a fraction of available physical memory.  In fact, it could
be larger than physical memory.

This should help with memory auto-tuning for ZFS and shouldn't affect
other workloads.
This should reduce number of circumstances for "kmem_map too small"
panics, but probably won't eliminate them entirely due to potential kmem
fragmentation.

In fact, you might want/need to limit maximum ARC size after this commit
if you need to resrve more memory for applications.

This change was discussed on arch@ and nobody said "don't do it".

MFC after:	6 weeks
2010-09-17 07:36:32 +00:00
Alexander Motin
a157e42516 Refactor timer management code with priority to one-shot operation mode.
The main goal of this is to generate timer interrupts only when there is
some work to do. When CPU is busy interrupts are generating at full rate
of hz + stathz to fullfill scheduler and timekeeping requirements. But
when CPU is idle, only minimum set of interrupts (down to 8 interrupts per
second per CPU now), needed to handle scheduled callouts is executed.
This allows significantly increase idle CPU sleep time, increasing effect
of static power-saving technologies. Also it should reduce host CPU load
on virtualized systems, when guest system is idle.

There is set of tunables, also available as writable sysctls, allowing to
control wanted event timer subsystem behavior:
  kern.eventtimer.timer - allows to choose event timer hardware to use.
On x86 there is up to 4 different kinds of timers. Depending on whether
chosen timer is per-CPU, behavior of other options slightly differs.
  kern.eventtimer.periodic - allows to choose periodic and one-shot
operation mode. In periodic mode, current timer hardware taken as the only
source of time for time events. This mode is quite alike to previous kernel
behavior. One-shot mode instead uses currently selected time counter
hardware to schedule all needed events one by one and program timer to
generate interrupt exactly in specified time. Default value depends of
chosen timer capabilities, but one-shot mode is preferred, until other is
forced by user or hardware.
  kern.eventtimer.singlemul - in periodic mode specifies how much times
higher timer frequency should be, to not strictly alias hardclock() and
statclock() events. Default values are 2 and 4, but could be reduced to 1
if extra interrupts are unwanted.
  kern.eventtimer.idletick - makes each CPU to receive every timer interrupt
independently of whether they busy or not. By default this options is
disabled. If chosen timer is per-CPU and runs in periodic mode, this option
has no effect - all interrupts are generating.

As soon as this patch modifies cpu_idle() on some platforms, I have also
refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions
(if supported) under high sleep/wakeup rate, as fast alternative to other
methods. It allows SMP scheduler to wake up sleeping CPUs much faster
without using IPI, significantly increasing performance on some highly
task-switching loads.

Tested by:	many (on i386, amd64, sparc64 and powerc)
H/W donated by:	Gheorghe Ardelean
Sponsored by:	iXsystems, Inc.
2010-09-13 07:25:35 +00:00
Kenneth D. Merry
d3c7b9a08a MFp4 (//depot/projects/mps/...)
Bring in a driver for the LSI Logic MPT2 6Gb SAS controllers.

This driver supports basic I/O, and works with SAS and SATA drives and
expanders.

Basic error recovery works (i.e. timeouts and aborts) as well.

Integrated RAID isn't supported yet, and there are some known bugs.

So this isn't ready for production use, but is certainly ready for
testing and additional development.  For the moment, new commits to this
driver should go into the FreeBSD Perforce repository first
(//depot/projects/mps/...) and then get merged into -current once
they've been vetted.

This has only been added to the amd64 GENERIC, since that is the only
architecture I have tested this driver with.

Submitted by:	scottl
Discussed with:	imp, gibbs, will
Sponsored by:	Yahoo, Spectra Logic Corporation
2010-09-10 15:03:56 +00:00
Andriy Gapon
3d844eddb7 bus_add_child: change type of order parameter to u_int
This reflects actual type used to store and compare child device orders.
Change is mostly done via a Coccinelle (soon to be devel/coccinelle)
semantic patch.
Verified by LINT+modules kernel builds.

Followup to:	r212213
MFC after:	10 days
2010-09-10 11:19:03 +00:00
Roman Divacky
27d4fea6c5 Change the parameter passed to the inline assembly to u_short
as we are dealing with 16bit segment registers. Change mov
to movw.

Approved by:    rpaulo (mentor)
Reviewed by:    kib, rink
2010-09-03 14:25:17 +00:00
Jung-uk Kim
305c5c0acb Save MSR_FSBASE, MSR_GSBASE and MSR_KGSBASE directly to PCB as we do not use
these values in the function.
2010-08-30 21:19:42 +00:00
Rui Paulo
cba3269417 Register an interrupt vector for DTrace return probes. There is some
code missing in lapic to make sure that we don't overwrite this entry,
but this will be done on a sequent commit.

Sponsored by:	The FreeBSD Foundation
2010-08-28 08:03:29 +00:00
Rui Paulo
0bc1991a4a Call the necessary DTrace function pointers when we have different kinds
of traps.

Sponsored by:	The FreeBSD Foundation
2010-08-25 09:10:32 +00:00