Commit Graph

12117 Commits

Author SHA1 Message Date
Robert Watson
74b5505e5d Continue to introduce Capsicum capability mode:
White list sysarch calls allowed in capability mode; arguably, there
should be some link between the capability mode model and the privilege
model here.  Sysarch is a morass similar to ioctl, in many senses.

Submitted by:	anderson
Discussed with:	benl, kris, pjd
Sponsored by:	Google, Inc.
Obtained from:	Capsicum Project
MFC after:	3 months
2011-03-01 13:35:48 +00:00
John Baldwin
4cef260f42 Fix whitespace nit. 2011-02-22 14:58:14 +00:00
Rebecca Cran
6bccea7c2b Fix typos - remove duplicate "the".
PR:	bin/154928
Submitted by:	Eitan Adler <lists at eitanadler.com>
MFC after: 	3 days
2011-02-21 09:01:34 +00:00
Alan Cox
e6ffa21488 Remove pmap fields that are either unused or not fully implemented.
Discussed with:	kib
2011-02-17 15:36:29 +00:00
Dmitry Chagin
dc4f0a9e11 To avoid excessive code duplication create wrapper for fill regs
from stack frame. Change the trap() code to use newly created function
instead of explicit regs assignment.
2011-02-16 17:50:21 +00:00
Dmitry Chagin
09d6cb0a23 For realtime signals fill the sigval value. 2011-02-15 21:46:36 +00:00
Dmitry Chagin
fde6316272 Sort include files in the alphabetical order. 2011-02-13 19:07:48 +00:00
Dmitry Chagin
222198ab0b Move linux_clone(), linux_fork(), linux_vfork() to a MI path. 2011-02-12 18:17:12 +00:00
Dmitry Chagin
c8d6845e9e In preparation for moving linux_clone() to a MI path
introduce linux_set_upcall_kse().
2011-02-12 16:33:00 +00:00
Dmitry Chagin
2c7660ba3e In preparation for moving linux_clone () to a MI path
move the TLS code in a separate function.

Use function parameter instead of direct using register.
2011-02-12 15:50:21 +00:00
Dmitry Chagin
9bd9b52478 Regen for r218610. 2011-02-12 15:36:25 +00:00
Dmitry Chagin
f91ea2518b The fourth argument of linux_clone is a pointer to the TLS. Change clone syscall definition to match actual linux one. 2011-02-12 15:33:25 +00:00
Alan Cox
02e5228ca0 Setting VV_TEXT here is redundant. It is already set by do_execve().
Reviewed by:	kib
2011-02-09 18:45:33 +00:00
Konstantin Belousov
b17ef03604 Fix linking of the kernel without device npx.
MFC after:	2 weeks
2011-02-05 15:37:10 +00:00
Konstantin Belousov
6f9ec5aab0 Clear the padding when returning context to the usermode, for
MI ucontext_t and x86 MD parts.
Kernel allocates the structures on the stack, and not clearing
reserved fields and paddings causes leakage.

Noted and discussed with:	bde
MFC after:	2 weeks
2011-02-05 15:10:27 +00:00
Matthew D Fleming
08b163fa51 Put the general logic for being a CPU hog into a new function
should_yield().  Use this in various places.  Encapsulate the common
case of check-and-yield into a new function maybe_yield().

Change several checks for a magic number of iterations to use
should_yield() instead.

MFC after:	1 week
2011-02-02 16:35:10 +00:00
Dmitry Chagin
77192fddeb Regen for r218101.
MFC after:	1 Month.
2011-01-30 20:38:26 +00:00
Dmitry Chagin
8d73c2bfd1 Change linux futex syscall definition to match actual linux one.
MFC after:	1 Month.
2011-01-30 20:31:43 +00:00
Dmitry Chagin
9adaae9403 The kern_wait() code already removes the SIGCHLD signal for the waited
process. Removing other SIGCHLD signals is not needed and may cause
problems.

Pointed out by:	jilles

MFC after:	1 Month.
2011-01-30 18:17:38 +00:00
Dmitry Chagin
adc7ece00a Implement a variation of the linux_common_wait() which should
be used by linuxolator itself.

Move linux_wait4() to MD path as it requires native struct
rusage translation to struct l_rusage on linux32/amd64.

MFC after:	1 Month.
2011-01-28 18:47:07 +00:00
Dmitry Chagin
a5c1afadeb Add macro to test the sv_flags of any process. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures.

MFC after:	1 month
2011-01-26 20:03:58 +00:00
Matthew D Fleming
f89f7ada8d Set td_kstack_pages for thread0. This was already being done for most
architectures, but i386 and amd64 were missing it.

Submitted by:	Mohd Fahadullah <mfahadullah AT isilon DOT com>
2011-01-26 17:06:13 +00:00
Sergey Kandaurov
4053b05b91 Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize.
Submitted by:	perryh pluto.rain.com (previous version)
Reviewed by:	jhb
Approved by:	kib (mentor)
Tested by:	universe
2011-01-21 10:26:26 +00:00
Jung-uk Kim
fd240d6d9f Fix yet another fallout from r208833. VM86 BIOS call may cause page fault
when FPU is in use.

Reported by:	Marc UBM Bocklet (ubm dot freebsd at googlemail dot com)
Tested by:	b. f. (bf1783 at googlemail dot com)
MFC after:	3 days
2011-01-19 17:09:07 +00:00
Konstantin Belousov
55aabb7fd1 For architectures not using direct map , and requiring real KVA page for
sf buf allocation, use wakeup() instead of wakeup_one() to notify sf
buffer waiters about free buffer.

sf_buf_alloc() calls msleep(PCATCH) when SFB_CATCH flag was given,
and for simultaneous wakeup and signal delivery, msleep() returns
EINTR/ERESTART despite the thread was selected for wakeup_one(). As
result, we loose a wakeup, and some other waiter will not be woken up.

Reported and tested by:	az
Reviewed by:	alc, jhb
MFC after:	1 week
2011-01-18 21:57:02 +00:00
John Baldwin
6bd823f334 - Remove some always-true checks (checking for unsigned < 0).
- Only check largs->num against max_ldt_segment on amd64 for I386_SET_LDT
  when descriptors are provided.  Specifically, allow the 'start == 0'
  and 'num == 0' special case used to free all LDT entries that previously
  failed with EINVAL.

Submitted by:	clang via rdivacky (some of 1)
Reviewed by:	kib
2011-01-18 16:43:01 +00:00
Jung-uk Kim
2fea643112 Add reader/writer lock around mem_range_attr_get() and mem_range_attr_set().
Compile sys/dev/mem/memutil.c for all supported platforms and remove now
unnecessary dev_mem_md_init().  Consistently define mem_range_softc from
mem.c for all platforms.  Add missing #include guards for machine/memdev.h
and sys/memrange.h.  Clean up some nearby style(9) nits.

MFC after:	1 month
2011-01-17 22:58:28 +00:00
Jung-uk Kim
df74996c3d Avoid preemption while manipulating CRs and MTRRs.
Tested by:	ariff
2011-01-17 17:30:35 +00:00
John Baldwin
072e9838e2 If an interrupt on an I/O APIC is moved to a different CPU after it has
started to execute, it seems that the corresponding ISR bit in the "old"
local APIC can be cleared.  This causes the local APIC interrupt routine
to fail to find an interrupt to service.  Rather than panic'ing in this
case, simply return from the interrupt without sending an EOI to the
local APIC.  If there are any other pending interrupts in other ISR
registers, the local APIC will assert a new interrupt.

Tested by:	steve
2011-01-13 17:00:22 +00:00
Konstantin Belousov
50a57dfbec Move repeated MAXSLP definition from machine/vmparam.h to sys/vmmeter.h.
Update the outdated comments describing MAXSLP and the process
selection algorithm for swap out.

Comments wording and reviewed by:	alc
2011-01-09 12:50:44 +00:00
Tijl Coosemans
d22e78d6b9 Copy powerpc/include/_inttypes.h to x86 and replace i386/amd64/pc98
headers with stubs.

Approved by:	kib (mentor)
2011-01-08 18:09:48 +00:00
Tijl Coosemans
a56e818f29 On mixed 32/64 bit architectures (mips, powerpc) use __LP64__ rather than
architecture macros (__mips_n64, __powerpc64__) when 64 bit types (and
corresponding macros) are different from 32 bit. [1]

Correct the type of INT64_MIN, INT64_MAX and UINT64_MAX.

Define (U)INTMAX_C as an alias for (U)INT64_C matching the type definition
for (u)intmax_t. Do this on all architectures for consistency.

Suggested by:	bde [1]
Approved by:	kib (mentor)
2011-01-08 12:43:05 +00:00
Tijl Coosemans
d942996baf On 32 bit architectures define (u)int64_t as (unsigned) long long instead
of (unsigned) int __attribute__((__mode__(__DI__))). This aligns better
with macros such as (U)INT64_C, (U)INT64_MAX, etc. which assume (u)int64_t
has type (unsigned) long long.

The mode attribute was used because long long wasn't standardised until
C99. Nowadays compilers should support long long and use of the mode
attribute is discouraged according to GCC Internals documentation.

The type definition has to be marked with __extension__ to support
compilation with "-std=c89 -pedantic".

Discussed with:	bde
Approved by:	kib (mentor)
2011-01-08 11:47:55 +00:00
Tijl Coosemans
9858863cd4 Fix types of some values in machine/_limits.h.
On some architectures UCHAR_MAX and USHRT_MAX had type unsigned int.
However, lacking integer suffixes for types smaller than int, their type
should correspond to that of an object of type unsigned char (or short)
when used in an expression with objects of type int. In that case unsigned
char (short) are promoted to int (i.e. signed) so the type of UCHAR_MAX and
USHRT_MAX should also be int.

Where MIN/MAX constants implicitly have the correct type the suffix has
been removed.

While here, correct some comments.

Reviewed by:	bde
Approved by:	kib (mentor)
2011-01-08 11:13:34 +00:00
Tijl Coosemans
911127a0d6 Remove unused support for 64 bit long on 32 bit architectures.
It was used mainly to discover and fix some 64-bit portability problems
before 64-bit arches were widely available.

Discussed with:	bde
Approved by:	kib (mentor)
2011-01-07 22:57:31 +00:00
Konstantin Belousov
39198f15ee Add AT_STACKPROT elf aux vector. Will be used to inform rtld about the
initial stack protection set by the kernel image activator.
2011-01-07 14:22:34 +00:00
John Baldwin
c305730dc0 Remove bogus usage of INTR_FAST. "Fast" interrupts are now indicated by
registering a filter handler rather than a threaded handler.  Also remove
a bogus use of INTR_MPSAFE for a filter.
2011-01-06 21:08:06 +00:00
Colin Percival
b5e61aab00 Spell CRITICAL_ASSERT correctly.
Submitted by:	jhb
MFC with:	r216944
2011-01-04 16:29:07 +00:00
Colin Percival
aa829cef6a Add hamfisted locking to the Xen/PV pmap code: Only allow one thread to
be in {pmap_pinit, pmap_copy, pmap_release} at a time.

This reduces the rate of panics when running 'make index' from ~0.6/hour
to ~0.02/hour (p < 10^-30).

At a later date this locking will be removed, and for this reason, it is
wrapped in #ifdef HAMFISTED_LOCKING; this temporary hack is being put in
place with the intention of shipping somewhat-stable Xen bits in FreeBSD
8.2-RELEASE.

PR:		kern/153672
MFC after:	3 days
2011-01-04 15:55:15 +00:00
Robert Watson
2913e88c91 Make "options XENHVM" compile for i386, not just amd64 -- a largely
mechanical change.  This opens the door for using PV device drivers
under Xen HVM on i386, as well as more general harmonisation of i386
and amd64 Xen support in FreeBSD.

Reviewed by:    cperciva
MFC after:      3 weeks
2011-01-04 14:49:54 +00:00
Colin Percival
d01b2dad73 Adjust the critical section protecting _xen_flush_queue to cover the
entire range where the page mapping request queue needs to be atomically
examined and modified.

Oddly, while this doesn't seem to affect the overall rate of panics
(running 'make index' on EC2 t1.micro instances, there are 0.6 +/- 0.1
panics per hour, both before and after this change), it eliminates
vm_fault from panic backtraces, leaving only backtraces going through
vmspace_fork.
2011-01-04 00:16:38 +00:00
Colin Percival
aaaf607148 Make i386_set_ldt work on i386/XEN, step 5/5.
When cleaning up a thread, reset its LDT to the default LDT.

Note: Casting the LDT pointer to an int and storing it in pc_currentldt is
wildly bogus, but is harmless since pc_currentldt is a write-only variable.

MFC after:	3 days
2010-12-31 17:42:25 +00:00
Colin Percival
698cc19d6b Make i386_set_ldt work on i386/XEN, step 4/5.
Use xen_update_descriptor to update the LDT rather than bcopy.  Under Xen,
pages used for holding LDTs must be read-only, so we can't make the change
ourselves.

Ths obvious alternative of "remap the page read-write, make the change, then
map it read-only again" doesn't work since Xen won't allow an LDT page to be
remapped as R/W.  An arguably better solution is used by NetBSD: They don't
modify LDTs in-place at all, but instead copy the entire LDT, modify the new
version, then atomically swap.

MFC after:	3 days
2010-12-31 17:41:14 +00:00
Colin Percival
de187b8df2 Make i386_set_ldt work on i386/XEN, step 3/5.
Synchronize reality with comment: The user_ldt_alloc function is supposed to
return with dt_lock held.  Due to broken locking in i386/xen/pmap.c, we drop
dt_lock during the call to pmap_map_readonly and then pick it up again; this
can be removed once the Xen pmap locking is fixed.

MFC after:	3 days
2010-12-31 17:40:30 +00:00
Colin Percival
90b7d33458 Make i386_set_ldt work on i386/XEN, step 2/5.
Don't map physical to machine page numbers in pte_load_store, since it uses
PT_SET_VA (which takes a physical page number and converts it to a machine
page number).

MFC after:	3 days
2010-12-31 17:39:58 +00:00
Colin Percival
d262f2dcfc Make i386_set_ldt work on i386/XEN, step 1/5.
Lock the vm page queue mutex around calls to pte_store.  As with many other
uses of the vm page queue mutex in i386/xen/pmap.c, this is bogus and needs
to be replaced at some future date by a spin lock dedicated to protecting
the queue of pending xen page mapping hypervisor calls.  (But for now, bogus
locking is better than a panic.)

MFC after:	3 days
2010-12-31 17:39:31 +00:00
Pyun YongHyeon
2608aefc0b Add driver for DM&P Vortex86 RDC R6040 Fast Ethernet.
The controller is commonly found on DM&P Vortex86 x86 SoC.  The
driver supports all hardware features except flow control.  The
flow control was intentionally disabled due to silicon bug.

DM&P Electronics, Inc. provided all necessary information including
sample board to write driver and answered many questions I had.
Many thanks for their support of FreeBSD.

H/W donated by:	DM&P Electronics, Inc.
2010-12-31 00:21:41 +00:00
Warner Losh
714cf6c0df Revert r216777, per jhb@ 2010-12-28 22:45:29 +00:00
Warner Losh
1977f3f168 Comment out npx and isa from NOTES file. We don't need them here
since DEFAULTS already pulls them in.
2010-12-28 21:22:08 +00:00
Warner Losh
78b92d19e0 Remove mem, io, isa and npx since they are duplicative of the entries
in DEFAULTS.  Saves 8 lines of warnings when we build XBOX.
2010-12-28 21:20:58 +00:00
Colin Percival
4a416f8375 Remove a "not strictly correct" (and panic-inducing) workaround for a bug
which doesn't seem to exist.

PR:		kern/141328
MFC after:	3 days
2010-12-28 14:36:32 +00:00
Colin Percival
76c9650713 Build the modules which can be built. The excluded modules fall into two
categories: Those which can't build with PAE because they attempt to cast
a pointer to a bus_addr_t (mostly scsi drivers); and those which can't be
built with XEN because they conflict with something in xen-os.h (e.g., in
cxgb there is a conflicting definition of test_and_clear_bit).

MFC after:	1 week
2010-12-27 23:59:27 +00:00
Colin Percival
8ea0b3bb2f Lock the vm page queue mutex in pmap_pte_release around the call
to PMAP_SET_VA; this fixes a mutex-not-held panic when a process
which called mlock(2) exits, and parallels a change made in
pmap_pte 10 months ago (svn r204160).

Note: The locking in this code is utterly broken.  We should not
be using the VM page queue mutex to protect the queue of pending
Xen page mapping hypervisor calls.  Even if it made sense to do
so, this commit and r204160 introduce LORs between the vm page
queue mutex and PMAP2mutex.

(However, a possible deadlock is better than a guaranteed panic,
and this change will hopefully make life easier for whoever fixes
the Xen pmap locking in the future.)

PR:		kern/140313
MFC after:	3 days
2010-12-26 13:05:43 +00:00
Tijl Coosemans
81bd5041a2 Merge amd64 and i386 bus.h and move the resulting header to x86. Replace
the original amd64 and i386 headers with stubs.

Rename (AMD64|I386)_BUS_SPACE_* to X86_BUS_SPACE_* everywhere.

Reviewed by:	imp (previous version), jhb
Approved by:	kib (mentor)
2010-12-20 16:39:43 +00:00
Alan Cox
9d555e459c Redo some parts of r216333, specifically, the locking changes to
pmap_extract_and_hold(), and undo the rest.  In particular, I forgot
that PG_PS and PG_PTE_PAT are the same bit.
2010-12-19 07:31:56 +00:00
Konstantin Belousov
7222d2fbee Inform a compiler which asm statements in the x86 implementation of
atomics change eflags.

Reviewed by:	jhb
MFC after:	2 weeks
2010-12-18 16:41:11 +00:00
Konstantin Belousov
a9b31c256e In pmap_extract(), unlock pmap lock earlier. The calculation does not need
the lock when operating on local variables.

Reviewed by:	alc
2010-12-18 11:31:32 +00:00
Jung-uk Kim
e1c9d39ebe Stop lying about supporting cpu_est_clockrate() when TSC is invariant. This
function always returned the nominal frequency instead of current frequency
because we use RDTSC instruction to calculate difference in CPU ticks, which
is supposedly constant for the case.  Now we support cpu_get_nominal_mhz()
for the case, instead.  Note it should be just enough for most usage cases
because cpu_est_clockrate() is often times abused to find maximum frequency
of the processor.
2010-12-14 20:07:51 +00:00
Konstantin Belousov
60c7c84e85 In fpudna()/npxdna(), mark FPU context initialized and optionally
mark user FPU context initialized, if current context is user context.
It was reversed in r215865, by inadequate change of this code fragment
to a call to fpuuserinited()/npxuserinited().

The issue is only relevant for in-kernel users of FPU.

Reported by:	Jan Henrik Sylvester <me janh de>, Mike Tancsa <mike sentex net>
Tested by:	Mike Tancsa
MFC after:	3 days
2010-12-12 16:16:39 +00:00
Colin Percival
20d1a304b3 Reduce the Xen timecounter from 1GHz to 2^-9 GHz, thereby increasing the
timecounter period from 2^32 ns (~4.3s) to 2^41 ns (~36m39s).  Some time
sharing systems can skip clock interrupts for a few seconds when under
load (e.g., if we've recently used more than our fair share of CPU and
someone else wants a burst of CPU) and we were losing time in quanta of
2^32 ns due to timecounter wrapping.

Increasing the timecounter period up to 2^41 ns is definitely overkill,
but we still have microsecond timecounter precision, and anyone using
paravirtualized hardware when they need submicrosecond timing is crazy.
2010-12-11 22:33:33 +00:00
Colin Percival
0f30ed5bc6 Make the machdep.independent_wallclock sysctl do what it says on the box. 2010-12-11 20:12:42 +00:00
Alan Cox
d1cf854b5d When r207410 eliminated the acquisition and release of the page queues
lock from pmap_extract_and_hold(), it didn't take into account that
pmap_pte_quick() sometimes requires the page queues lock to be held.
This change reimplements pmap_extract_and_hold() such that it no
longer uses pmap_pte_quick(), and thus never requires the page queues
lock.

For consistency, adopt the same idiom as used by the new
implementation of pmap_extract_and_hold() in pmap_extract() and
pmap_mincore().  It also happens to make these functions shorter.

Fix a style error in pmap_pte().

Reviewed by:	kib@
2010-12-09 20:16:00 +00:00
Colin Percival
91ff9dc058 Replace i386/i386/busdma_machdep.c and amd64/amd64/busdma_machdep.c
(which are identical) with a single x86/x86/busdma_machdep.c.
2010-12-09 06:41:50 +00:00
Jung-uk Kim
71e0b05797 Do not subtract 0.5% from estimated frequency if DELAY(9) is driven by TSC.
Remove a confusing comment about converting to MHz as we never did.
2010-12-08 23:40:41 +00:00
Colin Percival
af60888734 On amd64, we have (since r1.72, in December 2005) MAX_BPAGES=8192,
while on i386 we have MAX_BPAGES=512.  Implement this difference via
'#ifdef __i386__'.

With this commit, the i386 and amd64 busdma_machdep.c files become
identical; they will soon be replaced by a single file under sys/x86.
2010-12-08 20:20:10 +00:00
Jung-uk Kim
dd7d207dcb Merge sys/amd64/amd64/tsc.c and sys/i386/i386/tsc.c and move to sys/x86/x86.
Discussed with:	avg
2010-12-08 00:09:24 +00:00
Jung-uk Kim
61d14101dd Use int for 'tsc_present' instead of u_int. It is just a boolean. 2010-12-07 23:19:49 +00:00
Jung-uk Kim
7214d5d75b Remove stale comments about P-state invariant TSC and fix style(9) nits. 2010-12-07 22:43:25 +00:00
Jung-uk Kim
1bcc28295b Do not register a event handler for CPU freqency changes when it is found
P-state invariant.  This is continuation of r216274.
2010-12-07 22:34:51 +00:00
Jung-uk Kim
4a9c4056dc Now the P-state invariant TSC is probed early enough, do not register event
handlers for CPU freqency changes when it is found P-state invariant.
Adjust a comment about non-existent tsc_freq_max() while I am here.
2010-12-07 22:23:26 +00:00
Jung-uk Kim
78a661bbaa Probe P-state invariant TSC from rightful place. 2010-12-07 22:12:02 +00:00
Colin Percival
716d203d6b MFamd64 r204214: Enforce stronger alignment semantics (require that the
end of segments be aligned, not just the start of segments) in order to
allow Xen's blkfront driver to operate correctly.

PR:		kern/152818
MFC after:	3 days
2010-12-05 03:20:55 +00:00
Colin Percival
a39dc31fca Remove gratuitous i386/amd64 inconsistency in favour of the less verbose
version of declaring a variable initialized to zero.
2010-12-04 23:36:40 +00:00
Colin Percival
5c5590862f Remove unnecessary #includes which seem to have been accidentally added
as part of CVS r1.76 (in January 2006).
2010-12-04 23:24:35 +00:00
Jung-uk Kim
2f7ab7e85d Revert r216161. It is not necessary because we zero-fill BSS anyway.
Requested by:	jhb
2010-12-03 22:27:51 +00:00
Jung-uk Kim
b14fe63392 Explicitly initialize TSC frequency. To calibrate TSC frequency, we use
DELAY(9) and it may use TSC in turn if TSC frequency is non-zero.

MFC after:	3 days
2010-12-03 21:54:10 +00:00
Jung-uk Kim
e391a266ed Do not change CPU ticker frequency if TSC is P-state invariant. Note this
change was meant to be committed with r184102 (and its subsequent MFCs) but
it fell off somehow.

Pointyhat to:	jkim
MFC after:	3 days
2010-12-03 21:06:30 +00:00
Rebecca Cran
c90f7d9b44 Revert r216134. This checkin broke platforms where bus_space are macros:
they need to be a single statement, and do { } while (0) doesn't work in this
situation so revert until a solution can be devised.
2010-12-03 07:09:23 +00:00
Rebecca Cran
15b4888a24 Disallow passing in a count of zero bytes to the bus_space(9) functions.
Passing a count of zero on i386 and amd64 for [I386|AMD64]_BUS_SPACE_MEM
causes a crash/hang since the 'loop' instruction decrements the counter
before checking if it's zero.

PR:	kern/80980
Discussed with:	jhb
2010-12-02 22:19:30 +00:00
Colin Percival
d42446149f Fix bug introduced by r194784: Under XEN, the page(s) allocated to dpcpu
for CPU #0 weren't being properly reserved.  Under VM pressure this would
cause problems when the dpcpu structures were overwritten by arbitrary
data; the most common symptom was a panic when netisr attempted to lock a
mutex.

For some reason the XEN code keeps track of the start of available memory
in the variables 'first', 'physfree', and 'init_first'; as far as I can
tell, we always have first == physfree == init_first * PAGE_SIZE.  The
earlier commit adjusted 'first' (which, on !XEN, is the only variable
which tracks this value) but not the other two variables.

Exercise for reader: Eliminate two of these three variables.
2010-11-29 06:50:30 +00:00
Konstantin Belousov
c6fb218c3c Calling fill_fpregs() for curthread is legitimate, and ELF coredump
does this.

Reported and tested by:	pho
MFC after:	5 days
2010-11-28 17:56:34 +00:00
Konstantin Belousov
5c6eb03790 Remove npxgetregs(), npxsetregs(), fpugetregs() and fpusetregs()
functions, they are unused. Remove 'user' from npxgetuserregs()
etc. names.

For {npx,fpu}{get,set}regs(), always use pcb->pcb_user_save for FPU
context storage. This eliminates the need for ugly copying with
overwrite of the newly added and reserved fields in ucontext on i386
to satisfy alignment requirements for fpusave() and fpurstor().

pc98 version was copied from i386.

Suggested and reviewed by:	bde
Tested by:    pho (i386 and amd64)
MFC after:    1 week
2010-11-26 14:50:42 +00:00
Tijl Coosemans
ce4ec51dbe Merge amd64/i386 _align.h by aligning on the size of register_t (copied
from powerpc).

Reviewed by:	imp, jhb
Approved by:	kib (mentor)
2010-11-26 10:59:20 +00:00
Ulrich Spörlein
02604cd4f4 Remove kernel support for BB profiling, now that kernbb(8) is gone, too.
PR:		bin/83558
Reviewed by:	jkim
2010-11-26 08:11:43 +00:00
Colin Percival
5c0ab2fa8b Revert r215819 and fix the bug properly. In pmap_qremove, paging table
updates were being queued by pmap_kremove, but the queue wasn't being
flushed; as a result, the updates didn't happen until *after* the call
to pmap_invalidate_range, and old entries could stick around in the TLB.
Adding a PT_UPDATES_FLUSH() call immediately before pmap_invalidate_range
ensures that after the invalidation the TLB will be repopulated with the
correct new entries.

Thanks to:	kib, avg, alc
2010-11-25 22:06:07 +00:00
Dimitry Andric
079d7e43ca Use unambiguous inline assembly to load a float variable. GNU as
silently converts 'fld' to 'flds', without taking the actual variable
type into account (!), but clang's integrated assembler rightfully
complains about it.

Discussed with:	cperciva
2010-11-25 18:14:18 +00:00
John Baldwin
9d76324839 Add device IDs for two more ServerWorks Host-PCI bridges so that we can
read their starting PCI bus number for older systems that do not support
ACPI (or have a broken _BBN method).

PR:		kern/148108
MFC after:	1 week
2010-11-25 15:42:33 +00:00
Colin Percival
98702b3990 Work around paging bug. Somehow we seem to be ending up with entries in
the TLB which don't correspond to ptes with PG_V set; prior to this commit
I'm sometimes getting the wrong data when pages are loaded into the buffer
cache (they're being loaded, but the missing TLB invalidation is causing
the wrong data to be visible).
2010-11-25 15:41:34 +00:00
Colin Percival
1a3b2b87de Rename HYPERVISOR_multicall (which performs the multicall hypercall) to
_HYPERVISOR_multicall, and create a new HYPERVISOR_multicall function which
invokes _HYPERVISOR_multicall and checks that the individual hypercalls all
succeeded.
2010-11-25 15:05:21 +00:00
Colin Percival
0bd7a92067 Remove vestigal debugging code which, in fork-heavy workloads, can cause
a 30x slowdown.
2010-11-25 04:45:31 +00:00
Jung-uk Kim
d2d0fda841 Remove a stale tunable introduced in r215703. 2010-11-23 17:28:23 +00:00
Andriy Gapon
9b984feb3d specialreg.h: add definitions for some useful bits found in CPUID.6 EAX and ECX
CPUID.6 is defined as Thermal and Power Management Leaf by both Intel
and AMD.

Reviewed by:	jhb
MFC after:	7 days
2010-11-23 13:55:30 +00:00
Jung-uk Kim
7dd052c1d9 - Disable caches and flush caches/TLBs when we update PAT as we do for MTRR.
Flushing TLBs is required to ensure cache coherency according to the AMD64
architecture manual.  Flushing caches is only required when changing from a
cacheable memory type (WB, WP, or WT) to an uncacheable type (WC, UC, or
UC-).  Since this function is only used once per processor during startup,
there is no need to take any shortcuts.
- Leave PAT indices 0-3 at the default of WB, WT, UC-, and UC.  Program 5 as
WP (from default WT) and 6 as WC (from default UC-).  Leave 4 and 7 at the
default of WB and UC.  This is to avoid transition from a cacheable memory
type to an uncacheable type to minimize possible cache incoherency.  Since
we perform flushing caches and TLBs now, this change may not be necessary
any more but we do not want to take any chances.
- Remove Apple hardware specific quirks.  With the above changes, it seems
this hack is no longer needed.
- Improve pmap_cache_bits() with an array to map PAT memory type to index.
This array is initialized early from pmap_init_pat(), so that we do not need
to handle special cases in the function any more.  Now this function is
identical on both amd64 and i386.

Reviewed by:	jhb
Tested by:	RM (reuf_m at hotmail dot com)
		Ryszard Czekaj (rychoo at freeshell dot net)
		army.of.root (army dot of dot root at googlemail dot com)
MFC after:	3 days
2010-11-22 19:52:44 +00:00
Colin Percival
c3f128981e In xen_get_timecount, return the full ns-precision time rather than
rounding to 1/HZ precision.

I have no idea why the rounding was introduced in the first place, but
it makes FreeBSD unhappy.
2010-11-22 09:04:29 +00:00
Colin Percival
61381fcf2d Unifdef XEN. This file is only compiled with the XEN kernel option set,
and the !XEN bits get in the way of understanding the code.
2010-11-20 21:36:12 +00:00
Colin Percival
8cdabbaf32 Add VTOM(va) macro as xpmap_ptom(VTOP(va)) to convert to machine addresses.
Clean up the code by converting xpmap_ptom(VTOP(...)) to VTOM(...) and
converting xpmap_ptom(VM_PAGE_TO_PHYS(...)) to VM_PAGE_TO_MACH(...).  In
a few places we take advantage of the fact that xpmap_ptom can commute with
setting PG_* flags.

This commit should have no net effect save to improve the readability of
this code.
2010-11-20 20:04:29 +00:00
Colin Percival
ad520892d7 Make pmap_release consistent with pmap_pinit with respect to unpinning
pages.  The pinning of NPGPTD pages is #if 0ed out in pmap_pinit (I'm
not quite sure why...) and this commit adds a corresponding #if 0 in
pmap_release to avoid unpinning those pages.

Some versions of Xen seem to silently ignore requests to unpin pages
which were never pinned in the first place, but some return an error
(causing FreeBSD to panic) prior to this commit.
2010-11-19 15:12:19 +00:00
Andriy Gapon
b43d292565 specialreg.h: add definitions for MPERF/APERF pair of MSRs
These MSRs can be used to determine actual (average) performance as
compared to a maximum defined performance.
Availability of these MSRs is indicated by bit0 in CPUID.6.ECX on both
Intel and AMD processors.

MFC after:	5 days
2010-11-19 15:07:36 +00:00
Andriy Gapon
7af7c7624a specialreg.h: add AMD-specific "Hardware Configuration Register" MSR
It seems that this MSR has been available in a range of AMD processors
families for quite a while now.

Note1: not all AMD MSRs that are found in amd64 specialreg.h are also in
the i386 version.
Note2: perhaps some additional name component is needed to distinguish
AMD-specific MSRs.

MFC after:	5 days
2010-11-19 15:00:20 +00:00
Andriy Gapon
8fd6d51347 specialreg.h: add definition for AMD Core Performance Boost bit
This bit indicates availability of the feature.

MFC after:	4 days
2010-11-19 14:46:17 +00:00
Colin Percival
f4c884f95a Make pmap_release match pmap_pinit by invoking pmap_qremove(pmap->pm_pdpt)
to match pmap_pinit's pmap_qenter(pmap->pm_pdpt) call in the case of PAE.
2010-11-18 21:29:43 +00:00
Colin Percival
f86f965ef8 Don't KASSERT in pmap_release that
xpmap_ptom(VM_PAGE_TO_PHYS(m)) == (pmap->pm_pdpt[i] & PG_FRAME)
for i = NPGPTD, since pmap->pm_pdpt[i] is only initialized for
0 <= i < NPGPTD.

This fixes an inevitable panic with XEN && PAE && INVARIANTS when
pmap_release is called (e.g., when /sbin/init is launched).
2010-11-18 21:02:40 +00:00
Jung-uk Kim
816b3bd1b0 Restore CR0 after MTRR initialization for correctness sakes. There will be
no noticeable change because we enable caches before we enter here for both
BSP and AP cases.  Remove another pointless optimization for CR4.PGE bit
while I am here.
2010-11-16 23:26:02 +00:00
Jung-uk Kim
50083a5624 Invalidate TLBs explicitly. r1.4 of sys/i386/i386/i686_mem.c removed this
code but probably it only worked by chance because modifying CR4.PGE bit
causes invlidation of entire TLBs.  Since these are very rare events, this
micro-optimization seems useless.

Reviewed by:	jhb
2010-11-16 22:44:58 +00:00
Konstantin Belousov
7022f954c3 Do not use __FreeBSD_version prefix for the special osrel version.
The ports/Mk/bsd.port.mk uses sys/param.h to fetch osrel, and cannot
grok several constants with the prefix.

Reported and tested by:	    swell.k gmail com
MFC after:   1 week
2010-11-14 21:59:11 +00:00
Konstantin Belousov
94bce4535d Use symbolic names instead of hardcoding values for magic p_osrel constants.
MFC after:   1 week
2010-11-14 18:24:12 +00:00
Jung-uk Kim
a3c464fb3c MFamd64: (based on) r209957
Move logic of building ACPI headers for acpi_wakeup.c into better places,
remove intermediate makefile and shell script, and reduce diff between i386
and amd64.
2010-11-12 20:55:14 +00:00
Jung-uk Kim
19da400c64 Move identical copies of apm_bios.h to sys/x86/include, replace them with
stubs, and adjust PC98 stub accordingly.

Reviewed by:	imp, nyan
2010-11-11 19:36:21 +00:00
Jung-uk Kim
926ad40ff9 Add compat shim for apm(4) to translate APM BIOS function numbers from i386
to PC98-specific ones.  Any binaries using apm ioctl(4) commands but built
for i386 should also work on PC98 now.

Reviewed by:	imp, nyan
2010-11-11 19:20:33 +00:00
Jung-uk Kim
93a8847473 Make APM emulation look more closer to its origin. Use device_get_softc(9)
instead of hardcoding acpi(4) unit number as we have device_t for it.
2010-11-10 18:50:12 +00:00
Jung-uk Kim
7c2bf852d7 Refactor acpi_machdep.c for amd64 and i386, move APM emulation into a new
file acpi_apm.c, and place it on sys/x86/acpica.
2010-11-10 01:29:56 +00:00
John Baldwin
961135ead8 - Remove <machine/mutex.h>. Most of the headers were empty, and the
contents of the ones that were not empty were stale and unused.
- Now that <machine/mutex.h> no longer exists, there is no need to allow it
  to override various helper macros in <sys/mutex.h>.
- Rename various helper macros for low-level operations on mutexes to live
  in the _mtx_* or __mtx_* namespaces.  While here, change the names to more
  closely match the real API functions they are backing.
- Drop support for including <sys/mutex.h> in assembly source files.

Suggested by:	bde (1, 2)
2010-11-09 20:46:41 +00:00
Attilio Rao
fcb250f392 Move the mptable.h under x86/include/.
Sponsored by:	Sandvine Incorporated
MFC after:	14 days
2010-11-09 20:28:09 +00:00
Jung-uk Kim
cedd86cafa Now OsdEnvironment.c is identical on amd64 and i386. Move it to a new home. 2010-11-09 00:27:18 +00:00
Jung-uk Kim
2473325fa8 Reduce diff between platforms and fix style(9) bugs. 2010-11-09 00:14:39 +00:00
John Baldwin
13e25cb7a5 Move the MADT parser for amd64 and i386 to sys/x86/acpica now that it is
identical on both platforms.
2010-11-08 20:57:02 +00:00
John Baldwin
c5b0b5fc6b Sync the APIC startup sequence with amd64:
- Register APIC enumerators at SI_SUB_TUNABLES - 1 instead of SI_SUB_CPU - 1.
- Probe CPUs at SI_SUB_TUNABLES - 1.  This allows i386 to set a truly
  accurate mp_maxid value rather than always setting it to MAXCPU - 1.
2010-11-08 20:35:09 +00:00
John Baldwin
6c3c2b8b87 Remove stub symbols for APIC-related functions when 'device apic' is not
included in a kernel config.  These stubs had existed previously so that
acpi.ko could always include the MADT parsing code and still link with a
kernel that did not include 'device apic'.
2010-11-08 20:32:35 +00:00
John Baldwin
f67b4bd367 A few small style and whitespace fixes. 2010-11-08 20:05:22 +00:00
Alan Cox
228a253795 Eliminate a possible race between pmap_pinit() and pmap_kenter_pde() on
superpage promotion or demotion.

Micro-optimize pmap_kenter_pde().

Reviewed by:	kib, jhb (an earlier version)
MFC after:	1 week
2010-11-07 18:42:37 +00:00
John Baldwin
0108cce0a4 Adjust the order of operations in spinlock_enter() and spinlock_exit() to
work properly with single-stepping in a kernel debugger.  Specifically,
these routines have always disabled interrupts before increasing the nesting
count and restored the prior state of interrupts after decreasing the nesting
count to avoid problems with a nested interrupt not disabling interrupts
when acquiring a spin lock.  However, trap interrupts for single-stepping
can still occur even when interrupts are disabled.  Now the saved state of
interrupts is not saved in the thread until after interrupts have been
disabled and the nesting count has been increased.  Similarly, the saved
state from the thread cannot be read once the nesting count has been
decreased to zero.  To fix this, use temporary variables to store interrupt
state and shuffle it between the thread's MD area and the appropriate
registers.

In cooperation with:	bde
MFC after:     1 month
2010-11-05 13:42:58 +00:00
Andriy Gapon
3b50d59fef x86 topo_probe: do not probe smp topology if only one cpu is visible
This could lead to a division by zero if hardware is multi-core and/or
multi-threaded, but for some (quite unusual) reason FreeBSD sees only
one logical processor.  This could happen, for example, if neither MADT
nor MP Table are presented by BIOS.

Also:
- assert in topo_probe_0x4 that BSP is accounted for
- neither cpu_cores nor cpu_logical should be zero after successful
  probing, so either being zero is an indication of failed probing

Reported by:	vwe, Dan Allen <danallen46@airwired.net>
Tested by:	Dan Allen <danallen46@airwired.net>
MFC after:	3 days
2010-11-04 08:51:45 +00:00
John Baldwin
239da85bbc Further tweaks to the ram_attach() routine:
- Use > 2^32 - 1 instead of >= when checking for memory regions above 4G.
- Skip SMAP entries > 4G on i386 rather than breaking out of the loop
  since SMAP entries are not guaranteed to be in order.
- Remove 'i' and loop over 'rid' directly in the dump_avail[] case.
- Only check for 4G regions in the dump_avail[] case on i386 if PAE is
  enabled since vm_paddr_t is 32-bit in the !PAE case.

Submitted by:	alc
2010-11-02 17:56:16 +00:00
John Baldwin
32c3d3b6e6 Move <machine/apicreg.h> to <x86/apicreg.h>. 2010-11-01 18:18:46 +00:00
John Baldwin
5ecdb3c46b Move the <machine/mca.h> header to <x86/mca.h>. 2010-11-01 17:40:35 +00:00
Attilio Rao
ba2a27351b Merge nexus.c from amd64 and i386 to x86 subtree.
Sponsored by:	Sandvine Incorporated
Tested by:	gianni
2010-10-28 16:31:39 +00:00
John Baldwin
89d84a4055 Use 'PCPU_GET(apic_id)' to determine the BSP's APIC ID on a UP machine
when routing interrupts instead of cpu_apic_ids[0] since cpu_apic_ids[]
is only populated for multiple-CPU machines.  This also matches what the
code does when SMP is not enabled.

PR:		bin/151616
Tested by:	"Damian S. Kolodziejczyk"  damkol | gmail
Submitted by:	avg
MFC after:	1 week
2010-10-28 13:44:19 +00:00
Attilio Rao
a3da97926d Merge the mptable support from MD bits to x86 subtree.
Sponsored by:	Sandvine Incorporated
Discussed with:	jhb
2010-10-28 07:58:06 +00:00
Attilio Rao
256439c972 Merge dump_machdep.c i386/amd64 under the x86 subtree.
Sponsored by:	Sandvine Incorporated
Tested by:	gianni
2010-10-26 12:46:26 +00:00
John Baldwin
0689bdcc19 Use 'saveintr' instead of 'savecrit' or 'eflags' to hold the state returned
by intr_disable().

Requested by:	bde
2010-10-25 15:31:13 +00:00
John Baldwin
c6390f7ac5 Use intr_disable() and intr_restore() instead of frobbing the flags register
directly to disable interrupts.

Reviewed by:	bde (earlier version)
MFC after:	2 weeks
2010-10-25 15:28:03 +00:00
Justin T. Gibbs
ff662b5c98 Improve the Xen para-virtualized device infrastructure of FreeBSD:
o Add support for backend devices (e.g. blkback)
 o Implement extensions to the Xen para-virtualized block API to allow
   for larger and more outstanding I/Os.
 o Import a completely rewritten block back driver with support for fronting
   I/O to both raw devices and files.
 o General cleanup and documentation of the XenBus and XenStore support code.
 o Robustness and performance updates for the block front driver.
 o Fixes to the netfront driver.

Sponsored by: Spectra Logic Corporation

sys/xen/xenbus/init.txt:
	Deleted: This file explains the Linux method for XenBus device
	enumeration and thus does not apply to FreeBSD's NewBus approach.

sys/xen/xenbus/xenbus_probe_backend.c:
	Deleted: Linux version of backend XenBus service routines.  It
	was never ported to FreeBSD.  See xenbusb.c, xenbusb_if.m,
	xenbusb_front.c xenbusb_back.c for details of FreeBSD's XenBus
	support.

sys/xen/xenbus/xenbusvar.h:
sys/xen/xenbus/xenbus_xs.c:
sys/xen/xenbus/xenbus_comms.c:
sys/xen/xenbus/xenbus_comms.h:
sys/xen/xenstore/xenstorevar.h:
sys/xen/xenstore/xenstore.c:
	Split XenStore into its own tree.  XenBus is a software layer built
	on top of XenStore.  The old arrangement and the naming of some
	structures and functions blurred these lines making it difficult to
	discern what services are provided by which layer and at what times
	these services are available (e.g. during system startup and shutdown).

sys/xen/xenbus/xenbus_client.c:
sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbus_probe.c:
sys/xen/xenbus/xenbusb.c:
sys/xen/xenbus/xenbusb.h:
	Split up XenBus code into methods available for use by client
	drivers (xenbus.c) and code used by the XenBus "bus code" to
	enumerate, attach, detach, and service bus drivers.

sys/xen/reboot.c:
sys/dev/xen/control/control.c:
	Add a XenBus front driver for handling shutdown, reboot, suspend, and
	resume events published in the XenStore.  Move all PV suspend/reboot
	support from reboot.c into this driver.

sys/xen/blkif.h:
	New file from Xen vendor with macros and structures used by
	a block back driver to service requests from a VM running a
	different ABI (e.g. amd64 back with i386 front).

sys/conf/files:
	Adjust kernel build spec for new XenBus/XenStore layout and added
	Xen functionality.

sys/dev/xen/balloon/balloon.c:
sys/dev/xen/netfront/netfront.c:
sys/dev/xen/blkfront/blkfront.c:
sys/xen/xenbus/...
sys/xen/xenstore/...
	o Rename XenStore APIs and structures from xenbus_* to xs_*.
	o Adjust to use of M_XENBUS and M_XENSTORE malloc types for allocation
	  of objects returned by these APIs.
	o Adjust for changes in the bus interface for Xen drivers.

sys/xen/xenbus/...
sys/xen/xenstore/...
	Add Doxygen comments for these interfaces and the code that
	implements them.

sys/dev/xen/blkback/blkback.c:
	o Rewrite the Block Back driver to attach properly via newbus,
	  operate correctly in both PV and HVM mode regardless of domain
	  (e.g. can be in a DOM other than 0), and to deal with the latest
	  metadata available in XenStore for block devices.

	o Allow users to specify a file as a backend to blkback, in addition
	  to character devices.  Use the namei lookup of the backend path
	  to automatically configure, based on file type, the appropriate
	  backend method.

	The current implementation is limited to a single outstanding I/O
	at a time to file backed storage.

sys/dev/xen/blkback/blkback.c:
sys/xen/interface/io/blkif.h:
sys/xen/blkif.h:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkfront/block.h:
	Extend the Xen blkif API: Negotiable request size and number of
	requests.

	This change extends the information recorded in the XenStore
	allowing block front/back devices to negotiate for optimal I/O
	parameters.  This has been achieved without sacrificing backward
	compatibility with drivers that are unaware of these protocol
	enhancements.  The extensions center around the connection protocol
	which now includes these additions:

	o The back-end device publishes its maximum supported values for,
	  request I/O size, the number of page segments that can be
	  associated with a request, the maximum number of requests that
	  can be concurrently active, and the maximum number of pages that
	  can be in the shared request ring.  These values are published
	  before the back-end enters the XenbusStateInitWait state.

	o The front-end waits for the back-end to enter either the InitWait
	  or Initialize state.  At this point, the front end limits it's
	  own capabilities to the lesser of the values it finds published
	  by the backend, it's own maximums, or, should any back-end data
	  be missing in the store, the values supported by the original
	  protocol.  It then initializes it's internal data structures
	  including allocation of the shared ring, publishes its maximum
	  capabilities to the XenStore and transitions to the Initialized
	  state.

	o The back-end waits for the front-end to enter the Initalized
	  state.  At this point, the back end limits it's own capabilities
	  to the lesser of the values it finds published by the frontend,
	  it's own maximums, or, should any front-end data be missing in
	  the store, the values supported by the original protocol.  It
	  then initializes it's internal data structures, attaches to the
	  shared ring and transitions to the Connected state.

	o The front-end waits for the back-end to enter the Connnected
	  state, transitions itself to the connected state, and can
	  commence I/O.

	Although an updated front-end driver must be aware of the back-end's
	InitWait state, the back-end has been coded such that it can
	tolerate a front-end that skips this step and transitions directly
	to the Initialized state without waiting for the back-end.

sys/xen/interface/io/blkif.h:
	o Increase BLKIF_MAX_SEGMENTS_PER_REQUEST to 255.  This is
	  the maximum number possible without changing the blkif
	  request header structure (nr_segs is a uint8_t).

	o Add two new constants:
	  BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK, and
	  BLKIF_MAX_SEGMENTS_PER_SEGMENT_BLOCK.  These respectively
	  indicate the number of segments that can fit in the first
	  ring-buffer entry of a request, and for each subsequent
	  (sg element only) ring-buffer entry associated with the
          "header" ring-buffer entry of the request.

	o Add the blkif_request_segment_t typedef for segment
	  elements.

	o Add the BLKRING_GET_SG_REQUEST() macro which wraps the
	  RING_GET_REQUEST() macro and returns a properly cast
	  pointer to an array of blkif_request_segment_ts.

	o Add the BLKIF_SEGS_TO_BLOCKS() macro which calculates the
	  number of ring entries that will be consumed by a blkif
	  request with the given number of segments.

sys/xen/blkif.h:
	o Update for changes in interface/io/blkif.h macros.

	o Update the BLKIF_MAX_RING_REQUESTS() macro to take the
	  ring size as an argument to allow this calculation on
	  multi-page rings.

	o Add a companion macro to BLKIF_MAX_RING_REQUESTS(),
	  BLKIF_RING_PAGES().  This macro determines the number of
	  ring pages required in order to support a ring with the
	  supplied number of request blocks.

sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkfront/block.h:
	o Negotiate with the other-end with the following limits:
	      Reqeust Size:   MAXPHYS
	      Max Segments:   (MAXPHYS/PAGE_SIZE) + 1
	      Max Requests:   256
	      Max Ring Pages: Sufficient to support Max Requests with
	                      Max Segments.

	o Dynamically allocate request pools and segemnts-per-request.

	o Update ring allocation/attachment code to support a
	  multi-page shared ring.

	o Update routines that access the shared ring to handle
	  multi-block requests.

sys/dev/xen/blkfront/blkfront.c:
	o Track blkfront allocations in a blkfront driver specific
	  malloc pool.

	o Strip out XenStore transaction retry logic in the
	  connection code.  Transactions only need to be used when
	  the update to multiple XenStore nodes must be atomic.
	  That is not the case here.

	o Fully disable blkif_resume() until it can be fixed
	  properly (it didn't work before this change).

	o Destroy bus-dma objects during device instance tear-down.

	o Properly handle backend devices with powef-of-2 sector
	  sizes larger than 512b.

sys/dev/xen/blkback/blkback.c:
	Advertise support for and implement the BLKIF_OP_WRITE_BARRIER
	and BLKIF_OP_FLUSH_DISKCACHE blkif opcodes using BIO_FLUSH and
	the BIO_ORDERED attribute of bios.

sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkfront/block.h:
	Fix various bugs in blkfront.

       o gnttab_alloc_grant_references() returns 0 for success and
	 non-zero for failure.  The check for < 0 is a leftover
	 Linuxism.

       o When we negotiate with blkback and have to reduce some of our
	 capabilities, print out the original and reduced capability before
	 changing the local capability.  So the user now gets the correct
	 information.

	o Fix blkif_restart_queue_callback() formatting.  Make sure we hold
	  the mutex in that function before calling xb_startio().

	o Fix a couple of KASSERT()s.

        o Fix a check in the xb_remove_* macro to be a little more specific.

sys/xen/gnttab.h:
sys/xen/gnttab.c:
	Define GNTTAB_LIST_END publicly as GRANT_REF_INVALID.

sys/dev/xen/netfront/netfront.c:
	Use GRANT_REF_INVALID instead of driver private definitions of the
	same constant.

sys/xen/gnttab.h:
sys/xen/gnttab.c:
	Add the gnttab_end_foreign_access_references() API.

	This API allows a client to batch the release of an array of grant
	references, instead of coding a private for loop.  The implementation
	takes advantage of this batching to reduce lock overhead to one
	acquisition and release per-batch instead of per-freed grant reference.

	While here, reduce the duration the gnttab_list_lock is held during
	gnttab_free_grant_references() operations.  The search to find the
	tail of the incoming free list does not rely on global state and so
	can be performed without holding the lock.

sys/dev/xen/xenpci/evtchn.c:
sys/dev/xen/evtchn/evtchn.c:
sys/xen/xen_intr.h:
	o Implement the bind_interdomain_evtchn_to_irqhandler API for HVM mode.
	  This allows an HVM domain to serve back end devices to other domains.
	  This API is already implemented for PV mode.

	o Synchronize the API between HVM and PV.

sys/dev/xen/xenpci/xenpci.c:
	o Scan the full region of CPUID space in which the Xen VMM interface
	  may be implemented.  On systems using SuSE as a Dom0 where the
	  Viridian API is also exported, the VMM interface is above the region
	  we used to search.

	o Pass through bus_alloc_resource() calls so that XenBus drivers
	  attaching on an HVM system can allocate unused physical address
	  space from the nexus.  The block back driver makes use of this
	  facility.

sys/i386/xen/xen_machdep.c:
	Use the correct type for accessing the statically mapped xenstore
	metadata.

sys/xen/interface/hvm/params.h:
sys/xen/xenstore/xenstore.c:
	Move hvm_get_parameter() to the correct global header file instead
	of as a private method to the XenStore.

sys/xen/interface/io/protocols.h:
	Sync with vendor.

sys/xeninterface/io/ring.h:
	Add macro for calculating the number of ring pages needed for an N
	deep ring.

	To avoid duplication within the macros, create and use the new
	__RING_HEADER_SIZE() macro.  This macro calculates the size of the
	ring book keeping struct (producer/consumer indexes, etc.) that
	resides at the head of the ring.

	Add the __RING_PAGES() macro which calculates the number of shared
	ring pages required to support a ring with the given number of
	requests.

	These APIs are used to support the multi-page ring version of the
	Xen block API.

sys/xeninterface/io/xenbus.h:
	Add Comments.

sys/xen/xenbus/...
	o Refactor the FreeBSD XenBus support code to allow for both front and
	  backend device attachments.

	o Make use of new config_intr_hook capabilities to allow front and back
	  devices to be probed/attached in parallel.

	o Fix bugs in probe/attach state machine that could cause the system to
	  hang when confronted with a failure either in the local domain or in
	  a remote domain to which one of our driver instances is attaching.

	o Publish all required state to the XenStore on device detach and
	  failure.  The majority of the missing functionality was for serving
	  as a back end since the typical "hot-plug" scripts in Dom0 don't
	  handle the case of cleaning up for a "service domain" that is not
	  itself.

	o Add dynamic sysctl nodes exposing the generic ivars of
	  XenBus devices.

	o Add doxygen style comments to the majority of the code.

	o Cleanup types, formatting, etc.

sys/xen/xenbus/xenbusb.c:
	Common code used by both front and back XenBus busses.

sys/xen/xenbus/xenbusb_if.m:
	Method definitions for a XenBus bus.

sys/xen/xenbus/xenbusb_front.c:
sys/xen/xenbus/xenbusb_back.c:
	XenBus bus specialization for front and back devices.

MFC after:	1 month
2010-10-19 20:53:30 +00:00
Jung-uk Kim
56b11f84a7 Remove trailing ", " from `sysctl machdep.idle_available' output. 2010-10-12 20:53:12 +00:00
Konstantin Belousov
78ae4338a2 Add macro DECLARE_MODULE_TIED to denote a module as requiring the
kernel of exactly the same __FreeBSD_version as the headers module was
compiled against.

Mark our in-tree ABI emulators with DECLARE_MODULE_TIED. The modules
use kernel interfaces that the Release Engineering Team feel are not
stable enough to guarantee they will not change during the life cycle
of a STABLE branch. In particular, the layout of struct sysentvec is
declared to be not part of the STABLE KBI.

Discussed with:	bz, rwatson
Approved by:	re (bz, kensmith)
MFC after:	2 weeks
2010-10-12 09:18:17 +00:00
Alan Cox
fb4c8540b2 Initialize KPTmap in locore so that vm86.c can call vtophys() (or really
pmap_kextract()) before pmap_bootstrap() is called.

Document the set of pmap functions that may be called before
pmap_bootstrap() is called.

Tested by:	bde@
Reviewed by:	kib@
Discussed with:	jhb@
MFC after:	6 weeks
2010-10-05 17:06:51 +00:00
Konstantin Belousov
3f506a78ce Display PCID capability of CPU and add CPUID define for it.
MFC after:	1 week
2010-10-05 15:31:56 +00:00
Andriy Gapon
d443a96ffb i386 and amd64 mp_machdep: improve topology detection for Intel CPUs
This patch is significantly based on previous work by jkim.
List of changes:
- added comments that describe topology uniformity assumption
- added reference to Intel Processor Topology Enumeration article
- documented a few global variables that describe topology
- retired weirdly set and used logical_cpus variable
- changed fallback code for mp_ncpus > 0 case, so that CPUs are treated
  as being different packages rather than cores in a single package
- moved AMD-specific code to topo_probe_amd [jkim]
- in topo_probe_0x4() follow Intel-prescribed procedure of deriving SMT
  and core masks and match APIC IDs against those masks [started by
  jkim]
- in topo_probe_0x4() drop code for double-checking topology parameters
  by looking at L1 cache properties [jkim]
- in topo_probe_0xb() add fallback path to topo_probe_0x4() as
  prescribed by Intel [jkim]

Still to do:
- prepare for upcoming AMD CPUs by using new mechanism of uniform
  topology description [pointed by jkim]
- probe cache topology in addition to CPU topology and probably use that
  for scheduler affinity topology; e.g. Core2 Duo and Athlon II X2 have
  the same CPU topology, but Athlon cores do not share L2 cache while
  Core2's do (no L3 cache in both cases)
- think of supporting non-uniform topologies if they are ever
  implemented for platforms in question
- think how to better described old HTT vs new HTT distinction, HTT vs
  SMT can be confusing as SMT is a generic term
- more robust code for marking CPUs as "logical" and/or "hyperthreaded",
  use HTT mask instead of modulo operation
- correct support for halting logical and/or hyperthreaded CPUs, let
  scheduler know that it shouldn't schedule any threads on those CPUs

PR:			kern/145385 (related)
In collaboration with:	jkim
Tested by:		Sergey Kandaurov <pluknet@gmail.com>,
			Jeremy Chadwick <freebsd@jdc.parodius.com>,
			Chip Camden <sterling@camdensoftware.com>,
			Steve Wills <steve@mouf.net>,
			Olivier Smedts <olivier@gid0.org>,
			Florian Smeets <flo@smeets.im>
MFC after:		1 month
2010-10-01 10:32:54 +00:00
Neel Natu
5c1a8dc028 Fix bogus error message from bus_dmamem_alloc() about incorrect alignment.
The check for alignment should be made against the physical address and not
the virtual address that maps it.

Sponsored by:	NetApp
Submitted by:	Will McGovern (will at netapp dot com)
Reviewed by:	mjacob, jhb
2010-09-29 21:53:11 +00:00
David Xu
8d2a935e45 Remove a redundant instruction for casuword. 2010-09-29 02:36:58 +00:00
John Baldwin
1691b62202 Rewrite the i386 memory probe:
- Check for SMAP data from the loader first.  If it exists, don't bother
  doing any VM86 calls at all.  This will be more friendly for non-BIOS
  boot environments such as EFI, etc.
- Move the base memory setup into a new basemem_setup() routine instead
  of duplicating it.
- Simplify the XEN case by removing all of the VM86/SMAP parsing code rather
  than just jumping over it.
- Adjust some comments to better explain the code flow.

MFC after:	2 weeks
2010-09-27 19:36:15 +00:00
David Xu
295fbd498e Now userland POSIX semaphore is based on umtx. The kernel module
is only used to support binary compatible, if want to run old
binary, you need to kldload the module.
2010-09-24 09:04:16 +00:00
Norikatsu Shigemura
cbf4dac64f Add support 'device tpm' for amd64.
Add tpm(4)'s default setting to /boot/defaults/loader.conf.
Add 'device tpm' to NOTES for amd64 and i386.

Discussed with:	takawata
Approved by:	imp (mentor)
2010-09-19 14:40:37 +00:00
Alexander Motin
a157e42516 Refactor timer management code with priority to one-shot operation mode.
The main goal of this is to generate timer interrupts only when there is
some work to do. When CPU is busy interrupts are generating at full rate
of hz + stathz to fullfill scheduler and timekeeping requirements. But
when CPU is idle, only minimum set of interrupts (down to 8 interrupts per
second per CPU now), needed to handle scheduled callouts is executed.
This allows significantly increase idle CPU sleep time, increasing effect
of static power-saving technologies. Also it should reduce host CPU load
on virtualized systems, when guest system is idle.

There is set of tunables, also available as writable sysctls, allowing to
control wanted event timer subsystem behavior:
  kern.eventtimer.timer - allows to choose event timer hardware to use.
On x86 there is up to 4 different kinds of timers. Depending on whether
chosen timer is per-CPU, behavior of other options slightly differs.
  kern.eventtimer.periodic - allows to choose periodic and one-shot
operation mode. In periodic mode, current timer hardware taken as the only
source of time for time events. This mode is quite alike to previous kernel
behavior. One-shot mode instead uses currently selected time counter
hardware to schedule all needed events one by one and program timer to
generate interrupt exactly in specified time. Default value depends of
chosen timer capabilities, but one-shot mode is preferred, until other is
forced by user or hardware.
  kern.eventtimer.singlemul - in periodic mode specifies how much times
higher timer frequency should be, to not strictly alias hardclock() and
statclock() events. Default values are 2 and 4, but could be reduced to 1
if extra interrupts are unwanted.
  kern.eventtimer.idletick - makes each CPU to receive every timer interrupt
independently of whether they busy or not. By default this options is
disabled. If chosen timer is per-CPU and runs in periodic mode, this option
has no effect - all interrupts are generating.

As soon as this patch modifies cpu_idle() on some platforms, I have also
refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions
(if supported) under high sleep/wakeup rate, as fast alternative to other
methods. It allows SMP scheduler to wake up sleeping CPUs much faster
without using IPI, significantly increasing performance on some highly
task-switching loads.

Tested by:	many (on i386, amd64, sparc64 and powerc)
H/W donated by:	Gheorghe Ardelean
Sponsored by:	iXsystems, Inc.
2010-09-13 07:25:35 +00:00
Andriy Gapon
3d844eddb7 bus_add_child: change type of order parameter to u_int
This reflects actual type used to store and compare child device orders.
Change is mostly done via a Coccinelle (soon to be devel/coccinelle)
semantic patch.
Verified by LINT+modules kernel builds.

Followup to:	r212213
MFC after:	10 days
2010-09-10 11:19:03 +00:00
Roman Divacky
27d4fea6c5 Change the parameter passed to the inline assembly to u_short
as we are dealing with 16bit segment registers. Change mov
to movw.

Approved by:    rpaulo (mentor)
Reviewed by:    kib, rink
2010-09-03 14:25:17 +00:00
Rui Paulo
cba3269417 Register an interrupt vector for DTrace return probes. There is some
code missing in lapic to make sure that we don't overwrite this entry,
but this will be done on a sequent commit.

Sponsored by:	The FreeBSD Foundation
2010-08-28 08:03:29 +00:00
Rui Paulo
6bf9fb35e5 Sync DTrace bits with amd64 and fix the build.
Sponsored by:	The FreeBSD Foundation
2010-08-26 11:22:12 +00:00
Jung-uk Kim
db1cea00ad Increase maximum number of page table entries per VM86 context from 8 to 24
pages, yet again.  Now we can allocate a whole segment, which is required
for shadowing option ROM images, for example.
2010-08-25 21:13:23 +00:00
Rui Paulo
0bc1991a4a Call the necessary DTrace function pointers when we have different kinds
of traps.

Sponsored by:	The FreeBSD Foundation
2010-08-25 09:10:32 +00:00
Rui Paulo
8a8d8fa3d1 Add two DTrace trap type values. Used by fasttrap.
Sponsored by:	The FreeBSD Foundation
2010-08-24 13:13:24 +00:00