Commit Graph

12655 Commits

Author SHA1 Message Date
Mitsuru IWASAKI
77c80e2e5b Share IPI init and startup code of mp_machdep.c with acpi_wakeup.c
as ipi_startup().
2012-06-12 00:14:54 +00:00
Mitsuru IWASAKI
8a6c6fadc7 Some fixes for r236772.
- Remove cpuset stopped_cpus which is no longer used.
- Add a short comment for cpuset suspended_cpus clearing.
- Fix the un-ordered x86/acpica/acpi_wakeup.c in conf/files.amd64 and i386.

Pointed-out by:	attilio@
2012-06-10 02:38:51 +00:00
Mitsuru IWASAKI
fb864578af Add x86/acpica/acpi_wakeup.c for amd64 and i386. Difference of
suspend/resume procedures are minimized among them.

common:
- Add global cpuset suspended_cpus to indicate APs are suspended/resumed.
- Remove acpi_waketag and acpi_wakemap from acpivar.h (no longer used).
- Add some variables in acpi_wakecode.S in order to minimize the difference
  among amd64 and i386.
- Disable load_cr3() because now CR3 is restored in resumectx().

amd64:
- Add suspend/resume related members (such as MSR) in PCB.
- Modify savectx() for above new PCB members.
- Merge acpi_switch.S into cpu_switch.S as resumectx().

i386:
- Merge(and remove) suspendctx() into savectx() in order to match with
  amd64 code.

Reviewed by:	attilio@, acpi@
2012-06-09 00:37:26 +00:00
Alan Cox
23c0d041ba Various small changes to PV entry management:
Constify pc_freemask[].

pmap_pv_reclaim()
  Eliminate "freemask" because it was a pessimization.  Add a comment about
  the resident count adjustment.

free_pv_entry() [i386 only]
  Merge an optimization from amd64 (r233954).

get_pv_entry()
  Eliminate the move to tail of the pv_chunk on the global pv_chunks list.
  (The right strategy needs more thought.  Moreover, there were unintended
  differences between the amd64 and i386 implementation.)

pmap_remove_pages()
  Eliminate unnecessary ()'s.
2012-06-04 03:51:08 +00:00
Andriy Gapon
7adc598a15 free wdog_kern_pat calls in post-panic paths from under SW_WATCHDOG
Those calls are useful with hardware watchdog drivers too.

MFC after:	3 weeks
2012-06-03 08:01:12 +00:00
Alan Cox
0d6f49d84a Isolate the global pv list lock from data and other locks to prevent false
sharing within the cache.
2012-06-02 22:14:10 +00:00
Konstantin Belousov
fa9f322df9 Use plain store for atomic_store_rel on x86, instead of implicitly
locked xchg instruction.  IA32 memory model guarantees that store has
release semantic, since stores cannot pass loads or stores.

Reviewed by:	  bde, jhb
Tested by:	  pho
MFC after:	  2 weeks
2012-06-02 18:10:16 +00:00
Jung-uk Kim
9ad569771a Consistently use ACPI_SUCCESS() and ACPI_FAILURE() macros wherever possible. 2012-06-01 21:33:33 +00:00
Jung-uk Kim
db08ae007d Tidy up code clutter in SMP case a bit. No functional change. 2012-06-01 19:19:04 +00:00
Jung-uk Kim
108705d043 Call AcpiSetFirmwareWakingVector() with interrupt disabled for consistency. 2012-06-01 18:18:48 +00:00
Jung-uk Kim
d3638dc4de Improve style(9) in the previous commit. 2012-06-01 17:07:52 +00:00
Mitsuru IWASAKI
f0a101b7e2 Call AcpiLeaveSleepStatePrep() in interrupt disabled context
(described in ACPICA source code).

- Move intr_disable() and intr_restore() from acpi_wakeup.c to acpi.c
  and call AcpiLeaveSleepStatePrep() in interrupt disabled context.
- Add acpi_wakeup_machdep() to execute wakeup MD procedures and call
  it twice in interrupt disabled/enabled context (ia64 version is
  just dummy).
- Rename wakeup_cpus variable in acpi_sleep_machdep() to suspcpus in
  order to be shared by acpi_sleep_machdep() and acpi_wakeup_machdep().
- Move identity mapping related code to acpi_install_wakeup_handler()
  (i386 version) for preparation of x86/acpica/acpi_wakeup.c
  (MFC candidate).

Reviewed by:	jkim@
MFC after:	2 days
2012-06-01 15:26:32 +00:00
Alan Cox
d85fbe8a91 Eliminate code duplication in free_pv_entry() and pmap_remove_pages() by
introducing free_pv_chunk().
2012-06-01 04:26:50 +00:00
Alan Cox
a2efa4249e Eliminate some purely stylistic differences among the amd64, i386 native,
and i386 xen PV entry allocators.
2012-05-30 04:16:54 +00:00
Alan Cox
0490d34982 MFi386 pmap r233433
Disable detailed PV entry accounting by default.  (A config option for
  enabling it was already introduced in r233433.)
2012-05-29 16:11:15 +00:00
Alan Cox
6516bffdef Rename pmap_collect() to pmap_pv_reclaim() and rewrite it such that it no
longer uses the active and inactive paging queues.  Instead, the pmap now
maintains an LRU-ordered list of pv entry pages, and pmap_pv_reclaim() uses
this list to select pv entries for reclamation.

Note: The old pmap_collect() tried to avoid reclaiming mappings for pages
that have either a hold_count or a busy field that is non-zero.  However,
this isn't necessary for correctness, and the locking in pmap_collect() was
insufficient to guarantee that such mappings weren't reclaimed.  The new
pmap_pv_reclaim() doesn't even try.

Tested by:	sbruno
MFC after:	5 weeks
2012-05-29 15:41:20 +00:00
Kevin Lo
544c5e5b53 Make sure that each va_start has one and only one matching va_end,
especially in error cases.
2012-05-29 01:48:06 +00:00
Alan Cox
4edfd622b5 Update a comment in get_pv_entry() to reflect the changes to the
synchronization of pv_vafree in r236158.
2012-05-28 17:35:23 +00:00
Alan Cox
8b0f4e0a0d Replace all uses of the vm page queues lock by a r/w lock that is private
to this pmap.c.  This new r/w lock is used primarily to synchronize access
to the PV lists.  However, it will be used in a somewhat unconventional
way.  As finer-grained PV list locking is added to each of the pmap
functions that acquire this r/w lock, its acquisition will be changed from
write to read, enabling concurrent execution of the pmap functions with
finer-grained locking.

X-MFC after:	r236045
2012-05-27 16:24:00 +00:00
Alan Cox
33853281b4 Rename pmap_collect() to pmap_pv_reclaim() and rewrite it such that it no
longer uses the active and inactive paging queues.  Instead, the pmap now
maintains an LRU-ordered list of pv entry pages, and pmap_pv_reclaim() uses
this list to select pv entries for reclamation.

Note: The old pmap_collect() tried to avoid reclaiming mappings for pages
that have either a hold_count or a busy field that is non-zero.  However,
this isn't necessary for correctness, and the locking in pmap_collect() was
insufficient to guarantee that such mappings weren't reclaimed.  The new
pmap_pv_reclaim() doesn't even try.

MFC after:	5 weeks
2012-05-26 06:10:25 +00:00
Bjoern A. Zeeb
920b965865 MFp4 bz_ipv6_fast:
in_cksum.h required ip.h to be included for struct ip.  To be
  able to use some general checksum functions like in_addword()
  in a non-IPv4 context, limit the (also exported to user space)
  IPv4 specific functions to the times, when the ip.h header is
  present and IPVERSION is defined (to 4).

  We should consider more general checksum (updating) functions
  to also allow easier incremental checksum updates in the L3/4
  stack and firewalls, as well as ponder further requirements by
  certain NIC drivers needing slightly different pseudo values
  in offloading cases.  Thinking in terms of a better "library".

  Sponsored by:	The FreeBSD Foundation
  Sponsored by:	iXsystems

Reviewed by:	gnn (as part of the whole)
MFC After:	3 days
2012-05-24 22:00:48 +00:00
Alan Cox
4e65634580 MF amd64 r233097, r233122
With the changes over the past year to how accesses to the page's dirty
field are synchronized, there is no need for pmap_protect() to acquire
the page queues lock unless it is going to access the pv lists or
PMAP1/PADDR1.

Style fix to pmap_protect().
2012-05-24 15:25:35 +00:00
Konstantin Belousov
ccc00630ae Enable drm2 modules build.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
2012-05-23 21:07:01 +00:00
Mitsuru IWASAKI
fe756f2a59 Remove cpususpend IDT vector for XEN.
This broke XEN kernel building.
2012-05-20 08:17:20 +00:00
Mitsuru IWASAKI
29d8e665ba Revert the previous commit on wakecode address verbose printing.
This broke PAE kernel building.
2012-05-19 02:31:38 +00:00
Mitsuru IWASAKI
e3fd0bc1b2 Add SMP/i386 suspend/resume support.
Most part is merged from amd64.

- i386/acpica/acpi_wakecode.S
Replaced with amd64 code (from realmode to paging enabling code).

- i386/acpica/acpi_wakeup.c
Replaced with amd64 code (except for wakeup_pagetables stuff).

- i386/include/pcb.h
- i386/i386/genassym.c
Added PCB new members (CR0, CR2, CR4, DS, ED, FS, SS, GDT, IDT, LDT
and TR) needed for suspend/resume, not for context switch.

- i386/i386/swtch.s
Added suspendctx() and resumectx().
Note that savectx() was not changed and used for suspending (while
amd64 code uses it).
BSP and AP execute the same sequence, suspendctx(), acpi_wakecode()
and resumectx() for suspend/resume (in case of UP system also).

- i386/i386/apic_vector.s
Added cpususpend().

- i386/i386/mp_machdep.c
- i386/include/smp.h
Added cpususpend_handler().

- i386/include/apicvar.h
- kern/subr_smp.c
- sys/smp.h
Added IPI_SUSPEND and suspend_cpus().

- i386/i386/initcpu.c
- i386/i386/machdep.c
- i386/include/md_var.h
- pc98/pc98/machdep.c
Moved initializecpu() declarations to md_var.h.

MFC after:	3 days
2012-05-18 18:55:58 +00:00
John Baldwin
424e69759c Centralize declaration of the debug.acpi sysctl node. 2012-05-17 17:58:53 +00:00
Andriy Gapon
99a312d048 i386 bootinfo: re-arrange EFI fields for natural alignment and packing
Suggested by:	bde
MFC after:	2 weeks
2012-05-13 09:25:39 +00:00
Alexander Motin
c078c18853 Add options GEOM_RAID into i386 and amd64 GENERIC kernels.
ataraid(4) previously was present there and having GEOM RAID is convinient.
Unlike other classes GEOM RAID can be set up from BIOS before install and
users are expecting it to be detected automatically.
2012-05-10 12:37:32 +00:00
Brooks Davis
b3a397a8de The DDB_CTF has little or nothing to do with the debugger so move it
next KDTRACE_HOOKS.
2012-05-09 01:37:48 +00:00
Alexander Leidinger
19e252baeb - >500 static DTrace probes for the linuxulator
- DTrace scripts to check for errors, performance, ...
  they serve mostly as examples of what you can do with the static probe;s
  with moderate load the scripts may be overwhelmed, excessive lock-tracing
  may influence program behavior (see the last design decission)

Design decissions:
 - use "linuxulator" as the provider for the native bitsize; add the
   bitsize for the non-native emulation (e.g. "linuxuator32" on amd64)
 - Add probes only for locks which are acquired in one function and released
   in another function. Locks which are aquired and released in the same
   function should be easy to pair in the code, inter-function
   locking is more easy to verify in DTrace.
 - Probes for locks should be fired after locking and before releasing to
   prevent races (to provide data/function stability in DTrace, see the
   man-page of "dtrace -v ..." and the corresponding DTrace docs).
2012-05-05 19:42:38 +00:00
Attilio Rao
b8be27bf29 Revert part of r234723 by re-enabling the SMP protection for
intr_bind() on x86.
This has been requested by jhb and I strongly disagree with this,
but as long as he is the x86 and interrupt subsystem maintainer I will
follow his directives.

The disagreement cames from what we should really consider as a
public KPI. IMHO, if we really need a selection between the kernel
functions, we may need an explicit protection like _KERNEL_KPI, which
defines which subset of the kernel function might really be considered
as part of the KPI (for thirdy part modules) and which not.
As long as we don't have this mechanism I just consider any possible
function as usable by thirdy part code, thus intr_bind() included.

MFC after:	1 week
2012-05-03 21:44:01 +00:00
Dimitry Andric
460378bf13 Add a convenience macro for the returns_twice attribute, and apply it to
the prototypes of the appropriate functions (getcontext, savectx,
setjmp, sigsetjmp and vfork).

MFC after:	2 weeks
2012-04-29 11:04:31 +00:00
Attilio Rao
70dbd1604c Clean up the intr* MD KPI from the SMP dependency, removing a cause of
discrepancy between modules and kernel, but deal with SMP differences
within the functions themselves.

As an added bonus this also helps in terms of code readability.

Requested by:	gibbs
Reviewed by:	jhb, marius
MFC after:	1 week
2012-04-26 20:24:25 +00:00
Brooks Davis
e9acaa9ae4 Enable DTrace hooks in GENERIC.
Reviewed by:	gnn
Approved by:	core (jhb, imp)
Requested by:	a cast of thousands
MFC after:	3 days
2012-04-20 21:37:42 +00:00
Jung-uk Kim
17b27db088 Regen for r234359. 2012-04-16 23:17:29 +00:00
Jung-uk Kim
f69f4d8630 Correct an argument type of iopl syscall for Linuxulator. This also fixes
a warning from Clang, i. e., "args->level < 0 is always false".
2012-04-16 23:16:18 +00:00
Jung-uk Kim
13fa650c75 Regen for r234357. 2012-04-16 22:59:51 +00:00
Jung-uk Kim
db8eb180d9 Correct arguments of stat64, fstat64 and lstat64 syscalls for Linuxulator. 2012-04-16 22:58:28 +00:00
Jung-uk Kim
28cc85fd09 Regen for r234352. 2012-04-16 21:24:23 +00:00
Jung-uk Kim
d69a426fce - Implement pipe2 syscall for Linuxulator. This syscall appeared in 2.6.27
but GNU libc used it without checking its kernel version, e. g., Fedora 10.
- Move pipe(2) implementation for Linuxulator from MD files to MI file,
sys/compat/linux/linux_file.c.  There is no MD code for this syscall at all.
- Correct an argument type for pipe() from l_ulong * to l_int *.  Probably
this was the source of MI/MD confusion.

Reviewed by:	emulation
2012-04-16 21:22:02 +00:00
Jung-uk Kim
67490d785a - When interrupt is not requested for VM86 call, make a fake exit point and
push the address onto stack as we do for INTn emulation.  This avoids stack
underflow when we encounter RETF instruction in VM86 mode.  Lack of this
exit point actually caused page fault in VM86 mode with VESA module when we
resume from suspend state[1].
- Remove unnecessary CLI and STI instructions from BIOS interrupt emulation.
INTn and IRET must be able to emulate the flag correctly.

Reported by:	gavin [1]
Tested by:	gavin (early revision)
MFC after:	3 days
2012-04-16 19:31:44 +00:00
Andriy Gapon
56c2dc796b add actual interrupt counters to back ipi_invlcache_counts
Otherwise one could run into a panic with COUNT_IPIS when cache
invalidation actually happened.

Reviewed by:	jhb
MFC after:	1 week
2012-04-13 07:18:19 +00:00
Andriy Gapon
f84633cdcc bump INTRCNT_COUNT values to reflect actual numbers of IPI counters
Maybe the numbers should be conditionalized on COUNT_IPIS

Reviewed by:	jhb
MFC after:	1 week
2012-04-13 07:15:40 +00:00
John Baldwin
ed5a2b61fd Add OFED and the associated options and drivers to x86 LINT builds:
- Mark 'sdp' as requiring 'inet'.
- Always include "opt_inet.h" and "opt_inet6.h" and modify the IB
  driver Makefiles to honor WITH/WITHOUT_INET/INET6/_SUPPORT options
  to determine what should be enabled during a module build.
- Fix the mlxen(4) driver and the core IB code to compile without
  if INET is disabled (including when both INET and INET6 are disabled).

Reviewed by:	bz
MFC after:	2 weeks
2012-04-12 14:01:06 +00:00
Marius Strobl
0fec3e2d81 Fix !SMP build after r234074.
Reviewed by:	attilio, jhb
2012-04-10 16:08:46 +00:00
Attilio Rao
79257559ee BSP is not added to the mask of valid target CPUs for interrupts
in set_apic_interrupt_ids(). Besides, set_apic_interrupts_ids() is not
called in the !SMP case too.
Fix this by:
- Adding the BSP as an interrupt target directly in cpu_startup().
- Remove an obsolete optimization where the BSP are skipped in
  set_apic_interrupt_ids().

Reported by:	jh
Reviewed by:	jhb
MFC after:	3 days
X-MFC:		r233961
Pointy hat to:	me
2012-04-09 22:41:19 +00:00
John Baldwin
bcd6068179 Recognize the RDRAND instruction feature.
Submitted by:	Michael Fuckner  michael fuckner net
MFC after:	3 days
2012-04-09 15:20:16 +00:00
John Baldwin
20b5d3bf40 Add descriptions after the 'device' line for several NICs to match the
existing style.
2012-04-04 13:49:22 +00:00
John Baldwin
f2e3bfc074 Make machine check exception logging more readable. On newer Intel systems,
an uncorrected ECC error tends to fire on all CPUs in a package
simultaneously and the current printf hacks are not sufficient to make
the messages legible.  Instead, use the existing mca_lock spinlock to
serialize calls to mca_log() and change the machine check code to panic
directly when an unrecoverable error is encoutered rather than falling
back to a trap_fatal() call in trap() (which adds nearly a screen-full of
logging messages that aren't useful for machine checks).

MFC after:	2 weeks
2012-04-02 15:07:22 +00:00
John Baldwin
435803f3c7 Move the legacy(4) driver to x86. 2012-03-30 19:10:14 +00:00
John Baldwin
0d95597ca9 Use a more proper fix for enabling HT MSI mapping windows on Host-PCI
bridges.  Rather than blindly enabling the windows on all of them, only
enable the window when an MSI interrupt is enabled for a device behind
the bridge, similar to what already happens for HT PCI-PCI bridges.

To implement this, each x86 Host-PCI bridge driver has to be able to
locate it's actual backing device on bus 0.  For ACPI, use the _ADR
method to find the slot and function of the device.  For the non-ACPI
case, the legacy(4) driver already scans bus 0 looking for Host-PCI
bridge devices.  Now it saves the slot and function of each bridge that
it finds as ivars that the Host-PCI bridge driver can then use in its
pcib_map_msi() method.

This fixes machines where non-MSI interrupts were broken by the previous
round of HT MSI changes.

Tested by:	bapt
MFC after:	1 week
2012-03-29 19:03:22 +00:00
John Baldwin
1f22be4547 - Rename VM_MEMATTR_UNCACHED to VM_MEMATTR_WEAK_UNCACHEABLE on x86 to
be less ambiguous and more clearly identify what it means.  This
  attribute is what Intel refers to as UC-, and it's only difference
  relative to normal UC memory is that a WC MTRR will override a UC-
  PAT entry causing the memory to be treated as WC, whereas a UC PAT
  entry will always override the MTRR.
- Remove the VM_MEMATTR_UNCACHED alias from powerpc.
2012-03-29 16:51:22 +00:00
Fabien Thomas
f5f9340b98 Add software PMC support.
New kernel events can be added at various location for sampling or counting.
This will for example allow easy system profiling whatever the processor is
with known tools like pmcstat(8).

Simultaneous usage of software PMC and hardware PMC is possible, for example
looking at the lock acquire failure, page fault while sampling on
instructions.

Sponsored by: NETASQ
MFC after:	1 month
2012-03-28 20:58:30 +00:00
Alan Cox
5d4c773b32 Disable detailed PV entry accounting by default. Add a config option
to enable it.

MFC after:	1 week
2012-03-24 19:43:49 +00:00
Marius Strobl
b78ebd64b2 Add cas(4), gem(4) and hme(4) to x86 GENERICs as suggested by netchild@ in
<20120222095239.Horde.0hpYHJjmRSRPRKzXsoFRbYk@webmail.leidinger.net>.
According to some private emails received, it apparently is not unpopular
to use at least Quad GigaSwift cards driven by cas(4) in x86 machines.

MFC after:	1 week
2012-03-24 18:08:28 +00:00
Joel Dahl
f38f12f287 Add snd_cmi, snd_csa and snd_emu10kx to GENERIC on i386 and amd64.
The GPL infected parts which were blocking the inclusion of snd_csa
and snd_emu10kx in GENERIC have recently been removed from the tree.
I'm also adding snd_cmi to GENERIC, which I originally intended to
add when we enabled sound support by default.

Discussed with:	jhb, pfg, Yuriy Tsibizov <yuriy.tsibizov@gfk.ru>
Approved by:	jhb
2012-03-22 16:19:04 +00:00
Alan Cox
5730afc9b6 Handle spurious page faults that may occur in no-fault sections of the
kernel.

When access restrictions are added to a page table entry, we flush the
corresponding virtual address mapping from the TLB.  In contrast, when
access restrictions are removed from a page table entry, we do not
flush the virtual address mapping from the TLB.  This is exactly as
recommended in AMD's documentation.  In effect, when access
restrictions are removed from a page table entry, AMD's MMUs will
transparently refresh a stale TLB entry.  In short, this saves us from
having to perform potentially costly TLB flushes.  In contrast,
Intel's MMUs are allowed to generate a spurious page fault based upon
the stale TLB entry.  Usually, such spurious page faults are handled
by vm_fault() without incident.  However, when we are executing
no-fault sections of the kernel, we are not allowed to execute
vm_fault().  This change introduces special-case handling for spurious
page faults that occur in no-fault sections of the kernel.

In collaboration with:	kib
Tested by:		gibbs (an earlier version)

I would also like to acknowledge Hiroki Sato's assistance in
diagnosing this problem.

MFC after:	1 week
2012-03-22 04:52:51 +00:00
Ed Schouten
92396a3174 Remove pty(4) from our kernel configurations.
As of FreeBSD 8, this driver should not be used. Applications that use
posix_openpt(2) and openpty(3) use the pts(4) that is built into the
kernel unconditionally. If it turns out high profile depend on the
pty(4) module anyway, I'd rather get those fixed. So please report any
issues to me.

The pty(4) module is still available as a kernel module of course, so a
simple `kldload pty' can be used to run old-style pseudo-terminals.
2012-03-21 08:38:42 +00:00
Jung-uk Kim
4c52cad2f9 Merge ACPICA 20120320. 2012-03-20 21:37:52 +00:00
Tijl Coosemans
dfb1c11345 Copy amd64 sysarch.h to x86 and merge with i386 sysarch.h. Replace
amd64/i386/pc98 sysarch.h with stubs.
2012-03-19 21:57:31 +00:00
Tijl Coosemans
2c7879ea84 Copy i386 specialreg.h to x86 and merge with amd64 specialreg.h. Replace
amd64/i386/pc98 specialreg.h with stubs.
2012-03-19 21:34:11 +00:00
Tijl Coosemans
68156ad982 Copy i386 psl.h to x86 and replace amd64/i386/pc98 psl.h with stubs. 2012-03-19 21:29:57 +00:00
Tijl Coosemans
bcde3b9f67 Move userland bits (and some common kernel bits) from amd64 and i386
segments.h to a new x86 segments.h.

Add __packed attribute to some structs (just to be sure).
Also make it clear that i386 GDT and LDT entries are used in ia64 code.
2012-03-19 21:24:50 +00:00
Konstantin Belousov
fc6e32fb62 If we ever allow for managed fictitious pages, the pages shall be
excluded from superpage promotions.  At least one of the reason is
that pv_table is sized for non-fictitious pages only.

Consistently check for the page to be non-fictitious before accesing
superpage pv list.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	alc
MFC after:	2 weeks
2012-03-19 09:34:22 +00:00
Tijl Coosemans
01cd19680d Copy i386 reg.h to x86 and merge with amd64 reg.h. Replace i386/amd64/pc98
reg.h with stubs.

The tREGISTER macros are only made visible on i386. These macros are
deprecated and should not be available on amd64.

The i386 and amd64 versions of struct reg have been renamed to struct
__reg32 and struct __reg64. During compilation either __reg32 or __reg64
is defined as reg depending on the machine architecture. On amd64 the i386
struct is also available as struct reg32 which is used in COMPAT_FREEBSD32
code.

Most of compat/ia32/ia32_reg.h is now IA64 only.

Reviewed by:	kib (previous version)
2012-03-18 19:06:38 +00:00
Tijl Coosemans
23341c174c Use exact width integer types in amd64/i386 reg.h to prepare for a merge.
The only real change is replacing long with int on i386.
2012-03-18 18:44:42 +00:00
Tijl Coosemans
786645078b Move userland bits of i386 npx.h and amd64 fpu.h to x86 fpu.h.
Remove FPU types from compat/ia32/ia32_reg.h that are no longer needed.
Create machine/npx.h on amd64 to allow compiling i386 code that uses
this header.

The original npx.h and fpu.h define struct envxmm differently. Both
definitions have been included in the new x86 header as struct __envxmm32
and struct __envxmm64. During compilation either __envxmm32 or __envxmm64
is defined as envxmm depending on machine architecture. On amd64 the i386
struct is also available as struct envxmm32.

Reviewed by:	kib
2012-03-16 20:24:30 +00:00
Tijl Coosemans
545193ce59 Use exact width integer types instead of long in struct env87 in
preparation to merge with amd64.

Reviewed by:	kib
2012-03-16 19:42:39 +00:00
Yoshihiro Takahashi
dff207f860 - Fix to build a native i386 kernel without the SMP and atpic.
- Merge r232744 changes to pc98.
  (Allow a kernel to be built with 'nodevice atpic'.)
- Move ICU related defines from x86/isa/atpic.c to x86/isa/icu.h and
  use them in x86/x86/intr_machdep.c.

Reviewed by:	jhb
2012-03-16 12:13:44 +00:00
Tijl Coosemans
b827337ee6 Remove prototypes of _amd64_get_fsbase et al. The functions were removed
in r145571.
2012-03-16 10:10:17 +00:00
Alan Cox
9437b8d495 Simplify the error checking in one branch of trap_pfault() and update
the nearby comment.

Correct the style of two return statements in trap_pfault().

Merge a comment from amd64's trap_pfault().
2012-03-12 05:28:02 +00:00
Alexander Leidinger
2676e6799d regen 2012-03-10 23:11:21 +00:00
Alexander Leidinger
048e874f54 - add comments to syscalls.master and linux(32)_dummy about which linux
kernel version introduced the sysctl (based upon a linux man-page)
- add comments to sscalls.master regarding some names of sysctls which are
  different than the linux-names (based upon the linux unistd.h)
- add some dummy sysctls
- name an unimplemented sysctl

MFC after:	1 month
2012-03-10 23:10:18 +00:00
John Baldwin
646af7c6af Move i386's intr_machdep.c to the x86 tree and share it with amd64. 2012-03-09 20:43:29 +00:00
John Baldwin
ad47abd20c Allow a native i386 kernel to be built with 'nodevice atpic'. Just as on
amd64, if 'device isa' is present quiesce the 8259A's during boot and
resume from suspend.

While here, be more selective on amd64 about which kernel configurations
need elcr.c.

MFC after:	2 weeks
2012-03-09 19:42:48 +00:00
John Baldwin
5e9fcac6f4 MFamd64:
- Return failure for a suspend attempt if we have no wake address.
- Use intr_disable()/intr_restore() instead of ACPI_DISABLE_IRQS().
- Invoke intr_suspend() earlier and call intr_resume() if suspend
  fails.
- Use pause in the loop waiting for CPU to suspend.
- Restore PAT MSR, switchtime, switchticks, and MTRRs on resume.

Reviewed by:	jkim (earlier version)
MFC after:	2 weeks
2012-03-09 19:20:19 +00:00
Attilio Rao
9c170fd168 Disable the option VFS_ALLOW_NONMPSAFE by default on all the supported
platforms.
This will make every attempt to mount a non-mpsafe filesystem to the
kernel forbidden, unless it is expressely compiled with
VFS_ALLOW_NONMPSAFE option.

This patch is part of the effort of killing non-MPSAFE filesystems
from the tree.

No MFC is expected for this patch.
2012-03-06 20:01:25 +00:00
Bjoern A. Zeeb
0566170f70 Provide wbwd(4), a driver for the watchdog timer found on various
Winbond Super I/O chips.

With minor efforts it should be possible the extend the driver to support
further chips/revisions available from Winbond.  In the simplest case
only new IDs need to be added, while different chipsets might require
their own function to enter extended function mode, etc.

Sponsored by:	Sandvine Incorporated ULC (in 2011)
Reviewed by:	emaste, brueffer
MFC after:	2 weeks
2012-03-06 18:44:52 +00:00
Jung-uk Kim
e883bb1ae6 Fix few style nits. 2012-03-05 18:47:42 +00:00
Robert Millan
a65f78bf2e Exclude USB drivers (except umass and ukbd) from main kernel image on i386
and amd64.

Reviewed by:	hselasky, arch, usb
Approved by:	kib (mentor)
2012-03-04 21:31:13 +00:00
Tijl Coosemans
d8a023328d Copy amd64 ptrace.h to x86 and merge with i386 ptrace.h. Replace
amd64/i386/pc98 ptrace.h with stubs.

For amd64 PT_GETXSTATE and PT_SETXSTATE have been redefined to match the
i386 values. The old values are still supported but should no longer be
used.

Reviewed by:	kib
2012-03-04 20:24:28 +00:00
Tijl Coosemans
8b4a1ed0de Copy amd64 trap.h to x86 and replace amd64/i386/pc98 trap.h with stubs. 2012-03-04 14:12:57 +00:00
Tijl Coosemans
ee0d5ab989 Copy amd64 float.h to x86 and merge with i386 float.h. Replace
amd64/i386/pc98 float.h with stubs.
2012-03-04 14:00:32 +00:00
Jung-uk Kim
62953748f5 Add VESA option to GENERIC for amd64 and i386.
MFC after:	1 month
2012-03-03 00:11:46 +00:00
Tijl Coosemans
5b2a5decd1 Copy amd64 stdarg.h to x86 and replace amd64/i386/pc98 stdarg.h with stubs. 2012-02-28 22:30:58 +00:00
Tijl Coosemans
f85ac30a3d Copy amd64 setjmp.h to x86 and replace amd64/i386/pc98 setjmp.h with stubs. 2012-02-28 22:17:52 +00:00
Tijl Coosemans
95b1d16df5 Copy amd64 endian.h to x86 and merge with i386 endian.h. Replace
amd64/i386/pc98 endian.h with stubs.

In __bswap64_const(x) the conflict between 0xffUL and 0xffULL has been
resolved by reimplementing the macro in terms of __bswap32(x). As a side
effect __bswap64_var(x) is now implemented using two bswap instructions on
i386 and should be much faster. __bswap32_const(x) has been reimplemented
in terms of __bswap16(x) for consistency.
2012-02-28 19:39:54 +00:00
Tijl Coosemans
8770e9db97 Copy amd64 _stdint.h to x86 and merge with i386 _stdint.h. Replace
amd64/i386/pc98 _stdint.h with stubs.
2012-02-28 18:38:33 +00:00
Tijl Coosemans
8cfa93e4be Copy amd64 _limits.h to x86 and merge with i386 _limits.h. Replace
amd64/i386/pc98 _limits.h with stubs.
2012-02-28 18:24:28 +00:00
Tijl Coosemans
8f77be2b4c Copy amd64 _types.h to x86 and merge with i386 _types.h. Replace existing
amd64/i386/pc98 _types.h with stubs.
2012-02-28 18:15:28 +00:00
John Baldwin
9e30e6dee9 MFamd64: Don't whine about interrupts being disabled for an NMI. 2012-02-27 17:31:38 +00:00
John Baldwin
45096d8c43 Remove completely duplicate '#ifdef XEN' section. 2012-02-27 17:30:21 +00:00
John Baldwin
c7e8722ca0 Resort the IDT_DTRACE_RET constant after it was changed to be less than
IDT_SYSCALL.
2012-02-27 17:29:37 +00:00
Konstantin Belousov
aa10345311 Do not write to the user address directly, use suword().
Reported by:	Bengt Ahlgren <bengta sics se>
MFC after:	1 week
2012-02-25 01:33:39 +00:00
Konstantin Belousov
3494f31ad2 Fix misuse of the kernel map in miscellaneous image activators.
Vnode-backed mappings cannot be put into the kernel map, since it is a
system map.

Use exec_map for transient mappings, and remove the mappings with
kmem_free_wakeup() to notify the waiters on available map space.

Do not map the whole executable into KVA at all to copy it out into
usermode.  Directly use vn_rdwr() for the case of not page aligned
binary.

There is one place left where the potentially unbounded amount of data
is mapped into exec_map, namely, in the COFF image activator
enumeration of the needed shared libraries.

Reviewed by:   alc
MFC after:     2 weeks
2012-02-17 23:47:16 +00:00
Robert Millan
6a443d5308 Move WITHOUT_SOURCELESS_* files to sys/conf/ in order to avoid "universe"
target processing them as if they were standalone kernel config files.

Approved by:	kib (mentor)
MFC after:	5 days
2012-02-12 14:55:27 +00:00
Robert Millan
b10dbcfd65 Add "nodevice adw" to WITHOUT_SOURCELESS_UCODE.
Approved by:	kib (mentor)
MFC after:	13 days
2012-02-04 13:45:39 +00:00
Robert Millan
4a47892c81 Add MK_SOURCELESS build option. Setting MK_SOURCELESS to "no" will disable
kernel modules that include binary-only code.

More fine-grained control is provided via MK_SOURCELESS_HOST (for native code
that runs on host CPU) and MK_SOURCELESS_UCODE (for microcode).

Reviewed by:	julian, delphij, freebsd-arch
Approved by:	kib (mentor)
MFC after:	2 weeks
2012-02-04 00:54:43 +00:00
Kenneth D. Merry
048a50f354 Fix the netback driver build for i386.
netback.c:	Add missing VM includes.

xen/xenvar.h,
xen/xenpmap.h:	Move some XENHVM macros from <machine/xen/xenpmap.h> to
		<machine/xen/xenvar.h> on i386 to match the amd64 headers.

conf/files:	Add netback to the build.

Submitted by:	jhb
MFC after:	3 days
2012-02-02 17:54:35 +00:00
Jim Harris
f11c7f6305 Add isci(4) driver for amd64 and i386 targets.
The isci driver is for the integrated SAS controller in the Intel C600
(Patsburg) chipset.  Source files in sys/dev/isci directory are
FreeBSD-specific, and sys/dev/isci/scil subdirectory contains
an OS-agnostic library (SCIL) published by Intel to control the SAS
controller.  This library is used primarily as-is in this driver, with
some post-processing to better integrate into the kernel build
environment.

isci.4 and a README in the sys/dev/isci directory contain a few
additional details.

This driver is only built for amd64 and i386 targets.

Sponsored by: Intel
Reviewed by: scottl
Approved by: scottl
2012-01-31 19:38:18 +00:00
Konstantin Belousov
62c625fdd2 Finally, try to enable the nxstacks on amd64 and powerpc64 for both 64bit
and 32bit ABIs. Also try to enable nxstacks for PAE/i386 when supported,
and some variants of powerpc32.

MFC after:	2 months (if ever)
2012-01-30 07:56:00 +00:00
Konstantin Belousov
a045432a58 Synchronize the struct sigcontext definitions on x86 with mcontext_t.
Pointed out by:	bde
MFC after:	1 month
2012-01-30 07:51:52 +00:00
David Schultz
2ee7b1d4ae Add C11 macros describing subnormal numbers to float.h.
Reviewed by:	bde
2012-01-23 06:36:41 +00:00
Konstantin Belousov
8c6f8f3d5b Add support for the extended FPU states on amd64, both for native
64bit and 32bit ABIs.  As a side-effect, it enables AVX on capable
CPUs.

In particular:

- Query the CPU support for XSAVE, list of the supported extensions
  and the required size of FPU save area. The hw.use_xsave tunable is
  provided for disabling XSAVE, and hw.xsave_mask may be used to
  select the enabled extensions.

- Remove the FPU save area from PCB and dynamically allocate the
  (run-time sized) user save area on the top of the kernel stack,
  right above the PCB. Reorganize the thread0 PCB initialization to
  postpone it after BSP is queried for save area size.

- The dumppcb, stoppcbs and susppcbs now do not carry the FPU state as
  well. FPU state is only useful for suspend, where it is saved in
  dynamically allocated suspfpusave area.

- Use XSAVE and XRSTOR to save/restore FPU state, if supported and
  enabled.

- Define new mcontext_t flag _MC_HASFPXSTATE, indicating that
  mcontext_t has a valid pointer to out-of-struct extended FPU
  state. Signal handlers are supplied with stack-allocated fpu
  state. The sigreturn(2) and setcontext(2) syscall honour the flag,
  allowing the signal handlers to inspect and manipilate extended
  state in the interrupted context.

- The getcontext(2) never returns extended state, since there is no
  place in the fixed-sized mcontext_t to place variable-sized save
  area. And, since mcontext_t is embedded into ucontext_t, makes it
  impossible to fix in a reasonable way.  Instead of extending
  getcontext(2) syscall, provide a sysarch(2) facility to query
  extended FPU state.

- Add ptrace(2) support for getting and setting extended state; while
  there, implement missed PT_I386_{GET,SET}XMMREGS for 32bit binaries.

- Change fpu_kern KPI to not expose struct fpu_kern_ctx layout to
  consumers, making it opaque. Internally, struct fpu_kern_ctx now
  contains a space for the extended state. Convert in-kernel consumers
  of fpu_kern KPI both on i386 and amd64.

First version of the support for AVX was submitted by Tim Bird
<tim.bird am sony com> on behalf of Sony. This version was written
from scratch.

Tested by:	pho (previous version), Yamagi Burmeister <lists yamagi org>
MFC after:	1 month
2012-01-21 17:45:27 +00:00
Konstantin Belousov
6db9cf559f Add definitions for the FPU extended state header, legacy extended
state and AVX state.

MFC after:	1 week
2012-01-17 17:07:13 +00:00
Konstantin Belousov
79937651ef Add definitions related to XCR0.
MFC after:	1 week
2012-01-17 07:23:43 +00:00
Colin Percival
e69346d668 s/amd64/i386/ in comment. 2012-01-16 02:42:41 +00:00
Colin Percival
1934cbcc1a Copy XENHVM config file from amd64, now that i386+XENHVM works. 2012-01-16 02:42:16 +00:00
Ulrich Spörlein
9a14aa017b Convert files to UTF-8 2012-01-15 13:23:18 +00:00
Kenneth D. Merry
130f4520cb Add the CAM Target Layer (CTL).
CTL is a disk and processor device emulation subsystem originally written
for Copan Systems under Linux starting in 2003.  It has been shipping in
Copan (now SGI) products since 2005.

It was ported to FreeBSD in 2008, and thanks to an agreement between SGI
(who acquired Copan's assets in 2010) and Spectra Logic in 2010, CTL is
available under a BSD-style license.  The intent behind the agreement was
that Spectra would work to get CTL into the FreeBSD tree.

Some CTL features:

 - Disk and processor device emulation.
 - Tagged queueing
 - SCSI task attribute support (ordered, head of queue, simple tags)
 - SCSI implicit command ordering support.  (e.g. if a read follows a mode
   select, the read will be blocked until the mode select completes.)
 - Full task management support (abort, LUN reset, target reset, etc.)
 - Support for multiple ports
 - Support for multiple simultaneous initiators
 - Support for multiple simultaneous backing stores
 - Persistent reservation support
 - Mode sense/select support
 - Error injection support
 - High Availability support (1)
 - All I/O handled in-kernel, no userland context switch overhead.

(1) HA Support is just an API stub, and needs much more to be fully
    functional.

ctl.c:			The core of CTL.  Command handlers and processing,
			character driver, and HA support are here.

ctl.h:			Basic function declarations and data structures.

ctl_backend.c,
ctl_backend.h:		The basic CTL backend API.

ctl_backend_block.c,
ctl_backend_block.h:	The block and file backend.  This allows for using
			a disk or a file as the backing store for a LUN.
			Multiple threads are started to do I/O to the
			backing device, primarily because the VFS API
			requires that to get any concurrency.

ctl_backend_ramdisk.c:	A "fake" ramdisk backend.  It only allocates a
			small amount of memory to act as a source and sink
			for reads and writes from an initiator.  Therefore
			it cannot be used for any real data, but it can be
			used to test for throughput.  It can also be used
			to test initiators' support for extremely large LUNs.

ctl_cmd_table.c:	This is a table with all 256 possible SCSI opcodes,
			and command handler functions defined for supported
			opcodes.

ctl_debug.h:		Debugging support.

ctl_error.c,
ctl_error.h:		CTL-specific wrappers around the CAM sense building
			functions.

ctl_frontend.c,
ctl_frontend.h:		These files define the basic CTL frontend port API.

ctl_frontend_cam_sim.c:	This is a CTL frontend port that is also a CAM SIM.
			This frontend allows for using CTL without any
			target-capable hardware.  So any LUNs you create in
			CTL are visible in CAM via this port.

ctl_frontend_internal.c,
ctl_frontend_internal.h:
			This is a frontend port written for Copan to do
			some system-specific tasks that required sending
			commands into CTL from inside the kernel.  This
			isn't entirely relevant to FreeBSD in general,
			but can perhaps be repurposed.

ctl_ha.h:		This is a stubbed-out High Availability API.  Much
			more is needed for full HA support.  See the
			comments in the header and the description of what
			is needed in the README.ctl.txt file for more
			details.

ctl_io.h:		This defines most of the core CTL I/O structures.
			union ctl_io is conceptually very similar to CAM's
			union ccb.

ctl_ioctl.h:		This defines all ioctls available through the CTL
			character device, and the data structures needed
			for those ioctls.

ctl_mem_pool.c,
ctl_mem_pool.h:		Generic memory pool implementation used by the
			internal frontend.

ctl_private.h:		Private data structres (e.g. CTL softc) and
			function prototypes.  This also includes the SCSI
			vendor and product names used by CTL.

ctl_scsi_all.c,
ctl_scsi_all.h:		CTL wrappers around CAM sense printing functions.

ctl_ser_table.c:	Command serialization table.  This defines what
			happens when one type of command is followed by
			another type of command.

ctl_util.c,
ctl_util.h:		CTL utility functions, primarily designed to be
			used from userland.  See ctladm for the primary
			consumer of these functions.  These include CDB
			building functions.

scsi_ctl.c:		CAM target peripheral driver and CTL frontend port.
			This is the path into CTL for commands from
			target-capable hardware/SIMs.

README.ctl.txt:		CTL code features, roadmap, to-do list.

usr.sbin/Makefile:	Add ctladm.

ctladm/Makefile,
ctladm/ctladm.8,
ctladm/ctladm.c,
ctladm/ctladm.h,
ctladm/util.c:		ctladm(8) is the CTL management utility.
			It fills a role similar to camcontrol(8).
			It allow configuring LUNs, issuing commands,
			injecting errors and various other control
			functions.

usr.bin/Makefile:	Add ctlstat.

ctlstat/Makefile
ctlstat/ctlstat.8,
ctlstat/ctlstat.c:	ctlstat(8) fills a role similar to iostat(8).
			It reports I/O statistics for CTL.

sys/conf/files:		Add CTL files.

sys/conf/NOTES:		Add device ctl.

sys/cam/scsi_all.h:	To conform to more recent specs, the inquiry CDB
			length field is now 2 bytes long.

			Add several mode page definitions for CTL.

sys/cam/scsi_all.c:	Handle the new 2 byte inquiry length.

sys/dev/ciss/ciss.c,
sys/dev/ata/atapi-cam.c,
sys/cam/scsi/scsi_targ_bh.c,
scsi_target/scsi_cmds.c,
mlxcontrol/interface.c:	Update for 2 byte inquiry length field.

scsi_da.h:		Add versions of the format and rigid disk pages
			that are in a more reasonable format for CTL.

amd64/conf/GENERIC,
i386/conf/GENERIC,
ia64/conf/GENERIC,
sparc64/conf/GENERIC:	Add device ctl.

i386/conf/PAE:		The CTL frontend SIM at least does not compile
			cleanly on PAE.

Sponsored by:	Copan Systems, SGI and Spectra Logic
MFC after:	1 month
2012-01-12 00:34:33 +00:00
Adrian Chadd
38ec0ca81c Fix the broken module build I introduced earlier. 2012-01-07 19:38:26 +00:00
Ed Schouten
30b42655cf Also import WEAK_ALIAS() from the MIPS code. 2012-01-05 08:51:06 +00:00
Ed Schouten
766992d738 Add support for strong aliasing of symbols in i386 assembly.
This macro is a literal copy from the MIPS version of <machine/asm.h>.
2012-01-03 07:06:35 +00:00
Ed Schouten
dc15eac046 Use strchr() and strrchr().
It seems strchr() and strrchr() are used more often than index() and
rindex(). Therefore, simply migrate all kernel code to use it.

For the XFS code, remove an empty line to make the code identical to
the code in the Linux kernel.
2012-01-02 12:12:10 +00:00
Konstantin Belousov
36fd83b613 Make the comment in i386/include/ucontext.h identical to the one in
amd64/include/ucontext.h. The later is better worded.

Requested by:	deischen
MFC after:	3 days
2011-12-31 14:44:42 +00:00
Gavin Atkinson
c1cbd9ab53 Default to not performing the early-boot memory tests when we detect we
are booting inside a VM.  There are three reasons to disable this:

o  It causes the VM host to believe that all the tested pages or RAM are
   in use.  This in turn may force the host to page out pages of RAM
   belonging to other VMs, or otherwise cause problems with fair resource
   sharing on the VM cluster.
o  It adds significant time to the boot process (around 1 second/Gig in
   testing)
o  It is unnecessary - the host should have already verified that the
   memory is functional etc.

Note that this simply changes the default when in a VM - it can still be
overridden using the hw.memtest.tests tunable.

MFC after:	4 weeks
2011-12-31 13:24:53 +00:00
Alan Cox
c65205a6e2 Merge r216333 and r216555 from the native pmap
When r207410 eliminated the acquisition and release of the page queues
  lock from pmap_extract_and_hold(), it didn't take into account that
  pmap_pte_quick() sometimes requires the page queues lock to be held.
  This change reimplements pmap_extract_and_hold() such that it no
  longer uses pmap_pte_quick(), and thus never requires the page queues
  lock.

Merge r177525 from the native pmap
  Prevent the overflow in the calculation of the next page directory.
  The overflow causes the wraparound with consequent corruption of the
  (almost) whole address space mapping.

Strictly speaking, r177525 is not required by the Xen pmap because the
hypervisor steals the uppermost region of the normal kernel address
space.  I am nonetheless merging it in order to reduce the number of
unnecessary differences between the native and Xen pmap implementations.

Tested by:	sbruno
2011-12-30 18:16:15 +00:00
Robert Watson
009d2032af Add "options CAPABILITY_MODE" and "options CAPABILITIES" to GENERIC kernel
configurations for various architectures in FreeBSD 10.x.  This allows
basic Capsicum functionality to be used in the default FreeBSD
configuration on non-embedded architectures; process descriptors are not
yet enabled by default.

MFC after:	3 months
Sponsored by:	Google, Inc
2011-12-29 22:48:36 +00:00
John Baldwin
b494482f39 Use curthread rather than PCPU_GET(curthread). 'curthread' uses
special-case optimizations on several platforms and is preferred.

Reported by:	dim (indirectly)
MFC after:	2 weeks
2011-12-29 16:40:54 +00:00
John Baldwin
4eda7b08af Regen. 2011-12-29 15:35:47 +00:00
John Baldwin
dd01579cde Implement linux_fadvise64() and linux_fadvise64_64() using
kern_posix_fadvise().

Reviewed by:	silence on emulation@
MFC after:	2 weeks
2011-12-29 15:34:59 +00:00
Xin LI
81966bce06 Import the first release of HighPoint RocketRAID 27xx SAS 6Gb/s HBA card
driver.  This driver works for FreeBSD/i386 and FreeBSD/amd64 platforms.

Many thanks to HighPoint for providing this driver.

MFC after:	2 weeks
2011-12-28 23:26:58 +00:00
Alan Cox
fe8b9971a8 Fix a bug in the Xen pmap's implementation of pmap_extract_and_hold():
If the page lock acquisition is retried, then the underlying thread is
not unpinned.

Wrap nearby lines that exceed 80 columns.
2011-12-28 19:59:54 +00:00
Alan Cox
9800a50f2d Eliminate many of the unnecessary differences between the native and
paravirtualized pmap implementations for i386.  This includes some
style fixes to the native pmap and several bug fixes that were not
previously applied to the paravirtualized pmap.

Tested by:	sbruno
MFC after:	3 weeks
2011-12-27 23:53:00 +00:00
Alan Cox
7e77373c83 The size passed to kmem functions should be in terms of bytes and not
pages.

Avoid an out-of-bounds array access.

Reviewed by:	cperciva
2011-12-20 20:29:45 +00:00
Alan Cox
971238ae48 The Xen pmap doesn't support superpages. So, there is no point in it
initializing structures, like the pv table, that are only used to
implement superpages.  In fact, some of the unnecessary code in
pmap_init() was actually doing harm.  It was preventing the kernel from
booting on virtual machines with more than 768 MB of memory.

Tested by:	sbruno
2011-12-20 20:16:12 +00:00
Xin LI
25841e912f Add comments in NOTES to say what viawd is. 2011-12-20 00:16:52 +00:00
Alan Cox
725e839b9f Simplify the implementation of the identity mapping in start_all_aps().
Since mpboot.s enables processor support for PG_PS before enabling
paging, there is no reason that the identity must use 4 KB page mappings.

Discussed with:	jhb
2011-12-15 17:54:23 +00:00
Alan Cox
3b03ca3bbe Eliminate vestiges of page coloring. 2011-12-15 05:07:16 +00:00
Alan Cox
894b2848d3 Create large page mappings in pmap_map().
MFC after:	6 weeks
2011-12-14 23:57:47 +00:00
Ed Schouten
53627e400f Replace __signed by signed.
The signed keyword is an integral part of the C syntax. There's no need
to use __signed.
2011-12-13 13:38:03 +00:00
Fabien Thomas
61af1d1393 Add watchdog support for VIA south bridge chipset.
Tested on VT8251, VX900 but CX700, VX800, VX855 should works.

MFC after:	1 month
Sponsored by: NETASQ
2011-12-12 09:50:33 +00:00
Alan Cox
c5ecbfb410 Avoid the possibility of integer overflow in the calculation of
VM_KMEM_SIZE_MAX.  Specifically, if the user/kernel address space split
was changed such that the kernel address space was greater than or equal
to 2 GB, then overflow would occur.

PR:		161721
MFC after:	3 weeks
2011-12-10 18:42:00 +00:00
Marius Strobl
e6b42236cf Remove some more occurrences of amd(4) missed in r227982. 2011-11-26 18:02:39 +00:00
Marius Strobl
4b7ec27007 - There's no need to overwrite the default device method with the default
one. Interestingly, these are actually the default for quite some time
  (bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9)
  since r52045) but even recently added device drivers do this unnecessarily.
  Discussed with: jhb, marcel
- While at it, use DEVMETHOD_END.
  Discussed with: jhb
- Also while at it, use __FBSDID.
2011-11-22 21:28:20 +00:00
Lawrence Stewart
cf13a58510 - Add the ffclock_getcounter(), ffclock_getestimate() and ffclock_setestimate()
system calls to provide feed-forward clock management capabilities to
  userspace processes. ffclock_getcounter() returns the current value of the
  kernel's feed-forward clock counter. ffclock_getestimate() returns the current
  feed-forward clock parameter estimates and ffclock_setestimate() updates the
  feed-forward clock parameter estimates.

- Document the syscalls in the ffclock.2 man page.

- Regenerate the script-derived syscall related files.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

Submitted by:	Julien Ridoux (jridoux at unimelb edu au)
2011-11-21 01:26:10 +00:00
Ed Schouten
3d402cb52e Regenerate system call tables. 2011-11-19 07:20:20 +00:00
Ed Schouten
767a32641c Make the Linux *at() calls a bit more complete.
Properly support:

- AT_EACCESS for faccessat(),
- AT_SYMLINK_FOLLOW for linkat().
2011-11-19 07:19:37 +00:00
Ed Schouten
51cfb9474f Regenerate system call tables. 2011-11-19 06:36:11 +00:00
Ed Schouten
d3a993d46b Improve *access*() parameter name consistency.
The current code mixes the use of `flags' and `mode'. This is a bit
confusing, since the faccessat() function as a `flag' parameter to store
the AT_ flag.

Make this less confusing by using the same name as used in the POSIX
specification -- `amode'.
2011-11-19 06:35:15 +00:00
Konstantin Belousov
63b7742fbb Weaken the part of assertions added in the r227394. Only check that the
process state is stopped.

MFC after:	1 week
2011-11-11 04:10:36 +00:00
Ryan Stone
493b584dbd Correct the types of the arguments to return probes of the syscall
provider.  Previously we were erroneously supplying the argument types of
the corresponding entry probe.

Reviewed by:	rpaulo
MFC after:	1 week
2011-11-11 03:49:42 +00:00
Konstantin Belousov
e9862e9b9e Attempt to improve formatting and content of several comments for
amd64 and i386 MD code.

Based on suggestions by:	bde
MFC after:	1 week
2011-11-09 18:25:50 +00:00
Konstantin Belousov
2bb663c043 Stopped process may legitimately have some threads sleeping and not
suspended, if the sleep is uninterruptible.

Reported and tested by:	pho
MFC after:	1 week
2011-11-09 17:25:43 +00:00
Attilio Rao
ed1f6dc235 Introduce the option VFS_ALLOW_NONMPSAFE and turn it on by default on
all the architectures.
The option allows to mount non-MPSAFE filesystem. Without it, the
kernel will refuse to mount a non-MPSAFE filesytem.

This patch is part of the effort of killing non-MPSAFE filesystems
from the tree.

No MFC is expected for this patch.

Tested by:	gianni
Reviewed by:	kib
2011-11-08 10:18:07 +00:00
Kevin Lo
966d0ed18f Enable PCI MMC/SD support by default on i386 and amd64 2011-11-08 08:29:05 +00:00
Ed Schouten
6472ac3d8a Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
Ryan Stone
166808c625 Fix the DTrace pid return trap interrupt vector. Previously we were using
31, but that vector is reserved.

Without this fix, running dtrace -p <pid> would either cause the target
process to crash or the kernel to page fault.

Obtained from:	rpaulo
MFC after:	3days
2011-11-07 01:53:25 +00:00
Marius Strobl
a9ab459b31 Add a PCI front-end to esp(4) allowing it to support AMD Am53C974 and
replace amd(4) with the former in the amd64, i386 and pc98 GENERIC kernel
configuration files. Besides duplicating functionality, amd(4), which
previously also supported the AMD Am53C974, unlike esp(4) is no longer
maintained and has accumulated enough bit rot over time to always cause
a panic during boot as long as at least one target is attached to it
(see PR 124667).

PR:		124667
Obtained from:	NetBSD (based on)
MFC after:	3 days
2011-11-01 21:26:57 +00:00
Marcel Moolenaar
b2f1a8f2b3 Revert rev. 226893: subr_syscall.c is being included from C files and
on amd64 with FREEBSD32 enabled, this means that systrace_probe_func
gets defined twice.
2011-10-30 02:19:39 +00:00
Marcel Moolenaar
056f0ec755 Define systrace_probe_func in subr_syscall.c where it's used, instead
of defining it in MD code. This eliminates porting to other architectures.
2011-10-29 01:26:36 +00:00
Alan Cox
703dec68bf Eliminate vestiges of page coloring in VM_ALLOC_NOOBJ calls to
vm_page_alloc().  While I'm here, for the sake of consistency, always
specify the allocation class, such as VM_ALLOC_NORMAL, as the first of
the flags.
2011-10-27 16:39:17 +00:00
Ken Smith
6168545a11 Adjust the debugger options slightly. This should help me do the right
thing when changing the debugging options as part of head becoming a new
stable branch.  It may also help people who for one reason or another want
to run head but don't want it slowed down by the debugging support.

Reviewed by:	kib
2011-10-27 13:07:49 +00:00
David Schultz
a50079b7ff People porting FreeBSD to new architectures ought not have to
implement a deprecated FPU control interface in addition to the
standard one.  To make this clearer, further deprecate ieeefp.h
by not declaring the function prototypes except on architectures
that implement them already.

Currently i386 and amd64 implement the ieeefp.h interface for
compatibility, and for fp[gs]etprec(), which doesn't exist on
most other hardware.  Powerpc, sparc64, and ia64 partially implement
it and probably shouldn't, and other architectures don't implement it
at all.
2011-10-21 06:41:46 +00:00
Ken Smith
7042aba738 Add a warning about why sbp(4) is commented out so that curious folks
are forewarned they might wind up with a hole in their foot if they
decide to give it a try.

Suggested by:	dougb
2011-10-19 21:55:20 +00:00
Ken Smith
4c0ba9b742 Comment out the sbp(4) driver for architectures that support it.
As part of the 8.0-RELEASE cycle this was done in stable/8 (r199112)
but was left alone in head so people could work on fixing an issue that
caused boot failure on some motherboards.  Apparently nobody has worked
on it and we are getting reports of boot failure with the 9.0 test builds.
So this time I'll comment out the driver in head (still hoping someone
will work on it) and MFC to stable/9.

Submitted by:	Alberto Villa <avilla at FreeBSD dot org>
2011-10-18 13:45:16 +00:00
Dag-Erling Smørgrav
a417d4a46b Trace attempts to call restricted MD syscalls. 2011-10-18 07:39:27 +00:00
Konstantin Belousov
6bfe4c78c8 Remove unused define.
MFC after:	1 month
2011-10-07 16:09:44 +00:00
Xin LI
db1fda10b4 Add the 9750 SATA+SAS 6Gb/s RAID controller card driver, tws(4). Many
thanks for their contiued support to FreeBSD.

This is version 10.80.00.003 from codeset 10.2.1 [1]

Obtained from:	LSI http://kb.lsi.com/Download16574.aspx [1]
2011-10-04 21:40:25 +00:00
Konstantin Belousov
c06f5f6cea Do not allow the kernel to access usermode pages without installed
fault handler. Panic immediately in such situation, on i386 and amd64.

Reviewed by:	avg, jhb
MFC after:	1 week
2011-10-03 17:01:31 +00:00
Attilio Rao
8d79dfca55 Add some improvements in the idle table callbacks:
- Replace instances of manual assembly instruction "hlt" call
  with halt() function calling.
- In cpu_idle_mwait() avoid races in check to sched_runnable() using
  the same pattern used in cpu_idle_hlt() with the 'hlt' instruction.
- Add comments explaining the logic behind the pattern used in
  cpu_idle_hlt() and other idle callbacks.

In collabouration with:	jhb, mav
Reviewed by:	adri, kib
MFC after:	3 weeks
2011-10-03 14:23:00 +00:00
Kip Macy
9eca9361f9 Auto-generated code from sys_ prefixing makesyscalls.sh change
Approved by:	re(bz)
2011-09-16 14:04:14 +00:00
Kip Macy
8451d0dd78 In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
Christian Brueffer
b48f7c4c8d Fix a zyd(4) comment typo that was copy+pasted into most kernel config files.
PR:		160276
Submitted by:	MATSUMIYA Ryo <matsumiya@mma.club.uec.ac.jp>
Approved by:	re (kib)
MFC after:	1 week
2011-09-11 17:39:51 +00:00
Konstantin Belousov
26ccf4f10f Inline the syscallenter() and syscallret(). This reduces the time measured
by the syscall entry speed microbenchmarks by ~10% on amd64.

Submitted by:	jhb
Approved by:	re (bz)
MFC after:	2 weeks
2011-09-11 16:05:09 +00:00
Konstantin Belousov
3407fefef6 Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.

Document the changes to flags field to only require the page lock.

Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.

Reviewed by:    alc, attilio
Tested by:      marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by:	re (bz)
2011-09-06 10:30:11 +00:00
John Baldwin
3a3ba1b069 Enable the puc(4) driver on amd64 and i386 in GENERIC. This allows
devices supported by puc(4) to work "out of the box" since puc.ko does
not work "out of the box".

Reviewed by:	marcel
Approved by:	re (kib)
MFC after:	1 week
2011-08-26 21:22:34 +00:00
Bjoern A. Zeeb
61bc18a327 In HEAD when doing no further checkes there is no reason use the
temporary variable and check with if as TUNABLE_*_FETCH do not
alter values unless successfully found the tunable.

Reported by:	jhb, bde
MFC after:	3 days
X-MFC with:	r224516
Approved by:	re (kib)
2011-08-20 19:21:46 +00:00
Robert Watson
a9d2f8d84f Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *.  With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by:	re (bz)
Submitted by:	jonathan
Sponsored by:	Google Inc
2011-08-11 12:30:23 +00:00
Konstantin Belousov
d98d0ce27a - Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag
to VPO_UNMANAGED (and also making the flag protected by the vm object
  lock, instead of vm page queue lock).
- Mark the fake pages with both PG_FICTITIOUS (as it is now) and
  VPO_UNMANAGED. As a consequence, pmap code now can use use just
  VPO_UNMANAGED to decide whether the page is unmanaged.

Reviewed by:	alc
Tested by:	pho (x86, previous version), marius (sparc64),
    marcel (arm, ia64, powerpc), ray (mips)
Sponsored by:	The FreeBSD Foundation
Approved by:	re (bz)
2011-08-09 21:01:36 +00:00
Rick Macklem
88c037e26a Change all the sample kernel configurations to use
NFSCL, NFSD instead of NFSCLIENT, NFSSERVER since
NFSCL and NFSD are now the defaults. The client change is
needed for diskless configurations, so that the root
mount works for fstype nfs.
Reported by seanbru at yahoo-inc.com for i386/XEN.

Approved by:	re (hrs)
2011-08-07 20:16:46 +00:00
Konstantin Belousov
eb46c93fa3 Corrections for the iBCS2 support that seems to regressed from 4.x times.
In particular:
- fix format specifiers in the DPRINTFs;
- do not use kernel_map for temporal mapping backed by the vnode, this
  cannot work since kernel map is a system map. Use exec_map instead.
- ignore error code from an attempt to insert the hole. If supposed hole
  is located at the region already populated by .bss, it is not an error.
- correctly translate vm error codes to errno, when appropriate.

Reported and tested by:	Rich Naill <rich enterprisesystems net>
Approved by:	re (kensmith)
MFC after:	1 week
2011-08-02 18:12:19 +00:00
Bjoern A. Zeeb
0a5f264d60 Introduce a tunable to disable the time consuming parts of bootup
memtesting, which can easily save seconds to minutes of boot time.
The tunable name is kept general to allow reusing the code in
alternate frameworks.

Requested by:	many
Discussed on:	arch (a while a go)
Obtained from:	Sandvine Incorporated
Reviewed by:	sbruno
Approved by:	re (kib)
MFC after:	2 weeks
2011-07-30 13:33:05 +00:00
Attilio Rao
68b739cd6f Add the possibility to specify from kernel configs MAXCPU value.
This patch is going to help in cases like mips flavours where you
want a more granular support on MAXCPU.

No MFC is previewed for this patch.

Tested by:	pluknet
Approved by:	re (kib)
2011-07-19 00:37:24 +00:00
Attilio Rao
521ea19d1c - Remove the eintrcnt/eintrnames usage and introduce the concept of
sintrcnt/sintrnames which are symbols containing the size of the 2
  tables.
- For amd64/i386 remove the storage of intr* stuff from assembly files.
  This area can be widely improved by applying the same to other
  architectures and likely finding an unified approach among them and
  move the whole code to be MI. More work in this area is expected to
  happen fairly soon.

No MFC is previewed for this patch.

Tested by:	pluknet
Reviewed by:	jhb
Approved by:	re (kib)
2011-07-18 15:19:40 +00:00
Ed Schouten
78d4d8eeb2 Restore binary compatibility for GIO_KEYMAP and PIO_KEYMAP.
Back in 2009 I changed the ABI of the GIO_KEYMAP and PIO_KEYMAP ioctls
to support wide characters. I created a patch to add ABI compatibility
for the old calls, but I didn't get any feedback to that.

It seems now people are upgrading from 8 to 9 they experience this
issue, so add it anyway.
2011-07-17 08:19:19 +00:00
John Baldwin
4089603c38 Don't include mptable_pci.c in Xen kernels. It is only meant for systems
that truly have an MPTable.  The MPTable code in Xen is really a Xen
specific CPU enumerator and probably shouldn't be using the mptable name
at all.
2011-07-17 01:23:50 +00:00
John Baldwin
c6e81a1626 Fix build with NEW_PCIB defined. 2011-07-16 14:06:02 +00:00
Kirk McKusick
ba4579a7b9 Delete duplicate tags entry I introduced in -r223901.
Submitted-by:	John Baldwin
2011-07-15 17:27:26 +00:00
Kirk McKusick
b115b0e28f Update tags build script 2011-07-10 00:53:04 +00:00
Jung-uk Kim
f0b28f005e Correct cpu_monitor() and cpu_mwait() for amd64. These instructions take
%rcx as "extensions" in long mode.  If any unused bit is set in %rcx, these
instructions cause general protection fault.  Fix style nits and synchronize
i386 with amd64.
2011-07-05 18:42:10 +00:00
Attilio Rao
470107b2f1 MFC 2011-07-04 11:13:00 +00:00
Alan Cox
80788b2a27 When iterating over a paging queue, explicitly check for PG_MARKER, instead
of relying on zeroed memory being interpreted as an empty PV list.

Reviewed by:	kib
2011-07-02 23:42:04 +00:00
Jonathan Anderson
12bc222e57 Add some checks to ensure that Capsicum is behaving correctly, and add some
more explicit comments about what's going on and what future maintainers
need to do when e.g. adding a new operation to a sys_machdep.c.

Approved by: mentor(rwatson), re(bz)
2011-06-30 10:56:02 +00:00
Attilio Rao
7b744f6b01 MFC 2011-06-30 10:19:43 +00:00
Alan Cox
6bbee8e28a Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this
option to vm_object_page_remove() asserts that the specified range of pages
is not mapped, or more precisely that none of these pages have any managed
mappings.  Thus, vm_object_page_remove() need not call pmap_remove_all() on
the pages.

This change not only saves time by eliminating pointless calls to
pmap_remove_all(), but it also eliminates an inconsistency in the use of
pmap_remove_all() versus related functions, like pmap_remove_write().  It
eliminates harmless but pointless calls to pmap_remove_all() that were being
performed on PG_UNMANAGED pages.

Update all of the existing assertions on pmap_remove_all() to reflect this
change.

Reviewed by:	kib
2011-06-29 16:40:41 +00:00
Jonathan Anderson
24c1c3bf71 We may split today's CAPABILITIES into CAPABILITY_MODE (which has
to do with global namespaces) and CAPABILITIES (which has to do with
constraining file descriptors). Just in case, and because it's a better
name anyway, let's move CAPABILITIES out of the way.

Also, change opt_capabilities.h to opt_capsicum.h; for now, this will
only hold CAPABILITY_MODE, but it will probably also hold the new
CAPABILITIES (implying constrained file descriptors) in the future.

Approved by: rwatson
Sponsored by: Google UK Ltd
2011-06-29 13:03:05 +00:00
Attilio Rao
d16f8274a6 Remove pc_cpumask usage from i386 and XEN.
Tested by:	pluknet
2011-06-28 13:13:06 +00:00
Attilio Rao
de138ec703 MFC 2011-06-24 16:35:40 +00:00
John Baldwin
1368987ae4 Move {amd64,i386}/pci/pci_bus.c and {amd64,i386}/include/pci_cfgreg.h to
the x86 tree.  The $PIR code is still only enabled on i386 and not amd64.
While here, make the qpi(4) driver on conditional on 'device pci'.
2011-06-22 21:04:13 +00:00
Attilio Rao
250a44f6a2 Remove pc_other_cpus usage from i386 and XEN.
Tested by:	pluknet
2011-06-22 20:04:39 +00:00
John Baldwin
e8f40e32eb Oops, missed these in 223424.
Reported by:	jkim
2011-06-22 18:48:07 +00:00
John Baldwin
3bf59bd14f Use uintXX_t instead of u_intXX_t. 2011-06-22 17:55:16 +00:00
John Baldwin
38d7a61ba4 Add a helper routine to conditionally modify the start address of a
resource allocation from an x86 Host-PCI bridge driver so that it can be
reused by the ACPI Host-PCI bridge driver (and eventually the MPTable
Host-PCI bridge driver) instead of duplicating the same logic.  Note that
this means that hw.acpi.host_mem_start is now replaced with the
hw.pci.host_mem_start tunable that was already used in the non-ACPI case.
This also removes hw.acpi.host_mem_start on ia64 where it was not
applicable (the implementation was very x86-specific).

While here, adjust the logic to apply the new start address on any
"wildcard" allocation even if that allocation comes from a subset of
the allowable address range.

Reviewed by:	imp (1)
2011-06-22 16:15:15 +00:00
Hans Petter Selasky
144b716627 Enable USB 3.0 support by default in i386 and amd64 GENERIC kernels.
Discussed with:	joel @ and thompsa @
MFC after:	7 days
2011-06-14 20:30:49 +00:00
Joel Dahl
701b698b6f Enable sound support by default on i386 and amd64.
The generic sound driver has been added, along with enough
device-specific drivers to support the most common audio
chipsets.

We've discussed enabling it from time to time over the years
and we've received numerous requests from users, so we decided
that shipping 9.0 with working audio by default would be the
best thing to do.

Bug reports should be sent to the multimedia@ mailing list, as
usual.

Approved by:    mav
No objection:   re
2011-06-11 09:08:46 +00:00
John Baldwin
049dc0d1ff Implement BUS_ADJUST_RESOURCE() for the x86 drivers that sit between the
Host-PCI bridge drivers and nexus.
2011-06-10 12:30:16 +00:00
Andriy Gapon
234dab4a82 remove code for dynamic offlining/onlining of CPUs on x86
The code has definitely been broken for SCHED_ULE, which is a default
scheduler.  It may have been broken for SCHED_4BSD in more subtle ways,
e.g. with manually configured CPU affinities and for interrupt devilery
purposes.
We still provide a way to disable individual CPUs or all hyperthreading
"twin" CPUs before SMP startup.  See the UPDATING entry for details.

Interaction between building CPU topology and disabling CPUs still
remains fuzzy: topology is first built using all availble CPUs and then
the disabled CPUs should be "subtracted" from it.  That doesn't work
well if the resulting topology becomes non-uniform.

This work is done in cooperation with Attilio Rao who in addition to
reviewing also provided parts of code.

PR:		kern/145385
Discussed with:	gcooper, ambrisko, mdf, sbruno
Reviewed by:	attilio
Tested by:	pho, pluknet
X-MFC after:	never
2011-06-08 08:12:15 +00:00
Attilio Rao
81c02539f1 MFC 2011-06-06 21:38:39 +00:00
Andriy Gapon
ecee337a8c don't use cpuid level 4 in x86 cpu topology detection if it's not supported
This regression was introduced in r213323.
There are probably no Intel cpus that support amd64 mode, but do not
support cpuid level 4, but it's better to keep i386 and amd64 versions
of this code in sync.

Discovered by:	pho
Tested by:	pho
MFC after:	2 weeks
2011-06-06 14:23:13 +00:00
Attilio Rao
61b926921f MFC 2011-05-31 21:22:44 +00:00
Nathan Whitehorn
d098f93019 On multi-core, multi-threaded PPC systems, it is important that the threads
be brought up in the order they are enumerated in the device tree (in
particular, that thread 0 on each core be brought up first). The SLIST
through which we loop to start the CPUs has all of its entries added with
SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration
and so AP startup would always fail in such situations (causing a machine
check or RTAS failure). Fix this by changing the SLIST into an STAILQ,
and inserting new CPUs at the end.

Reviewed by:	jhb
2011-05-31 15:11:43 +00:00
Kevin Lo
a92e80be3f Bring back r222275. runfw(4) will statically link in rt2870.fw.uu
to the kernel, though I have MODULES_OVERRIDE="" in GENERIC.

Spotted by:	thompsa
2011-05-25 10:04:13 +00:00
Kevin Lo
6d5ee6cd7f run(4) needs firmware loaded to work 2011-05-25 04:46:48 +00:00
Attilio Rao
d7eb69e19c - Fix a misusage of cpuset_t objects
- Fix a typo

Reported by:	pluknet
2011-05-24 15:47:40 +00:00
Attilio Rao
d30e0db53a Add a "safety belt" check for lsb setting.
I don't think it is really necessary because the cpumask is known to be
!= 0, but it is just in case.

Requested by:	kib
2011-05-22 20:24:36 +00:00
Attilio Rao
b2b45cca93 Reintroduce the lazypmap infrastructure and convert it to using
cpuset_t.

Requested by:	alc
2011-05-20 14:53:16 +00:00
Attilio Rao
3a0318e055 Merge part of r221322 from largeSMP project:
Sync XEN support with i386 about the usage of ipi_send_cpu()

Tested by:	pluknet
MFC after:	2 weeks
2011-05-18 16:07:30 +00:00
Attilio Rao
5f6b159db7 MFC 2011-05-18 16:01:29 +00:00
Jung-uk Kim
2b052e43be Update CPUID bits to reflect AMD Bulldozer and Intel Sandy Bridge features.
Note AMD dropped SSE5 extensions in order to avoid ISA overlap with Intel
AVX instructions.  The SSE5 bit was recycled as XOP extended instruction
bit, CVT16 was deprecated in favor of F16C (half-precision float conversion
instructions for AVX), and the remaining FMA4 (4-operand FMA instructions)
gained a separate CPUID bit.  Replace non-existent references with today's
CPUID specifications.
2011-05-17 22:36:16 +00:00
Attilio Rao
179efac924 Remove an unused typedef.
Tested by:	sbruno, pluknet
2011-05-17 22:15:53 +00:00
Attilio Rao
447274a88b MFC 2011-05-15 15:47:16 +00:00
Henrik Brix Andersen
149d1c897e Add I2C bus driver for the AMD Geode LX series CS5536 Companion
Device.

Reviewed by:    jhb (newbus bits only), adrian
2011-05-15 14:01:23 +00:00
Attilio Rao
b2aa562e7b MFC 2011-05-13 20:58:48 +00:00
Matthew D Fleming
cfb00e5aa7 Move the ZERO_REGION_SIZE to a machine-dependent file, as on many
architectures (i386, for example) the virtual memory space may be
constrained enough that 2MB is a large chunk.  Use 64K for arches
other than amd64 and ia64, with special handling for sparc64 due to
differing hardware.

Also commit the comment changes to kmem_init_zero_region() that I
missed due to not saving the file.  (Darn the unfamiliar development
environment).

Arch maintainers, please feel free to adjust ZERO_REGION_SIZE as you
see fit.

Requested by:	alc
MFC after:	1 week
MFC with:	r221853
2011-05-13 19:35:01 +00:00
Attilio Rao
739e31f6d7 MFC 2011-05-13 15:20:57 +00:00
Alexander Motin
167aee3895 Refactor Xen PV code to use new event timers subsystem. That uses one-shot
Xen timer and time counter to provide one-shot and periodic time events.

On my tests this reduces idle interruts rate down to about 30Hz, and accor-
ding to Xen VM Manager reduces host CPU load by three times comparing to
the previous periodic 100Hz clock. Also now, when needed, it is possible to
increase HZ rate without useless CPU burning during idle periods.

Now only ia64 and some ARMs left not migrated to the new event timers.
2011-05-13 12:39:37 +00:00
Attilio Rao
ef607a6aa3 MFC 2011-05-12 14:01:40 +00:00
Jung-uk Kim
00c885e181 Add SC_PIXEL_MODE to GENERIC for amd64 and i386.
Requested by:	many
2011-05-10 16:44:16 +00:00
Attilio Rao
bd55ede060 MFC 2011-05-09 18:53:13 +00:00
Jung-uk Kim
65e7d70b09 Implement boot-time TSC synchronization test for SMP. This test is executed
when the user has indicated that the system has synchronized TSCs or it has
P-state invariant TSCs.  For the former case, we may clear the tunable if it
fails the test to prevent accidental foot-shooting.  For the latter case, we
may set it if it passes the test to notify the user that it may be usable.
2011-05-09 17:34:00 +00:00
Attilio Rao
b9f714be9f MFC 2011-05-07 23:34:14 +00:00
Alexander Motin
d96fd07637 Don't use MWAIT for short sleeps under XEN, as it was before r212541.
This fixes panic during boot in PV mode on Xen 3.2.
2011-05-07 12:27:25 +00:00
Attilio Rao
aa8b9e0706 MFC 2011-05-06 22:45:33 +00:00
Andriy Gapon
fdf30d59a6 prepare code that does topology detection for amd cpus for bulldozer
This also introduces a new detection path for family 10h and newer
pre-bulldozer cpus, pre-10h hardware should not be affected.

Tested by:	Gary Jennejohn <gljennjohn@googlemail.com>
		(with pre-10h hardware)
MFC after:	2 weeks
2011-05-06 13:51:54 +00:00
Attilio Rao
71a19bdc64 Commit the support for removing cpumask_t and replacing it directly with
cpuset_t objects.
That is going to offer the underlying support for a simple bump of
MAXCPU and then support for number of cpus > 32 (as it is today).

Right now, cpumask_t is an int, 32 bits on all our supported architecture.
cpumask_t on the other side is implemented as an array of longs, and
easilly extendible by definition.

The architectures touched by this commit are the following:
- amd64
- i386
- pc98
- arm
- ia64
- XEN

while the others are still missing.
Userland is believed to be fully converted with the changes contained
here.

Some technical notes:
- This commit may be considered an ABI nop for all the architectures
  different from amd64 and ia64 (and sparc64 in the future)
- per-cpu members, which are now converted to cpuset_t, needs to be
  accessed avoiding migration, because the size of cpuset_t should be
  considered unknown
- size of cpuset_t objects is different from kernel and userland (this is
  primirally done in order to leave some more space in userland to cope
  with KBI extensions). If you need to access kernel cpuset_t from the
  userland please refer to example in this patch on how to do that
  correctly (kgdb may be a good source, for example).
- Support for other architectures is going to be added soon
- Only MAXCPU for amd64 is bumped now

The patch has been tested by sbruno and Nicholas Esborn on opteron
4 x 12 pack CPUs. More testing on big SMP is expected to came soon.
pluknet tested the patch with his 8-ways on both amd64 and i386.

Tested by:	pluknet, sbruno, gianni, Nicholas Esborn
Reviewed by:	jeff, jhb, sbruno
2011-05-05 14:39:14 +00:00
Attilio Rao
8c0ef2464e Revert md_assert_preempt() introduction.
Discussed with:	jeff, jhb
2011-05-04 20:29:40 +00:00
Attilio Rao
94ebcddde3 MFC 2011-05-03 18:57:46 +00:00
John Baldwin
6162795be0 Enable the new PCI-PCI bridge driver on amd64 and i386 by default. It can
be disabled via 'nooptions NEW_PCIB'.
2011-05-03 18:23:11 +00:00
John Baldwin
83c41143ca Reimplement how PCI-PCI bridges manage their I/O windows. Previously the
driver would verify that requests for child devices were confined to any
existing I/O windows, but the driver relied on the firmware to initialize
the windows and would never grow the windows for new requests.  Now the
driver actively manages the I/O windows.

This is implemented by allocating a bus resource for each I/O window from
the parent PCI bus and suballocating that resource to child devices.  The
suballocations are managed by creating an rman for each I/O window.  The
suballocated resources are mapped by passing the bus_activate_resource()
call up to the parent PCI bus.  Windows are grown when needed by using
bus_adjust_resource() to adjust the resource allocated from the parent PCI
bus.  If the adjust request succeeds, the window is adjusted and the
suballocation request for the child device is retried.

When growing a window, the rman_first_free_region() and
rman_last_free_region() routines are used to determine if the front or
end of the existing I/O window is free.  From using that, the smallest
ranges that need to be added to either the front or back of the window
are computed.  The driver will first try to grow the window in whichever
direction requires the smallest growth first followed by the other
direction if that fails.

Subtractive bridges will first attempt to satisfy requests for child
resources from I/O windows (including attempts to grow the windows).  If
that fails, the request is passed up to the parent PCI bus directly
however.

The PCI-PCI bridge driver will try to use firmware-assigned ranges for
child BARs first and only allocate a "fresh" range if that specific range
cannot be accommodated in the I/O window.  This allows systems where the
firmware assigns resources during boot but later wipes the I/O windows
(some ACPI BIOSen are known to do this) to "rediscover" the original I/O
window ranges.

The ACPI Host-PCI bridge driver has been adjusted to correctly honor
hw.acpi.host_mem_start and the I/O port equivalent when a PCI-PCI bridge
makes a wildcard request for an I/O window range.

The new PCI-PCI bridge driver is only enabled if the NEW_PCIB kernel option
is enabled.  This is a transition aide to allow platforms that do not
yet support bus_activate_resource() and bus_adjust_resource() in their
Host-PCI bridge drivers (and possibly other drivers as needed) to use the
old driver for now.  Once all platforms support the new driver, the
kernel option and old driver will be removed.

PR:		kern/143874 kern/149306
Tested by:	mav
2011-05-03 17:37:24 +00:00
Attilio Rao
171c7d9bf6 MFC 2011-05-02 22:03:30 +00:00
Bernhard Schmidt
13c98eb780 All PCI based wireless drivers seem to be explicitly removed from the
PAE kernel config, do that also for those added to GENERIC lately.
2011-05-02 16:51:02 +00:00
Attilio Rao
7be8a2de4f MFC @ r221324 2011-05-02 14:23:36 +00:00
John Baldwin
d2c9344ff9 Add implementations of BUS_ADJUST_RESOURCE() to the PCI bus driver,
generic PCI-PCI bridge driver, x86 nexus driver, and x86 Host to PCI bridge
drivers.
2011-05-02 14:13:12 +00:00
Attilio Rao
ef6146b9a3 - Merge a fix fixup for the last lazyfix removal
- Sync xen with i386 about the ipi_send_cpu() usage
2011-05-02 13:56:47 +00:00
Bernhard Schmidt
d1f25d5dcb Add the remaining wireless drivers.
Discussed with:	joel
2011-05-01 13:26:34 +00:00
Attilio Rao
a4823f2d0c Remove unnused typedef. 2011-05-01 00:08:13 +00:00
Attilio Rao
f1edea81ac Add the function md_assert_nopreempt(), which is a very consistent
function on the possibility of a thread to not preempt.

As this function is very tied to x86 (interrupts disabled checkings)
it is not intended to be used in MI code.
2011-04-30 23:12:37 +00:00
Attilio Rao
9734077245 Remove the support for lazy cr3 switching from i386.
amd64 has already this micro-optimization removed.

Submitted by:	kib
2011-04-30 23:02:17 +00:00
Kevin Lo
5aaea65247 Add urtw(4) 2011-04-29 06:36:39 +00:00
Jung-uk Kim
c34e9dbee1 Define "Hypervisor Present" bit. This bit is used by several hypervisors to
identify CPUs running under emulation.  Currently QEMU-KVM, Xen-HVM, VMware,
and MS Hyper-V are known to set this bit.

MFC after:	3 days
2011-04-28 22:23:39 +00:00
Attilio Rao
2be767e069 Add the watchdogs patting during the (shutdown time) disk syncing and
disk dumping.
With the option SW_WATCHDOG on, these operations are doomed to let
watchdog fire, fi they take too long.

I implemented the stubs this way because I really want wdog_kern_*
KPI to not be dependant by SW_WATCHDOG being on (and really, the option
only enables watchdog activation in hardclock) and also avoid to
call them when not necessary (avoiding not-volountary watchdog
activations).

Sponsored by:	Sandvine Incorporated
Discussed with:	emaste, des
MFC after:	2 weeks
2011-04-28 16:02:05 +00:00
Rick Macklem
4309e17add This patch changes head so that the default NFS client is now the new
NFS client (which I guess is no longer experimental). The fstype "newnfs"
is now "nfs" and the regular/old NFS client is now fstype "oldnfs".
Although mounts via fstype "nfs" will usually work without userland
changes, an updated mount_nfs(8) binary is needed for kernels built with
"options NFSCL" but not "options NFSCLIENT". Updated mount_nfs(8) and
mount(8) binaries are needed to do mounts for fstype "oldnfs".
The GENERIC kernel configs have been changed to use options
NFSCL and NFSD (the new client and server) instead of NFSCLIENT and NFSSERVER.
For kernels being used on diskless NFS root systems, "options NFSCL"
must be in the kernel config.
Discussed on freebsd-fs@.
2011-04-27 17:51:51 +00:00
Alexander Motin
0d307e0905 - Add shim to simplify migration to the CAM-based ATA. For each new adaX
device in /dev/ create symbolic link with adY name, trying to mimic old ATA
numbering. Imitation is not complete, but should be enough in most cases to
mount file systems without touching /etc/fstab.
 - To know what behavior to mimic, restore ATA_STATIC_ID option in cases
where it was present before.
 - Add some more details to UPDATING.
2011-04-26 17:01:49 +00:00
Rick Macklem
7c208ed659 Fix the experimental NFS client so that it does not bogusly
set the f_flags field of "struct statfs". This had the interesting
effect of making the NFSv4 mounts "disappear" after r221014,
since NFSMNT_NFSV4 and MNT_IGNORE became the same bit.
Move the files used for a diskless NFS root from sys/nfsclient
to sys/nfs in preparation for them to be used by both NFS
clients. Also, move the declaration of the three global data
structures from sys/nfsclient/nfs_vfsops.c to sys/nfs/nfs_diskless.c
so that they are defined when either client uses them.

Reviewed by:	jhb
MFC after:	2 weeks
2011-04-25 22:22:51 +00:00
Alexander Motin
97b53e3634 Switch the GENERIC kernels for all architectures to the new CAM-based ATA
stack. It means that all legacy ATA drivers are disabled and replaced by
respective CAM drivers. If you are using ATA device names in /etc/fstab or
other places, make sure to update them respectively (adX -> adaY,
acdX -> cdY, afdX -> daY, astX -> saY, where 'Y's are the sequential
numbers for each type in order of detection, unless configured otherwise
with tunables, see cam(4)).

ataraid(4) functionality is now supported by the RAID GEOM class.
To use it you can load geom_raid kernel module and use graid(8) tool
for management. Instead of /dev/arX device names, use /dev/raid/rX.
2011-04-24 08:58:58 +00:00
Jung-uk Kim
51268821a9 Do not invoke resume event handlers if suspend was successful.
Pointy hat to:	jkim
2011-04-19 16:30:17 +00:00
Jung-uk Kim
ba40504144 Add suspend/resume event handlers for apm(4) as well. 2011-04-19 16:20:55 +00:00
Konstantin Belousov
3136faa59d Make pmap_invalidate_cache_range() available for consumption on amd64.
Add pmap_invalidate_cache_pages() method on x86. It flushes the CPU
cache for the set of pages, which are not neccessary mapped. Since its
supposed use is to prepare the move of the pages ownership to a device
that does not snoop all CPU accesses to the main memory (read GPU in
GMCH), do not rely on CPU self-snoop feature.

amd64 implementation takes advantage of the direct map. On i386,
extract the helper pmap_flush_page() from pmap_page_set_memattr(), and
use it to make a temporary mapping of the flushed page.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2011-04-18 21:24:42 +00:00
Jung-uk Kim
0e72764232 Add a function rdtsc32() to read lower 32 bits from TSC and discard upper
32 bits.  Some times compiler inserts unnecessary instructions to preserve
unused upper 32 bits even when it is casted to a 32-bit value.  It reduces
such compiler mistakes where every cycle counts.
2011-04-14 16:53:32 +00:00
Jung-uk Kim
4854ae249c Consistently use __volatile as the rest of this file. 2011-04-14 16:19:41 +00:00
Jung-uk Kim
f54c13ea44 Consistently use C99 standard integers as the rest of this file. 2011-04-14 16:02:52 +00:00
Jung-uk Kim
a7817c7ae5 Reduce errors in effective frequency calculation. 2011-04-12 23:49:07 +00:00
Jung-uk Kim
b9e4376214 Reinstate cpu_est_clockrate() support for P-state invariant TSC if APERF and
MPERF MSRs are available.  It was disabled in r216443.  Remove the earlier
hack to subtract 0.5% from the calibrated frequency as DELAY(9) is little
bit more reliable now.
2011-04-12 23:04:01 +00:00
Jung-uk Kim
dd3e254ebd Add forgotten declarations for tsc_perf_stat from the previous commit. 2011-04-12 22:22:01 +00:00
Jung-uk Kim
155094d77a Probe capability to find effective frequency. When the TSC is P-state
invariant, APERF/MPERF ratio can be used to find effective frequency.
2011-04-12 22:15:46 +00:00
Jung-uk Kim
3731174954 Add definitions for CPUID instruction 6, ECX information. 2011-04-12 22:12:23 +00:00
Ryan Stone
7d6a0bf373 Add tunables that mirror the functionality of sysctls machdep.panic_on_nmi
and machdep.kdb_on_nmi.

Approved by:	emaste (mentor)
MFC after:	1 week
2011-04-08 14:39:41 +00:00
Jung-uk Kim
3453537fa5 Use atomic load & store for TSC frequency. It may be overkill for amd64 but
safer for i386 because it can be easily over 4 GHz now.  More worse, it can
be easily changed by user with 'machdep.tsc_freq' tunable (directly) or
cpufreq(4) (indirectly).  Note it is intentionally not used in performance
critical paths to avoid performance regression (but we should, in theory).
Alternatively, we may add "virtual TSC" with lower frequency if maximum
frequency overflows 32 bits (and ignore possible incoherency as we do now).
2011-04-07 23:28:28 +00:00
Jung-uk Kim
d521c6b9c4 Implement atomic_load_acq_64(9) and atomic_store_rel_64(9) for i386. These
functions are implemented with CMPXCHG8B instruction where it is available,
i. e., all Pentium-class and later processors.  Note this instruction is
also used for atomic_store_rel_64() because a simple XCHG-like instruction
for 64-bit memory access does not exist, unfortunately.  If the processor
lacks the instruction, i. e., 80486-class CPUs, two 32-bit load/store are
performed with interrupt temporarily disabled, assuming it does not support
SMP.  Although this assumption may be little naive, it is true in reality.
This implementation is inspired by Linux.
2011-04-06 23:59:59 +00:00
Edward Tomasz Napierala
1ba5ad4210 Add accounting for most of the memory-related resources.
Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib (earlier version)
2011-04-05 20:23:59 +00:00
Jung-uk Kim
57af65d401 Use cpu_ticks() for get_cyclecount(9) rather than checking existence of TSC
at run-time on i386.  cpu_ticks() is set to use RDTSC early enough on i386
where it is available.  Otherwise, cpu_ticks() is driven by the current
timecounter hardware as binuptime(9) does.  This also avoids unnecessary
namespace pollution from <machine/cputypes.h>.
2011-04-04 22:56:33 +00:00
Andriy Gapon
a930718af1 Revert r220032:linux compat: add SO_PASSCRED option with basic handling
I have not properly thought through the commit.  After r220031 (linux
compat: improve and fix sendmsg/recvmsg compatibility) the basic
handling for SO_PASSCRED is not sufficient as it breaks recvmsg
functionality for SCM_CREDS messages because now we would need to handle
sockcred data in addition to cmsgcred.  And that is not implemented yet.

Pointyhat to:	avg
2011-03-31 08:14:51 +00:00
Adrian Chadd
dba9c85977 Break out the ath PCI logic into a separate device/module.
Introduce the AHB glue for Atheros embedded systems. Right now it's
hard-coded for the AR9130 chip whose support isn't yet in this HAL;
it'll be added in a subsequent commit.

Kernel configuration files now need both 'ath' and 'ath_pci' devices; both
modules need to be loaded for the ath device to work.
2011-03-31 08:07:13 +00:00
Andriy Gapon
01a9e1a11b linux compat: add SO_PASSCRED option with basic handling
This seems to have been a part of a bigger patch by dchagin that either
haven't been committed or committed partially.

Submitted by:	dchagin, nox
MFC after:	2 weeks
2011-03-26 11:25:36 +00:00
Andriy Gapon
931f0826ea linux compat: add non-dummy capget and capset system calls, regenerate
And drop dummy definitions for those system calls.
This may transiently break the build.

PR:		kern/149168
Submitted by:	John Wehle <john@feith.com>
Reviewed by:	netchild
MFC after:	2 weeks
2011-03-26 10:59:24 +00:00
Andriy Gapon
1f4ec5a3ba linux compat: add non-dummy capget and capset system calls
PR:		kern/149168
Submitted by:	John Wehle <john@feith.com>
Reviewed by:	netchild
MFC after:	2 weeks
2011-03-26 10:51:56 +00:00
Dmitry Chagin
acface683e Export the correct AT_PLATFORM value.
Since signal trampolines are copied to the shared page do not need to
leave place on the stack for it. Forgotten in the previous commit.

MFC after:	1 Week
2011-03-26 09:25:35 +00:00
Jung-uk Kim
cd45fec044 Improve CPU identifications of various IDT/Centaur/VIA, Rise and Transmeta
CPUs.  These CPUs need explicit MSR configuration to expose ceratin CPU
capabilities (e.g., CMPXCHG8B) to work around compatibility issues with
ancient software.  Unfortunately, Rise mP6 does not set the CX8 bit in CPUID
and there is no MSR to expose the feature although all mP6 processors are
capable of CMPXCHG8B according to datasheets I found from the Net.  Clean up
and simplify VIA PadLock detection while I am in the neighborhood.
2011-03-26 02:02:07 +00:00
Alan Cox
e9a3f7852d Modestly increase the maximum allowed size of the kmem map on i386.
Also, express this new maximum as a fraction of the kernel's address
space size rather than a constant so that increasing KVA_PAGES will
automatically increase this maximum.  As a side-effect of this change,
kern.maxvnodes will automatically increase by a proportional amount.

While I'm here ensure that this change doesn't result in an unintended
increase in maxpipekva on i386.  Calculate maxpipekva based upon the
size of the kernel address space and the amount of physical memory
instead of the size of the kmem map.  The memory backing pipes is not
allocated from the kmem map.  It is allocated from its own submap of
the kernel map.  In short, it has no real connection to the kmem map.
(In fact, the commit messages for the maxpipekva auto-sizing talk
about using the kernel map size, cf. r117325 and r117391, even though
the implementation actually used the kmem map size.)  Although the
calculation is now done differently, the resulting value for
maxpipekva should remain almost the same on i386.  However, on amd64,
the value will be reduced by 2/3.  This is intentional.  The recent
change to VM_KMEM_SIZE_SCALE on amd64 for the benefit of ZFS also had
the unnecessary side-effect of increasing maxpipekva.  This change is
effectively restoring maxpipekva on amd64 to its prior value.

Eliminate init_param3() since it is no longer used.
2011-03-23 16:38:29 +00:00
Jeff Roberson
e4cd31dd3c - Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
   and other miscellaneous small features.
2011-03-21 09:40:01 +00:00
Bjoern A. Zeeb
d2b74735b8 For now remove options FLOWTABLE from the remaining GENERIC kernel
configurations and make it opt-in for those who want it.  LINT will
still build it.

While it may be a perfect win in some scenarios, it still troubles users
(see PRs) in general cases.  In addition we are still allocating resources
even if disabled by sysctl and still leak arp/nd6 entries in case of
interface destruction.

Discussed with:	qingli (2010-11-24, just never executed)
Discussed with: juli (OCTEON1)
PR:		kern/148018, kern/155604, kern/144917, kern/146792
MFC after:	2 weeks
2011-03-19 15:50:34 +00:00
Jung-uk Kim
2ffa4044e9 Rework r219679. Always check CPU class at run-time to make it predictable.
Unfortunately, it pulls in <machine/cputypes.h> but it is small enough and
namespace pollution is minimal, I hope.

Pointed out by:	bde
Pointy hat:	jkim
2011-03-16 16:09:08 +00:00
Jung-uk Kim
1f5cdd5a99 Partially revert r219672. After r198295, kernel need to seed randomness as
soon as possible for stack protector.  However, dummy timecounter does not
have enough entropy and we don't need to sacrifice Pentium class and later.

Pointed out by:	Maxim Dounin (mdounin at mdounin dot ru)
2011-03-15 21:45:10 +00:00
Jung-uk Kim
b2b9331c44 Remove tsc_present from this file, really. 2011-03-15 18:09:29 +00:00
Jung-uk Kim
38b8542ca9 Deprecate tsc_present as the last of its real consumers finally disappeared. 2011-03-15 17:19:52 +00:00
Jung-uk Kim
d8ea2a492e Unconditionally use binuptime(9) for get_cyclecount(9) on i386. Since this
function is almost exclusively used for random harvesting, there is no need
for micro-optimization.  Adjust the manual page accordingly.
2011-03-15 17:14:26 +00:00
Jung-uk Kim
eb14346a8e Make get_cyclecount(9) little bit more useful where binuptime(9) is used. 2011-03-14 23:30:14 +00:00
David Christensen
dd46ab31de - Initial release of bxe(4) to support Broadcom NetXtreme II 10GbE.
(BCM57710, BCM57711, BCM57711E)

MFC after:	One month
2011-03-14 22:42:41 +00:00
Dmitry Chagin
8f1e49a638 Enable shared page use for amd64/linux32 and i386/linux binaries.
Move signal trampoline code from the top of the stack to the shared page.

MFC after:	2 Weeks
2011-03-13 14:58:02 +00:00
Andriy Gapon
d549ef5638 add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls
Regenerate system call and systrace support files.

PR:		kern/152822
Submitted by:	Artem Belevich <fbsdlist@src.cx>
Reviewed by:	jhb (earlier version)
MFC after:	3 weeks
2011-03-12 08:58:19 +00:00
Andriy Gapon
56ede1074e add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls
This commits makes necessary changes in syscall/sysent generation
infrastructure.

PR:		kern/152822
Submitted by:	Artem Belevich <fbsdlist@src.cx>
Reviewed by:	jhb (ealier version)
MFC after:	3 weeks
2011-03-12 08:51:43 +00:00
Jung-uk Kim
79422085d4 Add a tunable "machdep.disable_tsc" to turn off TSC. Specifically, it turns
off boot-time CPU frequency calibration, DELAY(9) with TSC, and using TSC as
a CPU ticker.  Note tsc_present does not change by this tunable.
2011-03-11 00:44:32 +00:00
Jung-uk Kim
cf0d2bb216 Detect NSC/AMD Geode SC1100 properly, not just Stepping 0. Although it is
unclear that "TSC stops ticking with HLT instruction" problem is present
with other steppings, it is limited to Stepping 0 for now.
2011-03-10 22:20:11 +00:00
Jung-uk Kim
bc34c87e81 Deprecate rarely used tsc_is_broken. Instead, we zero out tsc_freq because
it is almost always used with tsc_freq any way.
2011-03-10 20:02:58 +00:00
Julian Elischer
a8066a9d3b Add a small change to the comment in the GENRIC config files that include udbp
Submitted by:	Chris Forgron, cforgeron at acsi dot ca
MFC after:	1 week
2011-03-09 17:15:11 +00:00
Dmitry Chagin
e5d81ef1b5 Extend struct sysvec with new method sv_schedtail, which is used for an
explicit process at fork trampoline path instead of eventhadler(schedtail)
invocation for each child process.

Remove eventhandler(schedtail) code and change linux ABI to use newly added
sysvec method.

While here replace explicit comparing of module sysentvec structure with the
newly created process sysentvec to detect the linux ABI.

Discussed with:	kib

MFC after:	2 Week
2011-03-08 19:01:45 +00:00
Robert Watson
74b5505e5d Continue to introduce Capsicum capability mode:
White list sysarch calls allowed in capability mode; arguably, there
should be some link between the capability mode model and the privilege
model here.  Sysarch is a morass similar to ioctl, in many senses.

Submitted by:	anderson
Discussed with:	benl, kris, pjd
Sponsored by:	Google, Inc.
Obtained from:	Capsicum Project
MFC after:	3 months
2011-03-01 13:35:48 +00:00
John Baldwin
4cef260f42 Fix whitespace nit. 2011-02-22 14:58:14 +00:00
Rebecca Cran
6bccea7c2b Fix typos - remove duplicate "the".
PR:	bin/154928
Submitted by:	Eitan Adler <lists at eitanadler.com>
MFC after: 	3 days
2011-02-21 09:01:34 +00:00
Alan Cox
e6ffa21488 Remove pmap fields that are either unused or not fully implemented.
Discussed with:	kib
2011-02-17 15:36:29 +00:00
Dmitry Chagin
dc4f0a9e11 To avoid excessive code duplication create wrapper for fill regs
from stack frame. Change the trap() code to use newly created function
instead of explicit regs assignment.
2011-02-16 17:50:21 +00:00
Dmitry Chagin
09d6cb0a23 For realtime signals fill the sigval value. 2011-02-15 21:46:36 +00:00
Dmitry Chagin
fde6316272 Sort include files in the alphabetical order. 2011-02-13 19:07:48 +00:00
Dmitry Chagin
222198ab0b Move linux_clone(), linux_fork(), linux_vfork() to a MI path. 2011-02-12 18:17:12 +00:00
Dmitry Chagin
c8d6845e9e In preparation for moving linux_clone() to a MI path
introduce linux_set_upcall_kse().
2011-02-12 16:33:00 +00:00
Dmitry Chagin
2c7660ba3e In preparation for moving linux_clone () to a MI path
move the TLS code in a separate function.

Use function parameter instead of direct using register.
2011-02-12 15:50:21 +00:00
Dmitry Chagin
9bd9b52478 Regen for r218610. 2011-02-12 15:36:25 +00:00
Dmitry Chagin
f91ea2518b The fourth argument of linux_clone is a pointer to the TLS. Change clone syscall definition to match actual linux one. 2011-02-12 15:33:25 +00:00
Alan Cox
02e5228ca0 Setting VV_TEXT here is redundant. It is already set by do_execve().
Reviewed by:	kib
2011-02-09 18:45:33 +00:00
Konstantin Belousov
b17ef03604 Fix linking of the kernel without device npx.
MFC after:	2 weeks
2011-02-05 15:37:10 +00:00
Konstantin Belousov
6f9ec5aab0 Clear the padding when returning context to the usermode, for
MI ucontext_t and x86 MD parts.
Kernel allocates the structures on the stack, and not clearing
reserved fields and paddings causes leakage.

Noted and discussed with:	bde
MFC after:	2 weeks
2011-02-05 15:10:27 +00:00
Matthew D Fleming
08b163fa51 Put the general logic for being a CPU hog into a new function
should_yield().  Use this in various places.  Encapsulate the common
case of check-and-yield into a new function maybe_yield().

Change several checks for a magic number of iterations to use
should_yield() instead.

MFC after:	1 week
2011-02-02 16:35:10 +00:00
Dmitry Chagin
77192fddeb Regen for r218101.
MFC after:	1 Month.
2011-01-30 20:38:26 +00:00
Dmitry Chagin
8d73c2bfd1 Change linux futex syscall definition to match actual linux one.
MFC after:	1 Month.
2011-01-30 20:31:43 +00:00
Dmitry Chagin
9adaae9403 The kern_wait() code already removes the SIGCHLD signal for the waited
process. Removing other SIGCHLD signals is not needed and may cause
problems.

Pointed out by:	jilles

MFC after:	1 Month.
2011-01-30 18:17:38 +00:00
Dmitry Chagin
adc7ece00a Implement a variation of the linux_common_wait() which should
be used by linuxolator itself.

Move linux_wait4() to MD path as it requires native struct
rusage translation to struct l_rusage on linux32/amd64.

MFC after:	1 Month.
2011-01-28 18:47:07 +00:00
Dmitry Chagin
a5c1afadeb Add macro to test the sv_flags of any process. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures.

MFC after:	1 month
2011-01-26 20:03:58 +00:00
Matthew D Fleming
f89f7ada8d Set td_kstack_pages for thread0. This was already being done for most
architectures, but i386 and amd64 were missing it.

Submitted by:	Mohd Fahadullah <mfahadullah AT isilon DOT com>
2011-01-26 17:06:13 +00:00
Sergey Kandaurov
4053b05b91 Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize.
Submitted by:	perryh pluto.rain.com (previous version)
Reviewed by:	jhb
Approved by:	kib (mentor)
Tested by:	universe
2011-01-21 10:26:26 +00:00
Jung-uk Kim
fd240d6d9f Fix yet another fallout from r208833. VM86 BIOS call may cause page fault
when FPU is in use.

Reported by:	Marc UBM Bocklet (ubm dot freebsd at googlemail dot com)
Tested by:	b. f. (bf1783 at googlemail dot com)
MFC after:	3 days
2011-01-19 17:09:07 +00:00
Konstantin Belousov
55aabb7fd1 For architectures not using direct map , and requiring real KVA page for
sf buf allocation, use wakeup() instead of wakeup_one() to notify sf
buffer waiters about free buffer.

sf_buf_alloc() calls msleep(PCATCH) when SFB_CATCH flag was given,
and for simultaneous wakeup and signal delivery, msleep() returns
EINTR/ERESTART despite the thread was selected for wakeup_one(). As
result, we loose a wakeup, and some other waiter will not be woken up.

Reported and tested by:	az
Reviewed by:	alc, jhb
MFC after:	1 week
2011-01-18 21:57:02 +00:00
John Baldwin
6bd823f334 - Remove some always-true checks (checking for unsigned < 0).
- Only check largs->num against max_ldt_segment on amd64 for I386_SET_LDT
  when descriptors are provided.  Specifically, allow the 'start == 0'
  and 'num == 0' special case used to free all LDT entries that previously
  failed with EINVAL.

Submitted by:	clang via rdivacky (some of 1)
Reviewed by:	kib
2011-01-18 16:43:01 +00:00
Jung-uk Kim
2fea643112 Add reader/writer lock around mem_range_attr_get() and mem_range_attr_set().
Compile sys/dev/mem/memutil.c for all supported platforms and remove now
unnecessary dev_mem_md_init().  Consistently define mem_range_softc from
mem.c for all platforms.  Add missing #include guards for machine/memdev.h
and sys/memrange.h.  Clean up some nearby style(9) nits.

MFC after:	1 month
2011-01-17 22:58:28 +00:00
Jung-uk Kim
df74996c3d Avoid preemption while manipulating CRs and MTRRs.
Tested by:	ariff
2011-01-17 17:30:35 +00:00
John Baldwin
072e9838e2 If an interrupt on an I/O APIC is moved to a different CPU after it has
started to execute, it seems that the corresponding ISR bit in the "old"
local APIC can be cleared.  This causes the local APIC interrupt routine
to fail to find an interrupt to service.  Rather than panic'ing in this
case, simply return from the interrupt without sending an EOI to the
local APIC.  If there are any other pending interrupts in other ISR
registers, the local APIC will assert a new interrupt.

Tested by:	steve
2011-01-13 17:00:22 +00:00
Konstantin Belousov
50a57dfbec Move repeated MAXSLP definition from machine/vmparam.h to sys/vmmeter.h.
Update the outdated comments describing MAXSLP and the process
selection algorithm for swap out.

Comments wording and reviewed by:	alc
2011-01-09 12:50:44 +00:00
Tijl Coosemans
d22e78d6b9 Copy powerpc/include/_inttypes.h to x86 and replace i386/amd64/pc98
headers with stubs.

Approved by:	kib (mentor)
2011-01-08 18:09:48 +00:00
Tijl Coosemans
a56e818f29 On mixed 32/64 bit architectures (mips, powerpc) use __LP64__ rather than
architecture macros (__mips_n64, __powerpc64__) when 64 bit types (and
corresponding macros) are different from 32 bit. [1]

Correct the type of INT64_MIN, INT64_MAX and UINT64_MAX.

Define (U)INTMAX_C as an alias for (U)INT64_C matching the type definition
for (u)intmax_t. Do this on all architectures for consistency.

Suggested by:	bde [1]
Approved by:	kib (mentor)
2011-01-08 12:43:05 +00:00
Tijl Coosemans
d942996baf On 32 bit architectures define (u)int64_t as (unsigned) long long instead
of (unsigned) int __attribute__((__mode__(__DI__))). This aligns better
with macros such as (U)INT64_C, (U)INT64_MAX, etc. which assume (u)int64_t
has type (unsigned) long long.

The mode attribute was used because long long wasn't standardised until
C99. Nowadays compilers should support long long and use of the mode
attribute is discouraged according to GCC Internals documentation.

The type definition has to be marked with __extension__ to support
compilation with "-std=c89 -pedantic".

Discussed with:	bde
Approved by:	kib (mentor)
2011-01-08 11:47:55 +00:00
Tijl Coosemans
9858863cd4 Fix types of some values in machine/_limits.h.
On some architectures UCHAR_MAX and USHRT_MAX had type unsigned int.
However, lacking integer suffixes for types smaller than int, their type
should correspond to that of an object of type unsigned char (or short)
when used in an expression with objects of type int. In that case unsigned
char (short) are promoted to int (i.e. signed) so the type of UCHAR_MAX and
USHRT_MAX should also be int.

Where MIN/MAX constants implicitly have the correct type the suffix has
been removed.

While here, correct some comments.

Reviewed by:	bde
Approved by:	kib (mentor)
2011-01-08 11:13:34 +00:00
Tijl Coosemans
911127a0d6 Remove unused support for 64 bit long on 32 bit architectures.
It was used mainly to discover and fix some 64-bit portability problems
before 64-bit arches were widely available.

Discussed with:	bde
Approved by:	kib (mentor)
2011-01-07 22:57:31 +00:00
Konstantin Belousov
39198f15ee Add AT_STACKPROT elf aux vector. Will be used to inform rtld about the
initial stack protection set by the kernel image activator.
2011-01-07 14:22:34 +00:00
John Baldwin
c305730dc0 Remove bogus usage of INTR_FAST. "Fast" interrupts are now indicated by
registering a filter handler rather than a threaded handler.  Also remove
a bogus use of INTR_MPSAFE for a filter.
2011-01-06 21:08:06 +00:00
Colin Percival
b5e61aab00 Spell CRITICAL_ASSERT correctly.
Submitted by:	jhb
MFC with:	r216944
2011-01-04 16:29:07 +00:00
Colin Percival
aa829cef6a Add hamfisted locking to the Xen/PV pmap code: Only allow one thread to
be in {pmap_pinit, pmap_copy, pmap_release} at a time.

This reduces the rate of panics when running 'make index' from ~0.6/hour
to ~0.02/hour (p < 10^-30).

At a later date this locking will be removed, and for this reason, it is
wrapped in #ifdef HAMFISTED_LOCKING; this temporary hack is being put in
place with the intention of shipping somewhat-stable Xen bits in FreeBSD
8.2-RELEASE.

PR:		kern/153672
MFC after:	3 days
2011-01-04 15:55:15 +00:00
Robert Watson
2913e88c91 Make "options XENHVM" compile for i386, not just amd64 -- a largely
mechanical change.  This opens the door for using PV device drivers
under Xen HVM on i386, as well as more general harmonisation of i386
and amd64 Xen support in FreeBSD.

Reviewed by:    cperciva
MFC after:      3 weeks
2011-01-04 14:49:54 +00:00
Colin Percival
d01b2dad73 Adjust the critical section protecting _xen_flush_queue to cover the
entire range where the page mapping request queue needs to be atomically
examined and modified.

Oddly, while this doesn't seem to affect the overall rate of panics
(running 'make index' on EC2 t1.micro instances, there are 0.6 +/- 0.1
panics per hour, both before and after this change), it eliminates
vm_fault from panic backtraces, leaving only backtraces going through
vmspace_fork.
2011-01-04 00:16:38 +00:00
Colin Percival
aaaf607148 Make i386_set_ldt work on i386/XEN, step 5/5.
When cleaning up a thread, reset its LDT to the default LDT.

Note: Casting the LDT pointer to an int and storing it in pc_currentldt is
wildly bogus, but is harmless since pc_currentldt is a write-only variable.

MFC after:	3 days
2010-12-31 17:42:25 +00:00
Colin Percival
698cc19d6b Make i386_set_ldt work on i386/XEN, step 4/5.
Use xen_update_descriptor to update the LDT rather than bcopy.  Under Xen,
pages used for holding LDTs must be read-only, so we can't make the change
ourselves.

Ths obvious alternative of "remap the page read-write, make the change, then
map it read-only again" doesn't work since Xen won't allow an LDT page to be
remapped as R/W.  An arguably better solution is used by NetBSD: They don't
modify LDTs in-place at all, but instead copy the entire LDT, modify the new
version, then atomically swap.

MFC after:	3 days
2010-12-31 17:41:14 +00:00
Colin Percival
de187b8df2 Make i386_set_ldt work on i386/XEN, step 3/5.
Synchronize reality with comment: The user_ldt_alloc function is supposed to
return with dt_lock held.  Due to broken locking in i386/xen/pmap.c, we drop
dt_lock during the call to pmap_map_readonly and then pick it up again; this
can be removed once the Xen pmap locking is fixed.

MFC after:	3 days
2010-12-31 17:40:30 +00:00
Colin Percival
90b7d33458 Make i386_set_ldt work on i386/XEN, step 2/5.
Don't map physical to machine page numbers in pte_load_store, since it uses
PT_SET_VA (which takes a physical page number and converts it to a machine
page number).

MFC after:	3 days
2010-12-31 17:39:58 +00:00
Colin Percival
d262f2dcfc Make i386_set_ldt work on i386/XEN, step 1/5.
Lock the vm page queue mutex around calls to pte_store.  As with many other
uses of the vm page queue mutex in i386/xen/pmap.c, this is bogus and needs
to be replaced at some future date by a spin lock dedicated to protecting
the queue of pending xen page mapping hypervisor calls.  (But for now, bogus
locking is better than a panic.)

MFC after:	3 days
2010-12-31 17:39:31 +00:00
Pyun YongHyeon
2608aefc0b Add driver for DM&P Vortex86 RDC R6040 Fast Ethernet.
The controller is commonly found on DM&P Vortex86 x86 SoC.  The
driver supports all hardware features except flow control.  The
flow control was intentionally disabled due to silicon bug.

DM&P Electronics, Inc. provided all necessary information including
sample board to write driver and answered many questions I had.
Many thanks for their support of FreeBSD.

H/W donated by:	DM&P Electronics, Inc.
2010-12-31 00:21:41 +00:00
Warner Losh
714cf6c0df Revert r216777, per jhb@ 2010-12-28 22:45:29 +00:00
Warner Losh
1977f3f168 Comment out npx and isa from NOTES file. We don't need them here
since DEFAULTS already pulls them in.
2010-12-28 21:22:08 +00:00
Warner Losh
78b92d19e0 Remove mem, io, isa and npx since they are duplicative of the entries
in DEFAULTS.  Saves 8 lines of warnings when we build XBOX.
2010-12-28 21:20:58 +00:00
Colin Percival
4a416f8375 Remove a "not strictly correct" (and panic-inducing) workaround for a bug
which doesn't seem to exist.

PR:		kern/141328
MFC after:	3 days
2010-12-28 14:36:32 +00:00
Colin Percival
76c9650713 Build the modules which can be built. The excluded modules fall into two
categories: Those which can't build with PAE because they attempt to cast
a pointer to a bus_addr_t (mostly scsi drivers); and those which can't be
built with XEN because they conflict with something in xen-os.h (e.g., in
cxgb there is a conflicting definition of test_and_clear_bit).

MFC after:	1 week
2010-12-27 23:59:27 +00:00
Colin Percival
8ea0b3bb2f Lock the vm page queue mutex in pmap_pte_release around the call
to PMAP_SET_VA; this fixes a mutex-not-held panic when a process
which called mlock(2) exits, and parallels a change made in
pmap_pte 10 months ago (svn r204160).

Note: The locking in this code is utterly broken.  We should not
be using the VM page queue mutex to protect the queue of pending
Xen page mapping hypervisor calls.  Even if it made sense to do
so, this commit and r204160 introduce LORs between the vm page
queue mutex and PMAP2mutex.

(However, a possible deadlock is better than a guaranteed panic,
and this change will hopefully make life easier for whoever fixes
the Xen pmap locking in the future.)

PR:		kern/140313
MFC after:	3 days
2010-12-26 13:05:43 +00:00
Tijl Coosemans
81bd5041a2 Merge amd64 and i386 bus.h and move the resulting header to x86. Replace
the original amd64 and i386 headers with stubs.

Rename (AMD64|I386)_BUS_SPACE_* to X86_BUS_SPACE_* everywhere.

Reviewed by:	imp (previous version), jhb
Approved by:	kib (mentor)
2010-12-20 16:39:43 +00:00
Alan Cox
9d555e459c Redo some parts of r216333, specifically, the locking changes to
pmap_extract_and_hold(), and undo the rest.  In particular, I forgot
that PG_PS and PG_PTE_PAT are the same bit.
2010-12-19 07:31:56 +00:00
Konstantin Belousov
7222d2fbee Inform a compiler which asm statements in the x86 implementation of
atomics change eflags.

Reviewed by:	jhb
MFC after:	2 weeks
2010-12-18 16:41:11 +00:00
Konstantin Belousov
a9b31c256e In pmap_extract(), unlock pmap lock earlier. The calculation does not need
the lock when operating on local variables.

Reviewed by:	alc
2010-12-18 11:31:32 +00:00
Jung-uk Kim
e1c9d39ebe Stop lying about supporting cpu_est_clockrate() when TSC is invariant. This
function always returned the nominal frequency instead of current frequency
because we use RDTSC instruction to calculate difference in CPU ticks, which
is supposedly constant for the case.  Now we support cpu_get_nominal_mhz()
for the case, instead.  Note it should be just enough for most usage cases
because cpu_est_clockrate() is often times abused to find maximum frequency
of the processor.
2010-12-14 20:07:51 +00:00
Konstantin Belousov
60c7c84e85 In fpudna()/npxdna(), mark FPU context initialized and optionally
mark user FPU context initialized, if current context is user context.
It was reversed in r215865, by inadequate change of this code fragment
to a call to fpuuserinited()/npxuserinited().

The issue is only relevant for in-kernel users of FPU.

Reported by:	Jan Henrik Sylvester <me janh de>, Mike Tancsa <mike sentex net>
Tested by:	Mike Tancsa
MFC after:	3 days
2010-12-12 16:16:39 +00:00
Colin Percival
20d1a304b3 Reduce the Xen timecounter from 1GHz to 2^-9 GHz, thereby increasing the
timecounter period from 2^32 ns (~4.3s) to 2^41 ns (~36m39s).  Some time
sharing systems can skip clock interrupts for a few seconds when under
load (e.g., if we've recently used more than our fair share of CPU and
someone else wants a burst of CPU) and we were losing time in quanta of
2^32 ns due to timecounter wrapping.

Increasing the timecounter period up to 2^41 ns is definitely overkill,
but we still have microsecond timecounter precision, and anyone using
paravirtualized hardware when they need submicrosecond timing is crazy.
2010-12-11 22:33:33 +00:00
Colin Percival
0f30ed5bc6 Make the machdep.independent_wallclock sysctl do what it says on the box. 2010-12-11 20:12:42 +00:00
Alan Cox
d1cf854b5d When r207410 eliminated the acquisition and release of the page queues
lock from pmap_extract_and_hold(), it didn't take into account that
pmap_pte_quick() sometimes requires the page queues lock to be held.
This change reimplements pmap_extract_and_hold() such that it no
longer uses pmap_pte_quick(), and thus never requires the page queues
lock.

For consistency, adopt the same idiom as used by the new
implementation of pmap_extract_and_hold() in pmap_extract() and
pmap_mincore().  It also happens to make these functions shorter.

Fix a style error in pmap_pte().

Reviewed by:	kib@
2010-12-09 20:16:00 +00:00
Colin Percival
91ff9dc058 Replace i386/i386/busdma_machdep.c and amd64/amd64/busdma_machdep.c
(which are identical) with a single x86/x86/busdma_machdep.c.
2010-12-09 06:41:50 +00:00
Jung-uk Kim
71e0b05797 Do not subtract 0.5% from estimated frequency if DELAY(9) is driven by TSC.
Remove a confusing comment about converting to MHz as we never did.
2010-12-08 23:40:41 +00:00
Colin Percival
af60888734 On amd64, we have (since r1.72, in December 2005) MAX_BPAGES=8192,
while on i386 we have MAX_BPAGES=512.  Implement this difference via
'#ifdef __i386__'.

With this commit, the i386 and amd64 busdma_machdep.c files become
identical; they will soon be replaced by a single file under sys/x86.
2010-12-08 20:20:10 +00:00
Jung-uk Kim
dd7d207dcb Merge sys/amd64/amd64/tsc.c and sys/i386/i386/tsc.c and move to sys/x86/x86.
Discussed with:	avg
2010-12-08 00:09:24 +00:00
Jung-uk Kim
61d14101dd Use int for 'tsc_present' instead of u_int. It is just a boolean. 2010-12-07 23:19:49 +00:00
Jung-uk Kim
7214d5d75b Remove stale comments about P-state invariant TSC and fix style(9) nits. 2010-12-07 22:43:25 +00:00
Jung-uk Kim
1bcc28295b Do not register a event handler for CPU freqency changes when it is found
P-state invariant.  This is continuation of r216274.
2010-12-07 22:34:51 +00:00
Jung-uk Kim
4a9c4056dc Now the P-state invariant TSC is probed early enough, do not register event
handlers for CPU freqency changes when it is found P-state invariant.
Adjust a comment about non-existent tsc_freq_max() while I am here.
2010-12-07 22:23:26 +00:00
Jung-uk Kim
78a661bbaa Probe P-state invariant TSC from rightful place. 2010-12-07 22:12:02 +00:00
Colin Percival
716d203d6b MFamd64 r204214: Enforce stronger alignment semantics (require that the
end of segments be aligned, not just the start of segments) in order to
allow Xen's blkfront driver to operate correctly.

PR:		kern/152818
MFC after:	3 days
2010-12-05 03:20:55 +00:00
Colin Percival
a39dc31fca Remove gratuitous i386/amd64 inconsistency in favour of the less verbose
version of declaring a variable initialized to zero.
2010-12-04 23:36:40 +00:00
Colin Percival
5c5590862f Remove unnecessary #includes which seem to have been accidentally added
as part of CVS r1.76 (in January 2006).
2010-12-04 23:24:35 +00:00
Jung-uk Kim
2f7ab7e85d Revert r216161. It is not necessary because we zero-fill BSS anyway.
Requested by:	jhb
2010-12-03 22:27:51 +00:00
Jung-uk Kim
b14fe63392 Explicitly initialize TSC frequency. To calibrate TSC frequency, we use
DELAY(9) and it may use TSC in turn if TSC frequency is non-zero.

MFC after:	3 days
2010-12-03 21:54:10 +00:00
Jung-uk Kim
e391a266ed Do not change CPU ticker frequency if TSC is P-state invariant. Note this
change was meant to be committed with r184102 (and its subsequent MFCs) but
it fell off somehow.

Pointyhat to:	jkim
MFC after:	3 days
2010-12-03 21:06:30 +00:00
Rebecca Cran
c90f7d9b44 Revert r216134. This checkin broke platforms where bus_space are macros:
they need to be a single statement, and do { } while (0) doesn't work in this
situation so revert until a solution can be devised.
2010-12-03 07:09:23 +00:00
Rebecca Cran
15b4888a24 Disallow passing in a count of zero bytes to the bus_space(9) functions.
Passing a count of zero on i386 and amd64 for [I386|AMD64]_BUS_SPACE_MEM
causes a crash/hang since the 'loop' instruction decrements the counter
before checking if it's zero.

PR:	kern/80980
Discussed with:	jhb
2010-12-02 22:19:30 +00:00
Colin Percival
d42446149f Fix bug introduced by r194784: Under XEN, the page(s) allocated to dpcpu
for CPU #0 weren't being properly reserved.  Under VM pressure this would
cause problems when the dpcpu structures were overwritten by arbitrary
data; the most common symptom was a panic when netisr attempted to lock a
mutex.

For some reason the XEN code keeps track of the start of available memory
in the variables 'first', 'physfree', and 'init_first'; as far as I can
tell, we always have first == physfree == init_first * PAGE_SIZE.  The
earlier commit adjusted 'first' (which, on !XEN, is the only variable
which tracks this value) but not the other two variables.

Exercise for reader: Eliminate two of these three variables.
2010-11-29 06:50:30 +00:00
Konstantin Belousov
c6fb218c3c Calling fill_fpregs() for curthread is legitimate, and ELF coredump
does this.

Reported and tested by:	pho
MFC after:	5 days
2010-11-28 17:56:34 +00:00
Konstantin Belousov
5c6eb03790 Remove npxgetregs(), npxsetregs(), fpugetregs() and fpusetregs()
functions, they are unused. Remove 'user' from npxgetuserregs()
etc. names.

For {npx,fpu}{get,set}regs(), always use pcb->pcb_user_save for FPU
context storage. This eliminates the need for ugly copying with
overwrite of the newly added and reserved fields in ucontext on i386
to satisfy alignment requirements for fpusave() and fpurstor().

pc98 version was copied from i386.

Suggested and reviewed by:	bde
Tested by:    pho (i386 and amd64)
MFC after:    1 week
2010-11-26 14:50:42 +00:00
Tijl Coosemans
ce4ec51dbe Merge amd64/i386 _align.h by aligning on the size of register_t (copied
from powerpc).

Reviewed by:	imp, jhb
Approved by:	kib (mentor)
2010-11-26 10:59:20 +00:00
Ulrich Spörlein
02604cd4f4 Remove kernel support for BB profiling, now that kernbb(8) is gone, too.
PR:		bin/83558
Reviewed by:	jkim
2010-11-26 08:11:43 +00:00
Colin Percival
5c0ab2fa8b Revert r215819 and fix the bug properly. In pmap_qremove, paging table
updates were being queued by pmap_kremove, but the queue wasn't being
flushed; as a result, the updates didn't happen until *after* the call
to pmap_invalidate_range, and old entries could stick around in the TLB.
Adding a PT_UPDATES_FLUSH() call immediately before pmap_invalidate_range
ensures that after the invalidation the TLB will be repopulated with the
correct new entries.

Thanks to:	kib, avg, alc
2010-11-25 22:06:07 +00:00
Dimitry Andric
079d7e43ca Use unambiguous inline assembly to load a float variable. GNU as
silently converts 'fld' to 'flds', without taking the actual variable
type into account (!), but clang's integrated assembler rightfully
complains about it.

Discussed with:	cperciva
2010-11-25 18:14:18 +00:00
John Baldwin
9d76324839 Add device IDs for two more ServerWorks Host-PCI bridges so that we can
read their starting PCI bus number for older systems that do not support
ACPI (or have a broken _BBN method).

PR:		kern/148108
MFC after:	1 week
2010-11-25 15:42:33 +00:00
Colin Percival
98702b3990 Work around paging bug. Somehow we seem to be ending up with entries in
the TLB which don't correspond to ptes with PG_V set; prior to this commit
I'm sometimes getting the wrong data when pages are loaded into the buffer
cache (they're being loaded, but the missing TLB invalidation is causing
the wrong data to be visible).
2010-11-25 15:41:34 +00:00
Colin Percival
1a3b2b87de Rename HYPERVISOR_multicall (which performs the multicall hypercall) to
_HYPERVISOR_multicall, and create a new HYPERVISOR_multicall function which
invokes _HYPERVISOR_multicall and checks that the individual hypercalls all
succeeded.
2010-11-25 15:05:21 +00:00
Colin Percival
0bd7a92067 Remove vestigal debugging code which, in fork-heavy workloads, can cause
a 30x slowdown.
2010-11-25 04:45:31 +00:00
Jung-uk Kim
d2d0fda841 Remove a stale tunable introduced in r215703. 2010-11-23 17:28:23 +00:00
Andriy Gapon
9b984feb3d specialreg.h: add definitions for some useful bits found in CPUID.6 EAX and ECX
CPUID.6 is defined as Thermal and Power Management Leaf by both Intel
and AMD.

Reviewed by:	jhb
MFC after:	7 days
2010-11-23 13:55:30 +00:00
Jung-uk Kim
7dd052c1d9 - Disable caches and flush caches/TLBs when we update PAT as we do for MTRR.
Flushing TLBs is required to ensure cache coherency according to the AMD64
architecture manual.  Flushing caches is only required when changing from a
cacheable memory type (WB, WP, or WT) to an uncacheable type (WC, UC, or
UC-).  Since this function is only used once per processor during startup,
there is no need to take any shortcuts.
- Leave PAT indices 0-3 at the default of WB, WT, UC-, and UC.  Program 5 as
WP (from default WT) and 6 as WC (from default UC-).  Leave 4 and 7 at the
default of WB and UC.  This is to avoid transition from a cacheable memory
type to an uncacheable type to minimize possible cache incoherency.  Since
we perform flushing caches and TLBs now, this change may not be necessary
any more but we do not want to take any chances.
- Remove Apple hardware specific quirks.  With the above changes, it seems
this hack is no longer needed.
- Improve pmap_cache_bits() with an array to map PAT memory type to index.
This array is initialized early from pmap_init_pat(), so that we do not need
to handle special cases in the function any more.  Now this function is
identical on both amd64 and i386.

Reviewed by:	jhb
Tested by:	RM (reuf_m at hotmail dot com)
		Ryszard Czekaj (rychoo at freeshell dot net)
		army.of.root (army dot of dot root at googlemail dot com)
MFC after:	3 days
2010-11-22 19:52:44 +00:00
Colin Percival
c3f128981e In xen_get_timecount, return the full ns-precision time rather than
rounding to 1/HZ precision.

I have no idea why the rounding was introduced in the first place, but
it makes FreeBSD unhappy.
2010-11-22 09:04:29 +00:00
Colin Percival
61381fcf2d Unifdef XEN. This file is only compiled with the XEN kernel option set,
and the !XEN bits get in the way of understanding the code.
2010-11-20 21:36:12 +00:00
Colin Percival
8cdabbaf32 Add VTOM(va) macro as xpmap_ptom(VTOP(va)) to convert to machine addresses.
Clean up the code by converting xpmap_ptom(VTOP(...)) to VTOM(...) and
converting xpmap_ptom(VM_PAGE_TO_PHYS(...)) to VM_PAGE_TO_MACH(...).  In
a few places we take advantage of the fact that xpmap_ptom can commute with
setting PG_* flags.

This commit should have no net effect save to improve the readability of
this code.
2010-11-20 20:04:29 +00:00
Colin Percival
ad520892d7 Make pmap_release consistent with pmap_pinit with respect to unpinning
pages.  The pinning of NPGPTD pages is #if 0ed out in pmap_pinit (I'm
not quite sure why...) and this commit adds a corresponding #if 0 in
pmap_release to avoid unpinning those pages.

Some versions of Xen seem to silently ignore requests to unpin pages
which were never pinned in the first place, but some return an error
(causing FreeBSD to panic) prior to this commit.
2010-11-19 15:12:19 +00:00
Andriy Gapon
b43d292565 specialreg.h: add definitions for MPERF/APERF pair of MSRs
These MSRs can be used to determine actual (average) performance as
compared to a maximum defined performance.
Availability of these MSRs is indicated by bit0 in CPUID.6.ECX on both
Intel and AMD processors.

MFC after:	5 days
2010-11-19 15:07:36 +00:00
Andriy Gapon
7af7c7624a specialreg.h: add AMD-specific "Hardware Configuration Register" MSR
It seems that this MSR has been available in a range of AMD processors
families for quite a while now.

Note1: not all AMD MSRs that are found in amd64 specialreg.h are also in
the i386 version.
Note2: perhaps some additional name component is needed to distinguish
AMD-specific MSRs.

MFC after:	5 days
2010-11-19 15:00:20 +00:00
Andriy Gapon
8fd6d51347 specialreg.h: add definition for AMD Core Performance Boost bit
This bit indicates availability of the feature.

MFC after:	4 days
2010-11-19 14:46:17 +00:00
Colin Percival
f4c884f95a Make pmap_release match pmap_pinit by invoking pmap_qremove(pmap->pm_pdpt)
to match pmap_pinit's pmap_qenter(pmap->pm_pdpt) call in the case of PAE.
2010-11-18 21:29:43 +00:00
Colin Percival
f86f965ef8 Don't KASSERT in pmap_release that
xpmap_ptom(VM_PAGE_TO_PHYS(m)) == (pmap->pm_pdpt[i] & PG_FRAME)
for i = NPGPTD, since pmap->pm_pdpt[i] is only initialized for
0 <= i < NPGPTD.

This fixes an inevitable panic with XEN && PAE && INVARIANTS when
pmap_release is called (e.g., when /sbin/init is launched).
2010-11-18 21:02:40 +00:00
Jung-uk Kim
816b3bd1b0 Restore CR0 after MTRR initialization for correctness sakes. There will be
no noticeable change because we enable caches before we enter here for both
BSP and AP cases.  Remove another pointless optimization for CR4.PGE bit
while I am here.
2010-11-16 23:26:02 +00:00
Jung-uk Kim
50083a5624 Invalidate TLBs explicitly. r1.4 of sys/i386/i386/i686_mem.c removed this
code but probably it only worked by chance because modifying CR4.PGE bit
causes invlidation of entire TLBs.  Since these are very rare events, this
micro-optimization seems useless.

Reviewed by:	jhb
2010-11-16 22:44:58 +00:00
Konstantin Belousov
7022f954c3 Do not use __FreeBSD_version prefix for the special osrel version.
The ports/Mk/bsd.port.mk uses sys/param.h to fetch osrel, and cannot
grok several constants with the prefix.

Reported and tested by:	    swell.k gmail com
MFC after:   1 week
2010-11-14 21:59:11 +00:00
Konstantin Belousov
94bce4535d Use symbolic names instead of hardcoding values for magic p_osrel constants.
MFC after:   1 week
2010-11-14 18:24:12 +00:00
Jung-uk Kim
a3c464fb3c MFamd64: (based on) r209957
Move logic of building ACPI headers for acpi_wakeup.c into better places,
remove intermediate makefile and shell script, and reduce diff between i386
and amd64.
2010-11-12 20:55:14 +00:00
Jung-uk Kim
19da400c64 Move identical copies of apm_bios.h to sys/x86/include, replace them with
stubs, and adjust PC98 stub accordingly.

Reviewed by:	imp, nyan
2010-11-11 19:36:21 +00:00
Jung-uk Kim
926ad40ff9 Add compat shim for apm(4) to translate APM BIOS function numbers from i386
to PC98-specific ones.  Any binaries using apm ioctl(4) commands but built
for i386 should also work on PC98 now.

Reviewed by:	imp, nyan
2010-11-11 19:20:33 +00:00
Jung-uk Kim
93a8847473 Make APM emulation look more closer to its origin. Use device_get_softc(9)
instead of hardcoding acpi(4) unit number as we have device_t for it.
2010-11-10 18:50:12 +00:00
Jung-uk Kim
7c2bf852d7 Refactor acpi_machdep.c for amd64 and i386, move APM emulation into a new
file acpi_apm.c, and place it on sys/x86/acpica.
2010-11-10 01:29:56 +00:00
John Baldwin
961135ead8 - Remove <machine/mutex.h>. Most of the headers were empty, and the
contents of the ones that were not empty were stale and unused.
- Now that <machine/mutex.h> no longer exists, there is no need to allow it
  to override various helper macros in <sys/mutex.h>.
- Rename various helper macros for low-level operations on mutexes to live
  in the _mtx_* or __mtx_* namespaces.  While here, change the names to more
  closely match the real API functions they are backing.
- Drop support for including <sys/mutex.h> in assembly source files.

Suggested by:	bde (1, 2)
2010-11-09 20:46:41 +00:00