Commit Graph

11846 Commits

Author SHA1 Message Date
Matthew D Fleming
d7854da193 Add MALLOC_DEBUG_MAXZONES debug malloc(9) option to use multiple uma
zones for each malloc bucket size.  The purpose is to isolate
different malloc types into hash classes, so that any buffer overruns
or use-after-free will usually only affect memory from malloc types in
that hash class.  This is purely a debugging tool; by varying the hash
function and tracking which hash class was corrupted, the intersection
of the hash classes from each instance will point to a single malloc
type that is being misused.  At this point inspection or memguard(9)
can be used to catch the offending code.

Add MALLOC_DEBUG_MAXZONES=8 to -current GENERIC configuration files.
The suggestion to have this on by default came from Kostik Belousov on
-arch.

This code is based on work by Ron Steinke at Isilon Systems.

Reviewed by:    -arch (mostly silence)
Reviewed by:    zml
Approved by:    zml (mentor)
2010-07-28 15:36:12 +00:00
Alan Cox
a14a949872 The interpreter name should no longer be treated as a buffer that can be
overwritten.  (This change should have been included in r210545.)

Submitted by:	kib
2010-07-28 04:47:40 +00:00
John Baldwin
a3870a1826 Very rough first cut at NUMA support for the physical page allocator. For
now it uses a very dumb first-touch allocation policy.  This will change in
the future.
- Each architecture indicates the maximum number of supported memory domains
  via a new VM_NDOMAIN parameter in <machine/vmparam.h>.
- Each cpu now has a PCPU_GET(domain) member to indicate the memory domain
  a CPU belongs to.  Domain values are dense and numbered from 0.
- When a platform supports multiple domains, the default freelist
  (VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain.
  The MD code is required to populate an array of mem_affinity structures.
  Each entry in the array defines a range of memory (start and end) and a
  domain for the range.  Multiple entries may be present for a single
  domain.  The list is terminated by an entry where all fields are zero.
  This array of structures is used to split up phys_avail[] regions that
  fall in VM_FREELIST_DEFAULT into per-domain freelists.
- Each memory domain has a separate lookup-array of freelists that is
  used when fulfulling a physical memory allocation.  Right now the
  per-domain freelists are listed in a round-robin order for each domain.
  In the future a table such as the ACPI SLIT table may be used to order
  the per-domain lookup lists based on the penalty for each memory domain
  relative to a specific domain.  The lookup lists may be examined via a
  new vm.phys.lookup_lists sysctl.
- The first-touch policy is implemented by using PCPU_GET(domain) to
  pick a lookup list when allocating memory.

Reviewed by:	alc
2010-07-27 20:33:50 +00:00
Jung-uk Kim
172754036a Simplify fldcw() macro. There is no reason to use pointer here. No object
file change after this commit (verified with md5).
2010-07-26 23:20:55 +00:00
Jung-uk Kim
8b019a8887 Remove an unused macro since r189418. 2010-07-26 22:55:14 +00:00
Jung-uk Kim
30402401a7 Reduce diff against fenv.h:
Mark all inline asms as volatile for safety.  No object file change after
this commit (verified with md5).
2010-07-26 22:16:36 +00:00
Jung-uk Kim
2e50fa36a5 FNSTSW instruction can use AX register as an operand.
Obtained from:	fenv.h
2010-07-26 21:24:52 +00:00
Rui Paulo
daef39e7ae Remove the acpi_aiboost driver. It has been replaced by aibs(4). 2010-07-25 17:55:57 +00:00
Rui Paulo
2b95672852 MFamd64:
Add USD_GETBASE(), USD_SETBASE(), USD_GETLIMIT() and USD_SETLIMIT().
2010-07-21 18:47:52 +00:00
Tijl Coosemans
3245ecbe92 Store fsbase and gsbase in the right fields of the mcontext. They were
switched.

PR:		i386/148344
Approved by:	kib (mentor)
MFC after:	1 week
2010-07-20 12:36:36 +00:00
Alexander Motin
060d7431b5 Add hints for i8254 timer on i386 and amd64. Some people report about
systems with PnP/ACPI not reporting i8254 timer. In some cases it can be
fatal, as i8254 can be the only available time counter hardware. From other
side we are now heavily depend on i8254 timer and till the last time it's
init/usage was completely hardcoded. So this change just restores previous
behavior in more regular fashion.
2010-07-16 23:21:46 +00:00
Alexander Motin
fcc06be1b2 Move functions declaration to MI code, following implementation. 2010-07-15 17:49:35 +00:00
Bernhard Schmidt
774f94f14c - Update 6000 firmware to 9.221.4.1
- Add 6050 firmware

MFC after:	2 weeks
2010-07-15 11:26:07 +00:00
Warner Losh
1003cfe94d Remove obsolete undef of COPY_SIGCODE. It appears to have not been
used in FreeBSD in quite some time (maybe since before 4.4-lite :)

Submitted by:	bde
2010-07-13 15:06:13 +00:00
Alan Cox
8155e5d561 Reduce the number of global TLB shootdowns generated by pmap_qenter().
Specifically, teach pmap_qenter() to recognize the case when it is being
asked to replace a mapping with the very same mapping and not generate
a shootdown.  Unfortunately, the buffer cache commonly passes an entire
buffer to pmap_qenter() when only a subset of the mappings are changing.
For the extension of buffers in allocbuf() this was resulting in
unnecessary shootdowns.  The addition of new pages to the end of the
buffer need not and did not trigger a shootdown, but overwriting the
initial mappings with the very same mappings was seen as a change that
necessitated a shootdown.  With this change, that is no longer so.

For a "buildworld" on amd64, this change eliminates 14-15% of the
pmap_invalidate_range() shootdowns, and about 4% of the overall
shootdowns.

MFC after:	3 weeks
2010-07-10 18:22:44 +00:00
Konstantin Belousov
b543e91ba5 Fix spacing.
Noted by:	pgollucci
MFC after:	3 weeks
2010-07-09 21:27:42 +00:00
Konstantin Belousov
2680dac9e1 For both i386 and amd64 pmap,
- change the type of pm_active to cpumask_t, which it is;
- in pmap_remove_pages(), compare with PCPU(curpmap), instead of
  dereferencing the long chain of pointers [1].
For amd64 pmap, remove the unneeded checks for validity of curpmap
in pmap_activate(), since curpmap should be always valid after
r209789.

Submitted by:	alc [1]
Reviewed by:	alc
MFC after:	3 weeks
2010-07-09 20:05:56 +00:00
Alexander Motin
af565edaaa Revert r209638. After commit, there appeared to be more people who liked
previous name of stray interrupt counters, then responded to the list.
2010-07-02 17:22:15 +00:00
Alexander Motin
b7bc6aa726 Make stray irq counters have format alike to other counters. Unified format
makes string processing (for example by `systat -vm`) easier.
2010-07-01 21:58:46 +00:00
John Baldwin
fc0de8f0b6 Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to
<sys/syscallsubr.h> where all other kern_<syscall> prototypes live.
2010-06-30 18:03:42 +00:00
Konstantin Belousov
13cedde2cb Regenerate 2010-06-28 18:17:21 +00:00
Rui Paulo
bbe4a97d41 Import the acpi_aibs(4) driver written by Constantine A. Murenin.
It has more features than acpi_aiboost(4) and it will eventually replace
acpi_aiboost(4).

Submitted by:	Constantine A. Murenin <cnst at FreeBSD.org>
Reviewed by:	freebsd-acpi, imp
MFC after:	1 month
2010-06-25 15:32:46 +00:00
Konstantin Belousov
595473a587 Clear DF bit in eflags/rflags on the kernel entry. The i386 and amd64
ABI specifies the DF should be zero, and newer compilers do not clear
DF before using DF-sensitive instructions.

The DF clearing for signal handlers was done some time ago.

MFC after:	1 week
2010-06-23 20:44:07 +00:00
Konstantin Belousov
692add74d8 Fix bugs on pc98, use npxgetuserregs() instead of npxgetregs() for
get_fpcontext(), and npxsetuserregs() for set_fpcontext). Also,
note that usercontext is not initialized anymore in fpstate_drop().

Systematically replace references to npxgetregs() and npxsetregs()
by npxgetuserregs() and npxsetuserregs() in comments.

Noted by:	bde
2010-06-23 12:17:13 +00:00
Konstantin Belousov
1060a94fb5 After the FPU use requires #MF working due to INT13 FPU exception handling
removal, MFi386 r209198:
    Use critical sections instead of disabling local interrupts to ensure
    the consistency between PCPU fpcurthread and the state of FPU.

Reviewed by:	bde
Tested by:	pho
2010-06-23 11:21:19 +00:00
Konstantin Belousov
699d648aab Remove the support for int13 FPU exception reporting on i386. It is
believed that all 486-class CPUs FreeBSD is capable to run on, either
have no FPU and cannot use external coprocessor, or have FPU on the
package and can use #MF.

Reviewed by:	bde
Tested by:	pho (previous version)
2010-06-23 11:12:58 +00:00
Konstantin Belousov
95882b9865 Remove unused i586 optimized bcopy/bzero/etc implementations that utilize
FPU registers for copying. Remove the switch table and jumps from
bcopy/bzero/... to the actual implementation.
As a side-effect, i486-optimized bzero is removed.

Reviewed by:	bde
Tested by:	pho (previous version)
2010-06-23 10:40:28 +00:00
Alexander Motin
25eb1b8c15 Some style fixes for r209371.
Submitted by:	jhb@
2010-06-22 16:20:10 +00:00
Alexander Motin
875b8844be Implement new event timers infrastructure. It provides unified APIs for
writing event timer drivers, for choosing best possible drivers by machine
independent code and for operating them to supply kernel with hardclock(),
statclock() and profclock() events in unified fashion on various hardware.

Infrastructure provides support for both per-CPU (independent for every CPU
core) and global timers in periodic and one-shot modes. MI management code
at this moment uses only periodic mode, but one-shot mode use planned for
later, as part of tickless kernel project.

For this moment infrastructure used on i386 and amd64 architectures. Other
archs are welcome to follow, while their current operation should not be
affected.

This patch updates existing drivers (i8254, RTC and LAPIC) for the new
order, and adds event timers support into the HPET driver. These drivers
have different capabilities:
 LAPIC - per-CPU timer, supports periodic and one-shot operation, may
freeze in C3 state, calibrated on first use, so may be not exactly precise.
 HPET - depending on hardware can work as per-CPU or global, supports
periodic and one-shot operation, usually provides several event timers.
 i8254 - global, limited to periodic mode, because same hardware used also
as time counter.
 RTC - global, supports only periodic mode, set of frequencies in Hz
limited by powers of 2.

Depending on hardware capabilities, drivers preferred in following orders,
either LAPIC, HPETs, i8254, RTC or HPETs, LAPIC, i8254, RTC.
User may explicitly specify wanted timers via loader tunables or sysctls:
kern.eventtimer.timer1 and kern.eventtimer.timer2.
If requested driver is unavailable or unoperational, system will try to
replace it. If no more timers available or "NONE" specified for second,
system will operate using only one timer, multiplying it's frequency by few
times and uing respective dividers to honor hz, stathz and profhz values,
set during initial setup.
2010-06-20 21:33:29 +00:00
Konstantin Belousov
9e3e64e797 Only enable kdtrace hook in the LINT on the architectures that implement it. 2010-06-18 18:51:09 +00:00
Alexander Motin
d364638110 Merge COUNT_XINVLTLB_HITS and COUNT_IPIS kernel options from i386 to amd64.
This information can be very valuable for CPU sleep-time (and respectively
idle power consumption) optimization.

Add counters for timer-related IPIs.

Reviewed by:	jhb@ (previous version)
2010-06-17 11:54:49 +00:00
John Baldwin
61d3f0bab2 Restore the machine check register banks on resume. For banks being
monitored via CMCI, reset the interrupt threshold to 1 on resume.

Reviewed by:	jkim
MFC after:	2 weeks
2010-06-15 18:51:41 +00:00
Alexander Motin
2f9fc3899b Fix bug introduced in SVN rev 194985. When calling pic_assign_cpu()
for pre-bound IRQs during boot, submit there LAPIC ID, same as in other
places, not CPU ID.
2010-06-14 07:38:53 +00:00
Alexander Motin
1440de203f Check general TSC presence before doing more specific checks and printfs. 2010-06-12 13:10:03 +00:00
John Baldwin
3aa6d94e0c Update several places that iterate over CPUs to use CPU_FOREACH(). 2010-06-11 18:46:34 +00:00
Alan Cox
9124d0d6a3 Relax one of the new assertions in pmap_enter() a little. Specifically,
allow pmap_enter() to be performed on an unmanaged page that doesn't have
VPO_BUSY set.  Having VPO_BUSY set really only matters for managed pages.
(See, for example, pmap_remove_write().)
2010-06-11 15:49:39 +00:00
Alexander Kabaev
60743cbd22 Do not require pos parameter to be zero in MAP_ANONYMOUS mmap requests
in Linux emulation layer. Linux seems to only require that pos is
page-aligned, but otherwise ignores it. Default FreeBSD mmap parameter
checking is too strict to allow some Linux binaries to run. tsMuxeR is
one example of such a binary.

Discussed with:	jhb
MFC after:	1 week
2010-06-10 17:59:47 +00:00
Alan Cox
ce18658792 Reduce the scope of the page queues lock and the number of
PG_REFERENCED changes in vm_pageout_object_deactivate_pages().
Simplify this function's inner loop using TAILQ_FOREACH(), and shorten
some of its overly long lines.  Update a stale comment.

Assert that PG_REFERENCED may be cleared only if the object containing
the page is locked.  Add a comment documenting this.

Assert that a caller to vm_page_requeue() holds the page queues lock,
and assert that the page is on a page queue.

Push down the page queues lock into pmap_ts_referenced() and
pmap_page_exists_quick().  (As of now, there are no longer any pmap
functions that expect to be called with the page queues lock held.)

Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever
be passed an unmanaged page.  Assert this rather than returning "0"
and "FALSE" respectively.

ARM:

Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().

Push down the page queues lock inside of pmap_clearbit(), simplifying
pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write().
Additionally, this allows for avoiding the acquisition of the page
queues lock in some cases.

PowerPC/AIM:

moea*_page_exits_quick() and moea*_page_wired_mappings() will never be
called before pmap initialization is complete.  Therefore, the check
for moea_initialized can be eliminated.

Push down the page queues lock inside of moea*_clear_bit(),
simplifying moea*_clear_modify() and moea*_clear_reference().

The last parameter to moea*_clear_bit() is never used.  Eliminate it.

PowerPC/BookE:

Simplify mmu_booke_page_exists_quick()'s control flow.

Reviewed by:	kib@
2010-06-10 16:56:35 +00:00
John Baldwin
b9cd2f771a Move the MD support for PCI message signalled interrupts to the x86 tree
as it is identical for i386 and amd64.
2010-06-08 18:36:03 +00:00
John Baldwin
2465e30f0c Move the machine check support code to the x86 tree since it is identical
on i386 and amd64.

Requested by:	alc
2010-06-08 18:04:07 +00:00
John Baldwin
53a908cb07 Move the I/O APIC code to the x86 tree since it is identical on i386 and
amd64.
2010-06-08 17:51:21 +00:00
John Baldwin
bfc7a4fc48 - Use a bit more care when moving I/O APIC interrupts between CPUs. Mask
the interrupt followed by a brief delay if it is not currently masked
  before moving the interrupt.
- Move the icu_lock out of ioapic_program_intpin() and into callers.  This
  closes a race where ioapic_program_intpin() could use a stale value of
  the masked state to compute the masked bit in the register.

Reviewed by:	mav
MFC after:	2 weeks
2010-06-08 17:08:13 +00:00
Konstantin Belousov
6cf9a08d2c Introduce the x86 kernel interfaces to allow kernel code to use
FPU/SSE hardware. Caller should provide a save area that is chained
into the stack of the areas; pcb save_area for usermode FPU state is
on top. The pcb now contains a pointer to the current FPU saved area,
used during FPUDNA handling and context switches.  There is also a
facility to allow the kernel thread to use pcb save_area.

Change the dreaded warnings "npxdna in kernel mode!" into the panics
when FPU usage is not registered.

KPI discussed with:	fabient
Tested by:    pho, fabient
Hardware provided by:	Sentex Communications
MFC after:    1 month
2010-06-05 15:59:59 +00:00
Alan Cox
966898be68 In the unlikely event that pmap_ts_referenced() demoted five superpage
mappings to the same underlying physical page, the calling thread would be
left forever pinned to the same processor.

MFC after:	3 days
2010-06-03 03:55:22 +00:00
John Baldwin
9c72429312 MFamd64: Add a new macro PCPU_XEN_FIELDS to hold XEN-specific per-CPU
fields that is always included in PCPU_MD_FIELDS.  The macro is empty for
non-XEN kernels.  This avoids duplicating non-XEN per-CPU fields in two
places.  While here, remove several unused fields from the XEN-specific
structure.

Reviewed by:	kmacy, gibbs
MFC after:	1 month
2010-06-02 15:09:36 +00:00
Alan Cox
b2830a9649 Eliminate a stale comment. 2010-05-31 06:06:10 +00:00
Alan Cox
72dc3eb65b Simplify the inner loop of pmap_collect(): While iterating over the page's
pv list, there is no point in checking whether or not the pv list is empty.
Instead, wait until the loop completes.
2010-05-30 18:48:41 +00:00
Alan Cox
a1192299b3 Merge various changes from i386/i386/pmap.c:
The remaining, unmerged portions of r175404
  Retire PMAP_DIAGNOSTIC.  Any useful diagnostics that were conditionally
  compiled under PMAP_DIAGNOSTIC are now KASSERT()s.  (Note: The kernel
  option DIAGNOSTIC still disables inlining of certain pmap functions.)

  Eliminate dead code from pmap_enter().  This code implemented an assertion.
  On i386, an equivalent check is already implemented.  However, on amd64,
  a small change is required to implement an equivalent check.

  Eliminate \n from a nearby panic string.

  Use KASSERT() to reimplement pmap_copy()'s two assertions.

Merge portions of r177659
  To date, we have assumed that the TLB will only set the PG_M bit in a
  PTE if that PTE has the PG_RW bit set.  However, this assumption does
  not hold on recent processors from Intel.  For example, consider a PTE
  that has the PG_RW bit set but the PG_M bit clear.  Suppose this PTE
  is cached in the TLB and later the PG_RW bit is cleared in the PTE,
  but the corresponding TLB entry is not (yet) invalidated.
  Historically, upon a write access using this (stale) TLB entry, the
  TLB would observe that the PG_RW bit had been cleared and initiate a
  page fault, aborting the setting of the PG_M bit in the PTE.  Now,
  however, P4- and Core2-family processors will set the PG_M bit before
  observing that the PG_RW bit is clear and initiating a page fault.  In
  other words, the write does not occur but the PG_M bit is still set.

  The real impact of this difference is not that great.  Specifically,
  we should no longer assert that any PTE with the PG_M bit set must
  also have the PG_RW bit set, and we should ignore the state of the
  PG_M bit unless the PG_RW bit is set.

r208609
  Defer freeing any page table pages in pmap_remove_all() until after the
  page queues lock is released.  This may reduce the amount of time that the
  page queues lock is held by pmap_remove_all().

r208645
  When I pushed down the page queues lock into pmap_is_modified(), I created
  an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls
  vm_page_dirty() must perform the call first.  Otherwise, pmap_is_modified()
  could return FALSE without acquiring the page queues lock because the page
  is not (currently) writeable, and the caller to pmap_is_modified() might
  believe that the page's dirty field is clear because it has not seen the
  effect of the vm_page_dirty() call.

  When I pushed down the page queues lock into pmap_is_modified(), I
  overlooked one place where this ordering dependence is violated:
  pmap_enter().  In a rare situation pmap_enter() can be called to replace a
  dirty mapping to one page with a mapping to another page.  (I say rare
  because replacements generally occur as a result of a copy-on-write fault,
  and so the old page is not dirty.)  This change delays clearing PG_WRITEABLE
  until after vm_page_dirty() has been called.

  Fixing the ordering dependency also makes it easy to introduce a small
  optimization: When pmap_enter() used to replace a mapping to one page with a
  mapping to another page, it freed the pv entry for the first mapping and
  later called the pv entry allocator for the new mapping.  Now, pmap_enter()
  attempts to recycle the old pv entry, saving two calls to the pv entry
  allocator.

  There is no point in setting PG_WRITEABLE on unmanaged pages, so don't.
  Update a comment to reflect this.

  Tidy up the variable declarations at the start of pmap_enter().
2010-05-30 04:44:32 +00:00
Alan Cox
8f0d5d3b9f When I pushed down the page queues lock into pmap_is_modified(), I created
an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls
vm_page_dirty() must perform the call first.  Otherwise, pmap_is_modified()
could return FALSE without acquiring the page queues lock because the page
is not (currently) writeable, and the caller to pmap_is_modified() might
believe that the page's dirty field is clear because it has not seen the
effect of the vm_page_dirty() call.

When I pushed down the page queues lock into pmap_is_modified(), I
overlooked one place where this ordering dependence is violated:
pmap_enter().  In a rare situation pmap_enter() can be called to replace a
dirty mapping to one page with a mapping to another page.  (I say rare
because replacements generally occur as a result of a copy-on-write fault,
and so the old page is not dirty.)  This change delays clearing PG_WRITEABLE
until after vm_page_dirty() has been called.

Fixing the ordering dependency also makes it easy to introduce a small
optimization: When pmap_enter() used to replace a mapping to one page with a
mapping to another page, it freed the pv entry for the first mapping and
later called the pv entry allocator for the new mapping.  Now, pmap_enter()
attempts to recycle the old pv entry, saving two calls to the pv entry
allocator.

There is no point in setting PG_WRITEABLE on unmanaged pages, so don't.
Update a comment to reflect this.

Tidy up the variable declarations at the start of pmap_enter().
2010-05-29 17:10:45 +00:00
John Baldwin
0c86af8162 Defer initializing machine checks for the boot CPU until the local APIC is
fully configured.

MFC after:	1 month
2010-05-28 17:50:24 +00:00