Commit Graph

2026 Commits

Author SHA1 Message Date
Scott Long
60ad8150c7 Retire smp_active. It was racey and caused demonstrated problems with
the cpufreq code.  Replace its use with smp_started.  There's at least
one userland tool that still looks at the kern.smp.active sysctl, so
preserve it but point it to smp_started as well.

Discussed with: peter, jhb
MFC after: 3 days
Obtained from: Netflix
2014-04-26 20:27:54 +00:00
Tijl Coosemans
0a4c54d606 Rename __wchar_t so it no longer conflicts with __wchar_t from clang 3.4
-fms-extensions.

MFC after:	2 weeks
2014-04-01 14:46:11 +00:00
Ed Maste
df5dd82a32 Fix missed efi.h header change in r263815
Pointy hat to:	emaste
2014-03-28 11:46:54 +00:00
Ed Maste
35d75b3087 Move ia64 efi.h to sys in preparation for amd64 UEFI support
Prototypes specific to ia64 have been left in this file for now, under
__ia64__, rather than moving them to a new header under sys/ia64.
I anticipate that (some of) the corresponding functions will be shared
by the amd64, arm64, i386, and ia64 architectures, and we can adjust
this as EFI support on other than ia64 continues to develop.

Sponsored by:	The FreeBSD Foundation
2014-03-27 13:57:00 +00:00
Bryan Drewery
44f1c91610 Rename global cnt to vm_cnt to avoid shadowing.
To reduce the diff struct pcu.cnt field was not renamed, so
PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in
kvm(3) and vmstat(8). The goal was to not affect externally used KPI.

Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the
the global cnt variable.

Exp-run revealed no ports using it directly.

No objection from:	arch@
Sponsored by:	EMC / Isilon Storage Division
2014-03-22 10:26:09 +00:00
Marcel Moolenaar
a8691d0792 Add KTR events for the PMAP interface functions
1.  move unmapped_buf_allowed to machdep.c.
2.  map both pmap_mapbios() and pmap_mapdev() to pmap_mapdev_attr()
    and have the actual work done by pmap_mapdev_priv() (renamed from
    pmap_mapdev()). Use pmap_mapdev_priv() to map the I/O port space
    because we can't use CTR() that early.
3.  add pmap_pinit_common() that's used by both pmap_pinit0() and
    pmap_pinit(). Previously pmap_pinit0() would call pmap_pinit(),
    but that would create 2 KTR events. While here, use pmap_t instead
    of "struct pmap *".
4.  fix pmap_kenter() to use vm_paddr_t as the type for the physical.
5.  various white-space adjustments for consistency.
6.  use C99 and KNF for function definitions where appropriate.
7.  slightly re-order prototypes and defines in <machine/pmap.h>

No functional change (other than the creation of KTR_PMAP events).
2014-03-19 21:30:10 +00:00
Marcel Moolenaar
ac5c8b7c5b Fix and improve exception tracing:
1.  Name the kernel option XTRACE instead of EXCEPTION_TRACING
2.  Put support functions in ia64/ia64/xtrace.c
3.  Make it work with SMP by giving each CPU its own buffer
4.  Save 16 key registers in the buffer for every exception
5.  In ia64_handle_intr() and trap() transfer the trace record
    to the KTR trace buffer using CTRx() and with some basic
    information for now
6.  Use a tunable to anble tracing and stop tracing as soon as
    we enter the debugger

Room for improvements:
1.  Transferring exception-relevant information to KTR
2.  Add a sysctl to enable/disable tracing
2014-03-18 23:51:34 +00:00
Warner Losh
d4f95c889d In kernel config files, it is supposed to be 'options<space><tab>' not
'options<tab><tab>', per long standing (but recently not so strictly
enforced) convention.
2014-03-18 14:41:18 +00:00
Marcel Moolenaar
47c6266afd In intr_event_handle() we already save and set td_intr_frame, so
don't do it also in ia64_handle_intr(). With ia64_handle_intr()
not saving and setting td_intr_frame, make sure to do it for
timer interrupts in ia64_ih_hardclock().
2014-03-17 04:38:10 +00:00
Marcel Moolenaar
c8afe4038e Move the implementation of kdb_cpu_trap() from <machine/kdb.h> to
machdep.c. This makes it easier to add conditional code based on
options.
2014-03-16 22:56:22 +00:00
Marcel Moolenaar
2b567b38b4 Don't use the ITC as the faulting address for external interrupts.
We only use it for tracing and the KTR infrastructure will use ITC
for the time-stamp.
2014-03-16 21:57:05 +00:00
Marcel Moolenaar
f32ac833eb In intr_event_handle() we already save and set td_intr_frame, so
don't do it also in ia64_handle_intr(). With ia64_handle_intr()
not saving and setting td_intr_frame, make sure to do it for
clock interrupts in ia64_ih_clock().
2014-03-16 20:21:40 +00:00
Warner Losh
22b8ff24b5 Delete stray clause 3 (Advertising clause) and renumber while i'm
here.

Approved by: alc@
2014-03-11 23:41:35 +00:00
Marcel Moolenaar
dd3468cad5 When reading physical memory, make sure to access it using the right
memory attributes. The same applies to the mmap(2) interface. Not
doing so results in machine checks.

We find the memory attributes in the EFI memory map, as queried by
mem_phys2virt().
2014-03-04 03:19:36 +00:00
Marcel Moolenaar
229d894543 In pmap_set_pte(), make sure to enforce ordering by inserting a memory
fence. Under system load, the CPU has been found to change the order
by which the stores are made visible. When the tag is made visible
before the other TLB values, other CPUs may use the invalid TLB values
and do bad things.

While here (i.e. not a fix) don't return errors from pmap_remove_vhpt()
to callers of pmap_remove_pte(). Those callers don't check the return
value and as such don't do what is needed to keep a consistent state.
More importantly, pmap_remove_vhpt() can't really have an error without
it indicating something unintended. Using KASSERT is therefore better.

PR:		182999, 183227
2014-01-20 18:37:35 +00:00
Marcel Moolenaar
b5a9d8b5a7 Enable vt. This brings VGA-based console and terminal support to
ia64 for the very first time. Only 9 years in the making...

Note that the vt/vga driver does not actually make sure there's
VGA hardware at the standard/legacy VGA I/O port and memory I/O
addresses. This can cause machine checks if the H/W does not have
a VGA controller.
2014-01-19 04:45:52 +00:00
Marcel Moolenaar
79c3bcf42c In the nested TLB fault handler, for a direct-mapped address, make
sure to clear the lower 12 bits. We're adding the translation
attributes to the physical address and non-zero bits in the first
12 bits would give us something unexpected, including invalid bit
values. Those trigger nested general protection faults.
We do not have to clear the region bits, because they are ignored
anyway, so we can replace an existing dep instruction with the one
we need.

This fixes GP faults for the swapper thread, as it's the only thread
that has a direct-mapped stack. Since the bug is in the nested TLB
fault handler, the frequency of hitting the GP is in the order of
hours/days under load.
2014-01-15 03:57:41 +00:00
Marcel Moolenaar
37ed8bfa35 Implement atomic_swap_<type>.
The operation was documented and implemented partially (both from a
type and architecture perspective) on 2013-08-21 and got used in
ZFS with revision 260150 (zfeature.c) and since ZFS is supported on
ia64, the lack of having atomic_swap became problem.
2014-01-01 22:51:19 +00:00
Marcel Moolenaar
e10d245f6c Add a virt_foreach() that does the same as what phys_foreach() does and
change virt_size(), virt_dumphdrs() and virt_dumpdata() into its callback
functions.
In virt_foreach() we iterate over all the virtual memory regions that we
want in the minidump. For now, just start with the PBVM (= kernel text
and data plus preloaded modules). The core file this produces can already
be used to work out the libkvm changes that need to be made to support it.
In parallel, we can flesh out the in-kernel bits to dump more of what we
need in a minidump without changing the core file structure.
2013-12-28 19:54:19 +00:00
Marcel Moolenaar
dfc586a324 Add the scaffolding for minidumps. They're just like physical dumps,
except the chunks aren't physical memory regions but virtual memory
regions. In both cases, the core file is an ELF file and flags in
the header allow libkvm to distinguish one from the other.
2013-12-27 19:51:17 +00:00
Marcel Moolenaar
6930130109 Allow pmap_remove_pages() to be called for physical maps not
associated with the current thread.

Obtained from: alc@
2013-12-12 03:04:00 +00:00
Pawel Jakub Dawidek
f2b525e6b9 Make process descriptors standard part of the kernel. rwhod(8) already
requires process descriptors to work and having PROCDESC in GENERIC
seems not enough, especially that we hope to have more and more consumers
in the base.

MFC after:	3 days
2013-11-30 15:08:35 +00:00
Marcel Moolenaar
a36032c3d7 Don't enable interrupts before we call sched_throw(). Interrupts
are expected to be disabled by virtue of md_spinlock_count=1.
2013-11-10 04:22:40 +00:00
Alan Cox
c70af4875e As of r257209, all architectures have defined VM_KMEM_SIZE_SCALE. In other
words, every architecture is now auto-sizing the kmem arena.  This revision
changes kmeminit() so that the definition of VM_KMEM_SIZE_SCALE becomes
mandatory and the definition of VM_KMEM_SIZE becomes optional.

Replace or eliminate all existing definitions of VM_KMEM_SIZE.  With
auto-sizing enabled, VM_KMEM_SIZE effectively became an alternate spelling
for VM_KMEM_SIZE_MIN on most architectures.  Use VM_KMEM_SIZE_MIN for
clarity.

Change kmeminit() so that the effect of defining VM_KMEM_SIZE is similar to
that of setting the tunable vm.kmem_size.  Whereas the macros
VM_KMEM_SIZE_{MAX,MIN,SCALE} have had the same effect as the tunables
vm.kmem_size_{max,min,scale}, the effects of VM_KMEM_SIZE and vm.kmem_size
have been distinct.  In particular, whereas VM_KMEM_SIZE was overridden by
VM_KMEM_SIZE_{MAX,MIN,SCALE} and vm.kmem_size_{max,min,scale}, vm.kmem_size
was not.  Remedy this inconsistency.  Now, VM_KMEM_SIZE can be used to set
the size of the kmem arena at compile-time without that value being
overridden by auto-sizing.

Update the nearby comments to reflect the kmem submap being replaced by the
kmem arena.  Stop duplicating the auto-sizing formula in every machine-
dependent vmparam.h and place it in kmeminit() where auto-sizing takes
place.

Reviewed by:	kib (an earlier version)
Sponsored by:	EMC / Isilon Storage Division
2013-11-08 16:25:00 +00:00
Marcel Moolenaar
e3eb23c502 Use LOG2_ID_PAGE_SIZE again for the identity mapping in regions 6 & 7.
Make the default translation size the same as the PBVM page size to
avoid inserting overlapping translations in the TC. That generally is
very bad.
2013-11-01 01:32:01 +00:00
Marcel Moolenaar
fde9169e11 The PAL_PTCE_INFO function returns the counts and strides of the
outer and inner loop as 32-bit integers mux'd in 64-bit return
values. Change our data types for the count and stride to match
and simplify/adjust accordingly.
Note that with this change the defaults of the ptc.e parameters
have changed. Since all hardware is supposed to support the PAL
call, there should be no impact. Even ski is unaffected, because
the TC is re-initialized without considering the virtual address.
So, as long as we call ptc.e at least once, we're good. That's
what the new defaults do.
Most processor implementations need but a single ptc.e to purge
the entire TC anyway...
2013-11-01 00:21:38 +00:00
Marcel Moolenaar
3598d9555a Purge the translation cache of APs before we unleash them. To that
end, make pmap_invalidate_all() global and have it only handle the
local CPU -- i.e. no rendezvous. We do not use pmap_invalidate_all
other than during initialization.
Note that the BSP already purges its TC -- it was missing for APs
only. Nonetheless, this so far seems to eliminate random problems.
2013-10-31 23:06:04 +00:00
Marcel Moolenaar
ec9b648ef8 Respect the kern.smp.disabled tunable. When we're scanning the MADT in
ia64_probe_sapics(), we also create PCPU structures for any Local SAPICs
we encounter. When SMP is disabled, this leaves us with partially setup
PCPU structures, which typically results in panics when we're iterating
over CPUs. When SMP is disabled, we now prevent the creation of the
PCPU structures.
2013-10-31 22:46:03 +00:00
Konstantin Belousov
80938e75f0 Add bus_dmamap_load_ma() function to load map with the array of
vm_pages.  Provide trivial implementation which forwards the load to
_bus_dmamap_load_phys() page by page.  Right now all architectures use
bus_dmamap_load_ma_triv().

Tested by:	pho (as part of the functional patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
2013-10-27 21:39:16 +00:00
Alan Cox
deb179bb4c The pmap function pmap_clear_reference() is no longer used. Remove it.
pmap_clear_reference() has had exactly one caller in the kernel for
several years, more precisely, since FreeBSD 8.  Now, that call no
longer exists.

Approved by:	re (kib)
Sponsored by:	EMC / Isilon Storage Division
2013-09-20 04:30:18 +00:00
John Baldwin
edb572a38c Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use
an address in the first 2GB of the process's address space.  This flag should
have the same semantics as the same flag on Linux.

To facilitate this, add a new parameter to vm_map_find() that specifies an
optional maximum virtual address.  While here, fix several callers of
vm_map_find() to use a VMFS_* constant for the findspace argument instead of
TRUE and FALSE.

Reviewed by:	alc
Approved by:	re (kib)
2013-09-09 18:11:59 +00:00
Gleb Smirnoff
e16477e8d9 On those machines, where sf_bufs do not represent any real object, make
sf_buf_alloc()/sf_buf_free() inlines, to save two calls to an absolutely
empty functions.

Reviewed by:	alc, kib, scottl
Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2013-09-06 05:37:49 +00:00
Alan Cox
51321f7c31 Significantly reduce the cost, i.e., run time, of calls to madvise(...,
MADV_DONTNEED) and madvise(..., MADV_FREE).  Specifically, introduce a new
pmap function, pmap_advise(), that operates on a range of virtual addresses
within the specified pmap, allowing for a more efficient implementation of
MADV_DONTNEED and MADV_FREE.  Previously, the implementation of
MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as
pmap_clear_reference().  Intuitively, the problem with this implementation
is that the pmap-level locks are acquired and released and the page table
traversed repeatedly, once for each resident page in the range
that was specified to madvise(2).  A more subtle flaw with the previous
implementation is that pmap_clear_reference() would clear the reference bit
on all mappings to the specified page, not just the mapping in the range
specified to madvise(2).

Since our malloc(3) makes heavy use of madvise(2), this change can have a
measureable impact.  For example, the system time for completing a parallel
"buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%.

Note: This change only contains pmap_advise() implementations for a subset
of our supported architectures.  I will commit implementations for the
remaining architectures after further testing.  For now, a stub function is
sufficient because of the advisory nature of pmap_advise().

Discussed with: jeff, jhb, kib
Tested by:      pho (i386), marcel (ia64)
Sponsored by:   EMC / Isilon Storage Division
2013-08-29 15:49:05 +00:00
Konstantin Belousov
e68c64f0ba Revert r254501. Instead, reuse the type stability of the struct pmap
which is the part of struct vmspace, allocated from UMA_ZONE_NOFREE
zone.  Initialize the pmap lock in the vmspace zone init function, and
remove pmap lock initialization and destruction from pmap_pinit() and
pmap_release().

Suggested and reviewed by:	alc (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2013-08-22 18:12:24 +00:00
Pawel Jakub Dawidek
417ffc66fa Add process descriptors support to the GENERIC kernel. It is already being
used by the tools in base systems and with sandboxing more and more tools
the usage should only increase.

Submitted by:	Mariusz Zaborski <oshogbo@FreeBSD.org>
Sponsored by:	Google Summer of Code 2013
MFC after:	1 month
2013-08-18 10:21:29 +00:00
Jung-uk Kim
3bd12ca8f1 Tidy up global locks for ACPICA. There is no functional change. 2013-08-13 21:34:03 +00:00
Attilio Rao
c7aebda8a1 The soft and hard busy mechanism rely on the vm object lock to work.
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.

Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
  and vm_page_grab are being executed.  This will be very helpful
  once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag

The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.

Sponsored by:	EMC / Isilon storage division
Discussed with:	alc
Reviewed by:	jeff, kib
Tested by:	gavin, bapt (older version)
Tested by:	pho, scottl
2013-08-09 11:11:11 +00:00
Andriy Gapon
9ba0691bdd follow up to r254051
- update powerpc/GENERIC64 as well, suggested by mdf
- update comments so that they make sense after the change, suggested by
  jhb

X-MFC after:	never (change specific to head)
2013-08-09 08:11:09 +00:00
Andriy Gapon
818d282e7b enable KDB_TRACE in GENERICs
KDB_TRACE is not an alternative to DDB/etc, they are complementary.
So I do not see any reason to not enable KDB_TRACE by default.

X-MFC after:	never (change specific to head)
2013-08-07 08:03:50 +00:00
Jeff Roberson
5df87b21d3 Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

 - Normalize functions that allocate memory to use kmem_*
 - Those that allocate address space are named kva_*
 - Those that operate on maps are named kmap_*
 - Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	EMC / Isilon Storage Division
2013-08-07 06:21:20 +00:00
David E. O'Brien
0e6a0799a9 Back out r253779 & r253786. 2013-07-31 17:21:18 +00:00
David E. O'Brien
99ff83da74 Decouple yarrow from random(4) device.
* Make Yarrow an optional kernel component -- enabled by "YARROW_RNG" option.
  The files sha2.c, hash.c, randomdev_soft.c and yarrow.c comprise yarrow.

* random(4) device doesn't really depend on rijndael-*.  Yarrow, however, does.

* Add random_adaptors.[ch] which is basically a store of random_adaptor's.
  random_adaptor is basically an adapter that plugs in to random(4).
  random_adaptor can only be plugged in to random(4) very early in bootup.
  Unplugging random_adaptor from random(4) is not supported, and is probably a
  bad idea anyway, due to potential loss of entropy pools.
  We currently have 3 random_adaptors:
  + yarrow
  + rdrand (ivy.c)
  + nehemeiah

* Remove platform dependent logic from probe.c, and move it into
  corresponding registration routines of each random_adaptor provider.
  probe.c doesn't do anything other than picking a specific random_adaptor
  from a list of registered ones.

* If the kernel doesn't have any random_adaptor adapters present then the
  creation of /dev/random is postponed until next random_adaptor is kldload'ed.

* Fix randomdev_soft.c to refer to its own random_adaptor, instead of a
  system wide one.

Submitted by: arthurmesh@gmail.com, obrien
Obtained from: Juniper Networks
Reviewed by: obrien
2013-07-29 20:26:27 +00:00
Andriy Gapon
a29cc9a34b Revert r253748,253749
This WIP should not have been committed yet.

Pointyhat to:	avg
2013-07-28 18:44:17 +00:00
Andriy Gapon
366d8bfb7b put contents of cpu.h under _KERNEL
no userland-serviceable parts inside

MFC after:	20 days
2013-07-28 18:32:27 +00:00
Marcel Moolenaar
ae9742be10 In pci_cfgregread() and pci_cfgregwrite(), multiplex the domain and
bus number into the bus argument. The bus number occupies the least
significant 8 bits. The PCI domain occupies the most significant 24
bits.

On the Altix 350, the PCI domain is a required parameter, but
changing the prototype of the pci_cfgreg*() functions to include a
separate domain argument has wide-spread consequences across the
supported architectures. We'd be changing a known interface.

Multiplexing is an acceptable kluge to give us what we need with
manageable impact. Note that the PCI bus number fits in 8 bits,
so the multiplexing of the domain is a backward compatible change.
2013-07-23 03:03:17 +00:00
Marcel Moolenaar
a992fcc26c In ia64_mca_init(), don't limit the allocation of the info block to
fall within the first 256MB of memory. The origin/reason for that
limitation is not known, but it's not believed to be required for
proper initialization. What is known is that the Altix 350 does not
have physical memory at that address (by virtue of the address space
bits).

Keep the boundary at 256MB so that the info block will be covered
by a single direct-mapped translation.

While here, change the flags to M_NOWAIT to eliminate confusion. It
does not change the behaviour of contigmalloc(). What is does is
makes the flags argument explicitly say what the actual behaviour
is.
2013-07-23 02:38:23 +00:00
Marcel Moolenaar
57e65e9bda In pmap_mapdev(), if the physical memory range is not covered by an EFI
memory descriptor, don't return NULL as the virtual address, return the
direct-mapped uncacheable virtual address for it. At first, this was
needed only for the Altix 350, but now even some high-end HP machines
have devices mapped to physical addresses that aren't covered by the
EFI memory map.
2013-07-23 02:11:22 +00:00
Konstantin Belousov
70a7dd5d5b Fix issues with zeroing and fetching the counters, on x86 and ppc64.
Issues were noted by Bruce Evans and are present on all architectures.

On i386, a counter fetch should use atomic read of 64bit value,
otherwise carry from the increment on other CPU could be lost for the
given fetch, making error of 2^32.  If 64bit read (cmpxchg8b) is not
available on the machine, it cannot be SMP and it is enough to disable
preemption around read to avoid the split read.

On x86 the counter increment is not atomic on purpose, which makes it
possible for the store of the incremented result to override just
zeroed per-cpu slot.  The effect would be a counter going off by
arbitrary value after zeroing.  Perform the counter zeroing on the
same processor which does the increments, making the operations
mutually exclusive.  On i386, same as for the fetching, if the
cmpxchg8b is not available, machine is not SMP and we disable
preemption for zeroing.

PowerPC64 is treated the same as amd64.

For other architectures, the changes made to allow the compilation to
succeed, without fixing the issues with zeroing or fetching.  It
should be possible to handle them by using the 64bit loads and stores
atomic WRT preemption (assuming the architectures also converted from
using critical sections to proper asm).  If architecture does not
provide the facility, using global (spin) mutex would be non-optimal
but working solution.

Noted by:  bde
Sponsored by:	The FreeBSD Foundation
2013-07-01 02:48:27 +00:00
Jung-uk Kim
b1ddd13145 Move definitions required by userland applications out of acpica_machdep.h. 2013-06-27 00:22:40 +00:00
Achim Leubner
dce93cd06d Driver 'aacraid' added. Supports Adaptec by PMC RAID controller families Series 6, 7, 8 and upcoming products. Older Adaptec RAID controller families are supported by the 'aac' driver.
Approved by:	scottl (mentor)
2013-05-24 09:22:43 +00:00