Commit Graph

536 Commits

Author SHA1 Message Date
Alexander Motin
a157e42516 Refactor timer management code with priority to one-shot operation mode.
The main goal of this is to generate timer interrupts only when there is
some work to do. When CPU is busy interrupts are generating at full rate
of hz + stathz to fullfill scheduler and timekeeping requirements. But
when CPU is idle, only minimum set of interrupts (down to 8 interrupts per
second per CPU now), needed to handle scheduled callouts is executed.
This allows significantly increase idle CPU sleep time, increasing effect
of static power-saving technologies. Also it should reduce host CPU load
on virtualized systems, when guest system is idle.

There is set of tunables, also available as writable sysctls, allowing to
control wanted event timer subsystem behavior:
  kern.eventtimer.timer - allows to choose event timer hardware to use.
On x86 there is up to 4 different kinds of timers. Depending on whether
chosen timer is per-CPU, behavior of other options slightly differs.
  kern.eventtimer.periodic - allows to choose periodic and one-shot
operation mode. In periodic mode, current timer hardware taken as the only
source of time for time events. This mode is quite alike to previous kernel
behavior. One-shot mode instead uses currently selected time counter
hardware to schedule all needed events one by one and program timer to
generate interrupt exactly in specified time. Default value depends of
chosen timer capabilities, but one-shot mode is preferred, until other is
forced by user or hardware.
  kern.eventtimer.singlemul - in periodic mode specifies how much times
higher timer frequency should be, to not strictly alias hardclock() and
statclock() events. Default values are 2 and 4, but could be reduced to 1
if extra interrupts are unwanted.
  kern.eventtimer.idletick - makes each CPU to receive every timer interrupt
independently of whether they busy or not. By default this options is
disabled. If chosen timer is per-CPU and runs in periodic mode, this option
has no effect - all interrupts are generating.

As soon as this patch modifies cpu_idle() on some platforms, I have also
refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions
(if supported) under high sleep/wakeup rate, as fast alternative to other
methods. It allows SMP scheduler to wake up sleeping CPUs much faster
without using IPI, significantly increasing performance on some highly
task-switching loads.

Tested by:	many (on i386, amd64, sparc64 and powerc)
H/W donated by:	Gheorghe Ardelean
Sponsored by:	iXsystems, Inc.
2010-09-13 07:25:35 +00:00
Alexander Motin
6774542431 Fix the build after r212453. IPI_STATCLOCK declaration is still needed
for build, though not really used.

Submitted by:	andreast
2010-09-11 09:53:33 +00:00
Alexander Motin
707c2fb950 Update PowerPC event timer code to use new event timers infrastructure.
Reviewed by:	nwitehorn
Tested by:	andreast
H/W donated by:	Gheorghe Ardelean
2010-09-11 04:45:51 +00:00
Peter Grehan
d8acfb88ad - Bump MAXCPU to 4. Tested on a quad G5 with both 32 and 64-bit kernels.
A make buildkernel -j4 uses ~360% CPU.
- Bracket the AP spinup printf with a mutex to avoid garbled output.
- Enable SMP by default on powerpc64.

Reviewed by:	nwhitehorn
2010-09-03 03:56:09 +00:00
Nathan Whitehorn
b2a237be5c Restructure how reset and poweroff are handled on PowerPC systems, since
the existing code was very platform specific, and broken for SMP systems
trying to reboot from KDB.

- Add a new PLATFORM_RESET() method to the platform KOBJ interface, and
  migrate existing reset functions into platform modules.
- Modify the OF_reboot() routine to submit the request by hand to avoid
  the IPIs involved in the regular openfirmware() routine. This fixes
  reboot from KDB on SMP machines.
- Move non-KDB reset and poweroff functions on the Powermac platform
  into the relevant power control drivers (cuda, pmu, smu), instead of
  using them through the Open Firmware backdoor.
- Rename platform_chrp to platform_powermac since it has become
  increasingly Powermac specific. When we gain support for IBM systems,
  we will grow a new platform_chrp.
2010-08-31 15:27:46 +00:00
Konstantin Belousov
ee235befcb Supply some useful information to the started image using ELF aux vectors.
In particular, provide pagesize and pagesizes array, the canary value
for SSP use, number of host CPUs and osreldate.

Tested by:	marius (sparc64)
MFC after:	1 month
2010-08-17 08:55:45 +00:00
John Baldwin
60c7b36b7a Update various places that store or manipulate CPU masks to use cpumask_t
instead of int or u_int.  Since cpumask_t is currently u_int on all
platforms this should just be a cosmetic change.
2010-08-11 23:22:53 +00:00
John Baldwin
d9d8d1449d Add a new ipi_cpu() function to the MI IPI API that can be used to send an
IPI to a specific CPU by its cpuid.  Replace calls to ipi_selected() that
constructed a mask for a single CPU with calls to ipi_cpu() instead.  This
will matter more in the future when we transition from cpumask_t to
cpuset_t for CPU masks in which case building a CPU mask is more expensive.

Submitted by:	peter, sbruno
Reviewed by:	rookie
Obtained from:	Yahoo! (x86)
MFC after:	1 month
2010-08-06 15:36:59 +00:00
Nathan Whitehorn
3b4b38304e Improve hash coverage for kernel page table entries by modifying the kernel
ESID -> VSID map function. This makes ZFS run stably on PowerPC under
heavy loads (repeated simultaneous SVN checkouts and updates).
2010-07-31 21:35:15 +00:00
John Baldwin
a3870a1826 Very rough first cut at NUMA support for the physical page allocator. For
now it uses a very dumb first-touch allocation policy.  This will change in
the future.
- Each architecture indicates the maximum number of supported memory domains
  via a new VM_NDOMAIN parameter in <machine/vmparam.h>.
- Each cpu now has a PCPU_GET(domain) member to indicate the memory domain
  a CPU belongs to.  Domain values are dense and numbered from 0.
- When a platform supports multiple domains, the default freelist
  (VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain.
  The MD code is required to populate an array of mem_affinity structures.
  Each entry in the array defines a range of memory (start and end) and a
  domain for the range.  Multiple entries may be present for a single
  domain.  The list is terminated by an entry where all fields are zero.
  This array of structures is used to split up phys_avail[] regions that
  fall in VM_FREELIST_DEFAULT into per-domain freelists.
- Each memory domain has a separate lookup-array of freelists that is
  used when fulfulling a physical memory allocation.  Right now the
  per-domain freelists are listed in a round-robin order for each domain.
  In the future a table such as the ACPI SLIT table may be used to order
  the per-domain lookup lists based on the penalty for each memory domain
  relative to a specific domain.  The lookup lists may be examined via a
  new vm.phys.lookup_lists sysctl.
- The first-touch policy is implemented by using PCPU_GET(domain) to
  pick a lookup list when allocating memory.

Reviewed by:	alc
2010-07-27 20:33:50 +00:00
Konstantin Belousov
87d45a0392 When compat32 binary asks for the value of hw.machine_arch, report the
name of 32bit sibling architecture instead of the host one. Do the
same for hw.machine on amd64.

Add a safety belt debug.adaptive_machine_arch sysctl, to turn the
substitution off.

Reviewed by:	jhb, nwhitehorn
MFC after:	2 weeks
2010-07-22 09:13:49 +00:00
Rafal Jaworowski
4f124b977c Eliminate FDT_IMMR_VA define.
This removes platform dependencies from <machine>/fdt.h for the benfit of
portability.
2010-07-19 18:47:18 +00:00
Nathan Whitehorn
2a8d51200d Remove obsolete code that sets SHMMAXPGS to a tiny value by default
on PowerPC.
2010-07-13 23:10:55 +00:00
Nathan Whitehorn
c3e289e1ce MFppc64:
Kernel sources for 64-bit PowerPC, along with build-system changes to keep
32-bit kernels compiling (build system changes for 64-bit kernels are
coming later). Existing 32-bit PowerPC kernel configurations must be
updated after this change to specify their architecture.
2010-07-13 05:32:19 +00:00
Nathan Whitehorn
cc81c44dd8 Unify ABI-related bits of the Book-E and AIM machdep routines
(exec_setregs, etc.) in order to simplify the addition of 64-bit support,
and possible future extension of the Book-E code to handle hard floating
point and Altivec.

MFC after:	1 month
2010-07-12 16:08:07 +00:00
Nathan Whitehorn
d83af91b99 MFppc64:
Provide ELF definitions for 64-bit PowerPC. This unbreaks the powerpc
loader build.
2010-07-12 13:38:26 +00:00
Rafal Jaworowski
d1d3233ebd Convert Freescale PowerPC platforms to FDT convention.
The following systems are affected:

  - MPC8555CDS
  - MPC8572DS

This overhaul covers the following major changes:

  - All integrated peripherals drivers for Freescale MPC85XX SoC, which are
    currently in the FreeBSD source tree are reworked and adjusted so they
    derive config data out of the device tree blob (instead of hard coded /
    tabelarized values).

  - This includes: LBC, PCI / PCI-Express, I2C, DS1553, OpenPIC, TSEC, SEC,
    QUICC, UART, CFI.

  - Thanks to the common FDT infrastrucutre (fdtbus, simplebus) we retire
    ocpbus(4) driver, which was based on hard-coded config data.

Note that world for these platforms has to be built WITH_FDT.

Reviewed by:	imp
Sponsored by:	The FreeBSD Foundation
2010-07-11 21:08:29 +00:00
Marcel Moolenaar
8a35d194f2 Remove the unneeded header <machine/intr.h>. 2010-07-02 02:17:39 +00:00
Marcel Moolenaar
dc54fa1d55 MFia64:
When compiling with profiling, we define PROF for userspace and GPROF
for the kernel.
2010-07-01 03:48:45 +00:00
Marcel Moolenaar
cfbcffa248 Fix profiling (part 1):
o   Functions are 4-byte aligned for Book-E.
o   We get compiled with -DPROF and not -DGPROF if profiling
    is enabled.
2010-06-29 19:07:44 +00:00
Nathan Whitehorn
08393b3efa Configure interrupts on SMP systems to be distributed among all online
CPUs by default, and provide a functional version of BUS_BIND_INTR().
While here, fix some potential concurrency problems in the interrupt
handling code.
2010-06-23 22:33:03 +00:00
Nathan Whitehorn
755a660df7 Missed commit in r209310: the IRQ number in INTR_VEC() should have
parantheses around it to allow arithmetic expressions to be passed.

Submitted by:	Andreas Tobler
2010-06-18 21:24:17 +00:00
Nathan Whitehorn
eaef5f0af8 Provide for multiple, cascaded PICs on PowerPC systems, and extend the
OFW interrupt map interface to also return the device's interrupt parent.

MFC after:	8.1-RELEASE
2010-06-18 14:06:27 +00:00
Rafal Jaworowski
04cb90189b Initial loader(8) support for Flattened Device Tree.
o This is disabled by default for now, and can be enabled using WITH_FDT at
  build time.

o Tested with ARM and PowerPC.

Reviewed by:	imp
Sponsored by:	The FreeBSD Foundation
2010-05-25 15:21:39 +00:00
Konstantin Belousov
afe1a68827 Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
  usermode into struct syscall_args. The structure is machine-depended
  (this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
  from the syscall. It is a generalization of
  cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
  return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively.  The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by:	jhb, marcel, marius, nwhitehorn, stas
Tested by:	marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
	stas (mips)
MFC after:	1 month
2010-05-23 18:32:02 +00:00
Rafal Jaworowski
f40cf6825f Provide missing members for Book-E pmap (and fix build). 2010-05-18 21:23:51 +00:00
Nathan Whitehorn
ca2c19312e Add support for the U4 PCI-Express bridge chipset used in late-generation
Powermac G5 systems. MSI and several other things are not presently
supported.

The U3/U4 internal device support portions of this change were contributed
by Andreas Tobler.

MFC after:	1 week
2010-05-16 15:18:25 +00:00
Kip Macy
2965a45315 On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib
2010-04-30 00:46:43 +00:00
Konstantin Belousov
8bac98182a Style: use #define<TAB> instead of #define<SPACE>.
Noted by:	bde, pluknet gmail com
MFC after:	11 days
2010-04-27 09:48:43 +00:00
Konstantin Belousov
ed7806879b Move the constants specifying the size of struct kinfo_proc into
machine-specific header files. Add KINFO_PROC32_SIZE for struct
kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add
CTASSERT for the size of struct kinfo_proc32.

Submitted by:	pluknet
Reviewed by:	imp, jhb, nwhitehorn
MFC after:	2 weeks
2010-04-24 12:49:52 +00:00
Joel Dahl
1edcf74de7 The NetBSD Foundation has granted permission to remove clause 3 and 4 from
the software.

Obtained from:	NetBSD
2010-03-03 17:55:51 +00:00
Joel Dahl
2598954edc The NetBSD Foundation has granted permission to remove clause 3 and 4 from
their software.

Obtained from:	NetBSD
2010-03-03 17:07:02 +00:00
Nathan Whitehorn
83c01b8cf6 Close a race involving the OEA64 scratchpage. When the scratch page's
physical address is changed, there is a brief window during which its PTE
is invalid. Since moea64_set_scratchpage_pa() does not and cannot hold
the page table lock, it was possible for another CPU to insert a new PTE
into the scratch page's PTEG slot during this interval, corrupting both
mappings.

Solve this by creating a new flag, LPTE_LOCKED, such that
moea64_pte_insert will avoid claiming locked PTEG slots even if they
are invalid. This change also incorporates some additional paranoia
added to solve things I thought might be this bug.

Reported by:	linimon
2010-02-24 00:54:37 +00:00
Nathan Whitehorn
ab73970649 Reduce KVA pressure on OEA64 systems running in bridge mode by mapping
UMA segments at their physical addresses instead of into KVA. This emulates
the direct mapping behavior of OEA32 in an ad-hoc way. To make this work
properly required sharing the entire kernel PMAP with Open Firmware, so
ofw_pmap is transformed into a stub on 64-bit CPUs.

Also implement some more tweaks to get more mileage out of our limited
amount of KVA, principally by extending KVA into segment 16 until the
beginning of the first OFW mapping.

Reported by:	linimon
2010-02-20 16:23:29 +00:00
Nathan Whitehorn
50b8e360ac Bump limits on PowerPC. This allows large executables like parts of LLVM
to function.

Reviewed by:	grehan
Obtained from:	NetBSD
MFC after:	2 weeks
2009-12-02 06:49:22 +00:00
Nathan Whitehorn
227f66048e Add a CPU features framework on PowerPC and simplify CPU setup a little
more. This provides three new sysctls to user space:
hw.cpu_features - A bitmask of available CPU features
hw.floatingpoint - Whether or not there is hardware FP support
hw.altivec - Whether or not Altivec is available

PR:		powerpc/139154
MFC after:	10 days
2009-11-28 17:33:19 +00:00
Nathan Whitehorn
570d2b25a6 Turn on NAP mode on G5 systems, and refactor the HID0 setup code a little.
This makes my G5 Xserve sound slightly less like it is filled with
howling banshees.
2009-10-24 18:33:01 +00:00
Nathan Whitehorn
999987e51a Add SMP support on U3-based G5 systems. This does not yet work perfectly:
at least on my Xserve, getting the decrementer and timebase on APs to tick
requires setting up a clock chip over I2C, which is not yet done.

While here, correct the 64-bit tlbie function to set the CPU to 64-bit
mode correctly.

Hardware donated by:	grehan
2009-10-23 03:17:02 +00:00
Marcel Moolenaar
1a4fcaebe3 o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
    vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
    that translates the vm_map_t argumument to pmap_t.
o   Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
    it replaces the pmap_page_executable() function, added to solve
    the I-cache problem in uiomove_fromphys().
o   In proc_rwmem() call vm_sync_icache() when writing to a page that
    has execute permissions. This assures that when breakpoints are
    written, the I-cache will be coherent and the process will actually
    hit the breakpoint.
o   This also fixes the Book-E PMAP implementation that was missing
    necessary locking while trying to deal with the I-cache coherency
    in pmap_enter() (read: mmu_booke_enter_locked).

The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.
2009-10-21 18:38:02 +00:00
Konstantin Belousov
023063938a Define architectural load bases for PIE binaries. Addresses were selected
by looking at the bases used for non-relocatable executables by gnu ld(1),
and adjusting it slightly.

Discussed with:	bz
Reviewed by:	kan
Tested by:	bz (i386, amd64), bsam (linux)
MFC after:	some time
2009-10-10 15:31:24 +00:00
Alan Cox
fe105d45a2 Add a new sysctl for reporting all of the supported page sizes.
Reviewed by:	jhb
MFC after:	3 weeks
2009-09-18 17:04:57 +00:00
Poul-Henning Kamp
a254d1f16d Get rid of the _NO_NAMESPACE_POLLUTION kludge by creating an
architecture specific include file containing the _ALIGN*
stuff which <sys/socket.h> needs.
2009-09-08 20:45:40 +00:00
Attilio Rao
dc6fbf6545 * Completely Remove the option STOP_NMI from the kernel. This option
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions.  This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Adds an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding

This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.

Please don't forget to update your config file with the STOP_NMI
option removal

Reviewed by:	jhb
Tested by:	pho, bz, rink
Approved by:	re (kib)
2009-08-13 17:09:45 +00:00
Alan Cox
3153e878dd Add support to the virtual memory system for configuring machine-
dependent memory attributes:

Rename vm_cache_mode_t to vm_memattr_t.  The new name reflects the
fact that there are machine-dependent memory attributes that have
nothing to do with controlling the cache's behavior.

Introduce vm_object_set_memattr() for setting the default memory
attributes that will be given to an object's pages.

Introduce and use pmap_page_{get,set}_memattr() for getting and
setting a page's machine-dependent memory attributes.  Add full
support for these functions on amd64 and i386 and stubs for them on
the other architectures.  The function pmap_page_set_memattr() is also
responsible for any other machine-dependent aspects of changing a
page's memory attributes, such as flushing the cache or updating the
direct map.  The uses include kmem_alloc_contig(), vm_page_alloc(),
and the device pager:

  kmem_alloc_contig() can now be used to allocate kernel memory with
  non-default memory attributes on amd64 and i386.

  vm_page_alloc() and the device pager will set the memory attributes
  for the real or fictitious page according to the object's default
  memory attributes.

Update the various pmap functions on amd64 and i386 that map pages to
incorporate each page's memory attributes in the mapping.

Notes: (1) Inherent to this design are safety features that prevent
the specification of inconsistent memory attributes by different
mappings on amd64 and i386.  In addition, the device pager provides a
warning when a device driver creates a fictitious page with memory
attributes that are inconsistent with the real page that the
fictitious page is an alias for. (2) Storing the machine-dependent
memory attributes for amd64 and i386 as a dedicated "int" in "struct
md_page" represents a compromise between space efficiency and the ease
of MFCing these changes to RELENG_7.

In collaboration with: jhb

Approved by:	re (kib)
2009-07-12 23:31:20 +00:00
Sam Leffler
8c393fd1f0 Cleanup ALIGNED_POINTER:
o add to platforms where it was missing (arm, i386, powerpc, sparc64, sun4v)
o define as "1" on amd64 and i386 where there is no restriction
o make the type returned consistent with ALIGN
o remove _ALIGNED_POINTER
o make associated comments consistent

Reviewed by:	bde, imp, marcel
Approved by:	re (kensmith)
2009-07-05 17:45:48 +00:00
Alan Cox
5797795f5a Correct the #endif comment.
Noticed by:	jmallett
Approved by:	re (kib)
2009-06-26 16:22:24 +00:00
Alan Cox
e999111ae7 This change is the next step in implementing the cache control functionality
required by video card drivers.  Specifically, this change introduces
vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all
architectures.  In addition, this changes adds a vm_cache_mode_t parameter
to kmem_alloc_contig() and vm_phys_alloc_contig().  These will be the
interfaces for allocating mapped kernel memory and physical memory,
respectively, with non-default cache modes.

In collaboration with:	jhb
2009-06-26 04:47:43 +00:00
Nathan Whitehorn
7f0ad28f28 Fix copy/paste typo in last revision. PMC0 control should be shifted 8
bits, not 6, on the PPC 970.
2009-06-23 04:02:36 +00:00
Nathan Whitehorn
30a2bd2f78 Teach cpu_est_clockrate() about the G5's slightly different PMC. This
allows the boot messages to include the CPU speed and makes possible
the forthcoming cpufreq support for the PPC 970.
2009-06-17 16:34:40 +00:00
Rafal Jaworowski
02b553cafc Initial version of the sec(4) driver for the integrated security engine found
in Freescale system-on-chip devices.

The following algorithms and schemes are currently supported:
  - 3DES, AES, DES
  - MD5, SHA1, SHA256, SHA384, SHA512

Reviewed by:	philip
Obtained from:	Freescale, Semihalf
2009-06-06 09:37:55 +00:00