Commit Graph

117 Commits

Author SHA1 Message Date
John Baldwin
0a4c51f423 Move declarations of invpcid_works and pmap_pcid_enabled to pmap.h.
Previously these were only declared under #ifdef SMP in <machine/smp.h>.
However, these variables are defind in pmap.c unconditionally, and efirt.c
references them unconditionally.  This fixes non-SMP kernel builds.

Discussed with:	kib
MFC after:	1 week
2016-10-31 18:37:05 +00:00
Konstantin Belousov
7c958a41fe Merge common parts of i386 and amd64 md_var.h and smp.h into
new headers x86/include x86_var.h and x86_smp.h.

Reviewed by:	emaste, jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D4358
2015-12-07 17:41:20 +00:00
Konstantin Belousov
27691a24ab For amd64 non-PCID machines, and for i386 machines with support for
the PG_G global pte flag, pmap_invalidate_all() fails to flush global
TLB entries [*].  This is because TLB shootdown handler for such
configs reloads CR3, and on i386 pmap_invalidate_all() does the same
for the initiating CPU.  Note that current code does not issue total
invalidation requests for the kernel_pmap.

Rename amd64 function invltlb_globpcid() to invltlb_glob(), it is not
specific for PCID for quite some time, and implement the same
functionality for i386.  Use the function instead of invltlb() in
shootdown handlers and in i386 pmap_invalidate_all(), but only for the
kernel pmap (which maps pages with the PG_G attribute set), which
takes care of PG_G TLB entries on flush.

To detect the affected pmap in i386 TLB shootdown handler, pmap should
be passed to the smp_masked_invltlb() function, which makes amd64 and
i386 TLB shootdown code almost identical.  Merge the code under x86/.

Noted by:	jhb [*]
Reviewed by:	cem, jhb, pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D4346
2015-12-03 11:14:14 +00:00
Mateusz Guzik
c025b81442 amd64: plug redundant bootAP declaration
Reported by:	gcc5
2015-09-22 21:07:47 +00:00
Konstantin Belousov
a546448b8d Rewrite amd64 PCID implementation to follow an algorithm described in
the Vahalia' "Unix Internals" section 15.12 "Other TLB Consistency
Algorithms".  The same algorithm is already utilized by the MIPS pmap
to handle ASIDs.

The PCID for the address space is now allocated per-cpu during context
switch to the thread using pmap, when no PCID on the cpu was ever
allocated, or the current PCID is invalidated.  If the PCID is reused,
bit 63 of %cr3 can be set to avoid TLB flush.

Each cpu has PCID' algorithm generation count, which is saved in the
pmap pcpu block when pcpu PCID is allocated.  On invalidation, the
pmap generation count is zeroed, which signals the context switch code
that already allocated PCID is no longer valid.  The implication is
the TLB shootdown for the given cpu/address space, due to the
allocation of new PCID.

The pm_save mask is no longer has to be tracked, which (significantly)
reduces the targets of the TLB shootdown IPIs.  Previously, pm_save
was reset only on pmap_invalidate_all(), which made it accumulate the
cpuids of all processors on which the thread was scheduled between
full TLB shootdowns.

Besides reducing the amount of TLB shootdowns and removing atomics to
update pm_saves in the context switch code, the algorithm is much
simpler than the maintanence of pm_save and selection of the right
address space in the shootdown IPI handler.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-09 19:11:01 +00:00
Konstantin Belousov
02c26f81a7 Move common code from sys/i386/i386/mp_machdep.c and
sys/amd64/amd64/mp_machdep.c, to the new common x86 source
sys/x86/x86/mp_x86.c.

Proposed and reviewed by:	jhb
Review:	https://reviews.freebsd.org/D2347
Sponsored by:	The FreeBSD Foundation
2015-04-24 16:20:56 +00:00
Neel Natu
18a2b08e65 Use lapic_ipi_alloc() to dynamically allocate IPI slots needed by bhyve when
vmm.ko is loaded.

Also relocate the 'justreturn' IPI handler to be alongside all other handlers.

Requested by:	kib
2015-03-14 02:32:08 +00:00
Roger Pau Monné
079f7ef839 xen: add a hook to perform AP startup
AP startup on PVH follows the PV method, so we need to add a hook in
order to diverge from bare metal.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

amd64/amd64/machdep.c:
 - Add hook for start_all_aps on native (using native_start_all_aps
   defined in mp_machdep).

amd64/amd64/mp_machdep.c:
 - Make some variables global because they will also be used by the
   Xen PVH AP startup code.
 - Use the start_all_aps hook to start APs.
 - Rename start_all_aps to native_start_all_aps.

amd64/include/smp.h:
 - Add declaration for native_start_all_aps.

x86/include/init.h:
 - Declare start_all_aps hook in init_ops.

x86/xen/pv.c:
 - Pick external declarations from mp_machdep.
 - Introduce Xen PV code to start APs on PVH.
 - Set start_all_aps init hook to use the Xen PVH implementation.
2014-03-11 10:27:57 +00:00
Roger Pau Monné
e8da1c4877 amd64/i386: switch IPI handlers to C code.
Move asm IPIs handlers to C code, so both Xen and native IPI handlers
share the same code.

Reviewed by: jhb
Approved by: gibbs
Sponsored by: Citrix Systems R&D

amd64/amd64/apic_vector.S:
i386/i386/apic_vector.s:
 - Remove asm coded IPI handlers and instead call the newly introduced
   C variants.

amd64/amd64/mp_machdep.c:
i386/i386/mp_machdep.c:
 - Add C coded clones to the asm IPI handlers (moved from
   x86/xen/hvm.c).

i386/include/smp.h:
amd64/include/smp.h:
 - Add prototypes for the C IPI handlers.

x86/xen/hvm.c:
 - Move the C IPI handlers to mp_machdep and call those in the Xen IPI
   handlers.

i386/xen/mp_machdep.c:
 - Add dummy IPI handlers to the i386 Xen PV port (this port doesn't
   support SMP).
2014-03-11 10:03:29 +00:00
John Baldwin
e07ef9b0f6 Move <machine/apicvar.h> to <x86/apicvar.h>. 2014-01-23 20:10:22 +00:00
Konstantin Belousov
6aceaa3e17 Tidy up some loose ends in the PCID code:
- Restore the pre-PCID TLB shootdown handlers for whole address space
  and single page invalidation asm code, and assign the IPI handler to
  them when PCID is not supported or disabled.  Old handlers have
  linear control flow.  But, still use the common return sequence.

- Stop using pcpu for INVPCID descriptors in the invlrg handler.  It
  is enough to allocate descriptors on the stack.  As result, two
  SWAPGS instructions are shaved off from the code for Haswell+.

- Fix the reverted condition in invlrng for checking of the PCID
  support [1], also in invlrng check that pmap is kernel pmap before
  performing other tests.  For the kernel pmap, which provides global
  mappings, the INVLPG must be used for invalidation always.

- Save the pre-computed pmap' %CR3 register in the struct pmap.  This
  allows to remove several checks for pm_pcid validity when %CR3 is
  reloaded [2].

Noted by:   gibbs [1]
Discussed with:	alc [2]
Tested by:	pho, flo
Sponsored by:	The FreeBSD Foundation
2013-09-04 23:31:29 +00:00
Konstantin Belousov
37eed8419c Implement support for the process-context identifiers ('PCID') on
Intel CPUs.  The feature tags TLB entries with the Id of the address
space and allows to avoid TLB invalidation on the context switch, it
is available only in the long mode.  In the microbenchmarks, using the
PCID decreased latency of the context switches by ~30% on SandyBridge
class desktop CPUs, measured with the lat_ctx program from lmbench.

If available, use INVPCID instruction when a TLB entry in non-current
address space needs to be invalidated.  The instruction is typically
available on the Haswell.

If needed, the use of PCID can be turned off with the
vm.pmap.pcid_enabled loader tunable set to 0.  The state of the
feature is reported by the vm.pmap.pcid_enabled sysctl.  The sysctl
vm.pmap.pcid_save_cnt reports the number of context switches which
avoided invalidating the TLB; compare with the total number of context
switches, available as sysctl vm.stats.sys.v_swtch.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	alc
Tested by:	pho, bf
2013-08-30 07:59:49 +00:00
Mitsuru IWASAKI
77c80e2e5b Share IPI init and startup code of mp_machdep.c with acpi_wakeup.c
as ipi_startup().
2012-06-12 00:14:54 +00:00
Andriy Gapon
234dab4a82 remove code for dynamic offlining/onlining of CPUs on x86
The code has definitely been broken for SCHED_ULE, which is a default
scheduler.  It may have been broken for SCHED_4BSD in more subtle ways,
e.g. with manually configured CPU affinities and for interrupt devilery
purposes.
We still provide a way to disable individual CPUs or all hyperthreading
"twin" CPUs before SMP startup.  See the UPDATING entry for details.

Interaction between building CPU topology and disabling CPUs still
remains fuzzy: topology is first built using all availble CPUs and then
the disabled CPUs should be "subtracted" from it.  That doesn't work
well if the resulting topology becomes non-uniform.

This work is done in cooperation with Attilio Rao who in addition to
reviewing also provided parts of code.

PR:		kern/145385
Discussed with:	gcooper, ambrisko, mdf, sbruno
Reviewed by:	attilio
Tested by:	pho, pluknet
X-MFC after:	never
2011-06-08 08:12:15 +00:00
Attilio Rao
71a19bdc64 Commit the support for removing cpumask_t and replacing it directly with
cpuset_t objects.
That is going to offer the underlying support for a simple bump of
MAXCPU and then support for number of cpus > 32 (as it is today).

Right now, cpumask_t is an int, 32 bits on all our supported architecture.
cpumask_t on the other side is implemented as an array of longs, and
easilly extendible by definition.

The architectures touched by this commit are the following:
- amd64
- i386
- pc98
- arm
- ia64
- XEN

while the others are still missing.
Userland is believed to be fully converted with the changes contained
here.

Some technical notes:
- This commit may be considered an ABI nop for all the architectures
  different from amd64 and ia64 (and sparc64 in the future)
- per-cpu members, which are now converted to cpuset_t, needs to be
  accessed avoiding migration, because the size of cpuset_t should be
  considered unknown
- size of cpuset_t objects is different from kernel and userland (this is
  primirally done in order to leave some more space in userland to cope
  with KBI extensions). If you need to access kernel cpuset_t from the
  userland please refer to example in this patch on how to do that
  correctly (kgdb may be a good source, for example).
- Support for other architectures is going to be added soon
- Only MAXCPU for amd64 is bumped now

The patch has been tested by sbruno and Nicholas Esborn on opteron
4 x 12 pack CPUs. More testing on big SMP is expected to came soon.
pluknet tested the patch with his 8-ways on both amd64 and i386.

Tested by:	pluknet, sbruno, gianni, Nicholas Esborn
Reviewed by:	jeff, jhb, sbruno
2011-05-05 14:39:14 +00:00
Attilio Rao
8c0ef2464e Revert md_assert_preempt() introduction.
Discussed with:	jeff, jhb
2011-05-04 20:29:40 +00:00
Attilio Rao
f1edea81ac Add the function md_assert_nopreempt(), which is a very consistent
function on the possibility of a thread to not preempt.

As this function is very tied to x86 (interrupts disabled checkings)
it is not intended to be used in MI code.
2011-04-30 23:12:37 +00:00
Alan Cox
63740078e0 Amd64 doesn't have a lazypmap ipi. 2011-03-27 16:18:51 +00:00
John Baldwin
d9d8d1449d Add a new ipi_cpu() function to the MI IPI API that can be used to send an
IPI to a specific CPU by its cpuid.  Replace calls to ipi_selected() that
constructed a mask for a single CPU with calls to ipi_cpu() instead.  This
will matter more in the future when we transition from cpumask_t to
cpuset_t for CPU masks in which case building a CPU mask is more expensive.

Submitted by:	peter, sbruno
Reviewed by:	rookie
Obtained from:	Yahoo! (x86)
MFC after:	1 month
2010-08-06 15:36:59 +00:00
Alexander Motin
d364638110 Merge COUNT_XINVLTLB_HITS and COUNT_IPIS kernel options from i386 to amd64.
This information can be very valuable for CPU sleep-time (and respectively
idle power consumption) optimization.

Add counters for timer-related IPIs.

Reviewed by:	jhb@ (previous version)
2010-06-17 11:54:49 +00:00
Attilio Rao
dc6fbf6545 * Completely Remove the option STOP_NMI from the kernel. This option
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions.  This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Adds an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding

This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.

Please don't forget to update your config file with the STOP_NMI
option removal

Reviewed by:	jhb
Tested by:	pho, bz, rink
Approved by:	re (kib)
2009-08-13 17:09:45 +00:00
Attilio Rao
120b18d86f FreeBSD right now support 32 CPUs on all the architectures at least.
With the arrival of 128+ cores it is necessary to handle more than that.
One of the first thing to change is the support for cpumask_t that needs
to handle more than 32 bits masking (which happens now).  Some places,
however, still assume that cpumask_t is a 32 bits mask.
Fix that situation by using always correctly cpumask_t when needed.

While here, remove the part under STOP_NMI for the Xen support as it
is broken in any case.

Additively make ipi_nmi_pending as static.

Reviewed by:	jhb, kmacy
Tested by:	Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2009-05-14 17:43:00 +00:00
Jeff Roberson
82fcb0f192 - Add support for cpuid leaf 0xb. This allows us to determine the
topology of nehalem/corei7 based systems.
 - Remove the cpu_cores/cpu_logical detection from identcpu.
 - Describe the layout of the system in cpu_mp_announce().

Sponsored by:   Nokia
2009-04-29 06:54:40 +00:00
Jung-uk Kim
c66d2b38c8 Initial suspend/resume support for amd64.
This code is heavily inspired by Takanori Watanabe's experimental SMP patch
for i386 and large portion was shamelessly cut and pasted from Peter Wemm's
AP boot code.
2009-03-17 00:48:11 +00:00
Marius Strobl
6f04e7b9aa Remove ipi_all() and ipi_self() as the former hasn't been used at
all to date and the latter also is only used in ia64 and powerpc
code which no longer serves a real purpose after bring-up and just
can be removed as well. Note that architectures like sun4u also
provide no means of implementing IPI'ing a CPU itself natively
in the first place.

Suggested by:	jhb
Reviewed by:	arch, grehan, jhb
2008-09-28 18:34:14 +00:00
Jeff Roberson
81aa71755b - Remove the old smp cpu topology specification with a new, more flexible
tree structure that encodes the level of cache sharing and other
   properties.
 - Provide several convenience functions for creating one and two level
   cpu trees as well as a default flat topology.  The system now always
   has some topology.
 - On i386 and amd64 create a seperate level in the hierarchy for HTT
   and multi-core cpus.  This will allow the scheduler to intelligently
   load balance non-uniform cores.  Presently we don't detect what level
   of the cache hierarchy is shared at each level in the topology.
 - Add a mechanism for testing common topologies that have more information
   than the MD code is able to provide via the kern.smp.topology tunable.
   This should be considered a debugging tool only and not a stable api.

Sponsored by:	Nokia
2008-03-02 07:58:42 +00:00
Attilio Rao
c8790f5d09 Fix some entries in the locks static table of witness.
In particular:
- smp_tlb_mtx is no longer used, so it is axed.
- smp rendezvous lock isn't really a leaf spin-mutex. Its bad placement in
  the table, however, has been the source of a false positive LOR reporting
  with the dt_lock.  However, smp rendezvous lock would have had sched_lock
  there for older lock, so it wasn't still a leaf lock.
- allpmaps is only used in ia32 architecture, so it is inserted in the
  appropriate stub.

Addictionally:
- kse_zombie_lock is no longer present, so its definition is axed out.
- zombie_lock doesn't need to have an exported symbol, so just let's it be
  declared as static.

Tested by: kris
Approved by: jeff (mentor)
Approved by: re
2007-09-20 20:38:43 +00:00
Alexander Kabaev
fa298d5ea8 Include machine/pcb.hto turn extern struct pcb stoppcbs[]; construct
into the valid C.
2007-05-19 05:01:43 +00:00
John Baldwin
4c5bec1161 Change the x86 interrupt code to use FreeBSD CPU IDs (i.e. PCPU_GET(cpuid))
rather than local APIC IDs to keep track of CPUs which can handle
interrupts.
2007-03-06 17:16:47 +00:00
John Baldwin
4ac60df584 Add a new 'pmap_invalidate_cache()' to flush the CPU caches via the
wbinvd() instruction.  This includes a new IPI so that all CPU caches on
all CPUs are flushed for the SMP case.

MFC after:	1 month
2006-05-01 21:36:47 +00:00
John Baldwin
b439e431bf Tweak how the MD code calls the fooclock() methods some. Instead of
passing a pointer to an opaque clockframe structure and requiring the
MD code to supply CLKF_FOO() macros to extract needed values out of the
opaque structure, just pass the needed values directly.  In practice this
means passing the pair (usermode, pc) to hardclock() and profclock() and
passing the boolean (usermode) to hardclock_cpu() and hardclock_process().
Other details:
- Axe clockframe and CLKF_FOO() macros on all architectures.  Basically,
  all the archs were taking a trapframe and converting it into a clockframe
  one way or another.  Now they can just extract the PC and usermode values
  directly out of the trapframe and pass it to fooclock().
- Renamed hardclock_process() to hardclock_cpu() as the latter is more
  accurate.
- On Alpha, we now run profclock() at hz (profhz == hz) rather than at
  the slower stathz.
- On Alpha, for the TurboLaser machines that don't have an 8254
  timecounter, call hardclock() directly.  This removes an extra
  conditional check from every clock interrupt on Alpha on the BSP.
  There is probably room for even further pruning here by changing Alpha
  to use the simplified timecounter we use on x86 with the lapic timer
  since we don't get interrupts from the 8254 on Alpha anyway.
- On x86, clkintr() shouldn't ever be called now unless using_lapic_timer
  is false, so add a KASSERT() to that affect and remove a condition
  to slightly optimize the non-lapic case.
- Change prototypeof  arm_handler_execute() so that it's first arg is a
  trapframe pointer rather than a void pointer for clarity.
- Use KCOUNT macro in profclock() to lookup the kernel profiling bucket.

Tested on:	alpha, amd64, arm, i386, ia64, sparc64
Reviewed by:	bde (mostly)
2005-12-22 22:16:09 +00:00
John Baldwin
333b8de537 MFi386:
- Move PUSH_FRAME and POP_FRAME to asmacros.h and use PUSH_FRAME in
  atpic entry points.
- Move PCPU_* asm macros out of the middle of the asm profiling macros.
- Pass IRQ vector argument as an int rather than void * to reduce diffs
  with i386.
- EOI the lapic in C for the lapic timer handler.
- GC unused Xcpuast function.
- Split IPI_STOP handling code of ipi_nmi_handler() out into a
  cpustop_handler() function and call it from Xcpustop rather than
  duplicating all the logic in assembly.
- Fixup the list of symbols with interrupt frames in ddb traces.
  Xatpic_fastintr* have never existed on amd64, and the lapic timer
  handler and various IPI handlers were missing.
- Use trapframe instead of intrframe for interrupt entry points (on amd64
  the interrupt vector was already a separate argument, so the two frames
  were already identical) and GC intrframe.

Submitted by:	peter (3)
2005-12-08 18:33:30 +00:00
John Baldwin
58553b9925 Rename the KDB_STOP_NMI kernel option to STOP_NMI and make it apply to all
IPI_STOP IPIs.
- Change the i386 and amd64 MD IPI code to send an NMI if STOP_NMI is
  enabled if an attempt is made to send an IPI_STOP IPI.  If the kernel
  option is enabled, there is also a sysctl to change the behavior at
  runtime (debug.stop_cpus_with_nmi which defaults to enabled).  This
  includes removing stop_cpus_nmi() and making ipi_nmi_selected() a
  private function for i386 and amd64.
- Fix ipi_all(), ipi_all_but_self(), and ipi_self() on i386 and amd64 to
  properly handle bitmapped IPIs as well as IPI_STOP IPIs when STOP_NMI is
  enabled.
- Fix ipi_nmi_handler() to execute the restart function on the first CPU
  that is restarted making use of atomic_readandclear() rather than
  assuming that the BSP is always included in the set of restarted CPUs.
  Also, the NMI handler didn't clear the function pointer meaning that
  subsequent stop and restarts could execute the function again.
- Define a new macro HAVE_STOPPEDPCBS on i386 and amd64 to control the use
  of stoppedpcbs[] and always enable it for i386 and amd64 instead of
  being dependent on KDB_STOP_NMI.  It works fine in both the NMI and
  non-NMI cases.
2005-10-24 21:04:19 +00:00
Doug White
fdc9713bf7 Implement an alternate method to stop CPUs when entering DDB. Normally we use
a regular IPI vector, but this vector is blocked when interrupts are disabled.
With "options KDB_STOP_NMI" and debug.kdb.stop_cpus_with_nmi set, KDB will
send an NMI to each CPU instead. The code also has a context-stuffing
feature which helps ddb extract the state of processes running on the
stopped CPUs.

KDB_STOP_NMI is only useful with SMP and complains if SMP is not defined.
This feature only applies to i386 and amd64 at the moment, but could be
used on other architectures with the appropriate MD bits.

Submitted by:	ups
2005-04-30 20:01:00 +00:00
Peter Wemm
c29f1e2b3b MFi386: Bring over John's local apic timer code 2005-02-28 23:37:35 +00:00
Peter Wemm
b6e89c6d47 JumboMFi386: use bitmapped IPI handler. Update elcr and default mptable
config handler.  Tidy up various local apic initialization.
2005-01-21 06:01:20 +00:00
Warner Losh
46280ae719 Begin all license/copyright comments with /*- 2005-01-05 20:17:21 +00:00
Peter Wemm
12c1418ccf Kill the LAZYPMAP ifdefs. While they worked, they didn't do anything
to help the AMD cpus (which have a hardware tlb flush filter).  I held
off to see what the 64 bit Intel cpus did, but it doesn't seem to help
much there either.  Oh well, store it in the Attic.
2004-05-16 22:11:50 +00:00
Peter Wemm
b29fd7c4db MFi386: mp_topology(). 2004-01-28 23:51:16 +00:00
Peter Wemm
0d2a298904 Initial landing of SMP support for FreeBSD/amd64.
- This is heavily derived from John Baldwin's apic/pci cleanup on i386.
- I have completely rewritten or drastically cleaned up some other parts.
  (in particular, bootstrap)
- This is still a WIP.  It seems that there are some highly bogus bioses
  on nVidia nForce3-150 boards.  I can't stress how broken these boards
  are.  I have a workaround in mind, but right now the Asus SK8N is broken.
  The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed.
- Most of my testing has been with SCHED_ULE.  SCHED_4BSD works.
- the apic and acpi components are 'standard'.
- If you have an nVidia nForce3-150 board, you are stuck with 'device
  atpic' in addition, because they somehow managed to forget to connect the
  8254 timer to the apic, even though its in the same silicon!  ARGH!
  This directly violates the ACPI spec.
2003-11-17 08:58:16 +00:00
Peter Wemm
afa8862328 Commit MD parts of a loosely functional AMD64 port. This is based on
a heavily stripped down FreeBSD/i386 (brutally stripped down actually) to
attempt to get a stable base to start from.  There is a lot missing still.
Worth noting:
- The kernel runs at 1GB in order to cheat with the pmap code.  pmap uses
  a variation of the PAE code in order to avoid having to worry about 4
  levels of page tables yet.
- It boots in 64 bit "long mode" with a tiny trampoline embedded in the
  i386 loader.  This simplifies locore.s greatly.
- There are still quite a few fragments of i386-specific code that have
  not been translated yet, and some that I cheated and wrote dumb C
  versions of (bcopy etc).
- It has both int 0x80 for syscalls (but using registers for argument
  passing, as is native on the amd64 ABI), and the 'syscall' instruction
  for syscalls.  int 0x80 preserves all registers, 'syscall' does not.
- I have tried to minimize looking at the NetBSD code, except in a couple
  of places (eg: to find which register they use to replace the trashed
  %rcx register in the syscall instruction).  As a result, there is not a
  lot of similarity.  I did look at NetBSD a few times while debugging to
  get some ideas about what I might have done wrong in my first attempt.
2003-05-01 01:05:25 +00:00
Peter Wemm
cc66ebe2a9 Commit a partial lazy thread switch mechanism for i386. it isn't as lazy
as it could be and can do with some more cleanup.  Currently its under
options LAZY_SWITCH.  What this does is avoid %cr3 reloads for short
context switches that do not involve another user process.  ie: we can
take an interrupt, switch to a kthread and return to the user without
explicitly flushing the tlb.  However, this isn't as exciting as it could
be, the interrupt overhead is still high and too much blocks on Giant
still.  There are some debug sysctls, for stats and for an on/off switch.

The main problem with doing this has been "what if the process that you're
running on exits while we're borrowing its address space?" - in this case
we use an IPI to give it a kick when we're about to reclaim the pmap.

Its not compiled in unless you add the LAZY_SWITCH option.  I want to fix a
few more things and get some more feedback before turning it on by default.

This is NOT a replacement for Bosko's lazy interrupt stuff.  This was more
meant for the kthread case, while his was for interrupts.  Mine helps a
little for interrupts, but his helps a lot more.

The stats are enabled with options SWTCH_OPTIM_STATS - this has been a
pseudo-option for years, I just added a bunch of stuff to it.

One non-trivial change was to select a new thread before calling
cpu_switch() in the first place.  This allows us to catch the silly
case of doing a cpu_switch() to the current process.  This happens
uncomfortably often.  This simplifies a bit of the asm code in cpu_switch
(no longer have to call choosethread() in the middle).  This has been
implemented on i386 and (thanks to jake) sparc64.  The others will come
soon.  This is actually seperate to the lazy switch stuff.

Glanced at by:  jake, jhb
2003-04-02 23:53:30 +00:00
Paul Saab
87437b0b89 Nuke options HTT infavor of machdep.hlt_logical_cpus tunable/sysctl.
This keeps the logical cpu's halted in the idle loop.  By default
the logical cpu's are halted at startup.  It is also possible to
halt any cpu in the idle loop now using machdep.hlt_cpus.

Examples of how to use this:
machdep.hlt_cpus=1	halt cpu0
machdep.hlt_cpus=2	halt cpu1
machdep.hlt_cpus=4	halt cpu2
machdep.hlt_cpus=3	halt cpu0,cpu1

Reviewed by:	jhb, peter
2003-03-26 19:49:34 +00:00
Jake Burkholder
238dd3209a Split statclock into statclock and profclock, and made the method for driving
statclock based on profhz when profiling is enabled MD, since most platforms
don't use this anyway.  This removes the need for statclock_process, whose
only purpose was to subdivide profhz, and gets the profiling clock running
outside of sched_lock on platforms that implement suswintr.
Also changed the interface for starting and stopping the profiling clock to
do just that, instead of changing the rate of statclock, since they can now
be separate.

Reviewed by:	jhb, tmm
Tested on:	i386, sparc64
2003-02-03 17:53:15 +00:00
Jim Pirzyk
b2eb172cc3 Add the !define(COMPILING_LINT)
pass the pointy hat...

Requested by: Juli Mallett <jmallett@FreeBSD.org>
2002-10-17 18:17:28 +00:00
Jim Pirzyk
c8c1cf0ca7 put an #error directive when SMP and CPU_DISABLE_CMPXCHG are set
together.

Requested by: Lars Eggart <larse@isi.edu>
Enlighted how to do it by: John Baldwin <jhb@freebsd.org>
2002-10-17 05:51:36 +00:00
Peter Wemm
f1b665c8fe Revive backed out pmap related changes from Feb 2002. The highlights are:
- It actually works this time, honest!
- Fine grained TLB shootdowns for SMP on i386.  IPI's are very expensive,
  so try and optimize things where possible.
- Introduce ranged shootdowns that can be done as a single IPI.
- PG_G support for i386
- Specific-cpu targeted shootdowns.  For example, there is no sense in
  globally purging the TLB cache for where we are stealing a page from
  the local unshared process on the local cpu.  Use pm_active to track
  this.
- Add some instrumentation for the tlb shootdown code.
- Rip out SMP code from <machine/cpufunc.h>
- Try and fix some very bogus PG_G and PG_PS interactions that were bad
  enough to cause vm86 bios calls to break.  vm86 depended on our existing
  bugs and this was the cause of the VESA panics last time.
- Fix the silly one-line error that caused the 'panic: bad pte' last time.
- Fix a couple of other silly one-line errors that should have caused more
  pain than they did.

Some more work is needed:
- pmap_{zero,copy}_page[_idle].  These can be done without IPI's if we
  have a hook in cpu_switch.
- The IPI handlers need some cleanup.  I have a bogus %ds load that can
  be avoided.
- APTD handling is rather bogus and appears to be a large source of
  global TLB IPI shootdowns for no really good reason.

I see speedups of between 1.5% and ~4% on buildworlds in a while 1 loop.
I expect to see a bigger difference when there is significant pageout
activity or the system otherwise has memory shortages.

I have backed out a few optimizations that I had been using over the last
few days in order to be a little more conservative.  I'll revisit these
again over the next few days as the dust settles.

New option:  DISABLE_PG_G - In case I missed something.
2002-07-12 07:56:11 +00:00
Bruce Evans
809dbbc99b Fixed some style bugs in the removal of __P(()). The main ones were
not removing tabs before "__P((", and not outdenting continuation lines
to preserve non-KNF lining up of code with parentheses.  Switch to KNF
formatting and/or rewrap the whole prototype in some cases.
2002-03-23 15:09:35 +00:00
Alfred Perlstein
b63dc6ad47 Remove __P. 2002-03-20 05:48:58 +00:00
Peter Wemm
d1693e1701 Back out all the pmap related stuff I've touched over the last few days.
There is some unresolved badness that has been eluding me, particularly
affecting uniprocessor kernels.  Turning off PG_G helped (which is a bad
sign) but didn't solve it entirely.  Userland programs still crashed.
2002-02-27 09:51:33 +00:00