Commit Graph

1655 Commits

Author SHA1 Message Date
Alan Cox
5c0db7c71a Implement support for CPU private mappings within sf_buf_alloc(). 2005-02-13 06:23:13 +00:00
John Baldwin
e8ce55117b Use the local APIC timer to drive the various kernel clocks on SMP machines
rather than forwarding interrupts from the clock devices around using IPIs:
- Add an IDT vector that pushes a clock frame and calls
  lapic_handle_timer().
- Add functions to program the local APIC timer including setting the
  divisor, and setting up the timer to either down a periodic countdown
  or one-shot countdown.
- Add a lapic_setup_clock() function that the BSP calls from
  cpu_init_clocks() to setup the local APIC timer if it is going to be
  used.  The setup uses a one-shot countdown to calibrate the timer.  We
  then program the timer on each CPU to fire at a frequency of hz * 3.
  stathz is defined as freq / 23 (hz * 3 / 23), and profhz is defined as
  freq / 2 (hz * 3 / 2).  This gives the clocks relatively prime divisors
  while keeping a low LCM for the frequency of the clock interrupts.
  Thanks to Peter Jeremy for suggesting this approach.
- Remove the hardclock and statclock forwarding code including the two
  associated IPIs.  The bitmap IPI handler has now effectively degenerated
  to just IPI_AST.
- When the local APIC timer is used we don't turn the RTC on at all, but
  we still enable interrupts on the ISA timer 0 (i8254) for timecounting
  purposes.
2005-02-08 20:25:07 +00:00
Maxim Sobolev
84569dff34 o Move copyin()/copyout() out of i386_{get,set}_ldt() and
i386_{get,set}_ioperm() and make those APIs visible in the kernel namespace;

o use i386_{get,set}_ldt() and i386_{get,set}_ioperm() instead of sysarch()
  in the linuxlator, which allows to kill another two stackgaps.

MFC after:	2 weeks
2005-01-26 13:59:46 +00:00
John Baldwin
42f0ddd465 Tweak the ELCR support slightly. Explicitly probe the ELCR during boot
instead of burying that in the atpic(4) code as atpic(4) is not the only
user of elcr(4).  Change the elcr(4) code to export a global elcr_found
variable that other code can check to see if a valid ELCR was found.

MFC after:	1 month
2005-01-18 20:24:47 +00:00
Scott Long
e015dfcfd1 Introduce bus_dmamap_load_mbuf_sg(). Instead of taking a callback arg, this
cuts to the chase and fills in a provided s/g list.  This is meant to optimize
out the cost of the callback since the callback doesn't serve much purpose for
mbufs since mbuf loads will never be deferred.  This is just for amd64 and
i386 at the moment, other arches will be coming shortly.
2005-01-07 07:57:18 +00:00
Warner Losh
86cb007f9f /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 22:18:23 +00:00
Warner Losh
838d838f0b Remove left over include file from stallion driver. 2005-01-06 22:07:20 +00:00
Warner Losh
cf7fbde441 Expand indirect reference to BSD license with the current one. 2005-01-06 22:05:28 +00:00
Warner Losh
94306e4017 This doesn't seem to have been used since 386BSD days 2005-01-06 22:00:50 +00:00
Warner Losh
0027ba028a These appear to be unused in our tree, so remove them. 2005-01-05 20:50:31 +00:00
John Baldwin
e367f46738 Add some constants for the local APIC timer. 2004-12-23 20:35:07 +00:00
John Baldwin
21bc8faa44 Add a simple 'intrcnt_add' function that other MD code can use to add a
single named counter to the interrupt counts without having to fake up an
entire interrupt source.
2004-12-23 20:34:18 +00:00
John Baldwin
dfa7bc486b - Add a function to set the Task Priority Register (TPR) of the local APIC.
Currently this is only used to initiailize the TPR to 0 during initial
  setup.
- Reallocate vectors for the local APIC timer, error, and thermal LVT
  entries.  The timer entry is allocated from the top of the I/O interrupt
  range reducing the number of vectors available for hardware interrupts
  to 191.  Linux happens to use the same exact vector for its timer
  interrupt as well.  If the timer vector shared the same priority queue
  as the IPI handlers, then the frequency that the timer vector will
  eventually be firing at can interact badly with the IPIs resulting in
  the queue filling and the dreaded IPI stuck panics, hence it being located
  at the top of the previous priority queue instead.
- Fixup various minor nits in comments.
2004-12-23 19:47:59 +00:00
Stephan Uphoff
f30a4a1ced Avoid more than two pending IPI interrupt vectors per local APIC
as this may cause deadlocks.

This should fix kern/72123.

Discussed with: jhb
Tested by: Nik Azim Azam, Andy Farkas, Flack Man, Aykut KARA
           Izzet BESKARDES, Jens Binnewies, Karl Keusgen
Approved by:    sam (mentor)
2004-12-07 20:15:01 +00:00
Marcel Moolenaar
bcc5241c43 Change gdb_cpu_setreg() to not take the value to which to set the
specified register, but a pointer to the in-memory representation of
that value. The reason for this is twofold:
1. Not all registers can be represented by a register_t. In particular
   FP registers fall in that category. Passing the new register value
   by reference instead of by value makes this point moot.
2. When we receive a G or P packet, both are for writing a register,
   the packet will have the register value in target-byte order and
   in the memory representation (modulo the fact that bytes are sent
   as 2 printable hexadecimal numbers of course). We only need to
   decode the packet to have a pointer to the register value.

This change fixes the bug of extracting the register value of the P
packet as a hexadecimal number instead of as a bit array. The quick
(and dirty) fix to bswap the register value in gdb_cpu_setreg() as
it has been added on i386 and amd64 can therefore be removed and has
in fact been that.

Tested on: alpha, amd64, i386, ia64, sparc64
2004-12-01 06:40:35 +00:00
David Schultz
ab44ebf537 Remove UAREA_PAGES.
Reviewed by:	arch@
2004-11-20 02:29:50 +00:00
John Baldwin
2d68e3fb92 Initiate deorbit burn sequence for 80386 support in FreeBSD: Remove
80386 (I386_CPU) support from the kernel.
2004-11-16 20:42:32 +00:00
John Baldwin
90baa95fad Spell _KERNEL correctly so that UP kernels are actually optimized again.
Submitted by:	pjd
2004-11-12 19:18:46 +00:00
John Baldwin
bd2ed154a1 - Use the SMP style ops for atomic_load/store() in userland so that
libraries and binaries will work on both UP and SMP machines.
- Remove unnecessary gcc memory barrier from the UP atomic_store() op.

Submitted by:	bde
2004-11-12 18:40:22 +00:00
John Baldwin
57621b8b35 - Place the gcc memory barrier hint in the right place in the 80386 version
of atomic_store_rel().
- Use the 80386 versions of atomic_load_acq() and atomic_store_rel() that
  do not use serializing instructions on all UP kernels since a UP machine
  does need to synchronize with other CPUs.  This trims lots of cycles from
  spin locks on UP kernels among other things.

Benchmarked by:	rwatson
2004-11-11 22:42:25 +00:00
Peter Wemm
ffcb357bd1 Begin an invasion of i386-land by amd64.
Expose some of the amd64-specific sysarch functions to allow alternative
implementations of the %fs/%gs code for TLS, threads, etc.  USER_LDT does
not exist on the amd64 kernel, so we have to implement things other ways.
2004-11-06 03:23:36 +00:00
Nate Lawson
31ad3b8802 Move the code for halting the CPU (acpi_cpu_c1) into machdep files.
This removes the last MD portion of acpi_cpu.c.

MFC after:	2 weeks
2004-10-11 05:39:15 +00:00
Alan Cox
aced26ce6e Make pte_load_store() an atomic operation in all cases, not just i386 PAE.
Restructure pmap_enter() to prevent the loss of a page modified (PG_M) bit
in a race between processors.  (This restructuring assumes the newly atomic
pte_load_store() for correct operation.)

Reviewed by: tegge@
PR: i386/61852
2004-10-08 08:23:43 +00:00
Alan Cox
0a752e9843 Prevent the unexpected deallocation of a page table page while performing
pmap_copy().  This entails additional locking in pmap_copy() and the
addition of a "flags" parameter to the page table page allocator for
specifying whether it may sleep when memory is unavailable.  (Already,
pmap_copy() checks the availability of memory, aborting if it is scarce.
In theory, another CPU could, however, allocate memory between
pmap_copy()'s check and the call to the page table page allocator,
causing the current thread to release its locks and sleep.  This change
makes this scenario impossible.)

Reviewed by: tegge@
2004-09-29 19:20:40 +00:00
Julian Elischer
def46d58a6 Fix breakpoint handling for i386.
not sure yet about 5.x... MFC if needed.
Also fixes small problems with examining some registers and
some specific gdb transfer problems.

	As the patch says:
	This is not a pretty patch and only meant as a temporary
	fix until a better solution is committed.

PR:		i386/71715
Submitted by:	Stephan Uphoff <ups@tree.com>
MFC after:	1 week
2004-09-15 23:26:49 +00:00
Scott Long
9e0c3bdf64 Double the number of kernel page tables for amd64 and for i386/PAE. The old
value was only enough for 8GB of RAM, the new value can do 16GB.  This still
isn't optimal since it doesn't scale.  Fixing this for amd64 looks to be
fairly easy, but for i386 will be quite difficult.

Reviewed by: peter
2004-09-11 01:31:26 +00:00
Scott Long
9923b511ed Turn PREEMPTION into a kernel option. Make sure that it's defined if
FULL_PREEMPTION is defined.  Add a runtime warning to ULE if PREEMPTION is
enabled (code inspired by the PREEMPTION warning in kern_switch.c).  This
is a possible MT5 candidate.
2004-09-02 18:59:15 +00:00
Julian Elischer
df3a834f7e Give up trying to make preemption dependent on SCHED_4BSD
the list of breakages was getting too long
2004-09-01 20:41:18 +00:00
Julian Elischer
6222ded017 Don't ask for this for modules. no modules need to know about preemption at the moment 2004-09-01 18:29:57 +00:00
Scott Long
f164d4148e Protect the PREEMPTION logic with #ifdef _KERNEL to fix the build. 2004-09-01 10:12:08 +00:00
Julian Elischer
02ea3bcab9 Only turn preemption for 4bsd.
it's still poison for ULE.
2004-09-01 09:01:32 +00:00
Julian Elischer
6804a3ab6d Give the 4bsd scheduler the ability to wake up idle processors
when there is new work to be done.

MFC after:	5 days
2004-09-01 06:42:02 +00:00
Marcel Moolenaar
0f2fe153bc Move the kernel-specific logic to adjust frompc from MI to MD. For
these two reasons:
1. On ia64 a function pointer does not hold the address of the first
   instruction of a functions implementation. It holds the address
   of a function descriptor. Hence the user(), btrap(), eintr() and
   bintr() prototypes are wrong for getting the actual code address.
2. The logic forces interrupt, trap and exception entry points to
   be layed-out contiguously. This can not be achieved on ia64 and is
   generally just bad programming.

The MCOUNT_FROMPC_USER macro is used to set the frompc argument to
some kernel address which represents any frompc that falls outside
the kernel text range. The macro can expand to ~0U to bail out in
that case.
The MCOUNT_FROMPC_INTR macro is used to set the frompc argument to
some kernel address to represent a call to a trap or interrupt
handler. This to avoid that the trap or interrupt handler appear to
be called from everywhere in the call graph. The macro can expand
to ~0U to prevent adjusting frompc. Note that the argument is selfpc,
not frompc.

This commit defines the macros on all architectures equivalently to
the original code in sys/libkern/mcount.c. People can take it from
here...

Compile-tested on: alpha, amd64, i386, ia64 and sparc64
Boot-tested on: i386
2004-08-27 19:42:35 +00:00
David E. O'Brien
2e262ac39b Fix a bug in in_cksum_hdr w/o -O.
The C code assumes that the carry bit is always kept from the previous
operation. However, the pointer indexing requires another add operation.
Thus, the carry bit from the first operation is tromped over by the
"addl" operation that ends up following it, so the "adcl" that follows
that has no effect because the carry bit is cleared before it.
The result is checksum failure on received packets.

The larger issue is that there isn't any other way of preventing the compiler
inserting arbitrary instructions between different __asm statements (and
that the commit message in revision 1.13 of in_cksum.h is wrong on
this point).  From
http://developer.apple.com/documentation/DeveloperTools/gcc-3.3/gcc/Extended-Asm.html
	---8<---8<---8<---
	You can't expect a sequence of volatile asm instructions to remain
	perfectly consecutive. If you want consecutive output, use a single
	asm.  Also, GCC will perform some optimizations across a volatile
	asm instruction; GCC does not "forget everything" when it encounters
	a volatile asm instruction the way some other compilers do.
	---8<---8<---8<---

Also, this change also makes the ASM code much easier to read.

PR:		69257
Submitted by:	Mike Bristow <mike@urgle.com>, Qing Li <qing.li@bluecoat.com>
2004-08-25 18:28:15 +00:00
David E. O'Brien
9c737de401 Increase the scaling of VM_KMEM_SIZE_MAX.
Submitted by:	alc
2004-08-16 08:35:22 +00:00
Robert Watson
a632deec30 Add an "options MP_WATCHDOG" to i386. This option allows one of the
logical CPUs on a system to be used as a dedicated watchdog to cause a
drop to the debugger and/or generate an NMI to the boot processor if
the kernel ceases to respond.  A sysctl enables the watchdog running
out of the processor's idle thread; a callout is launched to reset a
timer in the watchdog.  If the callout fails to reset the timer for ten
seconds, the watchdog will fire.  The sysctl allows you to select which
CPU will run the watchdog.

A sample "debug.leak_schedlock" is included, which causes a sysctl to
spin holding sched_lock in order to trigger the watchdog.  On my Xeons,
the watchdog is able to detect this failure mode and break into the
debugger, which cannot otherwise be done without an NMI button.

This option does not currently work with sched_ule due to ule's push
notion of scheduling, similar to machdep.hlt_logical_cpus failing to
work with that scheduler.

On face value, this might seem somewhat inefficient, but there are a
lot of dual-processor Xeons with HTT around, so using one as a watchdog
for testing is not as inefficient as one might fear.
2004-08-15 18:02:09 +00:00
Maxime Henrion
9f1b87f106 Instead of calling ia32_pause() conditionally on __i386__ or __amd64__
being defined, define and use a new MD macro, cpu_spinwait().  It only
expands to something on i386 and amd64, so the compiled code should be
identical.

Name of the macro found by:	jhb
Reviewed by:	jhb
2004-08-03 18:44:27 +00:00
Doug Rabson
4d84a58d1d Add definitions for TLS relocations. 2004-08-02 19:12:17 +00:00
Scott Long
5ba0615c03 Optimize intr_execute_handlers() by combining the pic_disable_source() and
pic_eoi_source() into one call.  This halves the number of spinlock operations
and indirect function calls in the normal case of handling a normal (ithread)
interrupt.  Optimize the atpic and ioapic drivers to use inlines where
appropriate in supporting the intr_execute_handlers() change.

This knocks 900ns, or roughly 1350 cycles, off of the time spent servicing an
interrupt in the common case on my 1.5GHz P4 uniprocessor system.  SMP systems
likely won't see as much of a gain due to the ioapic being more efficient than
the atpic.  I'll investigate porting this to amd64 soon.

Reviewed by:	jhb
2004-08-02 15:31:10 +00:00
Scott Long
9352fe30a0 Turn off PREEMPTION by default while it gets debugged. It's been causing
4 weeks of problems including deadlocks and instant panics.  Note that the
real bugs are likely in the scheduler.
2004-08-01 14:31:45 +00:00
Mark Murray
8ab2f5ecc5 Break out the MI part of the /dev/[k]mem and /dev/io drivers into
their own directory and module, leaving the MD parts in the MD
area (the MD parts _are_ part of the modules). /dev/mem and /dev/io
are now loadable modules, thus taking us one step further towards
a kernel created entirely out of modules. Of course, there is nothing
preventing the kernel from having these statically compiled.
2004-08-01 11:40:54 +00:00
Robert Watson
1a8cfbc450 Pass a thread argument into cpu_critical_{enter,exit}() rather than
dereference curthread.  It is called only from critical_{enter,exit}(),
which already dereferences curthread.  This doesn't seem to affect SMP
performance in my benchmarks, but improves MySQL transaction throughput
by about 1% on UP on my Xeon.

Head nodding:	jhb, bmilekic
2004-07-27 16:41:01 +00:00
David Schultz
479f8d2214 Make FLT_ROUNDS correctly reflect the dynamic rounding mode. 2004-07-19 08:17:25 +00:00
Marcel Moolenaar
37224cd3fc Mega update for the KDB framework: turn DDB into a KDB backend.
Most of the changes are a direct result of adding thread awareness.
Typically, DDB_REGS is gone. All registers are taken from the
trapframe and backtraces use the PCB based contexts. DDB_REGS was
defined to be a trapframe on all platforms anyway.
Thread awareness introduces the following new commands:
	thread X	switch to thread X (where X is the TID),
	show threads	list all threads.

The backtrace code has been made more flexible so that one can
create backtraces for any thread by giving the thread ID as an
argument to trace.

With this change, ia64 has support for breakpoints.
2004-07-10 23:47:20 +00:00
Marcel Moolenaar
6c29a22f1f Update for the KDB framework:
o  s/ddb_on_nmi/kdb_on_nmi/g
o  Rename sysctl machdep.ddb_on_nmi to machdep.kdb_on_nmi
o  Make debugging support conditional upon KDB instead of DDB.
o  Call kdb_reenter() when kdb_active is non-zero.
o  Call kdb_trap() to enter the debugger when not already active.
o  Update comments accordingly.
o  Remove misplaced prototype of kdb_trap().
2004-07-10 22:11:14 +00:00
Marcel Moolenaar
5a39cbaf69 Implement makectx(). The makectx() function is used by KDB to create
a PCB from a trapframe for purposes of unwinding the stack. The PCB
is used as the thread context and all but the thread that entered the
debugger has a valid PCB.
This function can also be used to create a context for the threads
running on the CPUs that have been stopped when the debugger got
entered. This however is not done at the time of this commit.
2004-07-10 19:56:00 +00:00
Marcel Moolenaar
cbc174356c Introduce the KDB debugger frontend. The frontend provides a framework
in which multiple (presumably different) debugger backends can be
configured and which provides basic services to those backends.
Besides providing services to backends, it also serves as the single
point of contact for any and all code that wants to make use of the
debugger functions, such as entering the debugger or handling of the
alternate break sequence. For this purpose, the frontend has been
made non-optional.
All debugger requests are forwarded or handed over to the current
backend, if applicable. Selection of the current backend is done by
the debug.kdb.current sysctl. A list of configured backends can be
obtained with the debug.kdb.available sysctl. One can enter the
debugger by writing to the debug.kdb.enter sysctl.
2004-07-10 18:40:12 +00:00
Marcel Moolenaar
72d44f31a6 Introduce the GDB debugger backend for the new KDB framework. The
backend improves over the old GDB support in the following ways:
o  Unified implementation with minimal MD code.
o  A simple interface for devices to register themselves as debug
   ports, ala consoles.
o  Compression by using run-length encoding.
o  Implements GDB threading support.
2004-07-10 17:47:22 +00:00
John Baldwin
0c0b25ae91 Implement preemption of kernel threads natively in the scheduler rather
than as one-off hacks in various other parts of the kernel:
- Add a function maybe_preempt() that is called from sched_add() to
  determine if a thread about to be added to a run queue should be
  preempted to directly.  If it is not safe to preempt or if the new
  thread does not have a high enough priority, then the function returns
  false and sched_add() adds the thread to the run queue.  If the thread
  should be preempted to but the current thread is in a nested critical
  section, then the flag TDF_OWEPREEMPT is set and the thread is added
  to the run queue.  Otherwise, mi_switch() is called immediately and the
  thread is never added to the run queue since it is switch to directly.
  When exiting an outermost critical section, if TDF_OWEPREEMPT is set,
  then clear it and call mi_switch() to perform the deferred preemption.
- Remove explicit preemption from ithread_schedule() as calling
  setrunqueue() now does all the correct work.  This also removes the
  do_switch argument from ithread_schedule().
- Do not use the manual preemption code in mtx_unlock if the architecture
  supports native preemption.
- Don't call mi_switch() in a loop during shutdown to give ithreads a
  chance to run if the architecture supports native preemption since
  the ithreads will just preempt DELAY().
- Don't call mi_switch() from the page zeroing idle thread for
  architectures that support native preemption as it is unnecessary.
- Native preemption is enabled on the same archs that supported ithread
  preemption, namely alpha, i386, and amd64.

This change should largely be a NOP for the default case as committed
except that we will do fewer context switches in a few cases and will
avoid the run queues completely when preempting.

Approved by:	scottl (with his re@ hat)
2004-07-02 20:21:44 +00:00
Peter Wemm
654bd0e802 Reduce the size of pv entries by 15%. This saves 1MB of KVA for mapping
pv entries per 1GB of user virtual memory.  (eg: if we had 1GB file was
mmaped into 30 processes, that would theoretically reduce the KVA demand by
30MB for pv entries.  In reality though, we limit pv entries so we don't
have that many at once.)

We used to store the vm_page_t for the page table page.  But we recently
had the pa of the ptp, or can calculate it fairly quickly.  If we wanted
to avoid the shift/mask operation in pmap_pde(), we could recover the
pa but that means we have to store it for a while.

This does not measurably change performance.

Suggested by:  alc
Tested by:  alc
2004-06-29 15:57:05 +00:00