Commit Graph

1763 Commits

Author SHA1 Message Date
jhb
d90774443d Export 'struct pcpu' to userland w/o requiring _KERNEL. A few ports
already define _KERNEL to get to this and I'm about to add hooks to
libkvm to access per-CPU data.

MFC after:	1 week
2008-08-19 19:53:52 +00:00
bz
1021d43b56 Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from:	//depot/projects/vimage-commit2/...
Reviewed by:	brooks, des, ed, mav, julian,
		jamie, kris, rwatson, zec, ...
		(various people I forgot, different versions)
		md5 (with a bit of help)
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
X-MFC after:	never
V_Commit_Message_Reviewed_By:	more people than the patch
2008-08-17 23:27:27 +00:00
alc
d4de04e9b1 Update bus_dmamem_alloc()'s first call to malloc() such that M_WAITOK is
specified when appropriate.

Reviewed by:	scottl
2008-07-15 03:34:49 +00:00
delphij
cb283fcdf7 Add HWPMC_HOOKS to GENERIC kernels, this makes hwpmc.ko work out
of the box.
2008-07-07 22:55:11 +00:00
marcel
1c800dbdb5 Add inline function ia64_fc_i() to abstract inline assembly.
Use the new inline function in ia64_invalidate_icache().
While there, add proper synchronization so that we know
the fc.i instructions have taken effect when we return.
2008-07-07 17:43:56 +00:00
ed
4d6a9685e8 Remove the unused major/minor numbers from iodev and memdev.
Now that st_rdev is being automatically generated by the kernel, there
is no need to define static major/minor numbers for the iodev and
memdev. We still need the minor numbers for the memdev, however, to
distinguish between /dev/mem and /dev/kmem.

Approved by:	philip (mentor)
2008-06-25 07:45:31 +00:00
marcel
c628123952 Work-around a compiler optimization bug, that broke libthr. Massive
inlining resulted in constant propagation to the extend that cmpval
was known to the compiler to be URWLOCK_WRITE_OWNER (= 0x80000000U).
Unfortunately, instead of zero-extending the unsigned constant, it
was sign-extended. As such, the cmpxchg instruction was comparing
0x0000000080000000LU to 0xffffffff80000000LU and obviously didn't
perform the exchange.
But, since the value returned by cmpxhg equalled cmpval (when zero-
extended), the _thr_rtld_lock_release() function thought the exchange
did happen and as such returned as if having released the lock. This
was not the case. Subsequent locking requests found rw_state non-zero
and the thread in question entered the kernel and block indefinitely.

The work-around is to zero-extend by casting to uint64_t.
2008-05-28 16:41:02 +00:00
marcel
431e157a7b Account for IPI_PREEMPT. We don't want to call sched_preempt() with
interrupts disabled or with td_intr_nesting_level non-zero.
2008-05-23 19:53:50 +00:00
alc
964def13e2 The VM system no longer uses setPQL2(). Remove it and its helpers. 2008-05-23 04:03:54 +00:00
marcel
ae2d712eed Create the bucket mutexes with MTX_NOWITNESS. There's now a
hard limit of 512 pending mutexes in the witness code and
we can easily have 1 million bucket mutexes initialized before
witness is up and running. Bumping the limit from 512 to 1M
is not really an option here...
2008-05-22 06:27:46 +00:00
marcel
04349514e9 We can call ia64_flush_dirty() when the corresponding process is
locked or not. As such, use PROC_LOCKED() to determine which case
it is and lock the process when not.
2008-05-21 05:15:27 +00:00
alc
a8f81206ad Retire pmap_addr_hint(). It is no longer used. 2008-05-18 04:16:57 +00:00
alc
783a45362f Add a stub for pmap_align_superpage() on machines that don't (yet)
implement pmap-level support for superpages.
2008-05-09 23:31:42 +00:00
marcel
b14c656c51 Unbreak previous commit. While here, refactor the code a bit. 2008-04-25 16:09:03 +00:00
jeff
14b586bf96 - Add an integer argument to idle to indicate how likely we are to wake
from idle over the next tick.
 - Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are
   suspended in cpu specific states.  This function can fail and cause the
   scheduler to fall back to another mechanism (ipi).
 - Implement support for mwait in cpu_idle() on i386/amd64 machines that
   support it.  mwait is a higher performance way to synchronize cpus
   as compared to hlt & ipis.
 - Allow selecting the idle routine by name via sysctl machdep.idle.  This
   replaces machdep.cpu_idle_hlt.  Only idle routines supported by the
   current machine are permitted.

Sponsored by:	Nokia
2008-04-25 05:18:50 +00:00
phk
8d647da1ed Now that all platforms use genclock, shuffle things around slightly
for better structure.

Much of this is related to <sys/clock.h>, which should really have
been called <sys/calendar.h>, but unless and until we need the name,
the repocopy can wait.

In general the kernel does not know about minutes, hours, days,
timezones, daylight savings time, leap-years and such.  All that
is theoretically a matter for userland only.

Parts of kernel code does however care: badly designed filesystems
store timestamps in local time and RTC chips almost universally
track time in a YY-MM-DD HH:MM:SS format, and sometimes in local
timezone instead of UTC.  For this we have <sys/clock.h>

<sys/time.h> on the other hand, deals with time_t, timeval, timespec
and so on.  These know only seconds and fractions thereof.

Move inittodr() and resettodr() prototypes to <sys/time.h>.
Retain the names as it is one of the few surviving PDP/VAX references.

Move startrtclock() to <machine/clock.h> on relevant platforms, it
is a MD call between machdep.c/clock.c.  Remove references to it
elsewhere.

Remove a lot of unnecessary <sys/clock.h> includes.

Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs.
XXX: should be kern.disable_rtc_set really, it's not MD.
2008-04-22 19:38:30 +00:00
phk
bbf813673e Make genclock standard on all platforms.
Thanks to: grehan & marcel for platform support on ia64 and ppc.
2008-04-21 10:09:55 +00:00
marcel
f4fb7d0eb3 Sanitize the malloc types: M_PMAP is not used in pmap.c, so don't
define it there. Don't use M_PMAP in mp_machdep.c; define M_SMP
instead.
2008-04-19 04:56:16 +00:00
marcel
710f14abdd Remove cruft we got from Alpha, which was probably inherited
from NetBSD. I.e. make it more like a FreeBSD header.
2008-04-18 02:21:11 +00:00
marcel
e920d11c2c Use genclock for RTC handling. This eliminates the MD versions for
inittodr() and resettodr(). Have nexus double as the clock device,
because it's the firmware that provides RTC services. We could
create a special (pseudo-) device for it, but that wasn't superior
enough to actually do it. Maybe later...

Requested by: phk
2008-04-15 17:02:23 +00:00
marcel
da8b8894d6 Support and switch to the ULE scheduler:
o  Implement IPI_PREEMPT,
o  Set td_lock for the thread being switched out,
o  For ULE & SMP, loop while td_lock points to blocked_lock for
   the thread being switched in,
o  Enable ULE by default in GENERIC and SKI,
2008-04-15 05:02:42 +00:00
marcel
d8346deefc Revision 1.9 changes the delivery mode from the magic constant 0
(i.e. fixed delivery) to SAPIC_DELMODE_LOWPRI. While the commit
log doesn't mention the change in behaviour, it is believed to be
deliberate. In the last 5.5 years this hasn't been a problem. Nor
do I think did it make any difference, but who knows. However, I
do know that it break SMP support for Montecito-based machines.
Switch back to fixed-CPU delivery so that SMP works again. This
gives me some time to look more closely at the problem, as well
as make sure the I-cache validation as it's implemented currently
is sufficient in SMP configurations...
2008-04-14 20:34:45 +00:00
jeff
335d8ade9b - Pass the irq and not the vector to intr_event_create().
Reviewed by:	marcel
2008-04-11 23:10:39 +00:00
jeff
8efb03d60e - Add the interrupt vector number to intr_event_create so MI code can
lookup hard interrupt events by number.  Ignore the irq# for soft intrs.
 - Add support to cpuset for binding hardware interrupts.  This has the
   side effect of binding any ithread associated with the hard interrupt.
   As per restrictions imposed by MD code we can only bind interrupts to
   a single cpu presently.  Interrupts can be 'unbound' by binding them
   to all cpus.

Reviewed by:	jhb
Sponsored by:	Nokia
2008-04-11 03:26:41 +00:00
marcel
015b99f5de Unbreak after removal of SI_SUB_MOUNT_ROOT. 2008-04-09 03:32:48 +00:00
jhb
79918c45a6 Add a MI intr_event_handle() routine for the non-INTR_FILTER case. This
allows all the INTR_FILTER #ifdef's to be removed from the MD interrupt
code.
- Rename the intr_event 'eoi', 'disable', and 'enable' hooks to
  'post_filter', 'pre_ithread', and 'post_ithread' to be less x86-centric.
  Also, add a comment describe what the MI code expects them to do.
- On amd64, i386, and powerpc this is effectively a NOP.
- On arm, don't bother masking the interrupt unless the ithread is
  scheduled in the non-INTR_FILTER case to match what INTR_FILTER did.
  Also, don't bother unmasking the interrupt in the post_filter case if
  we never masked it.  The INTR_FILTER case had been doing this by having
  arm_unmask_irq for the post_filter (formerly 'eoi') hook.
- On ia64, stray interrupts are now masked for the non-INTR_FILTER case.
  They were already masked in the INTR_FILTER case.
- On sparc64, use the a NULL pre_ithread hook and use intr_enable_eoi() for
  both the 'post_filter' and 'post_ithread' hooks to match what the
  non-INTR_FILTER code did.
- On sun4v, retire the ithread wrapper hack by using an appropriate
  'post_ithread' hook instead (it's what 'post_ithread'/'enable' was
  designed to do even in 5.x).

Glanced at by:	piso
Reviewed by:	marius
Requested by:	marius [1], [5]
Tested on:	amd64, i386, arm, sparc64
2008-04-05 19:58:30 +00:00
marcel
f4a93f0828 Better implement I-cache invalidation. The previous implementation
was a kluge. This implementation matches the behaviour on powerpc
and sparc64.
While on the subject, make sure to invalidate the I-cache after
loading a kernel module.

MFC after: 2 weeks
2008-03-30 23:09:14 +00:00
dfr
dc98ee4196 Add kernel module support for nfslockd and krpc. Use the module system
to detect (or load) kernel NLM support in rpc.lockd. Remove the '-k'
option to rpc.lockd and make kernel NLM the default. A user can still
force the use of the old user NLM by building a kernel without NFSLOCKD
and/or removing the nfslockd.ko module.
2008-03-27 11:54:20 +00:00
jb
34e730ca27 When building a kernel module, define MAXCPU the same as SMP so
that modules work with and without SMP.
2008-03-27 05:03:26 +00:00
phk
fa71439e44 The "free-lance" timer in the i8254 is only used for the speaker
these days, so de-generalize the acquire_timer/release_timer api
to just deal with speakers.

The new (optional) MD functions are:
	timer_spkr_acquire()
	timer_spkr_release()
and
	timer_spkr_setfreq()

the last of which configures the timer to generate a tone of a given
frequency, in Hz instead of 1/1193182th of seconds.

Drop entirely timer2 on pc98, it is not used anywhere at all.

Move sysbeep() to kern/tty_cons.c and use the timer_spkr*() if
they exist, and do nothing otherwise.

Remove prototypes and empty acquire-/release-timer() and sysbeep()
functions from the non-beeping archs.

This eliminate the need for the speaker driver to know about
i8254frequency at all.  In theory this makes the speaker driver MI,
contingent on the timer_spkr_*() functions existing but the driver
does not know this yet and still attaches to the ISA bus.

Syscons is more tricky, in one function, sc_tone(), it knows the hz
and things are just fine.

In the other function, sc_bell() it seems to get the period from
the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode
the 1193182 and leave it at that.  It's probably not important.

Change a few other sysbeep() uses which obviously knew that the
argument was in terms of i8254 frequency, and leave alone those
that look like people thought sysbeep() took frequency in hertz.

This eliminates the knowledge of i8254_freq from all but the actual
clock.c code and the prof_machdep.c on amd64 and i386, where I think
it would be smart to ask for help from the timecounters anyway [TBD].
2008-03-26 20:09:21 +00:00
jhb
c04bb048f6 Simplify the interrupt code a bit:
- Always include the ie_disable and ie_eoi methods in 'struct intr_event'
  and collapse down to one intr_event_create() routine.  The disable and
  eoi hooks simply aren't used currently in the !INTR_FILTER case.
- Expand 'disab' to 'disable' in a few places.
- Use function casts for arm and i386:intr_eoi_src() instead of wrapper
  routines since to trim one extra indirection.

Compiled on:	{arm,amd64,i386,ia64,ppc,sparc64} x {FILTER, !FILTER}
Tested on:	{amd64,i386} x {FILTER, !FILTER}
2008-03-17 22:42:01 +00:00
pjd
ea49d310bf Implement atomic_fetchadd_long() for all architectures and document it.
Reviewed by:	attilio, jhb, jeff, kris (as a part of the uidinfo_waitfree.patch)
2008-03-16 21:20:50 +00:00
rwatson
877d7c65ba In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation.  This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after:	1 month
Discussed with:	imp, rink
2008-03-16 10:58:09 +00:00
imp
be829c21fb BUS_DMA_ISA is left over from Alpha, and is not used in the tree at
all.  The reference in ia64 code is due to cutNpaste in its history
and can safely be removed.

Revired by: cognet, raj, marcel, jhb and maybe one other whom I'm forgetting
2008-03-15 06:44:45 +00:00
jhb
9c113163fb Add preliminary support for binding interrupts to CPUs:
- Add a new intr_event method ie_assign_cpu() that is invoked when the MI
  code wishes to bind an interrupt source to an individual CPU.  The MD
  code may reject the binding with an error.  If an assign_cpu function
  is not provided, then the kernel assumes the platform does not support
  binding interrupts to CPUs and fails all requests to do so.
- Bind ithreads to CPUs on their next execution loop once an interrupt
  event is bound to a CPU.  Only shared ithreads are bound.  We currently
  leave private ithreads for drivers using filters + ithreads in the
  INTR_FILTER case unbound.
- A new intr_event_bind() routine is used to bind an interrupt event to
  a CPU.
- Implement binding on amd64 and i386 by way of the existing pic_assign_cpu
  PIC method.
- For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up
  an interrupt source and binds its interrupt event to the specified CPU.
  MI code can currently (ab)use this by doing:

	intr_bind(rman_get_start(irq_res), cpu);

  however, I plan to add a truly MI interface (probably a bus_bind_intr(9))
  where the implementation in the x86 nexus(4) driver would end up calling
  intr_bind() internally.

Requested by:	kmacy, gallatin, jeff
Tested on:	{amd64, i386} x {regular, INTR_FILTER}
2008-03-14 19:41:48 +00:00
jhb
64ab71ccbd Rework how the nexus(4) device works on x86 to better handle the idea of
different "platforms" on x86 machines.  The existing code already handles
having two platforms: ACPI and legacy.  However, the existing approach was
rather hardcoded and difficult to extend.  These changes take the approach
that each x86 hardware platform should provide its own nexus(4) driver (it
can inherit most of its behavior from the default legacy nexus(4) driver)
which is responsible for probing for the platform and performing
appropriate platform-specific setup during attach (such as adding a
platform-specific bus device).  This does mean changing the x86 platform
busses to no longer use an identify routine for probing, but to move that
logic into their matching nexus(4) driver instead.
- Make the default nexus(4) driver in nexus.c on i386 and amd64 handle the
  legacy platform.  It's probe routine now returns BUS_PROBE_GENERIC so it
  can be overriden.
- Expose a nexus_init_resources() routine which initializes the various
  resource managers so that subclassed nexus(4) drivers can invoke it from
  their attach routine.
- The legacy nexus(4) driver explicitly adds a legacy0 device in its
  attach routine.
- The ACPI driver no longer contains an new-bus identify method.  Instead
  it exposes a public function (acpi_identify()) which is a probe routine
  that the MD nexus(4) drivers can use to probe for ACPI.  All of the
  probe logic in acpi_probe() is now moved into acpi_identify() and
  acpi_probe() is just a stub.
- On i386 and amd64, an ACPI-specific nexus(4) driver checks for ACPI via
  acpi_identify() and claims the nexus0 device if the probe succeeds.  It
  then explicitly adds an acpi0 device in its attach routine.
- The legacy(4) driver no longer knows anything about the acpi0 device.
- On ia64 if acpi_identify() fails you basically end up with no devices.
  This matches the previous behavior where the old acpi_identify() would
  fail to add an acpi0 device again leaving you with no devices.

Discussed with:	imp
Silence on:	arch@
2008-03-13 20:39:04 +00:00
jeff
bf86f0992a - Fix build breakage; there was a reference to a removed syscall in
a KASSERT().  Attempt to cleanup the comment to reflect reality.
2008-03-12 22:14:14 +00:00
jeff
acb93d599c Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential.  Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.
2008-03-12 10:12:01 +00:00
jeff
ad2a31513f - Remove the old smp cpu topology specification with a new, more flexible
tree structure that encodes the level of cache sharing and other
   properties.
 - Provide several convenience functions for creating one and two level
   cpu trees as well as a default flat topology.  The system now always
   has some topology.
 - On i386 and amd64 create a seperate level in the hierarchy for HTT
   and multi-core cpus.  This will allow the scheduler to intelligently
   load balance non-uniform cores.  Presently we don't detect what level
   of the cache hierarchy is shared at each level in the topology.
 - Add a mechanism for testing common topologies that have more information
   than the MD code is able to provide via the kern.smp.topology tunable.
   This should be considered a debugging tool only and not a stable api.

Sponsored by:	Nokia
2008-03-02 07:58:42 +00:00
marcel
f051ca1feb Re-sort options. While here:
o  remove COMPAT_FREEBSD5
o  add INVARIANTS
o  add WITNESS
2008-02-16 18:30:58 +00:00
marcel
257f2d8fc2 On Montecito processors, the instruction cache is in fact not
coherent with the data caches. Implement a quick fix to allow
us to boot on Montecito, while I'm working on a better fix in
the mean time.

Commit made on Montecito-based Itanium...
2008-02-14 18:46:50 +00:00
marcel
c1a1c62b2a Allocate a stack for thread0 and switch to it before calling
mi_startup(). This frees up kstack for static PAL/SAL calls
and double-fault handling.
2008-02-04 02:21:33 +00:00
ru
910410640b Add a wrapper function that bound checks writes to the dump device. 2008-01-28 19:04:07 +00:00
jhb
c7e0e41f73 Add COMPAT_FREEBSD7 and enable it in configs that have COMPAT_FREEBSD6. 2008-01-07 21:40:11 +00:00
alc
545d26e30b Add an access type parameter to pmap_enter(). It will be used to implement
superpage promotion.

Correct a style error in kmem_malloc(): pmap_enter()'s last parameter is
a Boolean.
2008-01-03 07:34:34 +00:00
imp
9699ca07d2 Use correct function name in panic message 2008-01-03 06:44:12 +00:00
imp
a3fe3266a0 Fix obsolete comment. pmap_remove_all is the function we're in. 2008-01-03 06:35:04 +00:00
alc
37cdbd87f5 Add configuration knobs for the superpage reservation system. Initially,
the reservation will only be enabled on amd64.
2007-12-27 16:45:39 +00:00
rwatson
bdee30611d Add a new 'why' argument to kdb_enter(), and a set of constants to use
for that argument.  This will allow DDB to detect the broad category of
reason why the debugger has been entered, which it can use for the
purposes of deciding which DDB script to run.

Assign approximate why values to all current consumers of the
kdb_enter() interface.
2007-12-25 17:52:02 +00:00
jkoshy
39d4b4accf Add stubs to unbreak LINT. 2007-12-07 13:45:47 +00:00
marcel
5f073f2789 Add a BSD disklabel backend to g_part:
o  Disklabels can have between 8 and 20 partitions (inclusive).
o  No device special file is created for the raw partition.
o  Switch ia64 to use this backend.
o  No support for boot code yet.
2007-12-06 02:32:42 +00:00
rwatson
99285f7544 Break out stack(9) from ddb(4):
- Introduce per-architecture stack_machdep.c to hold stack_save(9).
- Introduce per-architecture machine/stack.h to capture any common
  definitions required between db_trace.c and stack_machdep.c.
- Add new kernel option "options STACK"; we will build in stack(9) if it is
  defined, or also if "options DDB" is defined to provide compatibility
  with existing users of stack(9).

Add new stack_save_td(9) function, which allows the capture of a stacktrace
of another thread rather than the current thread, which the existing
stack_save(9) was limited to.  It requires that the thread be neither
swapped out nor running, which is the responsibility of the consumer to
enforce.

Update stack(9) man page.

Build tested:	amd64, arm, i386, ia64, powerpc, sparc64, sun4v
Runtime tested:	amd64 (rwatson), arm (cognet), i386 (rwatson)
2007-12-02 20:40:35 +00:00
jhb
7fe785218b Remove the 'needbounce' variable from the _bus_dmamap_load_buffer()
routine.  It is not needed as the existing tests for segment coalescing
already handle bounced addresses and it prevents legal segment coalescing
in certain edge cases.

MFC after:	1 week
Reviewed by:	scottl
2007-11-27 17:28:12 +00:00
jasone
607f2953c0 Define atomic_readandclear_ptr. 2007-11-27 06:34:15 +00:00
scottl
b607c8d8ad Extend critical section coverage in the low-level interrupt handlers to
include the ithread scheduling step.  Without this, a preemption might
occur in between the interrupt getting masked and the ithread getting
scheduled.  Since the interrupt handler runs in the context of curthread,
the scheudler might see it as having a such a low priority on a busy system
that it doesn't get to run for a _long_ time, leaving the interrupt stranded
in a disabled state.  The only way that the preemption can happen is by
a fast/filter handler triggering a schduling event earlier in the handler,
so this problem can only happen for cases where an interrupt is being
shared by both a fast/filter handler and an ithread handler.  Unfortunately,
it seems to be common for this sharing to happen with network and USB
devices, for example.  This fixes many of the mysterious TCP session
timeouts and NIC watchdogs that were being reported.  Many thanks to Sam
Lefler for getting to the bottom of this problem.

Reviewed by: jhb, jeff, silby
2007-11-21 04:03:51 +00:00
alc
d1ab859bdc Prevent the leakage of wired pages in the following circumstances:
First, a file is mmap(2)ed and then mlock(2)ed.  Later, it is truncated.
Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the
pages beyond the EOF are unmapped and freed.  However, when the file is
mlock(2)ed, the pages beyond the EOF are unmapped but not freed because
they have a non-zero wire count.  This can be a mistake.  Specifically,
it is a mistake if the sole reason why the pages are wired is because of
wired, managed mappings.  Previously, unmapping the pages destroys these
wired, managed mappings, but does not reduce the pages' wire count.
Consequently, when the file is unmapped, the pages are not unwired
because the wired mapping has been destroyed.  Moreover, when the vm
object is finally destroyed, the pages are leaked because they are still
wired.  The fix is to reduce the pages' wired count by the number of
wired, managed mappings destroyed.  To do this, I introduce a new pmap
function pmap_page_wired_mappings() that returns the number of managed
mappings to the given physical page that are wired, and I use this
function in vm_object_page_remove().

Reviewed by: tegge
MFC after: 6 weeks
2007-11-17 22:52:29 +00:00
marcel
1e7c4f0a3f o Rename cpu_thread_setup() to cpu_thread_alloc() to better
communicate that it relates to (is called by) thread_alloc()
o  Add cpu_thread_free() which is called from thread_free()
   to counter-act cpu_thread_alloc().

i386:	Have cpu_thread_free() call cpu_thread_clean() to
	preserve behaviour.
ia64:	Have cpu_thread_free() call mtx_destroy() for the
	mutex initialized in cpu_thread_alloc().

PR: ia64/118024
2007-11-14 20:21:54 +00:00
julian
b2732e0c22 generally we are interested in what thread did something as
opposed to what process. Since threads by default have teh name of the
process unless over-written with more useful information, just print the
thread name instead.
2007-11-14 06:21:24 +00:00
kib
9ae733819b Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with:	Peter Holm
Reviewed by:	jhb
2007-11-05 11:36:16 +00:00
marcel
28e8a1b2ed Set PTE_ACCESSED in the PTE and before inserting it in the VHPT.
This avoids back-to-back faults for all TLB misses. This can be
improved further in the future by also setting PTE_DIRTY for TLB
misses for write accesses.

MFC after: 1 week
2007-10-16 03:20:32 +00:00
marcel
70374bf52e The flushrs instruction must be the first in an instruction
group. GNU as(1) already made sure of that, but it's better
to actually have the code right.

MFC after: 1 week
2007-10-16 03:07:56 +00:00
marcel
0e76c44417 Print instruction stops to improve analysis of dependency
violations.

MFC after: 1 week
2007-10-16 02:59:03 +00:00
marcel
a1840b78b2 Fix disassembly of the invala, itc, itr and hint instructions
by fixing the opcode ordering.

MFC after: 1 week
2007-10-16 02:49:40 +00:00
brueffer
26461bf019 Use the correct expanded name for SCTP.
PR:		116496
Submitted by:	koitsu
Reviewed by:	rrs
Approved by:	re (kensmith)
2007-09-26 20:05:07 +00:00
alc
d1bce06c64 Change the management of cached pages (PQ_CACHE) in two fundamental
ways:

(1) Cached pages are no longer kept in the object's resident page
splay tree and memq.  Instead, they are kept in a separate per-object
splay tree of cached pages.  However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock.  Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.

This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE).  The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held.  Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.

Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case.  Cached pages
are reclaimed far, far more often than they are reactivated.  Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.

(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.

Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated.  Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page.  Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.

Discussed with: many over the course of the summer, including jeff@,
   Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
alc
20b10da706 It has been observed on the mailing lists that the different categories
of pages don't sum to anywhere near the total number of pages on amd64.
This is for the most part because uma_small_alloc() pages have never been
counted as wired pages, like their kmem_malloc() brethren.  They should
be.  This changes fixes that.

It is no longer necessary for the page queues lock to be held to free
pages allocated by uma_small_alloc().  I removed the acquisition and
release of the page queues lock from uma_small_free() on amd64 and ia64
weeks ago.  This patch updates the other architectures that have
uma_small_alloc() and uma_small_free().

Approved by: re (kensmith)
2007-09-15 18:47:02 +00:00
marcel
5f4b9d20fc Clear pending interrupts before we enable external interrupts.
Recently the AP in my Merced box seems to have grown a habit
of getting unexpected interrupts, such as redundant wake-ups
and legacy interrupts that require an INTA cycle.

While here, replace DELAY(0) with cpu_spinwait() so that it's
clear what we're doing as well as enable the code to take
advantage of cpu_spinwait() when it gets implemented.

Approved by: re (blanket)
2007-08-06 05:15:57 +00:00
marcel
48dc5445bf Keep interrupts disabled while handling external interrupts.
There's no advantage in allowing nested external interrupts.
In fact, it leads to a potential stack overrun.

While here, put the interrupt vector in the trapframe, so as
to compensate for the 36 cycle latency of reading cr.ivr.

Further simplify assembly code by dealing with ASTs from C.

Approved by: re (blanket)
2007-08-06 05:11:01 +00:00
marcel
4ee3bb0e27 In ia64_set_rr(), don't perform data serialization. This allows
us to do the data serializations once after writing multiple
region registers, as is done in pmap_switch(). All existing
calls to ia64_set_rr() are followed with calls to ia64_srlz_d().

Approved by: re (blanket)
2007-08-05 18:19:38 +00:00
marcel
8e03a67f7b Replace "__asm __volatile()" by equivalent support functions from
ia64_cpu.h. This improves readability and consistency and aids in
auditing the code.
Add instruction-serialization after writing to cr.pta.

Delay enabling interrupts until after we setup the clocks and after
we program the task priority register.

Approved by: re (blanket)
2007-08-04 19:52:10 +00:00
marcel
aaa09dca3c Replace "__asm __volatile()" by equivalent support functions from
ia64_cpu.h. This improves readability and consistency and aids in
auditing the code.
Add data-serialization after writing to the region registers and
add instruction-serialization after writing to cr.pta.

Approved by: re (blanket)
2007-08-04 19:36:14 +00:00
marcel
67e460fdcb Replace "__asm __volatile()" by equivalent support functions from
ia64_cpu.h. This improves readability and consistency and aids in
auditing the code.
Add data-serialization after writing to cr.tpr.

Approved by: re (blanket)
2007-08-04 19:33:27 +00:00
marcel
57b4ebc17e Add required data-serialization after writing to cr.itm and cr.itv.
Approved by: re (blanket)
2007-08-04 19:28:19 +00:00
marcel
391597776c Add ia64_srlz_d() and ia64_srlz_i() functions to aid in serialization.
Approved by: re (blanket)
2007-08-04 19:26:42 +00:00
marcel
d4ec5356ec o Switch to physical addressing before dereferencing the VHPT
bucket pointer. The virtual mapping may not be present in the
  translation cache. This will result in a nested TLB fault at
  a place we don't handle (and don't want to handle).
o Make sure there's a stop after the rfi instruction, otherwise
  its behaviour is undefined.
o Make sure we switch back to virtual addressing before doing
  a rfi. Behaviour is undefined otherwise.

Approved by: re (blanket)
2007-07-30 22:52:52 +00:00
marcel
a5fceab6ad Add option EXCEPTION_TRACING, which enables KTR-like functionality
for processor interruptions. This is especially useful to track
unexpected nested TLB faults.

Approved by: re (blanket)
2007-07-30 22:42:33 +00:00
marcel
7878602389 Rework the interrupt code and add support for interrupt filtering
(INTR_FILTER). This includes:
o  Save a pointer to the sapic structure and IRQ for every vector,
   so that we can quickly EOI, mask and unmask the interrupt.
o  Add locking to the sapic code now that we can reprogram a
   sapic on multiple CPUs at the same time.
o  Use u_int for the vector and IRQ. We only have 256 vectors, so
   using a 64-bit type for it is rather excessive.
o  Properly handle concurrent registration of a handler for the
   same vector.

Since vectors have a corresponding priority, we should not map
IRQs to vectors in a linear fashion, but rather pick a vector
that has a priority in line with the interrupt type. This is left
for later. The vector/IRQ interchange has been untangled as much
as possible to make this easier.

Approved by: re (blacket)
2007-07-30 22:29:33 +00:00
marcel
8fddc91c70 Explicitly map the VHPT on all processors. Previously we were
merely lucky that the VHPT was mapped as a side-effect of
mapping the kernel, but when there's enough physical memory,
this may not at all be the case.

Approved by: re (blanket)
2007-07-30 22:12:53 +00:00
marcel
68c4f43232 Add casts to some of the more commonly used pointer-type atomic
operations. We really should be able to make those inline functions,
but this would break its use for sx_locks.

Approved by: re (blanket)
2007-07-30 22:07:01 +00:00
dwmalone
e8276674f3 If clock_ct_to_ts fails to convert time time from the real time clock,
print a one line error message. Add some comments on not being able to
trust the day of week field (I'll act on these comments in a follow up
commit).

Approved by:	re
MFC after:	3 weeks
2007-07-23 09:42:32 +00:00
marcel
0ea7715cdf Restore the value of ar.rnat after the assignment to ar.bspstore.
The SDM states that writing to ar.bspstore invalidates the ar.rnat
register as a side-effect. This was interpreted as "bits in the
ar.rnat register that correspond to registers whose value is on
the stack are undefined'. Since we keep the kernel stack NaT-
aligned with the user stack (i.e. the lower 9 bits of the backing
store pointer remain unchanged when we switch to the kernel stack)
bits that need preserving would be preserved.

That interpretation is questionable. So, now, the interpretation
is more absolute: ar.rnat is undefined after writing to ar.bspstore.
As such, we write the saved value of ar.rnat back to ar.rnat after
writing to ar.bspstore.

Discussed with: christian.kandeler@hob.de
Approved by: re (kensmith)
2007-07-16 16:47:35 +00:00
marcel
15e9a30c60 dma_tag is a static structure. Testing for it being a NULL pointer
doesn't make sense. Rewrite to what was intended.

Correctly warned about by: GCC
Approved by: re (bmah)
2007-07-09 04:58:16 +00:00
delphij
c990e91fd1 Enable SCTP by default for GENERIC kernels in order to give it
more exposure.  The current state of SCTP implementation is
considered to be ready for 32-bit platforms, but still need some
work/testing on 64-bit platforms.

Approved by:	re (kensmith)
Discussed with:	rrs
2007-06-14 17:14:27 +00:00
marcel
0d73aaee3a Enable GEOM_PART_MBR by default. On ia64 this replaces GEOM_MBR. 2007-06-13 05:07:42 +00:00
alc
173b3d6d03 Add the machine-specific definitions for configuring the new physical
memory allocator.

Set the size of phys_avail[] using one of these definitions.

Approved by:	re
2007-06-10 23:39:07 +00:00
marcel
2a881a553e Work around a firmware bug in the HP rx2660, where in ACPI an I/O port
is really a memory mapped I/O address. The bug is in the GAS that
describes the address and in particular the SpaceId field. The field
should not say the address is an I/O port when it clearly is not.

With an additional check for the IA64_BUS_SPACE_IO case in the bus
access functions, and the fact that I/O ports pretty much not used
in general on ia64, make the calculation of the I/O port address a
function. This avoids inlining the work-around into every driver,
and also helps reduce overall code bloat.
2007-06-10 16:53:01 +00:00
marcel
41b2f34ed7 Synchronize the instruction cache after writing to memory. This is
needed for breakpoints to work.
2007-06-09 22:15:13 +00:00
marcel
75588c5a15 Add kdb_cpu_sync_icache(), intended to synchronize instruction
caches with data caches after writing to memory. This typically
is required to make breakpoints work on ia64 and powerpc. For
those architectures the function is implemented.
2007-06-09 21:55:17 +00:00
marcel
c6fba5a928 Physical memory regions can be larger than INT_MAX. Change size1
from an int to a long to avoid printing negative byte and page
counts.
2007-06-09 01:19:08 +00:00
rwatson
4f2da72e8b Enable AUDIT by default in the GENERIC kernel, allowing security event
auditing to be turned on without a kernel recompile, just an rc.conf
option.

Approved by:	re (kensmith)
Obtained from:	TrustedBSD Project
2007-06-08 20:29:07 +00:00
marcel
dc29a4eb64 Remove remaining references to pc_curtid missed in previous commit. 2007-06-07 18:36:58 +00:00
marcel
a137153bff Eliminate pmap_install(), which was used to wrap pmap_switch() and
grab sched_lock. This would serialize calls to pmap_switch from
cpu_switch(). With the introduction of thread_lock, this is not
possible anymore, because thread_lock is not a single lock. It
varies.  Secondly and most importantly, it's not needed at all. The
only requirement for pmap_switch() is that it's not preempted
while in the middle of updating the CPU and PCPU. In other words,
it's a critical region. No locking required.
2007-06-07 16:04:23 +00:00
davidxu
17322fae92 Fix compiling error. 2007-06-07 01:53:29 +00:00
marcel
11092d3888 Include <sys/sched.h> for sched_throw(). 2007-06-06 04:44:19 +00:00
jeff
91d1501790 Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
   sychronization.
 - Use the per-process spinlock rather than the sched_lock for per-process
   scheduling synchronization.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-05 00:00:57 +00:00
jeff
8297f778b9 Commit 13/14 of sched_lock decomposition.
- Add a new parameter to cpu_switch() that is used to release the lock on
   the outgoing thread and properly acquire the lock on the incoming
   thread.  This parameter is not required for schedulers that don't do
   per-cpu locking and architectures which do not support it may continue
   to use the 4BSD scheduler.  This feature is presently not supported
   on ia64

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:58:47 +00:00
jeff
be3241715a - Change comments and asserts to reflect the removal of the global
scheduler lock.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:57:32 +00:00
jeff
20e0f793e8 Commit 10/14 of sched_lock decomposition.
- Use sched_throw() rather than replicating the same cpu_throw() code for
   each architecture.  This also allows the scheduler to use any locking it
   may want to.
 - Use the thread_lock() rather than sched_lock when preempting.
 - The scheduler lock is not required to synchronize release_aps.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:56:08 +00:00
attilio
e333d0ff0e Rework the PCPU_* (MD) interface:
- Rename PCPU_LAZY_INC into PCPU_INC
- Add the PCPU_ADD interface which just does an add on the pcpu member
  given a specific value.

Note that for most architectures PCPU_INC and PCPU_ADD are not safe.
This is a point that needs some discussions/work in the next days.

Reviewed by: alc, bde
Approved by: jeff (mentor)
2007-06-04 21:38:48 +00:00
attilio
7dd8ed88a9 Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)
2007-05-31 22:52:15 +00:00
piso
42dfc78150 In some particular cases (like in pccard and pccbb), the real device
handler is wrapped in a couple of functions - a filter wrapper and an
ithread wrapper. In this case (and just in this case), the filter
wrapper could ask the system to schedule the ithread and mask the
interrupt source if the wrapped handler is composed of just an ithread
handler: modify the "old" interrupt code to make it support
this situation, while the "new" interrupt code is already ok.

Discussed with: jhb
2007-05-31 19:25:35 +00:00
yongari
f53195d29a Honor maxsegsz of less than a page size in a DMA tag. Previously it
used to return PAGE_SIZE without respect to restrictions of a DMA tag.
This affected all of the busdma load functions that use
_bus_dmamap_loader_buffer() as their back-end.

Reviewed by:	scottl
2007-05-29 06:30:26 +00:00
alc
a530caef2a Eliminate an unused definition. 2007-05-27 20:34:26 +00:00
marcel
df27a8ac99 Have the processor defer all faults and exceptions for control
speculative loads. This at least makes control speculative loads
work. In the future we should analyze which faults/exceptions
we want to handle rather than defer to avoid having to call the
recovery code when it's not strictly necessary.
2007-05-27 19:02:47 +00:00
kan
4c2d706212 Allow FreeBSD's native ELF image activators to execute shared libraries the
same way it was enabled for Linux binares in linuxulator.

This allows binaries built with -pie. Many ports auto-detect -fPIE support
in GCC 4.2 and build binaries FreeBSD was unable to run.
2007-05-22 02:22:58 +00:00
marcel
4e45a73058 When speculation fails (as determined by the chk instruction) the
processor is to jump to recovery code. This branching behaviour
may not be implemented by the processor and a Speculative Operation
fault is raised. The OS is responsible to emulate the branch.
Implement this, because GCC 4.2 uses advanced loads regularly.
2007-05-21 05:11:43 +00:00
marcel
2a4c24267a Fix GCC warning: va = va += PAGE_SIZE contains pointless operation
va = va. Fix white space in nearby lines.
2007-05-19 18:25:14 +00:00
marcel
797bdcc549 Add a level of indirection to the kernel PTE table. The old
scheme allowed for 1024 PTE pages, each containing 256 PTEs.
This yielded 2GB of KVA. This is not enough to boot a kernel
on a 16GB box and in general too low for a 64-bit machine.
By adding a level of indirection we now have 1024 2nd-level
directory pages, each capable of supporting 2GB of KVA. This
brings the grand total to 2TB of KVA.
2007-05-19 13:11:27 +00:00
marcel
a94f1d6237 Account for the fact that contigmalloc(9) can return a NULL pointer.
Fix the flags argument: M_WAITOK is not a valid flag. Its presence
leaves the indication that contigmalloc(9) will not return a NULL
pointer.

The use of contigmalloc(9) in this place is probably not a good idea
given the constraints. It's probably better to lift the constraints
and instead add a permanent mapping to the ITR. It's possible that
the first 256MB of memory is exhausted when we get here.

This fixes a kernel panic on a 16GB rx3600.
2007-05-19 12:50:12 +00:00
jeff
e1996cb960 - define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts.  This can be used to abstract away pcpu details but also changes
   to use atomics for all counters now.  This means sched lock is no longer
   responsible for protecting counts in the switch routines.

Contributed by:		Attilio Rao <attilio@FreeBSD.org>
2007-05-18 07:10:50 +00:00
alc
b34f6f7ab1 Define every architecture as either VM_PHYSSEG_DENSE or
VM_PHYSSEG_SPARSE depending on whether the physical address space is
densely or sparsely populated with memory.  The effect of this
definition is to determine which of two implementations of
vm_page_array and PHYS_TO_VM_PAGE() is used.  The legacy
implementation is obtained by defining VM_PHYSSEG_DENSE, and a new
implementation that trades off time for space is obtained by defining
VM_PHYSSEG_SPARSE.  For now, all architectures except for ia64 and
sparc64 define VM_PHYSSEG_DENSE.  Defining VM_PHYSSEG_SPARSE on ia64
allows the entirety of my Itanium 2's memory to be used.  Previously,
only the first 1 GB could be used.  Defining VM_PHYSSEG_SPARSE on
sparc64 allows USIIIi-based systems to boot without crashing.

This change is a combination of Nathan Whitehorn's patch and my own
work in perforce.

Discussed with: kmacy, marius, Nathan Whitehorn
PR:		112194
2007-05-05 19:50:28 +00:00
sepotvin
a1e73b1eaf Add support for specifying a minimal size for vm.kmem_size in the loader via
vm.kmem_size_min. Useful when using ZFS to make sure that vm.kmem size will
be at least 256mb (for example) without forcing a particular value via vm.kmem_size.

Approved by: njl (mentor)
Reviewed by: alc
2007-04-21 01:14:48 +00:00
pjd
f4e110ebf2 Remove trailing '.' for consistency! 2007-04-10 21:40:13 +00:00
pjd
b159725895 Add UFS_GJOURNAL options to the GENERIC kernel.
Approved by:	re (kensmith)
2007-04-10 16:49:41 +00:00
jkim
c06098a406 Catch up with ACPI-CA 20070320 import. 2007-03-22 18:16:43 +00:00
jhb
8b3222b80b Change the amd64, i386, and ia64 nexus drivers to setup bus space tags and
handles when activating a resource via bus_activate_resource() rather than
doing some of the work in bus_alloc_resource() and some of it in
bus_activate_resource().

One note is that when using isa_alloc_resourcev() on PC-98, drivers now
need to just use bus_release_resource() without explicitly calling
bus_deactivate_resource() first.  nyan@ has already fixed all of the PC-98
drivers.
2007-03-21 15:36:38 +00:00
alc
b03ddb707b Push down the implementation of PCPU_LAZY_INC() into the machine-dependent
header file.  Reimplement PCPU_LAZY_INC() on amd64 and i386 making it
atomic with respect to interrupts.

Reviewed by: bde, jhb
2007-03-11 05:54:29 +00:00
mohans
a332cb00d5 Over NFS, an open() call could result in multiple over-the-wire
GETATTRs being generated - one from lookup()/namei() and the other
from nfs_open() (for cto consistency). This change eliminates the
GETATTR in nfs_open() if an otw GETATTR was done from the namei()
path. Instead of extending the vop interface, we timestamp each attr
load, and use this to detect whether a GETATTR was done from namei()
for this syscall. Introduces a thread-local variable that counts the
syscalls made by the thread and uses <pid, tid, thread syscalls> as
the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on
thread state that could be used as the timestamp with minimal overhead.
2007-03-09 04:02:38 +00:00
scottl
32acf7e446 Don't increment total_bounced when doing no-op dmamap_sync ops. 2007-03-06 18:28:43 +00:00
piso
b44ce1c4db Updated ia64 isa support with the new bus_setup_intr() syntax.
Approved by: re (implicit?)
2007-02-24 16:56:22 +00:00
piso
6a2ffa86e5 o break newbus api: add a new argument of type driver_filter_t to
bus_setup_intr()

o add an int return code to all fast handlers

o retire INTR_FAST/IH_FAST

For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current

Reviewed by: many
Approved by: re@
2007-02-23 12:19:07 +00:00
alc
989e3abb2c Change pmap_protect() so that execute access can be removed without
simultaneously removing write access.
2007-02-21 06:00:46 +00:00
alc
cc7fb68847 Eliminate some acquisitions and releases of the page queues lock that are
no longer necessary.
2007-02-18 06:33:02 +00:00
marcel
e43343208a Now that the free page queue mutex is a sleep mutex, we cannot call
vm_page_alloc() from within a critical section in pmap_growkernel().
Since the need for a critical section may never have existed in the
first place, simply get rid of it.

Discussed with: alc@
2007-02-11 02:52:54 +00:00
brooks
beaea8e48e Include GEOM_LABEL in GENERIC. It's very useful and not well publicized
enough.

Approved by:	pjd
2007-02-09 19:03:18 +00:00
marcel
0245423ad8 Evolve the ctlreq interface added to geom_gpt into a generic
partitioning class that supports multiple schemes. Current
schemes supported are APM (Apple Partition Map) and GPT.
Change all GEOM_APPLE anf GEOM_GPT options into GEOM_PART_APM
and GEOM_PART_GPT (resp).

The ctlreq interface supports verbs to create and destroy
partitioning schemes on a disk; to add, delete and modify
partitions; and to commit or undo changes made.
2007-02-07 18:55:31 +00:00
imp
9109b1ceb8 Remove 3rd clause, renumber, ok per email 2007-01-12 07:26:21 +00:00
davidxu
5a984630fa Add a lwpid field into per-cpu structure, the lwpid represents current
running thread's id on each cpu. This allow us to add in-kernel adaptive
spin for user level mutex. While spinning in user space is possible,
without correct thread running state exported from kernel, it hardly
can be implemented efficiently without wasting cpu cycles, however
exporting thread running state unlikely will be implemented soon as
it has to design and stablize interfaces. This implementation is
transparent to user space, it can be disabled dynamically. With this
change, mutex ping-pong program's performance is improved massively on
SMP machine. performance of mysql super-smack select benchmark is increased
about 7% on Intel dual dual-core2 Xeon machine, it indicates on systems
which have bunch of cpus and system-call overhead is low (athlon64, opteron,
and core-2 are known to be fast), the adaptive spin does help performance.

Added sysctls:
    kern.threads.umtx_dflt_spins
        if the sysctl value is non-zero, a zero umutex.m_spincount will
        cause the sysctl value to be used a spin cycle count.
    kern.threads.umtx_max_spins
        the sysctl sets upper limit of spin cycle count.

Tested on: Athlon64 X2 3800+, Dual Xeon 5130
2006-12-20 04:40:39 +00:00
julian
396ed947f6 Threading cleanup.. part 2 of several.
Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs.  Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.
2006-12-06 06:34:57 +00:00
marcel
aee58e2da0 Since printf also has at least one critical section, we need to
initialize pc_curthread. While here, rename early_pcpu to pcpu0
to be conistent (compare thread0 and proc0).
2006-11-18 23:15:25 +00:00
marcel
a76d88c55c Now that printf() needs the PCPU, set it up before we call printf().
Change the pc_pcb field from a pointer to struct pcb to struct pcb
so that sizeof(struct pcb) includes the PCB we use for IPI_STOP.
Statically declare early_pcb so that we don't have to allocate the
PCB for thread0. This way we can setup the PCPU before cninit()
and thus before we use printf().
2006-11-18 21:52:26 +00:00
marcel
08b7cebe65 Revert previous commit. PC_CONS_BUFR is not used nor needed by
assembly.
2006-11-18 21:48:13 +00:00
ru
8beeb4382c Fix a comment. 2006-11-13 06:26:57 +00:00
alc
6093953d36 Make pmap_enter() responsible for setting PG_WRITEABLE instead
of its caller.  (As a beneficial side-effect, a high-contention
acquisition of the page queues lock in vm_fault() is eliminated.)
2006-11-12 21:48:34 +00:00
rwatson
2e2a6d6a66 Add missing includes of priv.h. 2006-11-06 17:43:10 +00:00
rwatson
10d0d9cf47 Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
jb
0ba5da19a5 Remove the KDTRACE option again because of the complaints about having
it as a default.

For the record, the KDTRACE option caused _no_ additional source files
to be compiled in; certainly no CDDL source files. All it did was to
allow existing BSD licensed kernel files to include one or more CDDL
header files.

By removing this from DEFAULTS, the onus is on a kernel builder to add
the option to the kernel config, possibly by including GENERIC and
customising from there. It means that DTrace won't be a feature
available in FreeBSD by default, which is the way I intended it to be.

Without this option, you can't load the dtrace module (which contains
the dtrace device and the DTrace framework). This is equivalent to
requiring an option in a kernel config before you can load the linux
emulation module, for example.

I think it is a mistake to have DTrace ported to FreeBSD, but not
to have it available to everyone, all the time. The only exception
to this is the companies which distribute systems with FreeBSD embedded.
Those companies will customise their systems anyway. The KDTRACE
option was intended for them, and only them.
2006-11-04 23:50:12 +00:00
jb
f7bc0a87d6 Build in kernel support for loading DTrace modules by default. This
adds the hooks that DTrace modules register with, and adds a few functions
which have the dtrace_ prefix to allow the DTrace FBT (function boundary
trace) provider to avoid tracing because they are called from the DTtrace
probe context.

Unlike other forms of tracing and debug, DTrace support in the kernel
incurs negligible run-time cost.

I think the only reason why anyone wouldn't want to have kernel support
enabled for DTrace would be due to the license (CDDL) under which DTrace
is released.
2006-11-04 04:58:10 +00:00
marcel
c4a6a4c4a7 Make sure kern_envp is never NULL. If we don't get a pointer to
the environment from the loader, use the static environment.
2006-11-03 04:06:17 +00:00
jb
d2bd807356 Add a cnputs() function to write a string to the console with
a lock to prevent interspersed strings written from different CPUs
at the same time.

To avoid putting a buffer on the stack or having to malloc one,
space is incorporated in the per-cpu structure. The buffer
size if 128 bytes; chosen because it's the next power of 2 size
up from 80 characters.

String writes to the console are buffered up the end of the line
or until the buffer fills. Then the buffer is flushed to all
console devices.

Existing low level console output via cnputc() is unaffected by
this change. ithread calls to log() are also unaffected to avoid
blocking those threads.

A minor change to the behaviour in a panic situation is that
console output will still be buffered, but won't be written to
a tty as before. This should prevent interspersed panic output
as a number of CPUs panic before we end up single threaded
running ddb.

Reviewed by:	scottl, jhb
MFC after:	2 weeks
2006-11-01 04:54:51 +00:00
jb
d20db01f97 Remove the KSE option now that it's in DEFAULTS on these arches/machines.
The 'nooption' kernel config entry has to be used to turn KSE off now.
This isn't my preferred way of dealing with this, but I'll defer to
scottl's experience with the io/mem kernel option change and the grief
experienced over that.

Submitted by:	scottl@
2006-10-26 22:11:35 +00:00
jb
3a250c3e16 Add 'options KSE' to the kernel config DEFAULTS on all arches/machines
except sun4v.

This change makes the transition from a default to an option more
transparent and is an attempt to head off all the compliants that are
likely from people who don't read UPDATING, based on experience with
the io/mem change.

Submitted by:	scottl@
2006-10-26 22:05:25 +00:00
jb
f82c799735 Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by:	davidxu@
2006-10-26 21:42:22 +00:00
ru
dcc4e06e70 Move "device splash" back to MI NOTES and "files", it's MI. 2006-10-23 13:23:14 +00:00
marcel
9f9cb15f4e o Eliminate nexus_print_resources(). Use resource_list_print_type()
instead.
o  Eliminate nexus_print_all_resources(). Inline the function body
   in nexus_print_child().
2006-10-23 00:38:58 +00:00
alc
ab1a7ca9a2 Eliminate unnecessary PG_BUSY tests. 2006-10-22 04:18:01 +00:00
des
a3f4000fda Move more MD devices and options out of MI NOTES. 2006-10-20 09:52:27 +00:00
des
5658cd6451 The VGA_DEBUG option only exists on {amd64,i386,ia64}.
Also remove 'device io' from amd64 NOTES; DEFAULTS takes care of it.
2006-10-20 08:56:26 +00:00
marcel
82d986f8fd Fix previous revision:
o  day and mday are the same. No need to subtract 1 from mday.
o  Set dow to -1 as clock_ct_to_ts() checks this field and
   returns EINVAL on any day of the week but Sunday.
2006-10-19 00:53:35 +00:00
davidxu
bb5a3880aa o Add keyword volatile for user mutex owner field.
o Fix type consistent problem by using type long for old
  umtx and wait channel.
o Rename casuptr to casuword.
2006-10-17 02:24:47 +00:00
hrs
51beaea0b5 Add a newline to the printf().
Spotted by:	Peter Carah <pete@altadena.net>
MFC after:	3 days
2006-10-15 16:52:59 +00:00
marcel
a0da43241f Include freebsd32_signal.h now that signal-related definitions are
moved there.

Found by: ia64 tinderbox.
2006-10-06 19:33:44 +00:00
simon
440a3866d6 - Remove SCHED_ULE from GENERIC to better avoid foot-shooting by
unsuspecting users.
- Add a comment in NOTES about experimental status of SCHED_ULE.
- Make warning about experimental status in sched_ule(4) a bit
  stronger.

Suggested and reviewed by:	dougb
Discussed on:			developers
MFC after:			3 days
2006-10-05 20:31:58 +00:00
jb
bf543444cf PR:
Submitted by:
Reviewed by:
Approved by:
Obtained from:
MFC after:
Security:
Move the relocation definitions to the common elf header so that DTrace
can use them on one architecture targeted to a different one.

Add the additional ELF types defines in Sun's "Linker and Libraries"
manual.
2006-10-04 21:37:10 +00:00
phk
f245b890e0 Use calendrical calculations from subr_clock.c instead of home-rolled. 2006-10-02 16:32:36 +00:00
phk
84cc0f277f Second part of a little cleanup in the calendar/timezone/RTC handling.
Split subr_clock.c in two parts (by repo-copy):
   subr_clock.c contains generic RTC and calendaric stuff. etc.
   subr_rtc.c contains the newbus'ified RTC interface.

Centralize the machdep.{adjkerntz,disable_rtc_set,wall_cmos_clock}
sysctls and associated variables into subr_clock.c.  They are
not machine dependent and we have generic code that relies on being
present so they are not even optional.
2006-10-02 15:42:02 +00:00
phk
50c81b8a9a First part of a little cleanup in the calendar/timezone/RTC handling.
Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.
2006-10-02 12:59:59 +00:00
ru
d4abbeb0fe Added COMPAT_FREEBSD6 option. 2006-09-26 12:36:34 +00:00
kan
c9b2659ee8 Use __builtin_va_start instead of __builtin_stdarg_start. GCC4 obsoletes
the former and  __builtin_va_start was present in all GCC version 3.1 and
later.
2006-09-21 01:37:02 +00:00
rwatson
76eda1318a Add audit hooks for ppc, ia64 system call paths.
Reviewed by:	marcel (ia64)
Obtained from:	TrustedBSD Project
MFC after:	3 days
2006-09-16 17:03:02 +00:00
davidxu
87b5aa08ee Implement casuword32, compare and set user integer, thank Marcel Moolenarr
who wrote the IA64 version of casuword32.
2006-08-28 02:28:15 +00:00
alc
72ff1a9186 Eliminate unused definitions. (They came from NetBSD.)
Discussed with: cognet, grehan, marcel
2006-08-25 23:51:11 +00:00
jhb
ce9f8963fd First pass at allowing memory to be mapped using cache modes other than
WB (write-back) on x86 via control bits in PTEs and PDEs (including making
use of the PAT MSR).  Changes include:
- A new pmap_mapdev_attr() function for amd64 and i386 which takes an
  additional parameter (relative to pmap_mapdev()) specifying the cache
  mode for this mapping.  Note that on amd64 only WB mappings are done with
  the direct map, all other modes result in a private mapping.
- pmap_mapdev() on i386 and amd64 now defaults to using UC (uncached)
  mappings rather than WB.  Previously we relied on the BIOS setting up
  MTRR's to enforce memio regions being treated as UC.  This might make
  hw.cbb_start_memory unnecessary in some cases now for example.
- A new pmap_mapbios()/pmap_unmapbios() API has been added to allow places
  that used pmap_mapdev() to map non-device memory (such as ACPI tables)
  to do so using WB as before.
- A new pmap_change_attr() function for amd64 and i386 that changes the
  caching mode for a range of KVA.

Reviewed by:	alc
2006-08-11 19:22:57 +00:00
alc
a152234cf9 Complete the transition from pmap_page_protect() to pmap_remove_write().
Originally, I had adopted sparc64's name, pmap_clear_write(), for the
function that is now pmap_remove_write().  However, this function is more
like pmap_remove_all() than like pmap_clear_modify() or
pmap_clear_reference(), hence, the name change.

The higher-level rationale behind this change is described in
src/sys/amd64/amd64/pmap.c revision 1.567.  The short version is that I'm
trying to clean up and fix our support for execute access.

Reviewed by: marcel@ (ia64)
2006-08-01 19:06:06 +00:00
marcel
7067faff16 Remove sio(4) and related options from MI files to amd64, i386
and pc98 MD files. Remove nodevice and nooption lines specific
to sio(4) from ia64, powerpc and sparc64 NOTES. There were no
such lines for arm yet.
sio(4) is usable on less than half the platforms, not counting
a future mips platform. Its presence in MI files is therefore
increasingly becoming a burden.
2006-07-29 18:38:54 +00:00
jhb
3a707d012d Retire SYF_ARGMASK and remove both SYF_MPSAFE and SYF_ARGMASK. sy_narg is
now back to just being an argument count.
2006-07-28 20:22:58 +00:00
jhb
c62c38439f Now that all system calls are MPSAFE, retire the SYF_MPSAFE flag used to
mark system calls as being MPSAFE:
- Stop conditionally acquiring Giant around system call invocations.
- Remove all of the 'M' prefixes from the master system call files.
- Remove support for the 'M' prefix from the script that generates the
  syscall-related files from the master system call files.
- Don't explicitly set SYF_MPSAFE when registering nfssvc.
2006-07-28 19:05:28 +00:00
jhb
12302c47d0 Unify the checking for lock misbehavior in the various syscall()
implementations and adjust some of the checks while I'm here:
- Add a new check to make sure we don't return from a syscall in a critical
  section.
- Add a new explicit check before userret() to make sure we don't return
  with any locks held.  The advantage here is that we can include the
  syscall number and name in syscall() whereas that info is not available
  in userret().
- Drop the mtx_assert()'s of sched_lock and Giant.  They are replaced by
  the more general checks just added.

MFC after:	2 weeks
2006-07-27 22:32:30 +00:00
jhb
d832089727 Add KTR_SYSC tracing to the syscall() implementations that didn't have it
yet.

MFC after:	1 week
2006-07-27 21:25:50 +00:00
jhb
39705fd8c6 Add missing ptrace(2) system-call stops to various syscall()
implementations.

MFC after:	1 week
2006-07-27 19:50:16 +00:00
marcel
444a271f09 Move default GEOM classes from files.ia64, where they were marked
standard, to the DEFAULTS file.
2006-07-17 20:02:51 +00:00
jhb
a72b0bcd7f Simplify the pager support in DDB. Allowing different db commands to
install custom pager functions didn't actually happen in practice (they
all just used the simple pager and passed in a local quit pointer).  So,
just hardcode the simple pager as the only pager and make it set a global
db_pager_quit flag that db commands can check when the user hits 'q' (or a
suitable variant) at the pager prompt.  Also, now that it's easy to do so,
enable paging by default for all ddb commands.  Any command that wishes to
honor the quit flag can do so by checking db_pager_quit.  Note that the
pager can also be effectively disabled by setting $lines to 0.

Other fixes:
- 'show idt' on i386 and pc98 now actually checks the quit flag and
  terminates early.
- 'show intr' now actually checks the quit flag and terminates early.
2006-07-12 21:22:44 +00:00
mjacob
672bdf0dbc Make the firmware assist driver resident in
preparation for isp using it.
2006-07-09 16:40:31 +00:00
bde
c3398d0c39 Fixed FP_R*. fp{get_set}round() apparently never worked on ia64, since
the alpha values were used and are quite different.

Fixed some style bugs by copying from the i386 version where it is better.
2006-07-05 06:10:21 +00:00
marcel
d7fe1ba45f Partial support for branch long emulation. This only emulates the
branch long jump and not the branch long call. Support for that is
forthcoming.
2006-06-29 19:59:18 +00:00
alc
e05c27b796 Make several changes to pmap_enter_quick_locked():
1. Make the caller responsible for performing pmap_install().  This reduces
the number of times that pmap_install() is performed by
pmap_enter_object() from twice per page to twice overall.

2. Don't block if pmap_find_pte() is unable to allocate a PTE.  If it did
block, then it might wind up mapping a cache page.  Specifically, if
pmap_enter_quick_locked() slept when called from pmap_enter_object(), the
page daemon could change an active or inactive page into a cache page just
before it was to be mapped.

3. Bail out of pmap_enter_quick_locked() if pv entries aren't plentiful.
In other words, don't force the allocation of a pv entry if they aren't
readily available.

Reviewed by: marcel@
2006-06-27 05:05:05 +00:00
babkin
f0555f2de9 Backed out the change by request from rwatson.
PR:		kern/14584
2006-06-26 22:03:22 +00:00
babkin
3d8be823b0 The common UID/GID space implementation. It has been discussed on -arch
in 1999, and there are changes to the sysctl names compared to PR,
according to that discussion. The description is in sys/conf/NOTES.
Lines in the GENERIC files are added in commented-out form.
I'll attach the test script I've used to PR.

PR:		kern/14584
Submitted by:	babkin
2006-06-25 18:37:44 +00:00
marcel
90ebd15455 Update to SDM 2.2:
o  Add tf (test feature) instruction,
o  Add vmsw (VM switch) instruction.

While here, update copyright.

MFC after: 1 week
2006-06-24 19:21:11 +00:00
marcel
f5d673ba84 Sync up with SDM 2.1:
o  Add nop/hint formats F16, I18, M48 and X5,
o  Add format M47 for ptc.e,
o  Add hint instruction,
o  Fix decoding of cmp8xchg16.
2006-06-24 01:19:52 +00:00
marcel
27e6a73de1 Identify the cual-core Montecito.
MFC after: 3 days
2006-06-22 00:56:58 +00:00
netchild
11681ee0b5 Remove COMPAT_43 from GENERIC (and other kernel configs). For amd64 there's
an explicit comment that it's needed for the linuxolator. This is not the
case anymore. For all other architectures there was only a "KEEP THIS".
I'm (and other people too) running a COMPAT_43-less kernel since it's not
necessary anymore for the linuxolator. Roman is running such a kernel for a
for longer time. No problems so far. And I doubt other (newer than ia32
or alpha) architectures really depend on it.

This may result in a small performance increase for some workloads.

If the removal of COMPAT_43 results in a not working program, please
recompile it and all dependencies and try again before reporting a
problem.

The only place where COMPAT_43 is needed (as in: does not compile without
it) is in the (outdated/not usable since too old) svr4 code.

Note: this does not remove the COMPAT_43TTY option.

Nagging by:	rdivacky
2006-06-15 19:58:53 +00:00
ups
b3a7439a45 Remove mpte optimization from pmap_enter_quick().
There is a race with the current locking scheme and removing
it should have no measurable performance impact.
This fixes page faults leading to panics in pmap_enter_quick_locked()
on amd64/i386.

Reviewed by: alc,jhb,peter,ps
2006-06-15 01:01:06 +00:00
imp
038d1db25e Add the ability to subset the devices that UART pulls in. This allows
the arm to compile without all the extras that don't appear, at least
not in the flavors of ARM I deal with.  This helps us save about 100k.

If I've botched the available devices on a platform, please let me
know and I'll correct ASAP.
2006-06-12 04:21:50 +00:00
alc
ff4adb11fe Introduce the function pmap_enter_object(). It maps a sequence of resident
pages from the same object.  Use it in vm_map_pmap_enter() to reduce the
locking overhead of premapping objects.

Reviewed by: tegge@
2006-06-05 20:35:27 +00:00
imp
7a24ed8d3d EISA bus ia64 systems don't exist in reality. I'm told they may exist in
theory, but that it was OK to remove from NOTES.

OK'd by: marcel
2006-06-02 04:46:26 +00:00
alc
af1bb99b4d Correct a syntax error in the previous revision. 2006-06-01 19:23:45 +00:00
silby
89bd691dee After much discussion with mjacob and scottl, change bus_dmamem_alloc so
that it just warns the user with a printf when it misaligns a piece
of memory that was requested through a busdma tag.

Some drivers (such as mpt, and probably others) were asking for alignments
that could not be satisfied, but as far as driver operation was concerned,
that did not matter.  In the theory that other drivers will fall into
this same category, we agreed that panicing or making the allocation
fail will cause more hardship than is necessary.  The printf should
be sufficient motivation to get the driver glitch fixed.
2006-06-01 04:49:29 +00:00
mjacob
51170c3bdd Since it's to all intents and purposes identical
code to amd64 && i386, match the recent changes
to bus_dmamem_alloc here.
2006-05-31 00:38:53 +00:00
marcel
efa2062722 Unbreak after previous commit. While here, improve function naming
consistency by s/ssc/ssc_/g.
2006-05-27 17:52:08 +00:00
phk
a45f741c1f Update to new console api. 2006-05-26 18:25:34 +00:00
marius
1a141a2cee Add le(4). I could actually only test it on alpha, i386 and sparc64 but
given that this includes the more problematic platforms I see no reason
why it shouldn't also work on amd64 and ia64.
2006-05-17 20:45:45 +00:00
phk
ef310efff8 Since DELAY() was moved, most <machine/clock.h> #includes have been
unnecessary.
2006-05-16 14:37:58 +00:00
marcel
10f0f16487 Fix braino in previous commit: Don't redefine OID_AUTO to something
not equal to -1, or at all for that matter.
2006-05-11 22:49:31 +00:00
phk
d12dd358d4 Remove more straggling CPU_ macro references 2006-05-11 17:53:26 +00:00
phk
5d8c57a08b Clean out sysctl machdep.* related defines.
The cmos clock related stuff should really be in MI code.
2006-05-11 17:29:25 +00:00
marcel
193a6144b9 Rewrite of puc(4). Significant changes are:
o  Properly use rman(9) to manage resources. This eliminates the
   need to puc-specific hacks to rman. It also allows devinfo(8)
   to be used to find out the specific assignment of resources to
   serial/parallel ports.
o  Compress the PCI device "database" by optimizing for the common
   case and to use a procedural interface to handle the exceptions.
   The procedural interface also generalizes the need to setup the
   hardware (program chipsets, program clock frequencies).
o  Eliminate the need for PUC_FASTINTR. Serdev devices are fast by
   default and non-serdev devices are handled by the bus.
o  Use the serdev I/F to collect interrupt status and to handle
   interrupts across ports in priority order.
o  Sync the PCI device configuration to include devices found in
   NetBSD and not yet merged to FreeBSD.
o  Add support for Quatech 2, 4 and 8 port UARTs.
o  Add support for a couple dozen Timedia serial cards as found
   in Linux.
2006-04-28 21:21:53 +00:00
marcel
35fe8d3d11 In nexus_teardown_intr(), actually remove the handler.
MFC after: 1 day
2006-04-21 16:12:28 +00:00
imp
34755358c7 Set the rid of the resource obtained from rman_reserve_resource. 2006-04-20 04:18:30 +00:00
alc
a7e3d6f83b Retire pmap_track_modified(). We no longer need it because we do not
create managed mappings within the clean submap.  To prevent regressions,
add assertions blocking the creation of managed mappings within the clean
submap.

Reviewed by: tegge
2006-04-12 04:22:52 +00:00