Commit Graph

5167 Commits

Author SHA1 Message Date
Alan Cox
9a8f043722 Increase the ceiling on the size of the buffer map. 2008-07-19 23:42:38 +00:00
Alan Cox
59a23cacd4 Correct an error in pmap_change_attr()'s initial loop that verifies that the
given range of addresses are mapped.  Previously, the loop was testing the
same address every time.

Submitted by:	Magesh Dhasayyan
2008-07-18 22:05:51 +00:00
Alan Cox
53d13c6030 Simplify pmap_extract()'s control flow, making it more like the related
functions pmap_extract_and_hold() and pmap_kextract().
2008-07-18 20:07:50 +00:00
Alan Cox
36e6513df5 Update bus_dmamem_alloc()'s first call to malloc() such that M_WAITOK is
specified when appropriate.

Reviewed by:	scottl
2008-07-15 03:34:49 +00:00
Alan Cox
cfcbf8c6fd Handle a race between pmap_kextract() and pmap_promote_pde(). This race
caused ZFS to crash when restoring a snapshot with superpage promotion
enabled.

Reported by:	kris
2008-07-13 18:19:53 +00:00
Ed Schouten
f4d811f0b2 Make uart(4) the default serial port driver on i386 and amd64.
The uart(4) driver has the advantage of supporting a wider variety of
hardware on a greater amount of platforms. This driver has already been
the standard on platforms such as ia64, powerpc and sparc64.

I've decided not to change anything on pc98. I'd rather let people from
the pc98 team look at this.

Approved by:	philip (mentor), marcel
2008-07-13 07:20:14 +00:00
Alan Cox
8bfadfd616 Refine the changes made in SVN rev 180430. Specifically, instantiate a new
page table page only if the 2MB page mapping has been used.  Also, refactor
some assertions.
2008-07-12 21:24:42 +00:00
Alan Cox
85a0a1be91 In order to apply pmap_demote_pde() to a page directory entry (PDE) from the
direct map, the PDE must have PG_M and PG_A preset.

Noticed by: Magesh Dhasayyan
2008-07-12 18:43:57 +00:00
Alan Cox
e1cb4a353c Extend pmap_demote_pde() to include the ability to instantiate a new page
table page where none existed before.
2008-07-10 16:22:24 +00:00
Peter Wemm
401989b00b Band-aid a problem with 32 bit selector setup.
Initialize %ds, %es, and %fs during CPU startup.  Otherwise a garbage
value could leak to a 32-bit process if a process migrated to a different
CPU after exec and the new CPU had never exec'd a 32-bit process.

A more complete fix is needed, but this mitigates the most frequent
manifestations.

Obtained from:	ups
2008-07-09 19:44:37 +00:00
Alan Cox
bb7964b205 Fix lines that are too long in pmap_growkernel() by substituting shorter but
equivalent expressions.
2008-07-09 06:04:10 +00:00
Alan Cox
8136b7265f Eliminate pmap_growkernel()'s dependence on create_pagetables() preallocating
page directory pages from VM_MIN_KERNEL_ADDRESS through the end of the
kernel's bss.  Specifically, the dependence was in pmap_growkernel()'s one-
time initialization of kernel_vm_end, not in its main body.  (I could not,
however, resist the urge to optimize the main body.)

Reduce the number of preallocated page directory pages to just those needed
to support NKPT page table pages.  (In fact, this allows me to revert a
couple of my earlier changes to create_pagetables().)
2008-07-08 22:59:17 +00:00
Alan Cox
6d2005e71a Rev 180333, ``Change create_pagetables() and pmap_init() so that many fewer
page table pages have to be preallocated ...'', violates an assumption made
by minidumpsys(): kernel_vm_end is the highest virtual address that has ever
been used by the kernel.  Now, however, the kernel code, data, and bss may
reside at addresses beyond kernel_vm_end.  This revision modifies the upper
bound on minidumpsys()'s two page table traversals to account for this
possibility.
2008-07-08 04:00:22 +00:00
Xin LI
dbd47f1592 Add HWPMC_HOOKS to GENERIC kernels, this makes hwpmc.ko work out
of the box.
2008-07-07 22:55:11 +00:00
Alan Cox
cc82a18b88 In FreeBSD 7.0 and beyond, pmap_growkernel() should pass VM_ALLOC_INTERRUPT
to vm_page_alloc() instead of VM_ALLOC_SYSTEM.  VM_ALLOC_SYSTEM was the
logical choice before FreeBSD 7.0 because VM_ALLOC_INTERRUPT could not
reclaim a cached page.  Simply put, there was no ordering between
VM_ALLOC_INTERRUPT and VM_ALLOC_SYSTEM as to which "dug deeper" into the
cache and free queues.  Now, there is; VM_ALLOC_INTERRUPT dominates
VM_ALLOC_SYSTEM.

While I'm here, teach pmap_growkernel() to request a prezeroed page.

MFC after:	1 week
2008-07-07 17:25:09 +00:00
Alan Cox
4a7c66163b Change create_pagetables() and pmap_init() so that many fewer page table
pages have to be preallocated by create_pagetables().
2008-07-06 22:36:28 +00:00
Alan Cox
13e0058451 Increase the kernel map's size to 7GB, making room for a kmem map of size
greater than 4GB.  (Auto-sizing will set the ceiling on the kmem map size
to 4.2GB.)
2008-07-05 20:44:55 +00:00
Alan Cox
0cbeb44158 Eliminate an unused declaration. (In fact, the declaration is bogus
because the variable is defined static to pmap.c on i386.)

Found by: CScout
2008-07-04 17:36:12 +00:00
Alan Cox
db0a9105b1 Increase the ceiling on the kmem map's size to 3.6GB. Also, define the
ceiling as a fraction of the kernel map's size rather than an absolute
quantity.  Thus, scaling of the kmem map's size will be automatic with
changes to the kernel map's size.
2008-07-03 04:53:14 +00:00
Alan Cox
c4a6405c88 Eliminate an unnecessary static variable: nkpt. 2008-07-02 05:41:23 +00:00
Alan Cox
17e2138882 Document the layout of the address space, borrowing heavily from
http://lists.freebsd.org/pipermail/freebsd-amd64/2005-July/005578.html
2008-06-30 03:14:39 +00:00
Alan Cox
67ce249ac9 Compute NKPDPE from NKPT. This reduces the number of knobs that must be
turned in order to change the size of the kernel virtual address space.
2008-06-30 02:35:55 +00:00
Alan Cox
ce3cb38836 Strictly speaking, the definition of VM_MAX_KERNEL_ADDRESS is wrong. However,
in practice, the error (currently) makes no difference because the computation
performed by KVADDR() hides the error.  This revision fixes the error.

Also, eliminate a (now) unused definition.
2008-06-29 19:13:27 +00:00
Alan Cox
f4f491d095 Increase the size of the kernel virtual address space to 6GB. Until the
maximum size of the kmem map can be greater than 4GB, there is little point
in making the kernel virtual address space larger than 6GB.

Tested by:	kris@
2008-06-29 18:35:00 +00:00
Ed Schouten
721351876c Remove the unused major/minor numbers from iodev and memdev.
Now that st_rdev is being automatically generated by the kernel, there
is no need to define static major/minor numbers for the iodev and
memdev. We still need the minor numbers for the memdev, however, to
distinguish between /dev/mem and /dev/kmem.

Approved by:	philip (mentor)
2008-06-25 07:45:31 +00:00
Jung-uk Kim
b86977a5ab Emit opcodes closer to GNU as(1) generated codes and micro-optimize. 2008-06-24 20:12:12 +00:00
Jung-uk Kim
292f013c88 Rehash and clean up BPF JIT compiler macros to match AT&T notations. 2008-06-23 23:09:52 +00:00
Alan Cox
bd4328d3a6 Ensure that KERNBASE is no less than the virtual address -2GB. 2008-06-23 15:22:53 +00:00
Alan Cox
0ee1368a96 Prepare for a larger kernel virtual address space. Specifically, once
KERNBASE and VM_MIN_KERNEL_ADDRESS are no longer the same, the physical
memory allocated during bootstrap will be offset from the low-end of the
kernel's page table.
2008-06-21 19:19:09 +00:00
Alan Cox
948c5cc27e Make preparations for increasing the size of the kernel virtual
address space on the amd64 architecture.  The amd64 architecture
requires kernel code and global variables to reside in the highest 2GB
of the 64-bit virtual address space.  Thus, KERNBASE cannot change.
However, KERNBASE is sometimes used as the start of the kernel virtual
address space.  Henceforth, VM_MIN_KERNEL_ADDRESS should be used
instead.  Since KERNBASE and VM_MIN_KERNEL_ADDRESS are still the same
address, there should be no visible effect from this change (yet).
That said, kris@ has tested crash dumps under the full patch that
increases the kernel virtual address space on amd64 to 6GB.

Tested by: kris@
2008-06-20 20:59:31 +00:00
Xin LI
4d52a57549 Add et(4), a port of DragonFly's Agere ET1310 10/100/Gigabit
Ethernet device driver, written by sephe@

Obtained from:	DragonFly
Sponsored by:	iXsystems
MFC after:	2 weeks
2008-06-20 19:28:33 +00:00
Alan Cox
293ab7c941 Make preparations for increasing the size of the kernel virtual
address space on the amd64 architecture.  The amd64 architecture
requires kernel code and global variables to reside in the highest 2GB
of the 64-bit virtual address space.  Thus, KERNBASE cannot change.
However, KERNBASE is sometimes used as the start of the kernel virtual
address space.  Henceforth, VM_MIN_KERNEL_ADDRESS should be used
instead.  Since KERNBASE and VM_MIN_KERNEL_ADDRESS are still the same
address, there should be no visible effect from this change (yet).
2008-06-20 05:22:09 +00:00
Alan Cox
f9a4e9e4a9 Tweak the promotion test in pmap_promote_pde(). Specifically, test PG_A
before PG_M.  This sometimes prevents unnecessary removal of write access
from a PTE.  Overall, the net result is fewer demotions and promotion
failures.
2008-06-13 19:33:56 +00:00
Alan Cox
9d1b7fa31f Reverse the direction of pmap_promote_pde()'s traversal over the specified
page table page.  The direction of the traversal can matter if
pmap_promote_pde() has to remove write access (PG_RW) from a PTE that hasn't
been modified (PG_M).  In general, if there are two or more such PTEs to
choose among, it is better to write protect the one nearer the high end of
the page table page rather than the low end.  This is because most programs
access memory in an ascending direction.  The net result of this change is a
sometimes significant reduction in the number of failed promotion attempts
and the number of pages that are write protected by pmap_promote_pde().
2008-06-12 05:18:09 +00:00
Alan Cox
71e26e2c0e Correct an error in pmap_promote_pde() that may result in an errant
promotion within the kernel's address space.  Specifically,
pmap_promote_pde() is only called when the page table page (PTP) that
is referenced by the given PDE has a full "use count", i.e., its
wire_count is 512.  Although this guarantees for a user address space
that all 512 PTEs in the PTP hold valid mappings, the same is not true
of the kernel's address space.  A kernel PTP always has a use count of
512 regardless of the state of the PTEs.  Therefore,
pmap_promote_pde() should not assume (or assert) that the first PTE in
the PTP is valid.
2008-06-01 07:36:59 +00:00
Pyun YongHyeon
20f99a5be4 Add jme(4) to the list of drivers supported by GENERIC kernel. 2008-05-27 02:22:32 +00:00
Bjoern A. Zeeb
2e598474fa Remove ISDN4BSD (I4B) from HEAD as it is not MPSAFE and
parts relied on the now removed NET_NEEDS_GIANT.
Most of I4B has been disconnected from the build
since July 2007 in HEAD/RELENG_7.

This is what was removed:
- configuration in /etc/isdn
- examples
- man pages
- kernel configuration
- sys/i4b (drivers, layers, include files)
- user space tools
- i4b support from ppp
- further documentation

Discussed with: rwatson, re
2008-05-26 10:40:09 +00:00
John Birrell
367f3ce5e6 Add the DTrace hooks for exception handling (Function boundary trace
-fbt- provider), cyclic clock and syscalls.
2008-05-24 06:32:26 +00:00
Alan Cox
d1fdd63483 The VM system no longer uses setPQL2(). Remove it and its helpers. 2008-05-23 04:03:54 +00:00
Pyun YongHyeon
83a17b90eb Add age(4) to the list of drivers supported by GENERIC kernel. 2008-05-19 02:30:27 +00:00
Alan Cox
1ec1304bdb Retire pmap_addr_hint(). It is no longer used. 2008-05-18 04:16:57 +00:00
Remko Lodder
6e535f6e5b Resort the if_ti driver to match the PCI Network cards instead of placing
it under the mii devices list.

PR:		kern/123147
Submitted by:	gavin
Approved by:	imp (mentor, implicit)
MFC after:	3 days
2008-05-17 23:50:00 +00:00
Attilio Rao
13d4b2b0bc Removed unused assembly offsets for structures digging. 2008-05-16 13:23:47 +00:00
Roman Divacky
7c0cc5f941 Regen.
Approved by:	kib (mentor)
2008-05-13 20:02:26 +00:00
Roman Divacky
4732e446fb Implement robust futexes. Most of the code is modelled after
what Linux does. This is because robust futexes are mostly
userspace thing which we cannot alter. Two syscalls maintain
pointer to userspace list and when process exits a routine
walks this list waking up processes sleeping on futexes
from that list.

Reviewed by:	kib (mentor)
MFC after:	1 month
2008-05-13 20:01:27 +00:00
Alan Cox
ef4d480ced Correct an error in pmap_align_superpage(). Specifically, correctly
handle the case where the mapping is greater than a superpage in size
but the alignment of the physical pages spans a superpage boundary.
2008-05-11 20:33:47 +00:00
Alan Cox
d3249b142b Introduce pmap_align_superpage(). It increases the starting virtual
address of the given mapping if a different alignment might result in more
superpage mappings.
2008-05-09 16:48:07 +00:00
Sam Leffler
6c26723b19 enable IEEE80211_DEBUG and IEEE80211_AMPDU_AGE by default 2008-05-03 17:05:38 +00:00
Sam Leffler
3971d07be7 Intel 4965 wireless driver (derived from openbsd driver of the same name) 2008-04-29 21:36:17 +00:00
Alan Cox
26b77ff3b1 Always use PG_PS_FRAME to extract the physical address of a 2/4MB page
from a PDE.
2008-04-25 16:00:39 +00:00
Jeff Roberson
6c47aaae12 - Add an integer argument to idle to indicate how likely we are to wake
from idle over the next tick.
 - Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are
   suspended in cpu specific states.  This function can fail and cause the
   scheduler to fall back to another mechanism (ipi).
 - Implement support for mwait in cpu_idle() on i386/amd64 machines that
   support it.  mwait is a higher performance way to synchronize cpus
   as compared to hlt & ipis.
 - Allow selecting the idle routine by name via sysctl machdep.idle.  This
   replaces machdep.cpu_idle_hlt.  Only idle routines supported by the
   current machine are permitted.

Sponsored by:	Nokia
2008-04-25 05:18:50 +00:00
Roman Divacky
a6d043e30d Implement linux_truncate64() syscall.
Tested by:	Aline de Freitas <aline@riseup.net>
Approved by:	kib (mentor)
2008-04-23 15:56:33 +00:00
Poul-Henning Kamp
9b4a8ab7ba Now that all platforms use genclock, shuffle things around slightly
for better structure.

Much of this is related to <sys/clock.h>, which should really have
been called <sys/calendar.h>, but unless and until we need the name,
the repocopy can wait.

In general the kernel does not know about minutes, hours, days,
timezones, daylight savings time, leap-years and such.  All that
is theoretically a matter for userland only.

Parts of kernel code does however care: badly designed filesystems
store timestamps in local time and RTC chips almost universally
track time in a YY-MM-DD HH:MM:SS format, and sometimes in local
timezone instead of UTC.  For this we have <sys/clock.h>

<sys/time.h> on the other hand, deals with time_t, timeval, timespec
and so on.  These know only seconds and fractions thereof.

Move inittodr() and resettodr() prototypes to <sys/time.h>.
Retain the names as it is one of the few surviving PDP/VAX references.

Move startrtclock() to <machine/clock.h> on relevant platforms, it
is a MD call between machdep.c/clock.c.  Remove references to it
elsewhere.

Remove a lot of unnecessary <sys/clock.h> includes.

Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs.
XXX: should be kern.disable_rtc_set really, it's not MD.
2008-04-22 19:38:30 +00:00
Sam Leffler
b032f27c36 Multi-bss (aka vap) support for 802.11 devices.
Note this includes changes to all drivers and moves some device firmware
loading to use firmware(9) and a separate module (e.g. ral).  Also there
no longer are separate wlan_scan* modules; this functionality is now
bundled into the wlan module.

Supported by:	Hobnob and Marvell
Reviewed by:	many
Obtained from:	Atheros (some bits)
2008-04-20 20:35:46 +00:00
Sam Leffler
f446360711 move awi to the Attic; it will not make the jump to the new world order
Reviewed by:	imp
2008-04-20 19:20:39 +00:00
Peter Wemm
3bf67daaf3 Put in a real isa_irq_pending() stub in order to remove two lines of dmesg
noise from sio per unit.  sio likes to probe if interrupts are configured
correctly by looking at the pending bits of the atpic in order to put a
non-fatal warning on the console.  I think I'd rather read the pending
bits from the apics, but I'm not sure its worth the hassle.
2008-04-19 07:25:57 +00:00
Jeff Roberson
66247efa5a - Add inlines for the monitor and mwait instructions.
Sponsored by:	Nokia
2008-04-18 05:47:56 +00:00
Jung-uk Kim
01c3b1b200 Regenerate. 2008-04-16 19:27:36 +00:00
Jung-uk Kim
26833f3f9a Add stubs for syscalls introduced in Linux 2.6.17 kernel.
Some GNU libc version started using them before 2.6.17 was officially out.

MFC after:	3 days
2008-04-16 19:25:39 +00:00
Warner Losh
917ac33d4e This file is unused on amd64. 2008-04-15 02:10:14 +00:00
Poul-Henning Kamp
36bff1ebfb Convert amd64 and i386 to share the atrtc device driver. 2008-04-14 08:00:00 +00:00
Rui Paulo
6f15a9e57a Connect k8temp(4) to the build. 2008-04-12 14:20:22 +00:00
Jeff Roberson
9b33b154b5 - Add the interrupt vector number to intr_event_create so MI code can
lookup hard interrupt events by number.  Ignore the irq# for soft intrs.
 - Add support to cpuset for binding hardware interrupts.  This has the
   side effect of binding any ithread associated with the hard interrupt.
   As per restrictions imposed by MD code we can only bind interrupts to
   a single cpu presently.  Interrupts can be 'unbound' by binding them
   to all cpus.

Reviewed by:	jhb
Sponsored by:	Nokia
2008-04-11 03:26:41 +00:00
Alan Cox
f4d2c7f13e Correct pmap_copy()'s method for extracting the physical address of a
2/4MB page from a PDE.  Specifically, change it to use PG_PS_FRAME,
not PG_FRAME, to extract the physical address of a 2/4MB page from a
PDE.

Change the last argument passed to pmap_pv_insert_pde() from a
vm_page_t representing the first 4KB page of a 2/4MB page to the
vm_paddr_t of the 2/4MB page.  This avoids an otherwise unnecessary
conversion from a vm_paddr_t to a vm_page_t in pmap_copy().
2008-04-10 16:04:50 +00:00
Konstantin Belousov
50ad4fc65c Regenerate 2008-04-08 09:51:19 +00:00
Konstantin Belousov
48b05c3f82 Implement the linux syscalls
openat, mkdirat, mknodat, fchownat, futimesat, fstatat, unlinkat,
    renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat.

Submitted by:	rdivacky
Sponsored by:	Google Summer of Code 2007
Tested by:	pho
2008-04-08 09:45:49 +00:00
Alan Cox
109d493230 Update pmap_page_wired_mappings() so that it counts 2/4MB page mappings. 2008-04-07 07:38:02 +00:00
John Baldwin
1ee1b68792 Add a MI intr_event_handle() routine for the non-INTR_FILTER case. This
allows all the INTR_FILTER #ifdef's to be removed from the MD interrupt
code.
- Rename the intr_event 'eoi', 'disable', and 'enable' hooks to
  'post_filter', 'pre_ithread', and 'post_ithread' to be less x86-centric.
  Also, add a comment describe what the MI code expects them to do.
- On amd64, i386, and powerpc this is effectively a NOP.
- On arm, don't bother masking the interrupt unless the ithread is
  scheduled in the non-INTR_FILTER case to match what INTR_FILTER did.
  Also, don't bother unmasking the interrupt in the post_filter case if
  we never masked it.  The INTR_FILTER case had been doing this by having
  arm_unmask_irq for the post_filter (formerly 'eoi') hook.
- On ia64, stray interrupts are now masked for the non-INTR_FILTER case.
  They were already masked in the INTR_FILTER case.
- On sparc64, use the a NULL pre_ithread hook and use intr_enable_eoi() for
  both the 'post_filter' and 'post_ithread' hooks to match what the
  non-INTR_FILTER code did.
- On sun4v, retire the ithread wrapper hack by using an appropriate
  'post_ithread' hook instead (it's what 'post_ithread'/'enable' was
  designed to do even in 5.x).

Glanced at by:	piso
Reviewed by:	marius
Requested by:	marius [1], [5]
Tested on:	amd64, i386, arm, sparc64
2008-04-05 19:58:30 +00:00
Alan Cox
2addc03d04 Eliminate an unnecessary test and its misleading comment from pmap_enter(). 2008-04-04 18:00:22 +00:00
Alan Cox
bc8a0d87bd Optimize pmap_pml4e() and pmap_pdpe() based upon two observations: The
given pmap is never NULL, and therefore pmap_pml4e() can never return
NULL.  The pervasive use of these inline functions throughout the pmap
makes these simple changes worthwhile.
2008-04-02 04:39:47 +00:00
Paul Saab
6e7534b8c8 Add support to mincore for detecting whether a page is part of a
"super" page or not.

Reviewed by:	alc, ups
2008-03-28 04:29:27 +00:00
Doug Rabson
fa9d9930ca Add kernel module support for nfslockd and krpc. Use the module system
to detect (or load) kernel NLM support in rpc.lockd. Remove the '-k'
option to rpc.lockd and make kernel NLM the default. A user can still
force the use of the old user NLM by building a kernel without NFSLOCKD
and/or removing the nfslockd.ko module.
2008-03-27 11:54:20 +00:00
John Birrell
e483943791 When building a kernel module, define MAXCPU the same as SMP so
that modules work with and without SMP.
2008-03-27 05:03:26 +00:00
Poul-Henning Kamp
dad3b6c6fd Back in the good old days, PC's had random pieces of rock for
frequency generation and what frequency the generated was anyones
guess.

In general the 32.768kHz RTC clock x-tal was the best, because that
was a regular wrist-watch Xtal, whereas the X-tal generating the
ISA bus frequency was much lower quality, often costing as much as
several cents a piece, so it made good sense to check the ISA bus
frequency against the RTC clock.

The other relevant property of those machines, is that they
typically had no more than 16MB RAM.

These days, CPU chips croak if their clocks are not tightly within
specs and all necessary frequencies are derived from the master
crystal by means if PLL's.

Considering that it takes on average 1.5 second to calibrate the
frequency of the i8254 counter, that more likely than not, we will
not actually use the result of the calibration, and as the final
clincher, we seldom use the i8254 for anything besides BEL in
syscons anyway, it has become time to drop the calibration code.

If you need to tell the system what frequency your i8254 runs,
you can do so from the loader using hw.i8254.freq or using the
sysctl kern.timecounter.tc.i8254.frequency.
2008-03-26 22:12:00 +00:00
Poul-Henning Kamp
3a995824f6 Eliminate unnecessary #includes 2008-03-26 20:26:12 +00:00
Poul-Henning Kamp
e465985885 The "free-lance" timer in the i8254 is only used for the speaker
these days, so de-generalize the acquire_timer/release_timer api
to just deal with speakers.

The new (optional) MD functions are:
	timer_spkr_acquire()
	timer_spkr_release()
and
	timer_spkr_setfreq()

the last of which configures the timer to generate a tone of a given
frequency, in Hz instead of 1/1193182th of seconds.

Drop entirely timer2 on pc98, it is not used anywhere at all.

Move sysbeep() to kern/tty_cons.c and use the timer_spkr*() if
they exist, and do nothing otherwise.

Remove prototypes and empty acquire-/release-timer() and sysbeep()
functions from the non-beeping archs.

This eliminate the need for the speaker driver to know about
i8254frequency at all.  In theory this makes the speaker driver MI,
contingent on the timer_spkr_*() functions existing but the driver
does not know this yet and still attaches to the ISA bus.

Syscons is more tricky, in one function, sc_tone(), it knows the hz
and things are just fine.

In the other function, sc_bell() it seems to get the period from
the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode
the 1193182 and leave it at that.  It's probably not important.

Change a few other sysbeep() uses which obviously knew that the
argument was in terms of i8254 frequency, and leave alone those
that look like people thought sysbeep() took frequency in hertz.

This eliminates the knowledge of i8254_freq from all but the actual
clock.c code and the prof_machdep.c on amd64 and i386, where I think
it would be smart to ask for help from the timecounters anyway [TBD].
2008-03-26 20:09:21 +00:00
Poul-Henning Kamp
ebfbcd612a Rename timer0_max_count to i8254_max_count.
Rename timer0_real_max_count to i8254_real_max_count and make it static.
Rename timer_freq to i8254_freq and make it a loader tunable.
2008-03-26 15:03:24 +00:00
Poul-Henning Kamp
f168bfa529 The RTC related pscnt and psdiv variables have no business being public. 2008-03-26 13:25:27 +00:00
Jung-uk Kim
cb7d38abf2 Belatedly add BPF_JITTER in NOTES for supported architectures. 2008-03-24 22:23:22 +00:00
Peter Wemm
f001eabf3a First pass at (possibly futile) microoptimizing of cpu_switch. Results
are mixed.  Some pure context switch microbenchmarks show up to 29%
improvement.  Pipe based context switch microbenchmarks show up to 7%
improvement.  Real world tests are far less impressive as they are
dominated more by actual work than switch overheads, but depending on
the machine in question, workload, kernel options, phase of moon, etc, a
few percent gain might be seen.

Summary of changes:
- don't reload MSR_[FG]SBASE registers when context switching between
  non-threaded userland apps.  These typically cost 120 clock cycles each
  on an AMD cpu (less on Barcelona/Phenom).  Intel cores are probably no
  faster on this.
- The above change only helps unthreaded userland apps that tend to use
  the same value for gsbase.  Threaded apps will get no benefit from this.
- reorder things like accessing the pcb to be in memory order, to give
  prefetching a better chance of working.  Operations are now in increasing
  memory address order, rather than reverse or random.
- Push some lesser used code out of the main code paths.  Hopefully
  allowing better code density in cache lines.  This is probably futile.
- (part 2 of previous item) Reorder code so that branches have a more
  realistic static branch prediction hint.  Both Intel and AMD cpus
  default to predicting branches to lower memory addresses as being
  taken, and to higher memory addresses as not being taken.  This is
  overridden by the limited dynamic branch prediction subsystem.  A trip
  through userland might overflow this.
- Futule attempt at spreading the use of the results of previous operations
  in new operations.  Hopefully this will allow the cpus to execute in
  parallel better.
- stop wasting 16 bytes at the top of kernel stack, below the PCB.
- Never load the userland fs/gsbase registers for kthreads, but preserve
  curpcb->pcb_[fg]sbase as caches for the cpu. (Thanks Jeff!)

Microbenchmarking this code seems to be really sensitive to things like
scheduling luck, timing, cache behavior, tlb behavior, kernel options,
other random code changes, etc.

While it doesn't help heavy userland workloads much, it does help high
context switch loads a little, and should help those that involve
switching via kthreads a bit more.

A special thanks to Kris for the testing and reality checks, and Jeff for
tormenting me into doing this. :)

This is still work-in-progress.
2008-03-23 23:09:06 +00:00
Alan Cox
58680920e9 Correct an error in pmap_mincore() when applied to a 2MB page mapping:
Use PG_PS_FRAME, not PG_FRAME, to obtain the physical address of the
2MB physical page from the PDE.
2008-03-23 23:04:09 +00:00
Peter Wemm
22c0c6e9d3 Export TDP_KTHREAD to asm files. 2008-03-23 22:46:37 +00:00
Peter Wemm
6c73bb3557 Move pcb_flags to make trivially better use of cache lines. 2008-03-23 22:45:51 +00:00
Peter Wemm
3d60169ef4 Protect the setting of the fsbase/gsbase MSR registers and the
pcb_[fg]sbase values with a critical section, like the rest of the kernel.
2008-03-23 22:44:56 +00:00
Alan Cox
702006ff76 To date, we have assumed that the TLB will only set the PG_M bit in a
PTE if that PTE has the PG_RW bit set.  However, this assumption does
not hold on recent processors from Intel.  For example, consider a PTE
that has the PG_RW bit set but the PG_M bit clear.  Suppose this PTE
is cached in the TLB and later the PG_RW bit is cleared in the PTE,
but the corresponding TLB entry is not (yet) invalidated.
Historically, upon a write access using this (stale) TLB entry, the
TLB would observe that the PG_RW bit had been cleared and initiate a
page fault, aborting the setting of the PG_M bit in the PTE.  Now,
however, P4- and Core2-family processors will set the PG_M bit before
observing that the PG_RW bit is clear and initiating a page fault.  In
other words, the write does not occur but the PG_M bit is still set.

The real impact of this difference is not that great.  Specifically,
we should no longer assert that any PTE with the PG_M bit set must
also have the PG_RW bit set, and we should ignore the state of the
PG_M bit unless the PG_RW bit is set.  However, these changes enable
me to remove a work-around from pmap_promote_pde(), the superpage
promotion procedure.

(Note: The AMD processors that we have tested, including the latest,
the Phenom, still exhibit the historical behavior.)

Acknowledgments: After I observed the problem, Stephan (ups) was
instrumental in characterizing the exact behavior of Intel's recent
TLBs.

Tested by: Peter Holm
2008-03-23 20:38:01 +00:00
Konstantin Belousov
3f7905d29c Prevent the overflow in the calculation of the next page directory.
The overflow causes the wraparound with consequent corruption of the
(almost) whole address space mapping.

As Alan noted, pmap_copy() does not require the wrap-around checks
because it cannot be applied to the kernel's pmap. The checks there are
included for consistency.

Reported and tested by:	kris (i386/pmap.c:pmap_remove() part)
Reviewed by:	alc
MFC after:	1 week
2008-03-23 07:07:27 +00:00
John Baldwin
eb2b0540e5 Explicitly use spinlock_enter/exit rather than locking the icu_lock spin
lock in the 8259A drivers as these drivers are only used on UP systems.
This slightly reduces the penalty of an SMP kernel (such as GENERIC) on
a UP x86 machine.
2008-03-20 21:53:27 +00:00
John Baldwin
dcc8106854 Implement a BUS_BIND_INTR() method in the bus interface to bind an IRQ
resource to a CPU.  The default method is to pass the request up to the
parent similar to BUS_CONFIG_INTR() so that all busses don't have to
explicitly implement bus_bind_intr.  A bus_bind_intr(9) wrapper routine
similar to bus_setup/teardown_intr() is added for device drivers to use.
Unbinding an interrupt is done by binding it to NOCPU.  The IRQ resource
must be allocated, but it can happen in any order with respect to
bus_setup_intr().  Currently it is only supported on amd64 and i386 via
nexus(4) methods that simply call the intr_bind() routine.

Tested by:	gallatin
2008-03-20 21:24:32 +00:00
John Baldwin
6d2d1c044f Simplify the interrupt code a bit:
- Always include the ie_disable and ie_eoi methods in 'struct intr_event'
  and collapse down to one intr_event_create() routine.  The disable and
  eoi hooks simply aren't used currently in the !INTR_FILTER case.
- Expand 'disab' to 'disable' in a few places.
- Use function casts for arm and i386:intr_eoi_src() instead of wrapper
  routines since to trim one extra indirection.

Compiled on:	{arm,amd64,i386,ia64,ppc,sparc64} x {FILTER, !FILTER}
Tested on:	{amd64,i386} x {FILTER, !FILTER}
2008-03-17 22:42:01 +00:00
Pawel Jakub Dawidek
6eb4157ffc Implement atomic_fetchadd_long() for all architectures and document it.
Reviewed by:	attilio, jhb, jeff, kris (as a part of the uidinfo_waitfree.patch)
2008-03-16 21:20:50 +00:00
Roman Divacky
d8653dd986 Regen. 2008-03-16 16:29:37 +00:00
Roman Divacky
5dfb688191 Implement sched_setaffinity and get_setaffinity using
real cpu affinity setting primitives.

Reviewed by:	jeff
Approved by:	kib (mentor)
2008-03-16 16:27:44 +00:00
Robert Watson
237fdd787b In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation.  This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after:	1 month
Discussed with:	imp, rink
2008-03-16 10:58:09 +00:00
John Baldwin
eaf86d1678 Add preliminary support for binding interrupts to CPUs:
- Add a new intr_event method ie_assign_cpu() that is invoked when the MI
  code wishes to bind an interrupt source to an individual CPU.  The MD
  code may reject the binding with an error.  If an assign_cpu function
  is not provided, then the kernel assumes the platform does not support
  binding interrupts to CPUs and fails all requests to do so.
- Bind ithreads to CPUs on their next execution loop once an interrupt
  event is bound to a CPU.  Only shared ithreads are bound.  We currently
  leave private ithreads for drivers using filters + ithreads in the
  INTR_FILTER case unbound.
- A new intr_event_bind() routine is used to bind an interrupt event to
  a CPU.
- Implement binding on amd64 and i386 by way of the existing pic_assign_cpu
  PIC method.
- For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up
  an interrupt source and binds its interrupt event to the specified CPU.
  MI code can currently (ab)use this by doing:

	intr_bind(rman_get_start(irq_res), cpu);

  however, I plan to add a truly MI interface (probably a bus_bind_intr(9))
  where the implementation in the x86 nexus(4) driver would end up calling
  intr_bind() internally.

Requested by:	kmacy, gallatin, jeff
Tested on:	{amd64, i386} x {regular, INTR_FILTER}
2008-03-14 19:41:48 +00:00
John Baldwin
c9107e85d9 Fix a silly bogon which prevented all the CPUs that are tagged as interrupt
receivers from being given interrupts if any CPUs in the system were not
tagged as interrupt receivers that I introduced when switching the x86
interrupt code to track CPUs via FreeBSD CPU IDs rather than local APIC
IDs.  In practice this only affects systems with Hyperthreading (though
disabling HTT in the BIOS would workaround the issue) as that is the only
case currently where one can have CPUs that aren't tagged as interrupt
receivers.  On a Dell SC1425 test box with 2 x Xeon w/ HTT (so 4 logical
CPUs of which 2 were interrupt receivers) the result was that all
device interrupts were sent to CPU 0.

MFC after:	1 week
Pointy hat to:	jhb
2008-03-14 03:44:42 +00:00
John Baldwin
5217af301c Rework how the nexus(4) device works on x86 to better handle the idea of
different "platforms" on x86 machines.  The existing code already handles
having two platforms: ACPI and legacy.  However, the existing approach was
rather hardcoded and difficult to extend.  These changes take the approach
that each x86 hardware platform should provide its own nexus(4) driver (it
can inherit most of its behavior from the default legacy nexus(4) driver)
which is responsible for probing for the platform and performing
appropriate platform-specific setup during attach (such as adding a
platform-specific bus device).  This does mean changing the x86 platform
busses to no longer use an identify routine for probing, but to move that
logic into their matching nexus(4) driver instead.
- Make the default nexus(4) driver in nexus.c on i386 and amd64 handle the
  legacy platform.  It's probe routine now returns BUS_PROBE_GENERIC so it
  can be overriden.
- Expose a nexus_init_resources() routine which initializes the various
  resource managers so that subclassed nexus(4) drivers can invoke it from
  their attach routine.
- The legacy nexus(4) driver explicitly adds a legacy0 device in its
  attach routine.
- The ACPI driver no longer contains an new-bus identify method.  Instead
  it exposes a public function (acpi_identify()) which is a probe routine
  that the MD nexus(4) drivers can use to probe for ACPI.  All of the
  probe logic in acpi_probe() is now moved into acpi_identify() and
  acpi_probe() is just a stub.
- On i386 and amd64, an ACPI-specific nexus(4) driver checks for ACPI via
  acpi_identify() and claims the nexus0 device if the probe succeeds.  It
  then explicitly adds an acpi0 device in its attach routine.
- The legacy(4) driver no longer knows anything about the acpi0 device.
- On ia64 if acpi_identify() fails you basically end up with no devices.
  This matches the previous behavior where the old acpi_identify() would
  fail to add an acpi0 device again leaving you with no devices.

Discussed with:	imp
Silence on:	arch@
2008-03-13 20:39:04 +00:00
Konstantin Belousov
22eca0bf45 Since version 4.3, gcc changed its behaviour concerning the i386/amd64
ABI and the direction flag, that is it now assumes that the direction
flag is cleared at the entry of a function and it doesn't clear once
more if needed. This new behaviour conforms to the i386/amd64 ABI.

Modify the signal handler frame setup code to clear the DF {e,r}flags
bit on the amd64/i386 for the signal handlers.

jhb@ noted that it might break old apps if they assumed DF == 1 would be
preserved in the signal handlers, but that such apps should be rare and
that older versions of gcc would not generate such apps.

Submitted by:	Aurelien Jarno <aurelien aurel32 net>
PR:	121422
Reviewed by:	jhb
MFC after:	2 weeks
2008-03-13 10:54:38 +00:00
John Baldwin
391664b110 The variable MTRR registers actually have variable-sized PhysBase and
PhysMask fields based on the number of physical address bits supported
by the current CPU.  The old code assumed 36 bits on i386 and 40 bits on
amd64.  In truth, all Intel CPUs up until recently used 36 bits (a newer
Intel CPU uses 38 bits) and all the Opteron CPUs used 40 bits.

In at least one case (the new Intel CPU) having the size of the mask field
wrong resulted in writing questionable values into the MTRR registers on
the application processors (BSP as well if you modify the MTRRs via
memcontrol or running X, etc.).  The result of the questionable physmask
was that all of memory was apparently treated as uncached rather than
write-back resulting in a very significant performance hit.

Fix this by constructing a run-time mask for the PhysBase and PhysMask
fields based on the number of physical address bits supported by the CPU.
All 64-bit capable CPUs provide a count of PA bits supported via the
0x80000008 extended CPUID feature, so use that if it is available.  If that
feature is not available, then assume 36 PA bits.

While I'm here, expand the (now-unused) macros for the PhysBase and
PhysMask fields to the current largest possible value (52 PA bits).

MFC after:	1 week
PR:		i386/120516
Reported by:	Nokia
2008-03-12 22:09:19 +00:00
John Baldwin
f15a9cd288 Minimize diffs with i686_mem.c:
- A few whitespace changes I missed in the style(9) changes.
- Move M_MEMDESC to mem.c.
2008-03-12 21:43:50 +00:00
Jeff Roberson
6617724c5f Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential.  Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.
2008-03-12 10:12:01 +00:00