Commit Graph

5613 Commits

Author SHA1 Message Date
jkim
1e8881cf54 VIA Nano processor has a special MSR (CENT_HARDWARECTRL3) bit 32 to determine
whether TSC is P-state invariant or not.  In fact, this MSR is writable but
we just leave it at the BIOS default for now.
2009-01-22 21:04:46 +00:00
kib
a23ffa0635 The context switch to the 32bit binary does not properly restore
the fsbase value. The switch loads the fs segment register, that
invalidates the value in fsbase msr, thus value in %r9 can not be
considered the current value for fsbase anymore.

Unconditionally reload fsbase when switching to 32bit binary.

PR:	130526
MFC after:	3 weeks
2009-01-20 12:07:49 +00:00
sobomax
2f4bdc6a7f Take NTFS option out to match i386 GENERIC.
Suggested by:	phk, luigi
2009-01-19 15:33:06 +00:00
sobomax
a8c0e8bf46 asr(4) is not amd64-clean, not amr(4).
Pointy hat to:	myself
Submitted by:	scottl
2009-01-19 08:51:20 +00:00
sobomax
892160aa9e Comment amr(4) out - according to scottl it's not 64-bit clean. 2009-01-19 08:25:41 +00:00
sobomax
ba1b4a9892 Whitespace-only: reduce diff to the i386 GENERIC. 2009-01-19 07:18:32 +00:00
sobomax
bde2deb986 Add asr(4) and stge(4) from i386 GENERIC. Both drivers compile on amd64 and
there is no particular reason for them to be i386-only.

MFC after:	2 weeks
2009-01-19 07:10:11 +00:00
kib
c2da459b12 Disable interrupts, if they were enabled, before doing swapgs.
Otherwise, interrupt may happen while we run with kernel CS and usermode
gsbase.

Reviewed by:	jeff
MFC after:	1 week
2009-01-14 14:20:08 +00:00
thompsa
022a9f5ac2 MFp4: //depot/projects/usb@155990
Add USB scanner support to USB2 config files.

Submitted by: Hans Petter Selasky
2009-01-13 19:05:10 +00:00
luigi
421fd89a07 Documentation-only change:
- add a reference to the config(5) manpage;
- hopefully clarify the format of the 'env FILENAME' directive.

I am putting these notes in sys/${arch}/conf/GENERIC and not
in sys/conf/NOTES because:

1. i386/GENERIC already had reference to a similar option (hints..)
   and to documentation (handbook)

2. GENERIC is what most users look at when they have to modify or
   create a new kernel config, so having the suggestion there is
   more effective.

I am only touching i386 and amd64 because the other GENERIC files
are already out of sync, and I am not sure what is the overall plan.

MFC after:	3 days
2009-01-13 12:35:33 +00:00
jkim
7cdff4176a Add basic amd64 support for VIA Nano processors. 2009-01-12 19:17:35 +00:00
jkim
657301ea43 Add Centaur/IDT/VIA vendor ID for Nano family, which has long mode support. 2009-01-05 21:51:49 +00:00
rwatson
a5fb43e4cd Add commented out options KDTRACE_HOOKS and, for amd64, KDRACE_FRAME,
to GENERIC configuration files.  This brings what's in 8.x in sync
with what is in 7.x, but does not change any current defaults.

Possibly they should now be enabled in head by default?
2009-01-05 14:21:49 +00:00
rpaulo
114967dedd Disable USB bluetooth (needs netgraph built in) and USB audio (doesn't
compile).
2008-12-30 20:13:20 +00:00
rpaulo
72636ca358 Add a kernel config file so that users have less difficulty testing
USBng.

If it makes sense, it could be done for arm/mips too.
2008-12-30 19:46:06 +00:00
marcel
d654ea043b Make gpart the default partitioning class on all platforms.
Both ia64 and powerpc were using gpart exclusively already
so there's no change for those two.

Discussed on: arch@
2008-12-17 17:43:22 +00:00
imp
39a3668dcc AT_DEBUG and AT_BRK were OBE like 10 years ago, so retire them.
Reviewed by:	peter
2008-12-17 06:56:58 +00:00
imp
e4a424be30 Remove obsolete AT_DEBUG stuff. It never should have been committed
in the first place, let alone migrated to linux emulation.

Reviewed by:	peter, rdivacky
2008-12-17 06:11:42 +00:00
jkoshy
62d6b4ab63 Bug fix: %ebx needs to be preserved in the user callchain capture
path.
2008-12-14 09:06:28 +00:00
jkoshy
57939c399f - Bug fix: prevent a thread from migrating between CPUs between the
time it is marked for user space callchain capture in the NMI
  handler and the time the callchain capture callback runs.

- Improve code and control flow clarity by invoking hwpmc(4)'s user
  space callchain capture callback directly from low-level code.

Reviewed by:	jhb (kern/subr_trap.c)
Testing (various patch revisions): gnn,
		Fabien Thomas <fabien dot thomas at netasq dot com>,
		Artem Belevich <artemb at gmail dot com>
2008-12-13 13:07:12 +00:00
jkim
424471c961 Add more CPUID bits from AMD CPUID Specification Rev. 2.28. 2008-12-12 23:17:00 +00:00
jkoshy
e2774cc573 Expose symbol `PMC_FN_USER_CALLCHAIN' to assembler code. 2008-12-12 16:09:34 +00:00
jhb
160f4765eb Add constants for fields in the local APIC error status register and a
routine to read it.
2008-12-11 15:56:30 +00:00
alc
071a9996a4 Change the default value for the flag enabling superpage mapping and
promotion to "on".

Reminded by:	jhb
Tested by:	kris
2008-12-06 19:37:52 +00:00
kib
d57e5b9d8b Improve db_backtrace() for compat ia32 on amd64. 32bit image enters
the kernel via Xint0x80_syscall().

Submitted by:	dchagin
MFC after:	1 week
2008-12-05 11:34:36 +00:00
ed
9286c815e8 Remove "[KEEP THIS!]" from COMPAT_43TTY. It's not really that important.
Sgtty is a programming interface that has been replaced by termios over
the years. In June we already removed <sgtty.h>, which exposes the
ioctl()'s that are implemented by this interface. The importance of this
flag is overrated right now.
2008-12-02 19:09:08 +00:00
ganbold
a6ba0e1fed Remove unused variable.
Found with:     Coverity Prevent(tm)
CID: 3685

Approved by: jhb
2008-12-02 14:19:53 +00:00
sam
3693ee3c32 Switch to ath hal source code. Note this removes the ath_hal
module; the ath module now brings in the hal support.  Kernel
config files are almost backwards compatible; supplying

device ath_hal

gives you the same chip support that the binary hal did but you
must also include

options AH_SUPPORT_AR5416

to enable the extended format descriptors used by 11n parts.
It is now possible to control the chip support included in a
build by specifying exactly which chips are to be supported
in the config file; consult ath_hal(4) for information.
2008-12-01 16:53:01 +00:00
kensmith
656329bc07 Adjustments to make a tags file a bit more suitable to amd64.
Reviewed by:	peter
2008-12-01 14:15:10 +00:00
mav
750de65ae4 According to "Intel 64 and IA-32 Architectures Software Developer's Manual
Volume 3B: System Programming Guide, Part 2", CPUs with family 0x6 and model
above or 0xE and CPUs with family 0xF and model above or 0x3 have invariant
TSC.
2008-11-30 00:10:55 +00:00
kib
8ffb383318 Make linux_sendmsg() and linux_recvmsg() work on linux32/amd64.
Change types used in the linux' struct msghdr and struct cmsghdr
definitions to the properly-sized architecture-specific types.
Move ancillary data handler from linux_sendit() to linux_sendmsg().

Submitted by:	dchagin
2008-11-29 17:14:06 +00:00
kib
1500739dba Regenerate 2008-11-29 14:57:58 +00:00
kib
bfe2c8cb22 Fix iovec32 for linux32/amd64.
Add a custom version of copyiniov() to deal with the 32-bit iovec
pointers from userland (to be used later).

Adjust prototypes for linux_readv() and linux_writev() to use new
l_iovec32 definition and to match actual linux code. In particular,
use ulong for fd (why ?).

Submitted by:	dchagin
2008-11-29 14:55:24 +00:00
jkoshy
aa86a7c59e - Add support for PMCs in Intel CPUs of Family 6, model 0xE (Core Solo
and Core Duo), models 0xF (Core2), model 0x17 (Core2Extreme) and
  model 0x1C (Atom).

  In these CPUs, the actual numbers, kinds and widths of PMCs present
  need to queried at run time.  Support for specific "architectural"
  events also needs to be queried at run time.

  Model 0xE CPUs support programmable PMCs, subsequent CPUs
  additionally support "fixed-function" counters.

- Use event names that are close to vendor documentation, taking in
  account that:
  - events with identical semantics on two or more CPUs in this family
    can have differing names in vendor documentation,
  - identical vendor event names may map to differing events across
    CPUs,
  - each type of CPU supports a different subset of measurable
    events.

  Fixed-function and programmable counters both use the same vendor
  names for events.  The use of a class name prefix ("iaf-" or
  "iap-" respectively) permits these to be distinguished.

- In libpmc, refactor pmc_name_of_event() into a public interface
  and an internal helper function, for use by log handling code.

- Minor code tweaks: staticize a global, freshen a few comments.

Tested by:	gnn
2008-11-27 09:00:47 +00:00
jkim
8bceebfbc2 Use newly introduced cpu_vendor_id to make invariant TSC detection more
clearer and merge r185295 to amd64.
2008-11-26 19:29:33 +00:00
jkim
d6a4501391 Introduce cpu_vendor_id and replace a lot of strcmp(cpu_vendor, "...").
Reviewed by:	jhb, peter (early amd64 version)
2008-11-26 19:25:13 +00:00
dfr
19b6af98ec Clone Kip's Xen on stable/6 tree so that I can work on improving FreeBSD/amd64
performance in Xen's HVM mode.
2008-11-22 16:14:52 +00:00
kib
8fad2283b3 Add sv_flags field to struct sysentvec with intention to provide description
of the ABI of the currently executing image. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures to determine ABI features.

Discussed with:	dchagin, imp, jhb, peter
2008-11-22 12:36:15 +00:00
kmacy
9d3bb599b1 - bump __FreeBSD version to reflect added buf_ring, memory barriers,
and ifnet functions

- add memory barriers to <machine/atomic.h>
- update drivers to only conditionally define their own

- add lockless producer / consumer ring buffer
- remove ring buffer implementation from cxgb and update its callers

- add if_transmit(struct ifnet *ifp, struct mbuf *m) to ifnet to
  allow drivers to efficiently manage multiple hardware queues
  (i.e. not serialize all packets through one ifq)
- expose if_qflush to allow drivers to flush any driver managed queues

This work was supported by Bitgravity Inc. and Chelsio Inc.
2008-11-22 05:55:56 +00:00
kib
f5d16a4d66 In the robust futexes list head, futex_offset shall be signed,
and glibc actually supplies negative offsets. Change l_ulong to l_long.

Submitted by:	dchagin
2008-11-16 15:45:41 +00:00
yongari
8fe107d730 Add ale(4), a driver for Atheros AR8121/AR8113/AR8114 PCIe ethernet
controller. The controller is also known as L1E(AR8121) and
L2E(AR8113/AR8114). Unlike its predecessor Attansic L1,
AR8121/AR8113/AR8114 uses completely different Rx logic such that
it requires separate driver. Datasheet for AR81xx is not available
to open source driver writers but it shares large part of Tx and
PHY logic of L1. I still don't understand some part of register
meaning and some MAC statistics counters but the driver seems to
have no critical issues for performance and stability.

The AR81xx requires copy operation to pass received frames to upper
stack such that ale(4) consumes a lot of CPU cycles than that of
other controller. A couple of silicon bugs also adds more CPU
cycles to address the known hardware bug. However, if you have fast
CPU you can still saturate the link.
Currently ale(4) supports the following hardware features.
  - MSI.
  - TCP Segmentation offload.
  - Hardware VLAN tag insertion/stripping with checksum offload.
  - Tx TCP/UDP checksum offload and Rx IP/TCP/UDP checksum offload.
  - Tx/Rx interrupt moderation.
  - Hardware statistics counters.
  - Jumbo frame.
  - WOL.

AR81xx PCIe ethernet controllers are mainly found on ASUS EeePC or
P5Q series of ASUS motherboards. Special thanks to Jeremy Chadwick
who sent the hardware to me. Without his donation writing a driver
for AR81xx would never have been possible. Big thanks to all people
who reported feedback or tested patches.

HW donated by:	koitsu
Tested by:	bsam, Joao Barros <joao.barros <> gmail DOT com >
		Jan Henrik Sylvester <me <> janh DOT de >
		Ivan Brawley < ivan <> brawley DOT id DOT au >,
		CURRENT ML
2008-11-12 09:52:06 +00:00
ed
8d12469978 Several cleanups related to pipe(2).
- Use `fildes[2]' instead of `*fildes' to make more clear that pipe(2)
  fills an array with two descriptors.

- Remove EFAULT from the manual page. Because of the current calling
  convention, pipe(2) raises a segmentation fault when an invalid
  address is passed.

- Introduce kern_pipe() to make it easier for binary emulations to
  implement pipe(2).

- Make Linux binary emulation use kern_pipe(), which means we don't have
  to recover td_retval after calling the FreeBSD system call.

Approved by:	rdivacky
Discussed on:	arch
2008-11-11 14:55:59 +00:00
jkoshy
fdb59f927e - Separate PMC class dependent code from other kinds of machine
dependencies.  A 'struct pmc_classdep' structure describes operations
  on PMCs; 'struct pmc_mdep' contains one or more 'struct pmc_classdep'
  structures depending on the CPU in question.

  Inside PMC class dependent code, row indices are relative to the
  PMCs supported by the PMC class; MI code in "hwpmc_mod.c" translates
  global row indices before invoking class dependent operations.

- Augment the OP_GETCPUINFO request with the number of PMCs present
  in a PMC class.

- Move code common to Intel CPUs to file "hwpmc_intel.c".

- Move TSC handling to file "hwpmc_tsc.c".
2008-11-09 17:37:54 +00:00
ed
7baae41248 Regenerate system call tables for r184789. 2008-11-09 10:48:06 +00:00
ed
9d3703b842 Mark uname(), getdomainname() and setdomainname() with COMPAT_FREEBSD4.
Looking at our source code history, it seems the uname(),
getdomainname() and setdomainname() system calls got deprecated
somewhere after FreeBSD 1.1, but they have never been phased out
properly. Because we don't have a COMPAT_FREEBSD1, just use
COMPAT_FREEBSD4.

Also fix the Linuxolator to build without the setdomainname() routine by
just making it call userland_sysctl on kern.domainname. Also replace the
setdomainname()'s implementation to use this approach, because we're
duplicating code with sysctl_domainname().

I wasn't able to keep these three routines working in our
COMPAT_FREEBSD32, because that would require yet another keyword for
syscalls.master (COMPAT4+NOPROTO). Because this routine is probably
unused already, this won't be a problem in practice. If it turns out to
be a problem, we'll just restore this functionality.

Reviewed by:	rdivacky, kib
2008-11-09 10:45:13 +00:00
kib
37e83ace48 Revert r184136. Instead, push the check for crashdumpmap overflow into the
MD i386 and amd64 dump code.

Requested by:	jhb
Retested by:	pho
MFC after:	3 days (+ 176304 + 184136)
2008-10-31 10:11:35 +00:00
sobomax
c68ec05f2d Fix r184323 - set stathz to be the same as lapic_timer_hz when lapic_timer_hz
is less than 128. Remove extra {} to match existing style.
2008-10-27 21:45:18 +00:00
sobomax
c1e5c07e0b Fix division by zero panic if kern.hz less than 32.
MFC after:	1 day
2008-10-26 18:58:04 +00:00
jkim
4a27fbf6ee Simplify AMD64_CPU_MODEL() and AMD64_CPU_FAMILY() macros as the base family
should be at least 0xf00 for all supported platforms.
2008-10-22 17:36:52 +00:00
jkim
a62bf181b5 Add AMD Family 0Fh, Model 6Bh, Stepping 2 to the list of invariant TSCs
and fix i386 test.
2008-10-22 17:30:37 +00:00
jkim
d0d7e3dcb3 Set kern.timecounter.invariant_tsc to 1 for AMD CPU family 10h and higher
even if BIOS does not advertise it.
2008-10-22 00:01:53 +00:00
jkim
0b6f0646df Turn off CPU frequency change notifiers when the TSC is P-state invariant
or it is forced by setting 'kern.timecounter.invariant_tsc' tunable
to non-zero.
2008-10-21 00:38:00 +00:00
jkim
d4d906ba86 Detect Advanced Power Management Information for AMD CPUs. 2008-10-21 00:17:55 +00:00
kib
29ccf7d166 Correctly fill siginfo for the signals delivered by linux tkill/tgkill.
It is required for async cancellation to work.

Fix PROC_LOCK leak in linux_tgkill when signal delivery attempt is made
to not linux process.

Do not call em_find(p, ...) with p unlocked.

Move common code for linux_tkill() and linux_tgkill() into
linux_do_tkill().

Change linux siginfo_t definition to match actual linux one. Extend
uid fields to 4 bytes from 2. The extension does not change structure
layout and is binary compatible with previous definition, because i386
is little endian, and each uid field has 2 byte padding after it.

Reported by:	Nicolas Joly <njoly pasteur fr>
Submitted by:	dchangin
MFC after:	1 month
2008-10-19 10:02:26 +00:00
kib
98d25e143a Set PCB_32BIT and clear PCB_GS32BIT for linux32 binaries.
Tested by:	dchagin
MFC after:	3 days
2008-10-18 13:39:22 +00:00
kib
faae1c0f2f Make robust futexes work on linux32/amd64. Use PTRIN to read
user-mode pointers. Change types used in the structures definitions to
properly-sized architecture-specific types.

Submitted by:	dchagin
MFC after:	1 week
2008-10-14 07:59:23 +00:00
davidxu
8a22cb4e57 If the current thread has the trap bit set (i.e. a debugger had
single stepped the process to the system call), we need to clear
the trap flag from the new frame. Otherwise, the new thread will
receive a (likely unexpected) SIGTRAP when it executes the first
instruction after returning to userland.
2008-10-05 02:03:54 +00:00
stas
2a2b3ea928 - Add driver for Attansic L2 FastEthernet controller found on
Asus EeePC and some Asus mainboards.

Reviewed by:	yongari, rpaulo, jhb
Tested by:	many
Approved by:	kib (mentor)
MFC after:	1 week
2008-10-03 10:31:31 +00:00
peter
ed8d07f232 Collect N identical (or near identical) mkdumpheader() implementations into
one, as threatened in the comment.  Textdump magic can be passed in.
2008-10-01 22:08:53 +00:00
jhb
593ef76d8c Bump MAXCPU to 32 now that 32 CPU x86 systems exist.
Tested by:	rwatson, mdtansca
Approved by:	peter
2008-10-01 21:59:04 +00:00
marius
a1ec700ce8 Remove ipi_all() and ipi_self() as the former hasn't been used at
all to date and the latter also is only used in ia64 and powerpc
code which no longer serves a real purpose after bring-up and just
can be removed as well. Note that architectures like sun4u also
provide no means of implementing IPI'ing a CPU itself natively
in the first place.

Suggested by:	jhb
Reviewed by:	arch, grehan, jhb
2008-09-28 18:34:14 +00:00
ed
4efdef565f Replace all calls to minor() with dev2unit().
After I removed all the unit2minor()/minor2unit() calls from the kernel
yesterday, I realised calling minor() everywhere is quite confusing.
Character devices now only have the ability to store a unit number, not
a minor number. Remove the confusion by using dev2unit() everywhere.

This commit could also be considered as a bug fix. A lot of drivers call
minor(), while they should actually be calling dev2unit(). In -CURRENT
this isn't a problem, but it turns out we never had any problem reports
related to that issue in the past. I suspect not many people connect
more than 256 pieces of the same hardware.

Reviewed by:	kib
2008-09-27 08:51:18 +00:00
kib
c500808674 Change the static struct sysentvec and struct Elf_Brandinfo initializers
to the C99 style. At least, it is easier to read sysent definitions
that way, and search for the actual instances of sigcode etc.

Explicitely initialize sysentvec.sv_maxssiz that was missed in most
sysvecs.

No objection from:	jhb
MFC after:	1 month
2008-09-24 10:14:37 +00:00
jhb
7dfce995e4 MFC: Update comments about the 0xcf9 register reset method.
Approved by:	re (kensmith, kib)
2008-09-18 20:20:28 +00:00
stas
f6aef556ae - Recognize SAVE and OSXSAVE extended processor features.
Approved by:	kib (mentor)
MFC after:	1 month
2008-09-18 18:51:32 +00:00
jkoshy
a9cbfb55cd Correct a callchain capture bug on the i386.
On the i386 architecture, the processor only saves the current value
of `%esp' on stack if a privilege switch is necessary when entering
the interrupt handler.   Thus, `frame->tf_esp' is only valid for
an entry from user mode.  For interrupts taken in kernel mode, we
need to determine the top-of-stack for the interrupted kernel
procedure by adding the appropriate offset to the current frame
pointer.

Reported by:	kris, Fabien Thomas
Tested by:	Fabien Thomas <fabien.thomas at netasq dot com>
2008-09-15 06:47:52 +00:00
jhb
0ae31bdab7 MFC: Adjust the handling of the various timer frequencies when using the
lapic timer to use dynamic divisors and try to keep stathz around 128.

Approved by:	re (kib)
2008-09-12 21:41:21 +00:00
jhb
1d4a561a28 Add a 'hw.pci.mcfg' tunable. It can be set to 0 to disable memory-mapped
PCI config access.
2008-09-11 21:42:11 +00:00
jhb
8470014e06 Update the comments above the 0xcf9 register reset attempt to match the
code.  We only attempt a single reset using this method (a "hard" reset),
and we use two writes to ensure there is a 0 -> 1 transition in bit 2 to
force a reset.

MFC after:	1 week
2008-09-11 18:33:57 +00:00
jhb
ddef310111 Some K8 chipsets don't expose all of the PCI devices on bus 0 via PCIe
memory-mapped config access.  Add a workaround for these systems by
checking the first function of each slot on bus 0 using both the
memory-mapped config access and the older type 1 I/O port config access.
If we find a slot that is only visible via the type 1 I/O port config
access, we flag that slot.  Future PCI config transactions to flagged
slots on bus 0 use type 1 I/O port config access rather than memory mapped
config access.
2008-09-10 18:06:08 +00:00
kib
f444f744fa The pcb_gs32p should be per-cpu, not per-thread pointer. This is
location in GDT where the segment descriptor from pcb_gs32sd is
copied, and the location is in GDT local to CPU.

Noted and reviewed by:	peter
MFC after:	1 week
2008-09-08 09:59:05 +00:00
kib
73306a5435 Provide private per-CPU GDTs on amd64. This is required at least for the
linux CB_GS32BIT to work.

Noted by:	nox
Reviewed by:	peter
MFC after:	1 week
2008-09-08 09:55:51 +00:00
kib
396f47bf91 In linux_set_thread_area(), mark pcb as PCB_GS32BIT. This was missed
when r180992 was committed.

Reviewed by:	peter
MFC after:	1 week
2008-09-08 09:09:23 +00:00
kib
c593a271c4 Fix inconsistencies in the comments.
MFC after:	1 week
2008-09-08 08:58:29 +00:00
kib
a568a3185e Segment registers are stored in the uc_mcontext member of the struct
l_ucontext. To restore the registers content, trampoline needs to
dereference uc_mcontext instead of taking some undefined values from
l_ucontext.

Submitted by:	Dmitry Chagin <dchagin@>
MFC after:	1 week
2008-09-07 16:39:21 +00:00
simon
f095193224 - Fix amd64 local privilege escalation. [08:07]
- Fix nmount(2) local privilege escalation. [08:08]
- Fix IPv6 remote kernel panics. [08:09]

Fix for [08:07] is merge of r181823.

Submitted by:	kib [08:07], csjp [08:08], bz [08:09]
Reviewed by:	peter [08:07], jhb [08:07]
Reviewed by:	jinmei [08:09], rwatson [08:09]
Approved by:	re (SA blanket)
Approved by:	so (simon)
Security:	FreeBSD-SA-08:07.amd64
Security:	FreeBSD-SA-08:08.nmount
Security:	FreeBSD-SA-08:09.icmp6
2008-09-03 19:09:47 +00:00
kib
59a00054ac - When executing FreeBSD/amd64 binaries from FreeBSD/i386 or Linux/i386
processes, clear PCB_32BIT and PCB_GS32BIT bits [1].

- Reread the fs and gs bases from the msr unconditionally, not believing
  the values in pcb_fsbase and pcb_gsbase, since usermode may reload
  segment registers, invalidating the cache. [2].

Both problems resulted in the wrong fs base, causing wrong tls pointer
be dereferenced in the usermode.

Reported and tested by:	Vyacheslav Bocharov <adeepv at gmail com> [1]
Reported by:	Bernd Walter <ticsoat cicely7 cicely de>,
	Artem Belevich <fbsdlist at src cx>[2]
Reviewed by:	peter
MFC after:	3 days
2008-09-02 17:52:11 +00:00
jkim
e41f677c9f Move empty filter handling to MI source.
MFC after:	3 days
2008-08-26 21:06:31 +00:00
jkim
e21d933237 Fix a typo in copyrights. 2008-08-25 20:43:13 +00:00
jhb
346004ece8 Adjust the handling the various timer frequencies when using the lapic
timer.  Previously, the various divisors were fixed which meant that while
it gave somewhat reasonable stathz, etc. at hz=1000, it went off the rails
with any other hz value.  With these changes, we now pick a lapic timer hz
based on the value of hz.  If hz is >= 1500, then the lapic timer runs at
hz.  If 1500 hz >= 750, we run the lapic timer at hz * 2.  If hz < 750, we
run at hz * 4.  We compute a divider at runtime to make stathz run as close
to 128 as we can since stathz really wants to be run at something close to
that frequency.  Profiling just runs on every clock tick.  So some examples:

With hz = 100, the lapic timer now runs at 400 instead of 2000.  stathz
will be 133, and profhz = 400.  With hz = 1000 (default), the lapic timer
is still at 2000 (as it is now), stathz is at 133 (as it is now), and
profhz will be 2000 (previously 666).

MFC after:	2 weeks
2008-08-23 12:35:43 +00:00
jhb
2a48176eba Extend the support for PCI-e memory mapped configuration space access:
- Rename pciereg_cfgopen() to pcie_cfgregopen() and expose it to the
  rest of the kernel.  It now also accepts parameters via function
  arguments rather than global variables.
- Add a notion of minimum and maximum bus numbers and reject requests for
  an out of range bus.
- Add more range checks on slot/func/reg/bytes parameters to the cfg reg
  read/write routines.  Don't panic on any invalid parameters, just fail
  the request (writes do nothing, reads return -1).  This matches the
  behavior of the other cfg mechanisms.
- Port the memory mapped configuration space access to amd64.  On amd64
  we simply use the direct map (via pmap_mapdev()) for the memory mapped
  window.
- During acpi_attach() just after loading the ACPI tables, check for a
  MCFG table.  If it exists, call pciereg_cfgopen() on each subtable
  (memory mapped window).  For now we only support windows for domain 0
  that start with bus 0.  This removes the need for more chipset-specific
  quirks in the MD code.
- Remove the chipset-specific quirks for the Intel 5000P/V/Z chipsets
  since these machines should all have MCFG tables via ACPI.
- Updated pci_cfgregopen() to DTRT if ACPI had invoked pcie_cfgregopen()
  earlier.

MFC after:	2 weeks
2008-08-22 02:14:23 +00:00
jhb
cb1f3b0dc8 MFC: Decode "exotic" instructions such as pause as well as "cmov*" on i386. 2008-08-20 18:01:59 +00:00
ed
cc3116a938 Integrate the new MPSAFE TTY layer to the FreeBSD operating system.
The last half year I've been working on a replacement TTY layer for the
FreeBSD kernel. The new TTY layer was designed to improve the following:

- Improved driver model:

  The old TTY layer has a driver model that is not abstract enough to
  make it friendly to use. A good example is the output path, where the
  device drivers directly access the output buffers. This means that an
  in-kernel PPP implementation must always convert network buffers into
  TTY buffers.

  If a PPP implementation would be built on top of the new TTY layer
  (still needs a hooks layer, though), it would allow the PPP
  implementation to directly hand the data to the TTY driver.

- Improved hotplugging:

  With the old TTY layer, it isn't entirely safe to destroy TTY's from
  the system. This implementation has a two-step destructing design,
  where the driver first abandons the TTY. After all threads have left
  the TTY, the TTY layer calls a routine in the driver, which can be
  used to free resources (unit numbers, etc).

  The pts(4) driver also implements this feature, which means
  posix_openpt() will now return PTY's that are created on the fly.

- Improved performance:

  One of the major improvements is the per-TTY mutex, which is expected
  to improve scalability when compared to the old Giant locking.
  Another change is the unbuffered copying to userspace, which is both
  used on TTY device nodes and PTY masters.

Upgrading should be quite straightforward. Unlike previous versions,
existing kernel configuration files do not need to be changed, except
when they reference device drivers that are listed in UPDATING.

Obtained from:		//depot/projects/mpsafetty/...
Approved by:		philip (ex-mentor)
Discussed:		on the lists, at BSDCan, at the DevSummit
Sponsored by:		Snow B.V., the Netherlands
dcons(4) fixed by:	kan
2008-08-20 08:31:58 +00:00
jhb
d90774443d Export 'struct pcpu' to userland w/o requiring _KERNEL. A few ports
already define _KERNEL to get to this and I'm about to add hooks to
libkvm to access per-CPU data.

MFC after:	1 week
2008-08-19 19:53:52 +00:00
jkim
9847f32c4e Correctly check unsignedness of all BPF_LD|BPF_IND instructions.
This is roughly from sys/net/bpf_filter.c r1.12 and r1.14.
2008-08-18 19:14:26 +00:00
jkim
137ba6a238 - Make these files compilable on user land.
- Update copyrights and fix style(9).
2008-08-18 18:59:33 +00:00
kib
0d74400a62 The doreti_iret_fault code is always called with gs base MSR containing
kernel gs base, because %rip is adjusted only on kernel-mode trap caused
by iretq execution. On the other hand, the stack contains (hardware
part of) trap frame from the usermode. As a consequence, checking for
frame mode and doing swapgs causes the kernel to enter trap() with
usermode gs base.

Remove the check for mode and conditional swapgs, we already have right
gs base in the MSR.

Submitted by:	Nate Eldredge <neldredge math ucsd edu>
MFC after:	3 days
2008-08-18 08:47:27 +00:00
bz
1021d43b56 Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from:	//depot/projects/vimage-commit2/...
Reviewed by:	brooks, des, ed, mav, julian,
		jamie, kris, rwatson, zec, ...
		(various people I forgot, different versions)
		md5 (with a bit of help)
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
X-MFC after:	never
V_Commit_Message_Reviewed_By:	more people than the patch
2008-08-17 23:27:27 +00:00
jkim
4385fbc2f6 Use int32_t/int16_t instead of int/short as sys/net/bpf_filter.c does. 2008-08-13 19:52:00 +00:00
jkim
39084a201d - Remove unnecessary jump instruction(s) when offset(s) is/are zero(s).
- Constantly use conditional jumps for unsigned integers.
2008-08-13 19:25:09 +00:00
jkim
ad7ec74ae4 Update copyrights and fix style(9). 2008-08-12 21:31:31 +00:00
jkim
5bf34e3e87 Replace all stack usages with registers and remove unused macros. 2008-08-12 20:10:45 +00:00
jhb
7109016dfe Decode some more "exotic" instructions including: fxsave, fxrstor, ldmxcsr,
stmxcsr, clflush, lfence, mfence, sfence, syscall, sysret, sysenter,
sysexit, pause, monitor, mwait, and swapgs (amd64 only).

MFC after:	1 week
2008-08-11 20:19:42 +00:00
alc
d53364aaab Intel describes the behavior of their processors as "undefined" if two or
more mappings to the same physical page have different memory types, i.e.,
PAT settings.  Consequently, if pmap_change_attr() is applied to a virtual
address range within the kernel map, then the corresponding ranges of the
direct map also need to be changed.  Enhance pmap_change_attr() to handle
this case automatically.

Add a comment describing what pmap_change_attr() does.

Discussed with:	jhb
2008-08-09 05:46:13 +00:00
stas
a782fc10fe - Add cpuctl(4) pseudo-device driver to provide access to some low-level
features of CPUs like reading/writing machine-specific registers,
  retrieving cpuid data, and updating microcode.
- Add cpucontrol(8) utility, that provides userland access to
  the features of cpuctl(4).
- Add subsequent manpages.

The cpuctl(4) device operates as follows. The pseudo-device node cpuctlX
is created for each cpu present in the systems. The pseudo-device minor
number corresponds to the cpu number in the system. The cpuctl(4) pseudo-
device allows a number of ioctl to be preformed, namely RDMSR/WRMSR/CPUID
and UPDATE. The first pair alows the caller to read/write machine-specific
registers from the correspondent CPU. cpuid data could be retrieved using
the CPUID call, and microcode updates are applied via UPDATE.

The permissions are inforced based on the pseudo-device file permissions.
RDMSR/CPUID will be allowed when the caller has read access to the device
node, while WRMSR/UPDATE will be granted only when the node is opened
for writing. There're also a number of priv(9) checks.

The cpucontrol(8) utility is intened to provide userland access to
the cpuctl(4) device features. The utility also allows one to apply
cpu microcode updates.

Currently only Intel and AMD cpus are supported and were tested.

Approved by:	kib
Reviewed by:	rpaulo, cokane, Peter Jeremy
MFC after:	1 month
2008-08-08 16:26:53 +00:00
alc
a08d055d40 Introduce pmap_change_attr_locked(). 2008-08-07 04:56:29 +00:00
alc
016300862b Make pmap_kenter_attr() static. 2008-08-04 08:04:09 +00:00
ed
7237d2d9a2 Disconnect drivers that haven't been ported to MPSAFE TTY yet.
As clearly mentioned on the mailing lists, there is a list of drivers
that have not been ported to the MPSAFE TTY layer yet. Remove them from
the kernel configuration files. This means people can now still use
these drivers if they explicitly put them in their kernel configuration
file, which is good.

People should keep in mind that after August 10, these drivers will not
work anymore. Even though owners of the hardware are capable of getting
these drivers working again, I will see if I can at least get them to a
compilable state (if time permits).
2008-08-03 10:32:17 +00:00
alc
a59aadb8d6 Enhance pmap_mapdev_attr(). Take advantage of recent enhancements to
pmap_change_attr() in order to use the direct map for any cache mode, not
just write-back mode.

It is worth noting that this change also eliminates a situation in which we
have two mappings to the same physical memory with different cache modes.

Submitted by:	Magesh Dhasayyan (with some changes by me)
Discussed with:	jhb
2008-08-02 03:43:54 +00:00
jhb
972469c120 MFC: Add the optional nvram(4) device. As with 7.x, this device is off
by default but can be enabled via 'device nvram' or loading the nvram.ko
module on amd64 and i386.
2008-08-01 21:24:17 +00:00
alc
2496625e17 Enhance pmap_change_attr() with the ability to demote 1GB page mappings. 2008-08-01 04:55:38 +00:00
alc
ba620c5434 Enhance pmap_change_attr(). Specifically, avoid 2MB page demotions, cache
mode changes, and cache and TLB invalidation when some or all of the
specified range is already mapped with the specified cache mode.

Submitted by:	Magesh Dhasayyan
2008-07-31 22:45:28 +00:00
alc
9a91b0b82b Eliminate recomputation of the PDE by pmap_pde_attr(). 2008-07-31 04:42:42 +00:00
jfv
528c59435f Add igb to the default kernel
MFC after:ASAP
2008-07-30 22:27:38 +00:00
kib
19bf5e2807 Bring back the save/restore of the %ds, %es, %fs and %gs registers for
the 32bit images on amd64.

Change the semantic of the PCB_32BIT pcb flag to request the context
switch code to operate on the segment registers. Its previous meaning
of saving or restoring the %gs base offset is assigned to the new
PCB_GS32BIT flag.

FreeBSD 32bit image activator sets the PCB_32BIT flag, while Linux 32bit
emulation sets PCB_32BIT | PCB_GS32BIT.

Reviewed by:	peter
MFC after:	2 weeks
2008-07-30 11:30:55 +00:00
alc
93ecf09bb7 Don't allow pmap_change_attr() to be applied to the recursive mapping. 2008-07-28 04:59:48 +00:00
alc
8c160a5c01 Add a check for 1GB page mappings to pmap_change_attr() so that it fails
gracefully.  (On K10 family processors the direct map is implemented using
1GB page mappings.)
2008-07-28 03:58:49 +00:00
yongari
7e604e1299 MFC r179347.
Add jme(4) to the list of drivers supported by GENERIC kernel.
2008-07-28 02:20:29 +00:00
alc
369f479043 Style fixes to several function definitions. 2008-07-27 18:18:50 +00:00
alc
4455d56f31 Enhance pmap_change_attr(). Use pmap_demote_pde() to demote a 2MB page
mapping to 4KB page mappings when the specified attribute change only
applies to a portion of the 2MB page.  Previously, in such cases,
pmap_change_attr() gave up and returned an error.

Submitted by:	Magesh Dhasayyan
2008-07-27 17:32:36 +00:00
alc
86bb1c4f2b Increase the ceiling on the size of the buffer map. 2008-07-19 23:42:38 +00:00
alc
493038bc03 Correct an error in pmap_change_attr()'s initial loop that verifies that the
given range of addresses are mapped.  Previously, the loop was testing the
same address every time.

Submitted by:	Magesh Dhasayyan
2008-07-18 22:05:51 +00:00
alc
07ead18715 Simplify pmap_extract()'s control flow, making it more like the related
functions pmap_extract_and_hold() and pmap_kextract().
2008-07-18 20:07:50 +00:00
alc
d4de04e9b1 Update bus_dmamem_alloc()'s first call to malloc() such that M_WAITOK is
specified when appropriate.

Reviewed by:	scottl
2008-07-15 03:34:49 +00:00
alc
181ba3c627 Handle a race between pmap_kextract() and pmap_promote_pde(). This race
caused ZFS to crash when restoring a snapshot with superpage promotion
enabled.

Reported by:	kris
2008-07-13 18:19:53 +00:00
ed
a8f4e95b68 Make uart(4) the default serial port driver on i386 and amd64.
The uart(4) driver has the advantage of supporting a wider variety of
hardware on a greater amount of platforms. This driver has already been
the standard on platforms such as ia64, powerpc and sparc64.

I've decided not to change anything on pc98. I'd rather let people from
the pc98 team look at this.

Approved by:	philip (mentor), marcel
2008-07-13 07:20:14 +00:00
alc
a761fe4f10 Refine the changes made in SVN rev 180430. Specifically, instantiate a new
page table page only if the 2MB page mapping has been used.  Also, refactor
some assertions.
2008-07-12 21:24:42 +00:00
alc
6dd633c3c8 In order to apply pmap_demote_pde() to a page directory entry (PDE) from the
direct map, the PDE must have PG_M and PG_A preset.

Noticed by: Magesh Dhasayyan
2008-07-12 18:43:57 +00:00
alc
bb93e65e81 Extend pmap_demote_pde() to include the ability to instantiate a new page
table page where none existed before.
2008-07-10 16:22:24 +00:00
peter
383e07b996 Band-aid a problem with 32 bit selector setup.
Initialize %ds, %es, and %fs during CPU startup.  Otherwise a garbage
value could leak to a 32-bit process if a process migrated to a different
CPU after exec and the new CPU had never exec'd a 32-bit process.

A more complete fix is needed, but this mitigates the most frequent
manifestations.

Obtained from:	ups
2008-07-09 19:44:37 +00:00
alc
1f7be5a00f Fix lines that are too long in pmap_growkernel() by substituting shorter but
equivalent expressions.
2008-07-09 06:04:10 +00:00
alc
12ebfa7cd9 Eliminate pmap_growkernel()'s dependence on create_pagetables() preallocating
page directory pages from VM_MIN_KERNEL_ADDRESS through the end of the
kernel's bss.  Specifically, the dependence was in pmap_growkernel()'s one-
time initialization of kernel_vm_end, not in its main body.  (I could not,
however, resist the urge to optimize the main body.)

Reduce the number of preallocated page directory pages to just those needed
to support NKPT page table pages.  (In fact, this allows me to revert a
couple of my earlier changes to create_pagetables().)
2008-07-08 22:59:17 +00:00
alc
88ebbadb2e Rev 180333, ``Change create_pagetables() and pmap_init() so that many fewer
page table pages have to be preallocated ...'', violates an assumption made
by minidumpsys(): kernel_vm_end is the highest virtual address that has ever
been used by the kernel.  Now, however, the kernel code, data, and bss may
reside at addresses beyond kernel_vm_end.  This revision modifies the upper
bound on minidumpsys()'s two page table traversals to account for this
possibility.
2008-07-08 04:00:22 +00:00
delphij
cb283fcdf7 Add HWPMC_HOOKS to GENERIC kernels, this makes hwpmc.ko work out
of the box.
2008-07-07 22:55:11 +00:00
alc
64ef3ee8e5 In FreeBSD 7.0 and beyond, pmap_growkernel() should pass VM_ALLOC_INTERRUPT
to vm_page_alloc() instead of VM_ALLOC_SYSTEM.  VM_ALLOC_SYSTEM was the
logical choice before FreeBSD 7.0 because VM_ALLOC_INTERRUPT could not
reclaim a cached page.  Simply put, there was no ordering between
VM_ALLOC_INTERRUPT and VM_ALLOC_SYSTEM as to which "dug deeper" into the
cache and free queues.  Now, there is; VM_ALLOC_INTERRUPT dominates
VM_ALLOC_SYSTEM.

While I'm here, teach pmap_growkernel() to request a prezeroed page.

MFC after:	1 week
2008-07-07 17:25:09 +00:00
alc
2e2599b5d0 Change create_pagetables() and pmap_init() so that many fewer page table
pages have to be preallocated by create_pagetables().
2008-07-06 22:36:28 +00:00
alc
12adeb8575 Increase the kernel map's size to 7GB, making room for a kmem map of size
greater than 4GB.  (Auto-sizing will set the ceiling on the kmem map size
to 4.2GB.)
2008-07-05 20:44:55 +00:00
alc
3f80ced8f0 Eliminate an unused declaration. (In fact, the declaration is bogus
because the variable is defined static to pmap.c on i386.)

Found by: CScout
2008-07-04 17:36:12 +00:00
alc
4ef7d0cd62 Increase the ceiling on the kmem map's size to 3.6GB. Also, define the
ceiling as a fraction of the kernel map's size rather than an absolute
quantity.  Thus, scaling of the kmem map's size will be automatic with
changes to the kernel map's size.
2008-07-03 04:53:14 +00:00
alc
04dc8c2e9c Eliminate an unnecessary static variable: nkpt. 2008-07-02 05:41:23 +00:00
obrien
e34ad39247 MFC: 180149 / 1.49.2.1 (which was MFC of r180109 / rev 1.53)
Document the layout of the address space.
2008-07-01 21:10:38 +00:00
alc
c5776c1b86 Document the layout of the address space, borrowing heavily from
http://lists.freebsd.org/pipermail/freebsd-amd64/2005-July/005578.html
2008-06-30 03:14:39 +00:00
alc
2ea095c0e8 Compute NKPDPE from NKPT. This reduces the number of knobs that must be
turned in order to change the size of the kernel virtual address space.
2008-06-30 02:35:55 +00:00
alc
828df4dc3d Strictly speaking, the definition of VM_MAX_KERNEL_ADDRESS is wrong. However,
in practice, the error (currently) makes no difference because the computation
performed by KVADDR() hides the error.  This revision fixes the error.

Also, eliminate a (now) unused definition.
2008-06-29 19:13:27 +00:00
alc
4a9078c9b0 Increase the size of the kernel virtual address space to 6GB. Until the
maximum size of the kmem map can be greater than 4GB, there is little point
in making the kernel virtual address space larger than 6GB.

Tested by:	kris@
2008-06-29 18:35:00 +00:00
ed
4d6a9685e8 Remove the unused major/minor numbers from iodev and memdev.
Now that st_rdev is being automatically generated by the kernel, there
is no need to define static major/minor numbers for the iodev and
memdev. We still need the minor numbers for the memdev, however, to
distinguish between /dev/mem and /dev/kmem.

Approved by:	philip (mentor)
2008-06-25 07:45:31 +00:00
jkim
4cc7195a06 Emit opcodes closer to GNU as(1) generated codes and micro-optimize. 2008-06-24 20:12:12 +00:00
jkim
6aed2b5388 Rehash and clean up BPF JIT compiler macros to match AT&T notations. 2008-06-23 23:09:52 +00:00
alc
13b5266a78 Ensure that KERNBASE is no less than the virtual address -2GB. 2008-06-23 15:22:53 +00:00
alc
2d03a1918b Prepare for a larger kernel virtual address space. Specifically, once
KERNBASE and VM_MIN_KERNEL_ADDRESS are no longer the same, the physical
memory allocated during bootstrap will be offset from the low-end of the
kernel's page table.
2008-06-21 19:19:09 +00:00
alc
8c14570d5e Make preparations for increasing the size of the kernel virtual
address space on the amd64 architecture.  The amd64 architecture
requires kernel code and global variables to reside in the highest 2GB
of the 64-bit virtual address space.  Thus, KERNBASE cannot change.
However, KERNBASE is sometimes used as the start of the kernel virtual
address space.  Henceforth, VM_MIN_KERNEL_ADDRESS should be used
instead.  Since KERNBASE and VM_MIN_KERNEL_ADDRESS are still the same
address, there should be no visible effect from this change (yet).
That said, kris@ has tested crash dumps under the full patch that
increases the kernel virtual address space on amd64 to 6GB.

Tested by: kris@
2008-06-20 20:59:31 +00:00
delphij
ee624c02de Add et(4), a port of DragonFly's Agere ET1310 10/100/Gigabit
Ethernet device driver, written by sephe@

Obtained from:	DragonFly
Sponsored by:	iXsystems
MFC after:	2 weeks
2008-06-20 19:28:33 +00:00
alc
2fc7871f11 Make preparations for increasing the size of the kernel virtual
address space on the amd64 architecture.  The amd64 architecture
requires kernel code and global variables to reside in the highest 2GB
of the 64-bit virtual address space.  Thus, KERNBASE cannot change.
However, KERNBASE is sometimes used as the start of the kernel virtual
address space.  Henceforth, VM_MIN_KERNEL_ADDRESS should be used
instead.  Since KERNBASE and VM_MIN_KERNEL_ADDRESS are still the same
address, there should be no visible effect from this change (yet).
2008-06-20 05:22:09 +00:00
alc
25d85147e9 Tweak the promotion test in pmap_promote_pde(). Specifically, test PG_A
before PG_M.  This sometimes prevents unnecessary removal of write access
from a PTE.  Overall, the net result is fewer demotions and promotion
failures.
2008-06-13 19:33:56 +00:00
alc
8ef896c2f8 Reverse the direction of pmap_promote_pde()'s traversal over the specified
page table page.  The direction of the traversal can matter if
pmap_promote_pde() has to remove write access (PG_RW) from a PTE that hasn't
been modified (PG_M).  In general, if there are two or more such PTEs to
choose among, it is better to write protect the one nearer the high end of
the page table page rather than the low end.  This is because most programs
access memory in an ascending direction.  The net result of this change is a
sometimes significant reduction in the number of failed promotion attempts
and the number of pages that are write protected by pmap_promote_pde().
2008-06-12 05:18:09 +00:00
alc
061cc6f3ee Correct an error in pmap_promote_pde() that may result in an errant
promotion within the kernel's address space.  Specifically,
pmap_promote_pde() is only called when the page table page (PTP) that
is referenced by the given PDE has a full "use count", i.e., its
wire_count is 512.  Although this guarantees for a user address space
that all 512 PTEs in the PTP hold valid mappings, the same is not true
of the kernel's address space.  A kernel PTP always has a use count of
512 regardless of the state of the PTEs.  Therefore,
pmap_promote_pde() should not assume (or assert) that the first PTE in
the PTP is valid.
2008-06-01 07:36:59 +00:00
yongari
2643dd65cd Add jme(4) to the list of drivers supported by GENERIC kernel. 2008-05-27 02:22:32 +00:00
bz
6bba9b4244 Remove ISDN4BSD (I4B) from HEAD as it is not MPSAFE and
parts relied on the now removed NET_NEEDS_GIANT.
Most of I4B has been disconnected from the build
since July 2007 in HEAD/RELENG_7.

This is what was removed:
- configuration in /etc/isdn
- examples
- man pages
- kernel configuration
- sys/i4b (drivers, layers, include files)
- user space tools
- i4b support from ppp
- further documentation

Discussed with: rwatson, re
2008-05-26 10:40:09 +00:00
jb
3c417aadf3 Add the DTrace hooks for exception handling (Function boundary trace
-fbt- provider), cyclic clock and syscalls.
2008-05-24 06:32:26 +00:00
alc
964def13e2 The VM system no longer uses setPQL2(). Remove it and its helpers. 2008-05-23 04:03:54 +00:00
yongari
11daf29917 Add age(4) to the list of drivers supported by GENERIC kernel. 2008-05-19 02:30:27 +00:00
alc
a8f81206ad Retire pmap_addr_hint(). It is no longer used. 2008-05-18 04:16:57 +00:00
remko
91e9f2c6be Resort the if_ti driver to match the PCI Network cards instead of placing
it under the mii devices list.

PR:		kern/123147
Submitted by:	gavin
Approved by:	imp (mentor, implicit)
MFC after:	3 days
2008-05-17 23:50:00 +00:00
attilio
bc6974a05b Removed unused assembly offsets for structures digging. 2008-05-16 13:23:47 +00:00
rdivacky
faae559cb1 Regen.
Approved by:	kib (mentor)
2008-05-13 20:02:26 +00:00
rdivacky
13cbd9c97e Implement robust futexes. Most of the code is modelled after
what Linux does. This is because robust futexes are mostly
userspace thing which we cannot alter. Two syscalls maintain
pointer to userspace list and when process exits a routine
walks this list waking up processes sleeping on futexes
from that list.

Reviewed by:	kib (mentor)
MFC after:	1 month
2008-05-13 20:01:27 +00:00
alc
9fed63a445 Correct an error in pmap_align_superpage(). Specifically, correctly
handle the case where the mapping is greater than a superpage in size
but the alignment of the physical pages spans a superpage boundary.
2008-05-11 20:33:47 +00:00
alc
9e8bccea75 Introduce pmap_align_superpage(). It increases the starting virtual
address of the given mapping if a different alignment might result in more
superpage mappings.
2008-05-09 16:48:07 +00:00
sam
39c0719a2e enable IEEE80211_DEBUG and IEEE80211_AMPDU_AGE by default 2008-05-03 17:05:38 +00:00
sam
8bf6f34fe9 Intel 4965 wireless driver (derived from openbsd driver of the same name) 2008-04-29 21:36:17 +00:00
alc
fc2ca88c2c Always use PG_PS_FRAME to extract the physical address of a 2/4MB page
from a PDE.
2008-04-25 16:00:39 +00:00
jeff
14b586bf96 - Add an integer argument to idle to indicate how likely we are to wake
from idle over the next tick.
 - Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are
   suspended in cpu specific states.  This function can fail and cause the
   scheduler to fall back to another mechanism (ipi).
 - Implement support for mwait in cpu_idle() on i386/amd64 machines that
   support it.  mwait is a higher performance way to synchronize cpus
   as compared to hlt & ipis.
 - Allow selecting the idle routine by name via sysctl machdep.idle.  This
   replaces machdep.cpu_idle_hlt.  Only idle routines supported by the
   current machine are permitted.

Sponsored by:	Nokia
2008-04-25 05:18:50 +00:00
dfr
0536363c85 MFC: kernel-mode NFS lock manager. 2008-04-24 10:46:25 +00:00
rdivacky
dd1e82ea4d Implement linux_truncate64() syscall.
Tested by:	Aline de Freitas <aline@riseup.net>
Approved by:	kib (mentor)
2008-04-23 15:56:33 +00:00
phk
8d647da1ed Now that all platforms use genclock, shuffle things around slightly
for better structure.

Much of this is related to <sys/clock.h>, which should really have
been called <sys/calendar.h>, but unless and until we need the name,
the repocopy can wait.

In general the kernel does not know about minutes, hours, days,
timezones, daylight savings time, leap-years and such.  All that
is theoretically a matter for userland only.

Parts of kernel code does however care: badly designed filesystems
store timestamps in local time and RTC chips almost universally
track time in a YY-MM-DD HH:MM:SS format, and sometimes in local
timezone instead of UTC.  For this we have <sys/clock.h>

<sys/time.h> on the other hand, deals with time_t, timeval, timespec
and so on.  These know only seconds and fractions thereof.

Move inittodr() and resettodr() prototypes to <sys/time.h>.
Retain the names as it is one of the few surviving PDP/VAX references.

Move startrtclock() to <machine/clock.h> on relevant platforms, it
is a MD call between machdep.c/clock.c.  Remove references to it
elsewhere.

Remove a lot of unnecessary <sys/clock.h> includes.

Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs.
XXX: should be kern.disable_rtc_set really, it's not MD.
2008-04-22 19:38:30 +00:00
sam
3569e353ca Multi-bss (aka vap) support for 802.11 devices.
Note this includes changes to all drivers and moves some device firmware
loading to use firmware(9) and a separate module (e.g. ral).  Also there
no longer are separate wlan_scan* modules; this functionality is now
bundled into the wlan module.

Supported by:	Hobnob and Marvell
Reviewed by:	many
Obtained from:	Atheros (some bits)
2008-04-20 20:35:46 +00:00
sam
682b4ae9be move awi to the Attic; it will not make the jump to the new world order
Reviewed by:	imp
2008-04-20 19:20:39 +00:00
peter
a2a93ed9df Put in a real isa_irq_pending() stub in order to remove two lines of dmesg
noise from sio per unit.  sio likes to probe if interrupts are configured
correctly by looking at the pending bits of the atpic in order to put a
non-fatal warning on the console.  I think I'd rather read the pending
bits from the apics, but I'm not sure its worth the hassle.
2008-04-19 07:25:57 +00:00
jeff
b82673ba40 - Add inlines for the monitor and mwait instructions.
Sponsored by:	Nokia
2008-04-18 05:47:56 +00:00
jkim
e0c673b7e2 Regenerate. 2008-04-16 19:27:36 +00:00
jkim
513781a1c1 Add stubs for syscalls introduced in Linux 2.6.17 kernel.
Some GNU libc version started using them before 2.6.17 was officially out.

MFC after:	3 days
2008-04-16 19:25:39 +00:00
imp
1bfc42b341 This file is unused on amd64. 2008-04-15 02:10:14 +00:00
phk
bd75233fb4 Convert amd64 and i386 to share the atrtc device driver. 2008-04-14 08:00:00 +00:00
rpaulo
07a3d6df55 Connect k8temp(4) to the build. 2008-04-12 14:20:22 +00:00
jeff
8efb03d60e - Add the interrupt vector number to intr_event_create so MI code can
lookup hard interrupt events by number.  Ignore the irq# for soft intrs.
 - Add support to cpuset for binding hardware interrupts.  This has the
   side effect of binding any ithread associated with the hard interrupt.
   As per restrictions imposed by MD code we can only bind interrupts to
   a single cpu presently.  Interrupts can be 'unbound' by binding them
   to all cpus.

Reviewed by:	jhb
Sponsored by:	Nokia
2008-04-11 03:26:41 +00:00
alc
062b34d1be Correct pmap_copy()'s method for extracting the physical address of a
2/4MB page from a PDE.  Specifically, change it to use PG_PS_FRAME,
not PG_FRAME, to extract the physical address of a 2/4MB page from a
PDE.

Change the last argument passed to pmap_pv_insert_pde() from a
vm_page_t representing the first 4KB page of a 2/4MB page to the
vm_paddr_t of the 2/4MB page.  This avoids an otherwise unnecessary
conversion from a vm_paddr_t to a vm_page_t in pmap_copy().
2008-04-10 16:04:50 +00:00
kib
133f8f7798 Regenerate 2008-04-08 09:51:19 +00:00
kib
eb77b477b4 Implement the linux syscalls
openat, mkdirat, mknodat, fchownat, futimesat, fstatat, unlinkat,
    renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat.

Submitted by:	rdivacky
Sponsored by:	Google Summer of Code 2007
Tested by:	pho
2008-04-08 09:45:49 +00:00
alc
f2ea5d4883 Update pmap_page_wired_mappings() so that it counts 2/4MB page mappings. 2008-04-07 07:38:02 +00:00
jhb
79918c45a6 Add a MI intr_event_handle() routine for the non-INTR_FILTER case. This
allows all the INTR_FILTER #ifdef's to be removed from the MD interrupt
code.
- Rename the intr_event 'eoi', 'disable', and 'enable' hooks to
  'post_filter', 'pre_ithread', and 'post_ithread' to be less x86-centric.
  Also, add a comment describe what the MI code expects them to do.
- On amd64, i386, and powerpc this is effectively a NOP.
- On arm, don't bother masking the interrupt unless the ithread is
  scheduled in the non-INTR_FILTER case to match what INTR_FILTER did.
  Also, don't bother unmasking the interrupt in the post_filter case if
  we never masked it.  The INTR_FILTER case had been doing this by having
  arm_unmask_irq for the post_filter (formerly 'eoi') hook.
- On ia64, stray interrupts are now masked for the non-INTR_FILTER case.
  They were already masked in the INTR_FILTER case.
- On sparc64, use the a NULL pre_ithread hook and use intr_enable_eoi() for
  both the 'post_filter' and 'post_ithread' hooks to match what the
  non-INTR_FILTER code did.
- On sun4v, retire the ithread wrapper hack by using an appropriate
  'post_ithread' hook instead (it's what 'post_ithread'/'enable' was
  designed to do even in 5.x).

Glanced at by:	piso
Reviewed by:	marius
Requested by:	marius [1], [5]
Tested on:	amd64, i386, arm, sparc64
2008-04-05 19:58:30 +00:00
alc
b1ed68eb9c Eliminate an unnecessary test and its misleading comment from pmap_enter(). 2008-04-04 18:00:22 +00:00
alc
18c266c7a7 Optimize pmap_pml4e() and pmap_pdpe() based upon two observations: The
given pmap is never NULL, and therefore pmap_pml4e() can never return
NULL.  The pervasive use of these inline functions throughout the pmap
makes these simple changes worthwhile.
2008-04-02 04:39:47 +00:00
ps
41d5b26ff8 Add support to mincore for detecting whether a page is part of a
"super" page or not.

Reviewed by:	alc, ups
2008-03-28 04:29:27 +00:00
kib
6d49f1490b MFC
rev. 1.682 of sys/amd64/amd64/machdep.c
rev. 1.16  of sys/amd64/ia32/ia32_signal.c
rev. 1.33  of sys/amd64/linux32/linux32_sysvec.c
rev. 1.666 of sys/i386/i386/machdep.c
rev. 1.152 of sys/i386/linux/linux_sysvec.c
rev. 1.39  of sys/i386/svr4/svr4_machdep.c
rev. 1.402 of sys/pc98/pc98/machdep.c

Modify the signal handler frame setup code to clear the DF {e,r}flags
bit on the amd64/i386 for the signal handlers.
2008-03-27 13:53:52 +00:00
dfr
dc98ee4196 Add kernel module support for nfslockd and krpc. Use the module system
to detect (or load) kernel NLM support in rpc.lockd. Remove the '-k'
option to rpc.lockd and make kernel NLM the default. A user can still
force the use of the old user NLM by building a kernel without NFSLOCKD
and/or removing the nfslockd.ko module.
2008-03-27 11:54:20 +00:00
jb
34e730ca27 When building a kernel module, define MAXCPU the same as SMP so
that modules work with and without SMP.
2008-03-27 05:03:26 +00:00
phk
c763b22a79 Back in the good old days, PC's had random pieces of rock for
frequency generation and what frequency the generated was anyones
guess.

In general the 32.768kHz RTC clock x-tal was the best, because that
was a regular wrist-watch Xtal, whereas the X-tal generating the
ISA bus frequency was much lower quality, often costing as much as
several cents a piece, so it made good sense to check the ISA bus
frequency against the RTC clock.

The other relevant property of those machines, is that they
typically had no more than 16MB RAM.

These days, CPU chips croak if their clocks are not tightly within
specs and all necessary frequencies are derived from the master
crystal by means if PLL's.

Considering that it takes on average 1.5 second to calibrate the
frequency of the i8254 counter, that more likely than not, we will
not actually use the result of the calibration, and as the final
clincher, we seldom use the i8254 for anything besides BEL in
syscons anyway, it has become time to drop the calibration code.

If you need to tell the system what frequency your i8254 runs,
you can do so from the loader using hw.i8254.freq or using the
sysctl kern.timecounter.tc.i8254.frequency.
2008-03-26 22:12:00 +00:00
phk
168398fe50 Eliminate unnecessary #includes 2008-03-26 20:26:12 +00:00
phk
fa71439e44 The "free-lance" timer in the i8254 is only used for the speaker
these days, so de-generalize the acquire_timer/release_timer api
to just deal with speakers.

The new (optional) MD functions are:
	timer_spkr_acquire()
	timer_spkr_release()
and
	timer_spkr_setfreq()

the last of which configures the timer to generate a tone of a given
frequency, in Hz instead of 1/1193182th of seconds.

Drop entirely timer2 on pc98, it is not used anywhere at all.

Move sysbeep() to kern/tty_cons.c and use the timer_spkr*() if
they exist, and do nothing otherwise.

Remove prototypes and empty acquire-/release-timer() and sysbeep()
functions from the non-beeping archs.

This eliminate the need for the speaker driver to know about
i8254frequency at all.  In theory this makes the speaker driver MI,
contingent on the timer_spkr_*() functions existing but the driver
does not know this yet and still attaches to the ISA bus.

Syscons is more tricky, in one function, sc_tone(), it knows the hz
and things are just fine.

In the other function, sc_bell() it seems to get the period from
the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode
the 1193182 and leave it at that.  It's probably not important.

Change a few other sysbeep() uses which obviously knew that the
argument was in terms of i8254 frequency, and leave alone those
that look like people thought sysbeep() took frequency in hertz.

This eliminates the knowledge of i8254_freq from all but the actual
clock.c code and the prof_machdep.c on amd64 and i386, where I think
it would be smart to ask for help from the timecounters anyway [TBD].
2008-03-26 20:09:21 +00:00
phk
632e5d39f7 Rename timer0_max_count to i8254_max_count.
Rename timer0_real_max_count to i8254_real_max_count and make it static.
Rename timer_freq to i8254_freq and make it a loader tunable.
2008-03-26 15:03:24 +00:00
phk
44bfb30efd The RTC related pscnt and psdiv variables have no business being public. 2008-03-26 13:25:27 +00:00
jkim
3e99f5d364 Belatedly add BPF_JITTER in NOTES for supported architectures. 2008-03-24 22:23:22 +00:00
peter
112e790f78 First pass at (possibly futile) microoptimizing of cpu_switch. Results
are mixed.  Some pure context switch microbenchmarks show up to 29%
improvement.  Pipe based context switch microbenchmarks show up to 7%
improvement.  Real world tests are far less impressive as they are
dominated more by actual work than switch overheads, but depending on
the machine in question, workload, kernel options, phase of moon, etc, a
few percent gain might be seen.

Summary of changes:
- don't reload MSR_[FG]SBASE registers when context switching between
  non-threaded userland apps.  These typically cost 120 clock cycles each
  on an AMD cpu (less on Barcelona/Phenom).  Intel cores are probably no
  faster on this.
- The above change only helps unthreaded userland apps that tend to use
  the same value for gsbase.  Threaded apps will get no benefit from this.
- reorder things like accessing the pcb to be in memory order, to give
  prefetching a better chance of working.  Operations are now in increasing
  memory address order, rather than reverse or random.
- Push some lesser used code out of the main code paths.  Hopefully
  allowing better code density in cache lines.  This is probably futile.
- (part 2 of previous item) Reorder code so that branches have a more
  realistic static branch prediction hint.  Both Intel and AMD cpus
  default to predicting branches to lower memory addresses as being
  taken, and to higher memory addresses as not being taken.  This is
  overridden by the limited dynamic branch prediction subsystem.  A trip
  through userland might overflow this.
- Futule attempt at spreading the use of the results of previous operations
  in new operations.  Hopefully this will allow the cpus to execute in
  parallel better.
- stop wasting 16 bytes at the top of kernel stack, below the PCB.
- Never load the userland fs/gsbase registers for kthreads, but preserve
  curpcb->pcb_[fg]sbase as caches for the cpu. (Thanks Jeff!)

Microbenchmarking this code seems to be really sensitive to things like
scheduling luck, timing, cache behavior, tlb behavior, kernel options,
other random code changes, etc.

While it doesn't help heavy userland workloads much, it does help high
context switch loads a little, and should help those that involve
switching via kthreads a bit more.

A special thanks to Kris for the testing and reality checks, and Jeff for
tormenting me into doing this. :)

This is still work-in-progress.
2008-03-23 23:09:06 +00:00
alc
f9d9755304 Correct an error in pmap_mincore() when applied to a 2MB page mapping:
Use PG_PS_FRAME, not PG_FRAME, to obtain the physical address of the
2MB physical page from the PDE.
2008-03-23 23:04:09 +00:00
peter
b238ee1007 Export TDP_KTHREAD to asm files. 2008-03-23 22:46:37 +00:00
peter
1f7e9770bb Move pcb_flags to make trivially better use of cache lines. 2008-03-23 22:45:51 +00:00
peter
075e9da352 Protect the setting of the fsbase/gsbase MSR registers and the
pcb_[fg]sbase values with a critical section, like the rest of the kernel.
2008-03-23 22:44:56 +00:00
alc
e702727e2c To date, we have assumed that the TLB will only set the PG_M bit in a
PTE if that PTE has the PG_RW bit set.  However, this assumption does
not hold on recent processors from Intel.  For example, consider a PTE
that has the PG_RW bit set but the PG_M bit clear.  Suppose this PTE
is cached in the TLB and later the PG_RW bit is cleared in the PTE,
but the corresponding TLB entry is not (yet) invalidated.
Historically, upon a write access using this (stale) TLB entry, the
TLB would observe that the PG_RW bit had been cleared and initiate a
page fault, aborting the setting of the PG_M bit in the PTE.  Now,
however, P4- and Core2-family processors will set the PG_M bit before
observing that the PG_RW bit is clear and initiating a page fault.  In
other words, the write does not occur but the PG_M bit is still set.

The real impact of this difference is not that great.  Specifically,
we should no longer assert that any PTE with the PG_M bit set must
also have the PG_RW bit set, and we should ignore the state of the
PG_M bit unless the PG_RW bit is set.  However, these changes enable
me to remove a work-around from pmap_promote_pde(), the superpage
promotion procedure.

(Note: The AMD processors that we have tested, including the latest,
the Phenom, still exhibit the historical behavior.)

Acknowledgments: After I observed the problem, Stephan (ups) was
instrumental in characterizing the exact behavior of Intel's recent
TLBs.

Tested by: Peter Holm
2008-03-23 20:38:01 +00:00
kib
53a15ee1ea Prevent the overflow in the calculation of the next page directory.
The overflow causes the wraparound with consequent corruption of the
(almost) whole address space mapping.

As Alan noted, pmap_copy() does not require the wrap-around checks
because it cannot be applied to the kernel's pmap. The checks there are
included for consistency.

Reported and tested by:	kris (i386/pmap.c:pmap_remove() part)
Reviewed by:	alc
MFC after:	1 week
2008-03-23 07:07:27 +00:00
jhb
303513927d MFC: Add constants for the different memory types in the SMAP table and
use the SMAP types and constants from <machine/pc/bios.h> in the boot code.
2008-03-21 15:34:39 +00:00
jhb
fbea3b6403 Explicitly use spinlock_enter/exit rather than locking the icu_lock spin
lock in the 8259A drivers as these drivers are only used on UP systems.
This slightly reduces the penalty of an SMP kernel (such as GENERIC) on
a UP x86 machine.
2008-03-20 21:53:27 +00:00
jhb
6cf6d7b22b Implement a BUS_BIND_INTR() method in the bus interface to bind an IRQ
resource to a CPU.  The default method is to pass the request up to the
parent similar to BUS_CONFIG_INTR() so that all busses don't have to
explicitly implement bus_bind_intr.  A bus_bind_intr(9) wrapper routine
similar to bus_setup/teardown_intr() is added for device drivers to use.
Unbinding an interrupt is done by binding it to NOCPU.  The IRQ resource
must be allocated, but it can happen in any order with respect to
bus_setup_intr().  Currently it is only supported on amd64 and i386 via
nexus(4) methods that simply call the intr_bind() routine.

Tested by:	gallatin
2008-03-20 21:24:32 +00:00
jhb
b1769b65e6 MFC: Use a runtime mask for the PhysBase and PhysMask fields in variable
sized MTRR registers.
2008-03-19 16:37:24 +00:00
jhb
79661bdbd0 MFC: Minimize diffs with i686_mem.c. 2008-03-19 16:31:32 +00:00
jhb
aca442d36a MFC: Add constants for the various fields in MTRR registers and apply
style(9).  No functional changes.
2008-03-19 16:29:07 +00:00
jhb
c04bb048f6 Simplify the interrupt code a bit:
- Always include the ie_disable and ie_eoi methods in 'struct intr_event'
  and collapse down to one intr_event_create() routine.  The disable and
  eoi hooks simply aren't used currently in the !INTR_FILTER case.
- Expand 'disab' to 'disable' in a few places.
- Use function casts for arm and i386:intr_eoi_src() instead of wrapper
  routines since to trim one extra indirection.

Compiled on:	{arm,amd64,i386,ia64,ppc,sparc64} x {FILTER, !FILTER}
Tested on:	{amd64,i386} x {FILTER, !FILTER}
2008-03-17 22:42:01 +00:00
jhb
9c82fd4f0f MFC: Remove the 'needbounce' variable from _bus_dmamap_load_buffer(). 2008-03-17 17:33:32 +00:00
pjd
ea49d310bf Implement atomic_fetchadd_long() for all architectures and document it.
Reviewed by:	attilio, jhb, jeff, kris (as a part of the uidinfo_waitfree.patch)
2008-03-16 21:20:50 +00:00
rdivacky
64c7931e65 Regen. 2008-03-16 16:29:37 +00:00
rdivacky
b13a84dcb7 Implement sched_setaffinity and get_setaffinity using
real cpu affinity setting primitives.

Reviewed by:	jeff
Approved by:	kib (mentor)
2008-03-16 16:27:44 +00:00
rwatson
877d7c65ba In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation.  This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after:	1 month
Discussed with:	imp, rink
2008-03-16 10:58:09 +00:00
jhb
9c113163fb Add preliminary support for binding interrupts to CPUs:
- Add a new intr_event method ie_assign_cpu() that is invoked when the MI
  code wishes to bind an interrupt source to an individual CPU.  The MD
  code may reject the binding with an error.  If an assign_cpu function
  is not provided, then the kernel assumes the platform does not support
  binding interrupts to CPUs and fails all requests to do so.
- Bind ithreads to CPUs on their next execution loop once an interrupt
  event is bound to a CPU.  Only shared ithreads are bound.  We currently
  leave private ithreads for drivers using filters + ithreads in the
  INTR_FILTER case unbound.
- A new intr_event_bind() routine is used to bind an interrupt event to
  a CPU.
- Implement binding on amd64 and i386 by way of the existing pic_assign_cpu
  PIC method.
- For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up
  an interrupt source and binds its interrupt event to the specified CPU.
  MI code can currently (ab)use this by doing:

	intr_bind(rman_get_start(irq_res), cpu);

  however, I plan to add a truly MI interface (probably a bus_bind_intr(9))
  where the implementation in the x86 nexus(4) driver would end up calling
  intr_bind() internally.

Requested by:	kmacy, gallatin, jeff
Tested on:	{amd64, i386} x {regular, INTR_FILTER}
2008-03-14 19:41:48 +00:00
jhb
8293fe75e4 Fix a silly bogon which prevented all the CPUs that are tagged as interrupt
receivers from being given interrupts if any CPUs in the system were not
tagged as interrupt receivers that I introduced when switching the x86
interrupt code to track CPUs via FreeBSD CPU IDs rather than local APIC
IDs.  In practice this only affects systems with Hyperthreading (though
disabling HTT in the BIOS would workaround the issue) as that is the only
case currently where one can have CPUs that aren't tagged as interrupt
receivers.  On a Dell SC1425 test box with 2 x Xeon w/ HTT (so 4 logical
CPUs of which 2 were interrupt receivers) the result was that all
device interrupts were sent to CPU 0.

MFC after:	1 week
Pointy hat to:	jhb
2008-03-14 03:44:42 +00:00
jhb
64ab71ccbd Rework how the nexus(4) device works on x86 to better handle the idea of
different "platforms" on x86 machines.  The existing code already handles
having two platforms: ACPI and legacy.  However, the existing approach was
rather hardcoded and difficult to extend.  These changes take the approach
that each x86 hardware platform should provide its own nexus(4) driver (it
can inherit most of its behavior from the default legacy nexus(4) driver)
which is responsible for probing for the platform and performing
appropriate platform-specific setup during attach (such as adding a
platform-specific bus device).  This does mean changing the x86 platform
busses to no longer use an identify routine for probing, but to move that
logic into their matching nexus(4) driver instead.
- Make the default nexus(4) driver in nexus.c on i386 and amd64 handle the
  legacy platform.  It's probe routine now returns BUS_PROBE_GENERIC so it
  can be overriden.
- Expose a nexus_init_resources() routine which initializes the various
  resource managers so that subclassed nexus(4) drivers can invoke it from
  their attach routine.
- The legacy nexus(4) driver explicitly adds a legacy0 device in its
  attach routine.
- The ACPI driver no longer contains an new-bus identify method.  Instead
  it exposes a public function (acpi_identify()) which is a probe routine
  that the MD nexus(4) drivers can use to probe for ACPI.  All of the
  probe logic in acpi_probe() is now moved into acpi_identify() and
  acpi_probe() is just a stub.
- On i386 and amd64, an ACPI-specific nexus(4) driver checks for ACPI via
  acpi_identify() and claims the nexus0 device if the probe succeeds.  It
  then explicitly adds an acpi0 device in its attach routine.
- The legacy(4) driver no longer knows anything about the acpi0 device.
- On ia64 if acpi_identify() fails you basically end up with no devices.
  This matches the previous behavior where the old acpi_identify() would
  fail to add an acpi0 device again leaving you with no devices.

Discussed with:	imp
Silence on:	arch@
2008-03-13 20:39:04 +00:00
kib
be9c86776f Since version 4.3, gcc changed its behaviour concerning the i386/amd64
ABI and the direction flag, that is it now assumes that the direction
flag is cleared at the entry of a function and it doesn't clear once
more if needed. This new behaviour conforms to the i386/amd64 ABI.

Modify the signal handler frame setup code to clear the DF {e,r}flags
bit on the amd64/i386 for the signal handlers.

jhb@ noted that it might break old apps if they assumed DF == 1 would be
preserved in the signal handlers, but that such apps should be rare and
that older versions of gcc would not generate such apps.

Submitted by:	Aurelien Jarno <aurelien aurel32 net>
PR:	121422
Reviewed by:	jhb
MFC after:	2 weeks
2008-03-13 10:54:38 +00:00
jhb
4aef4283ee The variable MTRR registers actually have variable-sized PhysBase and
PhysMask fields based on the number of physical address bits supported
by the current CPU.  The old code assumed 36 bits on i386 and 40 bits on
amd64.  In truth, all Intel CPUs up until recently used 36 bits (a newer
Intel CPU uses 38 bits) and all the Opteron CPUs used 40 bits.

In at least one case (the new Intel CPU) having the size of the mask field
wrong resulted in writing questionable values into the MTRR registers on
the application processors (BSP as well if you modify the MTRRs via
memcontrol or running X, etc.).  The result of the questionable physmask
was that all of memory was apparently treated as uncached rather than
write-back resulting in a very significant performance hit.

Fix this by constructing a run-time mask for the PhysBase and PhysMask
fields based on the number of physical address bits supported by the CPU.
All 64-bit capable CPUs provide a count of PA bits supported via the
0x80000008 extended CPUID feature, so use that if it is available.  If that
feature is not available, then assume 36 PA bits.

While I'm here, expand the (now-unused) macros for the PhysBase and
PhysMask fields to the current largest possible value (52 PA bits).

MFC after:	1 week
PR:		i386/120516
Reported by:	Nokia
2008-03-12 22:09:19 +00:00
jhb
0794c13d8e Minimize diffs with i686_mem.c:
- A few whitespace changes I missed in the style(9) changes.
- Move M_MEMDESC to mem.c.
2008-03-12 21:43:50 +00:00
jeff
acb93d599c Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential.  Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.
2008-03-12 10:12:01 +00:00
jhb
384b4a0305 Style(9) these files. No changes in the compiled code. (Verified by
diff'ing objdump -d output).
2008-03-11 21:41:36 +00:00
jhb
6ab428b54b Add constants for the various fields in MTRR registers.
MFC after:	1 week
Verified by:	md5(1)
2008-03-11 20:10:37 +00:00
jhb
18300325cb Probe CPUs after the PCI hierarchy on i386, amd64, and ia64. This allows
the cpufreq drivers to reliably use properties of PCI devices for quirks,
etc.
- For the legacy drivers, add CPU devices via an identify routine in the
  CPU driver itself rather than in the legacy driver's attach routine.
- Add CPU devices after Host-PCI bridges in the acpi bus driver.
- Change the ichss(4) driver to use pci_find_bsf() to locate the ICH and
  check its device ID rather than having a bogus PCI attachment that only
  checked for the ID in probe and always failed.  As a side effect, you
  can now kldload ichss after boot.
- Fix the ichss(4) driver to use the correct device_t for the ICH (and not
  for ichss0) when doing PCI config space operations to enable SpeedStep.

MFC after:	2 weeks
Reviewed by:	njl, Andriy Gapon  avg of icyb.net.ua
2008-03-10 22:18:07 +00:00
jeff
d952a04ecf - Rather than repeating the same preemption code everywhere call the scheduler
specific sched_preempt() routine.
2008-03-10 01:32:48 +00:00
rink
699140d247 Import uslcom(4) from OpenBSD - this is a driver for Silicon Laboratories
CP2101/CP2102 based USB serial adapters.

Reviewed by:		imp, emaste
Obtained from:		OpenBSD
MFC after:		2 weeks
2008-03-05 14:13:30 +00:00
alc
d6dc62ac2d Add support for automatic promotion of 4KB page mappings to 2MB page
mappings.  Automatic promotion can be enabled by setting the tunable
"vm.pmap.pg_ps_enabled" to a non-zero value.  By default, automatic
promotion is disabled.  (Expect this to change.)

Reviewed by:	ups
Tested by:	kris, Peter Holm
2008-03-04 18:50:15 +00:00
jeff
ad2a31513f - Remove the old smp cpu topology specification with a new, more flexible
tree structure that encodes the level of cache sharing and other
   properties.
 - Provide several convenience functions for creating one and two level
   cpu trees as well as a default flat topology.  The system now always
   has some topology.
 - On i386 and amd64 create a seperate level in the hierarchy for HTT
   and multi-core cpus.  This will allow the scheduler to intelligently
   load balance non-uniform cores.  Presently we don't detect what level
   of the cache hierarchy is shared at each level in the topology.
 - Add a mechanism for testing common topologies that have more information
   than the MD code is able to provide via the kern.smp.topology tunable.
   This should be considered a debugging tool only and not a stable api.

Sponsored by:	Nokia
2008-03-02 07:58:42 +00:00
ru
82432917a3 Eliminate whitespace diffs to the i386 version. 2008-02-19 06:30:49 +00:00
scottl
7163c9c1fc Teach the dump and minidump code to respect the maxioszie attribute of
the disk; the hard-coded assumption of 64K doesn't work in all cases.
2008-02-15 06:26:25 +00:00
jkim
e5908a9796 MFC: sys/amd64/linux32/linux32_machdep.c 1.46
sys/i386/linux/linux_machdep.c		1.80

Fix Linux mmap with MAP_GROWSDOWN flag.
2008-02-14 18:41:00 +00:00
jhb
39865d42d5 Add missing #include for cpu_spinwait().
Pointy hat:	jhb
2008-02-13 15:46:10 +00:00
jhb
c6cf595ecb MFC: Properly handle ACPI table headers that cross a page boundary when
looking for the MADT during early boot.
2008-02-12 19:20:10 +00:00
jhb
9d35afcf34 MFC: Use cpu_spinwait() in DELAY(). 2008-02-12 19:14:01 +00:00
scottl
db8258708b If busdma is being used to realign dynamic buffers and the alignment is set to
PAGE_SIZE or less, the bounce page counting logic was flawed and wouldn't
reserve any pages.  Adjust to be correct.  Review of other architectures is
forthcoming.

Submitted by: Joseph Golio
2008-02-12 16:24:30 +00:00
jkim
3bffed0bec Fix Linux mmap with MAP_GROWSDOWN flag.
Reported by:	Andriy Gapon (avg at icyb dot net dot ua)
Tested by:	Andriy Gapon (avg at icyb dot net dot ua)
Pointyhat:	me
MFC after:	3 days
2008-02-11 19:35:03 +00:00
mav
b53943d8df MFC GET_STACK_USAGE() macro to be used for netgraph stack protection. 2008-02-06 21:02:56 +00:00
scottl
249efc9b24 Remove the rr232x driver. It has been superceded by the hptrr driver. 2008-02-03 07:07:30 +00:00
das
e0a189d4ba Add a few more CPUID feature bits while here. We don't support these
features yet.
2008-02-02 23:17:27 +00:00
das
eca0289cd9 SSE4 CPUID bits 2008-02-02 22:40:17 +00:00
jhb
9c76956524 For no good reason I had assumed that ACPI table headers would be page
aligned (or at least not cross a page boundary).  However, it turns out
that on at least one machine one table header does cross a page boundary.
This caused problems with the MADT early probe as it uses the crash dump
map to load ACPI tables by loading the RSDT/XSDT into pages 1 ... N and
loading the header of each ACPI table header into page 0 looking for the
MADT.  However, if a table header crossed a page boundary, then page 1
would get trashed resulting in a panic.  Fix this by reserving the first
2 pages for ACPI table headers (headers are less than a page in size,
so 2 pages will be sufficient) and use pages 2 .. N for the RSDT and XSDT.

Note: amd64 should probably be simplified to just use pmap_mapbios()
for all these tables which will use the direct map and not need the
crash dump hack.

MFC after:	5 days
Tested on:	i386
Reported by:	Pete French  petefrench of ticketswitch.com
2008-01-31 16:51:43 +00:00
mav
739abe292f Move GET_STACK_USAGE from MI header to i386/amd64 MD ones.
Somebody who can, please feel free to implement it for other archs
or copy this one if it suits.
2008-01-31 08:24:27 +00:00
ru
910410640b Add a wrapper function that bound checks writes to the dump device. 2008-01-28 19:04:07 +00:00
jhb
d1ed1ebf3b MFC: Split the intr_table_lock into an sx lock used for most things, and a
spin lock to protect intrcnt_index.  Originally I had this as a spin lock
so interrupt code could use it to lookup sources.  However, we don't
actually do that because it would add a lot of overhead to interrupts,
and if we ever do support removing interrupt sources, we can use other
means to safely do so w/o locking in the interrupt handling code.

This fixes a LOR in the most recent MSI MFC and was a part of the original
commit to HEAD that included the changes in the most recent MSI MFC.
2008-01-19 15:38:13 +00:00
jhb
a114208c34 Use cpu_spinwait() (i.e., "pause") when spinning on rdtsc during DELAY().
MFC after:	1 week
2008-01-17 18:59:38 +00:00
alc
7f5a9c7a36 Retire PMAP_DIAGNOSTIC. Any useful diagnostics that were conditionally
compiled under PMAP_DIAGNOSTIC are now KASSERT()s.  (Note: The kernel
option DIAGNOSTIC still disables inlining of certain pmap functions.)

Eliminate dead code from pmap_enter().  This code implemented an assertion.
On i386, an equivalent check is already implemented.  However, on amd64,
a small change is required to implement an equivalent check.

Eliminate \n from a nearby panic string.

Use KASSERT() to reimplement pmap_copy()'s two assertions.
2008-01-17 18:25:52 +00:00
bde
691f99e98f Translate from the i386. All FP constants and operations are evaluated
in the range and precision of their type(s) on amd64, but FLT_EVAL_METHOD
said that they were evalated in the "interesting" (buggy) i387 methods.
float_t was broken compatibly with FLT_EVAL_METHOD.

These definitions seem to be broken on powerpc and possibly on arm.
float_t is float on powerpc with gcc [-notraditional] according to
glibc, and FLT_EVAL_METHOD is marked with XXX on arm.
2008-01-17 13:12:46 +00:00
alc
021a700f2a Make pmap_is_prefaultable() more TLB friendly. Specifically, make it use
the kernel's direct map instead of the pmap's recursive mapping to access
the lowest level in the page table.  The direct map is preferable for two
reasons: (1) The TLB is more likely to hold the required direct mapping
because pmap_enter() has already used the direct map to access a nearby
PTE and (2) loading a direct mapping into the TLB involves walking only 2
or 3 levels of the page table instead of 4.
2008-01-14 21:25:06 +00:00
bde
369b6ad6b5 Fix fpset*() to not trap if there is a currently unmasked exception.
Unmasked exceptions (which can be fixed up using fpset*() before they
trap) are very rare, especially on amd64 since SSE exceptions trap
synchronously, but I want to merge the faster amd64 implementations of
fpset*() back to i386 without introducing the bug on i386.

The i386 implementation has always avoided the trap automatically by
changing things using load/store of the FP environment, but this is
very slow.  Most changes only affect the control word, so they can
usually be done much more efficiently, and amd64 has always done this,
but loading the control word can trap.

This version use the fast method only in the usual case where it will
not trap.  This only costs a couple of integer instructions (including
one branch which I haven't optimized carefully yet) in the usual case,
but bloats the inlines a lot.  The inlines were already a bit too large
to handle both the FPU and SSE.
2008-01-11 17:11:32 +00:00
bde
77a62a7576 Fix some style bugs:
- fix a previous style fix: shifts should be in the correct direction even
  if they are null.
- restore a comment about namespace pollution from floatingpoint.h 1.12 and
  update it.
- remove unused namespace pollution FP_*REG.
- improve some comments.
- sort macro definitions for entry points.
- don't use underscores for macro args.
2008-01-11 14:11:46 +00:00
bde
2d90a44ac6 Simplify the ifdefs:
- fix this to compile with C++ by casting ints to enums in a few places
  and by using the correct parameter type for _fpsetprec().  Remove
  __cplusplus ifdefs which disabled the buggy code.
- remove __CC_SUPPORTS___INLINE ifdefs.  `__inline' vs `inline', and either
  of these #defined away, are supposed to be handled by very old ifdefs
  in <sys/cdefs.h>.  Thus the __CC_SUPPORTS___INLINE macro is not needed
  here (or anywhere else that it used).  It is less needed here than in
  most places, since this file is userland-only and userland is far from
  supporting INTEL_COMPILER.  The __CC_SUPPORTS___INLINE__ macro which
  was used here is even less needed.  It is to support spelling `inline'
  as `__inline__' instead of the usual spelling `__inline'.

Fix some style bugs that I missed in the previous commit (remove unused
asms and sort more variables).
2008-01-09 15:03:03 +00:00
bde
35b2cbed50 Fix some style bugs (mainly, use explicit shifts when accessing bit-fields
even if the shift count happens to be 0, sort declarations, and spell
__inline normally).
2008-01-09 13:35:31 +00:00
bde
b52cfc10cb Improve some comments. 2008-01-09 10:42:47 +00:00