Commit Graph

10985 Commits

Author SHA1 Message Date
John Birrell
1f80cd9398 Build in kernel support for loading DTrace modules by default. This
adds the hooks that DTrace modules register with, and adds a few functions
which have the dtrace_ prefix to allow the DTrace FBT (function boundary
trace) provider to avoid tracing because they are called from the DTtrace
probe context.

Unlike other forms of tracing and debug, DTrace support in the kernel
incurs negligible run-time cost.

I think the only reason why anyone wouldn't want to have kernel support
enabled for DTrace would be due to the license (CDDL) under which DTrace
is released.
2006-11-04 04:58:10 +00:00
John Birrell
3d068827c2 Add a cnputs() function to write a string to the console with
a lock to prevent interspersed strings written from different CPUs
at the same time.

To avoid putting a buffer on the stack or having to malloc one,
space is incorporated in the per-cpu structure. The buffer
size if 128 bytes; chosen because it's the next power of 2 size
up from 80 characters.

String writes to the console are buffered up the end of the line
or until the buffer fills. Then the buffer is flushed to all
console devices.

Existing low level console output via cnputc() is unaffected by
this change. ithread calls to log() are also unaffected to avoid
blocking those threads.

A minor change to the behaviour in a panic situation is that
console output will still be buffered, but won't be written to
a tty as before. This should prevent interspersed panic output
as a number of CPUs panic before we end up single threaded
running ddb.

Reviewed by:	scottl, jhb
MFC after:	2 weeks
2006-11-01 04:54:51 +00:00
Takanori Watanabe
0967107190 Fix Typo.
Pointed out by: ru
2006-10-31 07:22:24 +00:00
Takanori Watanabe
1cc5605910 Add conf file entries for acpi_aiboost drivers. 2006-10-30 05:51:54 +00:00
Alexander Leidinger
96ed72ac81 regen after linux_io_* backout 2006-10-29 14:12:44 +00:00
Alexander Leidinger
3680a41902 Backout the linux aio stuff. Several problems where identified and the
dynamic nature (if no native aio code is available, the linux part
returns ENOSYS because of missing requisites) should be solved differently
than it is.

All this will be done in P4.

Not included in this commit is a backout of the changes to the native aio
code (removing static in some places). Those changes (and some more) will
also be needed when the reworked linux aio stuff will reenter the tree.

Requested by:	rwatson
Discussed with:	rwatson
2006-10-29 14:02:39 +00:00
Bruce Evans
6a70163fcc Removed some SMP ifdefs so that using the TSC as a cputime clock is
not completely decided at config time.  Just don't default to using
the TSC if there are multiple active CPUs.  Also, don't default to
using the TSC if it is broken.  SMP ifdefs are still used to disallow
using perfmon since perfmon is always broken if SMP is just configured.

This only helps much for SMP kernels running on 1 CPU.  The overheads
for using the i8254 cputime clock were a bit too high on 486/33's, and
now on multi-GHz CPUs they are usually in the 99-99.9% range.  Switching
from the old default of an i8254 clock to the TSC works poorly because
the overheads are not recalibrated.

Use the same condition for declaring perfmon stuff as for using it.
2006-10-29 09:48:44 +00:00
Alexander Leidinger
c1ea90bfd3 regen (prctl addition) 2006-10-28 11:24:38 +00:00
Bruce Evans
43f0ea0a27 i386/include/profile.h:
Fixed a syntax error for the (!__KERNEL && !__GNUCLIKE_ASM) case in
rev.1.36.  Apparently, this case has never been reached even by lint.

Submitted by:	stefanf

{amd64,i386}/include/profile.h:
In case the above case is actually reached, break it properly by
providing null support that will fail at link time instead of a stub
that gives wrong (null) profiling at runtime.
2006-10-28 11:03:03 +00:00
Alexander Leidinger
955d762aca MFP4:
Implement prctl().

Submitted by:	rdivacky
Tested with:	LTP
2006-10-28 10:59:59 +00:00
Bruce Evans
853b92dacf In MCOUNT_OVERHEAD(label), actually use the `label' parameter. We were
still using the global label named "profil", and this worked accidentally
because all callers use the same name.
2006-10-28 07:59:11 +00:00
Bruce Evans
3a110062fd Cleaned up includes. <machine/profile.h> was unused. <machine/timerreg.h>
was only used in the GUPROF case, so the messes to get its i386 prerequisites
included shouldn't have been needed.

Fixed some style bugs. Quote #error contents, and don't repeat an #error
directive on amd64.
2006-10-28 06:38:51 +00:00
Bruce Evans
94450a83e8 Removed all traces of HIDENAME() in amd64 and i386 kernel code. Using
this used to be slightly cleaner than using ifdefs in a few places to
support both a.out and elf, but using it now just causes messes and
unportabilities.  It seems to be impossible to implement the elf
HIDENAME() portably in cpp (since token pasting of "." and <name> is
invalid).

*/prof_machdep.c:
- Removed all uses of CNAME().  CNAME() is easy enough to use in pure
  asm code, but using it in inline asm requires messy quoting.  The
  core pure asm code has been hacked on more and all uses of CNAME() in
  it have already gone away.  Just assume the elf convention here too.
- Removed now-uneeded include of <machine/asmacros.h>.
- Removed the workaround for a namespace conflict with this include.
2006-10-28 06:04:29 +00:00
Bruce Evans
447647908c Don't call mexitcount or provide a stub mexitcount to call when
profiling is configured but high resolution profiling is not configured.
Only functions in *.[Ss] called the stub, so efficiency was not
significantly affected.
2006-10-27 14:17:50 +00:00
John Birrell
3750d1ecad Remove the KSE option now that it's in DEFAULTS on these arches/machines.
The 'nooption' kernel config entry has to be used to turn KSE off now.
This isn't my preferred way of dealing with this, but I'll defer to
scottl's experience with the io/mem kernel option change and the grief
experienced over that.

Submitted by:	scottl@
2006-10-26 22:11:35 +00:00
John Birrell
013d6d8cb4 Add 'options KSE' to the kernel config DEFAULTS on all arches/machines
except sun4v.

This change makes the transition from a default to an option more
transparent and is an attempt to head off all the compliants that are
likely from people who don't read UPDATING, based on experience with
the io/mem change.

Submitted by:	scottl@
2006-10-26 22:05:25 +00:00
John Birrell
8460a577a4 Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by:	davidxu@
2006-10-26 21:42:22 +00:00
Ruslan Ermilov
837f167eb2 Move "device splash" back to MI NOTES and "files", it's MI. 2006-10-23 13:23:14 +00:00
Robert Watson
aed5570872 Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA
2006-10-22 11:52:19 +00:00
Alan Cox
43200cd3ed Eliminate unnecessary PG_BUSY tests. 2006-10-22 04:18:01 +00:00
Alexander Leidinger
0f0549587b Fix a recent regression regarding valid signals.
Submitted by:	rdivacky
2006-10-20 10:09:40 +00:00
Dag-Erling Smørgrav
c43ac89acc Move more MD devices and options out of MI NOTES. 2006-10-20 09:52:27 +00:00
Bruce Evans
045f738b58 Don't show debug registers in "show registers". Special registers should
be displayed specially, and debug registers are among of the least
interesting special registers (far behind %cr3).  The debug registers
are still accessible as variables and displayed in another bogus place
("show watches").
2006-10-20 09:44:21 +00:00
Dag-Erling Smørgrav
c276283866 The VGA_DEBUG option only exists on {amd64,i386,ia64}.
Also remove 'device io' from amd64 NOTES; DEFAULTS takes care of it.
2006-10-20 08:56:26 +00:00
Ruslan Ermilov
034f5f8e72 Add missing acpi_wakecode.o: assym.s dependency, so that if assym.s
is newer than acpi_wakecode.h, the latter is rebuilt.

Reported by:	bde
2006-10-19 05:55:09 +00:00
Warner Losh
e54ad0a189 Remove references to pccard.conf 2006-10-19 05:17:55 +00:00
David Xu
5f641fc0fb o Add keyword volatile for user mutex owner field.
o Fix type consistent problem by using type long for old
  umtx and wait channel.
o Rename casuptr to casuword.
2006-10-17 02:24:47 +00:00
Alexander Leidinger
95f2da66d3 regen (linux AIO stuff) 2006-10-15 14:24:10 +00:00
Alexander Leidinger
6a1162d4cd MFP4 (with some minor changes):
Implement the linux_io_* syscalls (AIO). They are only enabled if the native
AIO code is available (either compiled in to the kernel or as a module) at
the time the functions are used. If the AIO stuff is not available there
will be a ENOSYS.

From the submitter:
---snip---
DESIGN NOTES:

1. Linux permits a process to own multiple AIO queues (distinguished by
   "context"), but FreeBSD creates only one single AIO queue per process.
   My code maintains a request queue (STAILQ of queue(3)) per "context",
   and throws all AIO requests of all contexts owned by a process into
   the single FreeBSD per-process AIO queue.

   When the process calls io_destroy(2), io_getevents(2), io_submit(2) and
   io_cancel(2), my code can pick out requests owned by the specified context
   from the single FreeBSD per-process AIO queue according to the per-context
   request queues maintained by my code.

2. The request queue maintained by my code stores contrast information between
   Linux IO control blocks (struct linux_iocb) and FreeBSD IO control blocks
   (struct aiocb). FreeBSD IO control block actually exists in userland memory
   space, required by FreeBSD native aio_XXXXXX(2).

3. It is quite troubling that the function io_getevents() of libaio-0.3.105
   needs to use Linux-specific "struct aio_ring", which is a partial mirror
   of context in user space. I would rather take the address of context in
   kernel as the context ID, but the io_getevents() of libaio forces me to
   take the address of the "ring" in user space as the context ID.

   To my surprise, one comment line in the file "io_getevents.c" of
   libaio-0.3.105 reads:

             Ben will hate me for this

REFERENCE:

1. Linux kernel source code:   http://www.kernel.org/pub/linux/kernel/v2.6/
   (include/linux/aio_abi.h, fs/aio.c)

2. Linux manual pages:         http://www.kernel.org/pub/linux/docs/manpages/
   (io_setup(2), io_destroy(2), io_getevents(2), io_submit(2), io_cancel(2))

3. Linux Scalability Effort:   http://lse.sourceforge.net/io/aio.html
   The design notes:           http://lse.sourceforge.net/io/aionotes.txt

4. The package libaio, both source and binary:
       http://rpmfind.net/linux/rpm2html/search.php?query=libaio
   Simple transparent interface to Linux AIO system calls.

5. Libaio-oracle:              http://oss.oracle.com/projects/libaio-oracle/
   POSIX AIO implementation based on Linux AIO system calls (depending on
   libaio).
---snip---

Submitted by:	Li, Xiao <intron@intron.ac>
2006-10-15 14:22:14 +00:00
Alexander Leidinger
0a62e03542 MFP4 (106538 + 106541):
Implement CLONE_VFORK. This fixes the clone05 LTP test.

Submitted by:	rdivacky
2006-10-15 13:39:40 +00:00
Alexander Leidinger
2482245b0c Revert my previous commit, I mismerged this to the wrong place.
Pointy hat to:	netchild
2006-10-15 13:30:45 +00:00
Alexander Leidinger
21aed094a9 MFP4 (106541): Fix the clone05 test in the LTP.
Submitted by:	rdivacky
2006-10-15 13:25:23 +00:00
Alexander Leidinger
4b3583a354 MFP4 (107144[1]): Implement CLONE_FS on i386[1] and amd64.
Submitted by:	rdivacky	[1]
2006-10-15 13:22:14 +00:00
Alexander Leidinger
687c23be1d MFP4 (107868 - 107870):
Use a macro to test for a valid signal instead of doing it my hand everywhere.

Submitted by:	rdivacky
2006-10-15 12:51:43 +00:00
John Baldwin
520ffff83e Change the x86 interrupt code to suspend/resume interrupt controllers
(PICs) rather than interrupt sources.  This allows interrupt controllers
with no interrupt pics (such as the 8259As when APIC is in use) to
participate in suspend/resume.
- Always register the 8259A PICs even if we don't use any of their pins.
- Explicitly reset the 8259As on resume on amd64 if 'device atpic' isn't
  included.
- Add a "dummy" PIC for the local APIC on the BSP to reset the local APIC
  on resume.  This gets suspend/resume working with APIC on UP systems.
  SMP still needs more work to bring the APs back to life.

The MFC after is tentative.

Tested by:	anholt (i386)
Submitted by:	Andrea Bittau <a.bittau at cs.ucl.ac.uk> (3)
MFC after:	1 week
2006-10-10 23:23:12 +00:00
John Baldwin
6e20fe33ba Oops, fix sign bug in #ifdef for value of INTRCNT_COUNT.
PR:		kern/99870
Submitted by:	jkim
MFC after:	3 days
2006-10-10 19:26:35 +00:00
Simon L. B. Nielsen
4517aab293 - Remove SCHED_ULE from GENERIC to better avoid foot-shooting by
unsuspecting users.
- Add a comment in NOTES about experimental status of SCHED_ULE.
- Make warning about experimental status in sched_ule(4) a bit
  stronger.

Suggested and reviewed by:	dougb
Discussed on:			developers
MFC after:			3 days
2006-10-05 20:31:58 +00:00
John Birrell
6825d60738 PR:
Submitted by:
Reviewed by:
Approved by:
Obtained from:
MFC after:
Security:
Move the relocation definitions to the common elf header so that DTrace
can use them on one architecture targeted to a different one.

Add the additional ELF types defines in Sun's "Linker and Libraries"
manual.
2006-10-04 21:37:10 +00:00
Poul-Henning Kamp
e4c9547050 Use calendaric calculation support from subr_clock.c instead of home-rolled.
Eventually, this RTC should probably use subr_rtc.c as well
2006-10-02 16:18:40 +00:00
Poul-Henning Kamp
b69f71eb29 Second part of a little cleanup in the calendar/timezone/RTC handling.
Split subr_clock.c in two parts (by repo-copy):
   subr_clock.c contains generic RTC and calendaric stuff. etc.
   subr_rtc.c contains the newbus'ified RTC interface.

Centralize the machdep.{adjkerntz,disable_rtc_set,wall_cmos_clock}
sysctls and associated variables into subr_clock.c.  They are
not machine dependent and we have generic code that relies on being
present so they are not even optional.
2006-10-02 15:42:02 +00:00
Poul-Henning Kamp
f645b0b51c First part of a little cleanup in the calendar/timezone/RTC handling.
Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.
2006-10-02 12:59:59 +00:00
Poul-Henning Kamp
c29ba5fe6e Remove the no longer relevant or correct bootinfo sysctls. 2006-09-30 10:08:09 +00:00
Maxim Sobolev
2c473eaf67 Extend comment explaining why code is conditional at !defined(SCHED_ULE).
Suggested by:	ru
2006-09-27 22:09:35 +00:00
Maxim Sobolev
6e93c19e3d Since ULE doesn't honor hlt_cpus_mask don't compile code that prevents
timer interrupt servicing for disabled HTT cores in ULE case. Should be
probably fixed in ULE code instead, but we have no real maintainer for
ULE to do it.

PR:		103697
2006-09-27 18:51:19 +00:00
Scott Long
31e2a87d4d The need to run a filter also implies that bouncing could be possible, so
just use the COULD_BOUNCE flag for both and retire the USE_FILTER flag.
This fixes the problem that rev 1.81 introduced with the if_bfe driver
(and possibly others).
2006-09-26 23:14:42 +00:00
Ruslan Ermilov
6c9fdda750 Added COMPAT_FREEBSD6 option. 2006-09-26 12:36:34 +00:00
Warner Losh
2dca50b6a1 Add a newline to the printf. 2006-09-24 19:24:26 +00:00
John Baldwin
d72a078647 Update the ipmi(4) driver:
- Split out the communication protocols into their own files and use
  a couple of function pointers in the softc that the commuication
  protocols setup in their own attach routine.
- Add support for the SSIF interface (talking to IPMI over SMBus).
- Add an ACPI attachment.
- Add a PCI attachment that attaches to devices with the IPMI interface
  subclass.
- Split the ISA attachment out into its own file: ipmi_isa.c.
- Change the code to probe the SMBIOS table for an IPMI entry to just use
  pmap_mapbios() to map the table in rather than trying to setup a fake
  resource on an isa device and then activating the resource to map in the
  table.
- Make bus attachments leaner by adding attach functions for each
  communication interface (ipmi_kcs_attach(), ipmi_smic_attach(), etc.)
  that setup per-interface data.
- Formalize the model used by the driver to handle requests by adding an
  explicit struct ipmi_request object that holds the state of a given
  request and reply for the entire lifetime of the request.  By bundling
  the request into an object, it is easier to add retry logic to the various
  communication backends (as well as eventually support BT mode which uses
  a slightly different message format than KCS, SMIC, and SSIF).
- Add a per-softc lock and remove D_NEEDGIANT as the driver is now MPSAFE.
- Add 32-bit compatibility ioctl shims so you can use a 32-bit ipmitool
  on FreeBSD/amd64.
- Add ipmi(4) to i386 and amd64 NOTES.

Submitted by:	ambrisko (large portions of 2 and 3)
Sponsored by:	IronPort Systems, Yahoo!
MFC after:	6 days
2006-09-22 22:11:29 +00:00
Robert Watson
827f0e85a6 Regenerate. 2006-09-21 16:20:38 +00:00
Robert Watson
e6f188152c Use AUE_CREAT instead of AUE_O_CREAT for linux_creat().
Obtained from:	TrustedBSD Project
2006-09-21 16:18:33 +00:00
Robert Watson
753a5e888c Regenerate. 2006-09-21 16:13:16 +00:00
Robert Watson
b5ca51459a Use AUE_GETDIRENTRIES instead of AUE_O_GETDENTS and AUE_NULL for a number
of directory reading system calls.

Respell a mis-spelled event name.

Clean up white space/line wraps in a couple of places.

Assign event numbers to some new system call entries that have turned
up in the list since audit support was added.

Obtained from:	TrustedBSD Project
2006-09-21 16:12:58 +00:00
Alexander Kabaev
d9cb97ff9d Use __builtin_va_start instead of __builtin_stdarg_start. GCC4 obsoletes
the former and  __builtin_va_start was present in all GCC version 3.1 and
later.
2006-09-21 01:37:02 +00:00
Alexander Leidinger
6dc4e81071 style(9)
While I'm here add a MFC reminder, I forgot it in the previous commit.

Noticed by:	ssouhlal
MFC after:	1 week
2006-09-20 19:27:11 +00:00
Alexander Leidinger
a312f6a30a Bring the i386 linux mmap code more into line with how linux (2.4.x)
behaves. This fixes a lot of test which failed before. For amd64 there
are still some problems, but without any testers which apply patches
and run some predefines tests we can't do more ATM.

Submitted by:	Marcin Cieslak <saper@SYSTEM.PL> (minor fixups by myself)
Tested with:	LTP
2006-09-20 17:24:20 +00:00
Wojciech A. Koszek
6a535c2e4a Fix 'interrupt interrupt' -> 'interrupt' in the comment.
Approved by:	cognet (mentor)
2006-09-20 12:23:33 +00:00
Scott Long
adab0fdc4f Remove duplicated code. Declare functions non-static that shouldn't be
inlined.
2006-09-13 09:35:59 +00:00
John-Mark Gurney
c14c65ed52 document that PAE kernels needs twice the value of non-PAE kernels
for KVA_PAGES, and that it it likely needed for >4GB memory boxes..

MFC after:	3 days
2006-09-13 01:23:08 +00:00
John Baldwin
884ff1813f Add a new ddb command 'show lapic' to dump details about the local APIC
registers for the current CPU.

MFC after:	3 days
2006-09-11 20:12:42 +00:00
John Baldwin
5c15c7e71d Actually hook up the IPI_INVLCACHE IDT vectors backing
pmap_invalidate_cache() in the SMP case so pmap_mapdev() in multiuser
doesn't panic with a trap 30.  I broke this many months ago when I
added pmap_invalidate_cache() as early parts of the PAT work.

Patience from:	jmg
Pointy hat:	jhb
2006-09-11 20:10:42 +00:00
John Baldwin
9914a8cc7d - Fix rman_manage_region() to be a lot more intelligent. It now checks
for overlaps, but more importantly, it collapses adjacent free regions.
  This is needed to cope with BIOSen that split up ports for system devices
  (like IPMI controllers) across multiple system resource entries.
- Now that rman_manage_region() is not so dumb, remove extra logic in the
  x86 nexus drivers to populate the IRQ rman that manually coalesced the
  regions.

MFC after:	1 week
2006-09-11 19:31:52 +00:00
Scott Long
88591e04af The run_filter() procedure is a means of working around DMA engine bugs in
old/broken hardware.  Unfortunately, it adds cache pressure and possible
mispredicted branches to the fast path of the bus_dmamap_load collection of
functions.  Since it's meant for slow path exception processing, de-inline
it and allow its conditions to be pre-computed at tag_create time and thus
short-circuited at runtime.

While here, cut down on the size of _bus_dmamap_load_buffer() by pushing the
bounce page logic into a non-inlined function.  Again, this helps with
cache pressure and mispredicted branches.

According to the TSC, this shaves off a few cycles on average.  Unfortunately,
the data varies quite a bit due to interrupts and preemption, so it's hard to
get a good measurement.  Real world measurements of network PPS are welcomed.
A merge to amd64 and other arches is pending more testing.
2006-09-11 06:48:53 +00:00
Alexander Leidinger
bb59e63f8f Change futex lock from mutex to sx. Make futex_get atomic (protected by the
futex lock).

Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
Suggested by:	jhb
2006-09-09 16:25:25 +00:00
Robert Watson
98bf5a707d Audit sysarch() operation argument.
MFC after:	3 days
2006-09-09 10:20:31 +00:00
John Baldwin
f6c48c1932 Use a single constant to define the sizes of the physmap[], phys_avail[],
and dump_avail[] arrays so they are in sync (previously it was possible
to store more entries in the physmap[] then we could store in phys_avail[],
which was pointless).  While I'm here, bump up the length of these tables
to hold 30 entries on amd64 and 16 on i386.  This allows machines with
fairly fragmented memory maps to boot ok (at least one machine would
not boot FreeBSD/i386 but would boot FreeBSD/amd64 because amd64 allowed
for more fragments).

MFC after:	3 days
2006-09-07 15:03:02 +00:00
Maxim Sobolev
e7d33dcbc5 Unbreak in the case when device apic is compiled into non-SMP kernel.
Reported by:	jhay
MFC after:	2 weeks
2006-09-06 22:05:34 +00:00
Ruslan Ermilov
4962065404 Refine previous revision to allow acpi_wakecode.h to be safely built
from both the acpi module build directory and a kernel build directory.
The latter didn't work when one attempted to build a kernel which had
"device acpi" with the "make kernel-toolchain buildkernel" command
because a cross-compiler couldn't find anything in the standard system
include path (it's empty in the kernel-toolchain case).

Fix this by passing a better root path to kernel headers (src/sys)
which works for both cases, kernel and module (-I@ only worked for
module).

Also, while here, pass -nostdinc (and a different spelling for icc) --
it's a feature that the kernel source tree is self-contained, and this
change enforces this.

Reported by:	glebius
2006-09-06 14:23:40 +00:00
Maxim Sobolev
23da540855 The FreeBSD by default "disables" hyper-threading cores, by not scheduling
any threads to them. However, it still counts those cores as "active but
permanently idle" when calculating system-wide CPUs statistics. It is
incorrect, since it skews statistics quite a bit and creates real problems
for certain types of applications (monitoring applications for example),
by making them believe that the system does have enough idle CPU resources,
while in fact it does not.

Correct the problem by not calling performance counting routines on "disabled"
cores. The cleaner solution would be to just disable APIC timer interrupts on
those cores completely, but ENOTIME here and it is not clear if the
additional complexity really worth minor performance gain.

Reviewed by:	ssouhlal
Sponsored by:	Sippy Software, Inc.
MFC after:	2 weeks
2006-09-05 17:15:24 +00:00
David Xu
66e1c26dba Implement casuword32, compare and set user integer, thank Marcel Moolenarr
who wrote the IA64 version of casuword32.
2006-08-28 02:28:15 +00:00
Alexander Leidinger
a6c5f81339 Fix video playing and network connections in realplayer (and most likely
other stuff) in the osrelease=2.6.16 case:
 - implement CLONE_PARENT semantic
 - fix TLS loading in clone CLONE_SETTLS
 - lock proc in the currently disabled part of CLONE_THREAD

I suggest to not unload the linux module after testing this, there are
some "<defunct>" processes hanging around after exiting (they aren't
with osrelease=2.4.2) and they may panic your kernel when unloading the
linux module. They are in state Z and some of them consume CPU according
to ps. But I don't trust the CPU part, the idle threads gets too much CPU
that this may be possible (accumulating idle, X and 2 defunct processes
results in 104.7%, this looks to much to be a rounding error).

Noticed by:	Intron <mag@intron.ac>
Submitted by:	rdivacky (in collaboration with Intron)
Tested by:	Intron, netchild
Reviewed by:	jhb (previous version)
2006-08-27 18:51:32 +00:00
Alexander Leidinger
084556f5d7 regen 2006-08-27 08:58:00 +00:00
Alexander Leidinger
835e506190 Add the linux statfs64 call. This allows Tivoli backup to proceed a little
but further on -current (still not successful, but a step into the right
direction).

Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
Tested by:	Paul Mather <paul@gromit.dlib.vt.edu>
2006-08-27 08:56:54 +00:00
Alexander Leidinger
40f734dd0d Emulate what vfork does instead of using it in linux_vfork. This way
we can do the stuff we need to do with linux processes at fork and
don't panic the kernel at exit of the child.

Submitted by:	rdivacky
Tested with:	tst-vfork* (glibc regression tests)
Tested by:	netchild
2006-08-25 11:59:56 +00:00
Alexander Leidinger
29ddc19bbf Get rid of some nested includes.
Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
Noticed by:	jhb
2006-08-19 15:13:01 +00:00
Alexander Leidinger
94cb2ecf79 Move some stuff into headers where they belong.
Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
Noticed by:	jhb, ssouhlal
2006-08-17 21:06:48 +00:00
Alexander Leidinger
0eef2f8a4e Style fixes to comments.
Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
Noticed by:	jhb, ssouhlal
2006-08-16 18:54:51 +00:00
Pav Lucistnik
a08875fbb2 - Fix typo in #error pragma: compitable -> compatible
Submitted by:	neologism
2006-08-15 20:10:49 +00:00
John Baldwin
f8f1f7fb85 Regen to propogate <prefix>_AUE_<mumble> changes as well as the earlier
systrace changes.
2006-08-15 17:37:01 +00:00
John Baldwin
df78f6d313 - Remove unused sysvec variables from various syscalls.conf.
- Send the systrace_args files for all the compat ABIs to /dev/null for
  now.  Right now makesyscalls.sh generates a file with a hardcoded
  function name, so it wouldn't work for any of the ABIs anyway.  Probably
  the function name should be configurable via a 'systracename' variable
  and the functions should be stored in a function pointer in the sysvec
  structure.
2006-08-15 17:25:55 +00:00
Warner Losh
72fa8a0268 No need for opt_global.h here 2006-08-15 15:48:58 +00:00
Alexander Leidinger
e9fe50af7e Remove the include of opt_global.h. It's included globally by a command
line switch. Other files which may make the same mistake (according to
fxr.watson.org) but aren't fixed in this commit (people with more clue
about those files should fix this):
 - i386/xbox/xbox.c
 - arm/arm/elf_trampoline.c
 - arm/arm/mem.c

Noticed by:	cognet
2006-08-15 15:27:13 +00:00
Alexander Leidinger
94930c140c Add include of opt_global.h, else the futex operations aren't locked on
SMP systems.

Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
2006-08-15 13:45:39 +00:00
Alexander Leidinger
77b959aa51 add autogenerated systrace_args stuff for dtrace 2006-08-15 12:56:36 +00:00
Alexander Leidinger
9b44bfc556 Add the linux 2.6.x stuff (not used by default!):
- TLS - complete
 - pid/tid mangling - complete
 - thread area - complete
 - futexes - complete with issues
 - clone() extension - complete with some possible minor issues
 - mq*/timer*/clock* stuff - complete but untested and the mq* stuff is
   disabled when not build as part of the kernel with native FreeBSD mq*
   support (module support for this will come later)

Tested with:
 - linux-firefox - works, tested
 - linux-opera - works, tested
 - linux-realplay - doesnt work, issue with futexes
 - linux-skype - doesnt work, issue with futexes
 - linux-rt2-demo - works, tested
 - linux-acroread - doesnt work, unknown reason (coredump) and sometimes
   issue with futexes
 - various unix utilities in linux-base-gentoo3 and linux-base-fc4:
   everything tried worked

On amd64 not everything is supported like on i386, the catchup is planned for
later when the remaining bugs in the new functions are fixed.

To test this new stuff, you have to run
	sysctl compat.linux.osrelease=2.6.16
to switch back use
	sysctl compat.linux.osrelease=2.4.2

Don't switch while running a linux program, strange things may or may not
happen.

Sponsored by:			Google SoC 2006
Submitted by:			rdivacky
Some suggestions/help by:	jhb, kib, manu@NetBSD.org, netchild
2006-08-15 12:54:30 +00:00
Alexander Leidinger
c107650561 regen 2006-08-15 12:51:45 +00:00
Alexander Leidinger
b4359bd8e5 Add new syscalls in the linuxolator (only used when the sysctl
compat.linux.osrelease is changed to "2.6.16" or similar).

On amd64 not everything is supported like on i386, the catchup is planned for
later when the remaining bugs in the new functions are fixed.

Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
2006-08-15 12:28:14 +00:00
Alexander Leidinger
fb5592c084 - Add some ASM stuff needed for futexes (linuxolator).
Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
With help from:	kib
2006-08-15 12:14:36 +00:00
Alan Cox
c3b182d235 Eliminate an unnecessary initialization from trap_pfault() that also
happens to contain a style error.
2006-08-14 19:53:53 +00:00
John Baldwin
d25fdf539e Don't try to preserve PAT bits in pmap_enter(). We currently on pages that
aren't mapped via pmap_enter() (KVA).  We will eventually support PAT bits
on user pages, but those will require some sort of MI caching mode stored
in the vm_page.

Reviewed by:	alc
2006-08-14 15:39:41 +00:00
John Baldwin
7e9f73f3ed First pass at allowing memory to be mapped using cache modes other than
WB (write-back) on x86 via control bits in PTEs and PDEs (including making
use of the PAT MSR).  Changes include:
- A new pmap_mapdev_attr() function for amd64 and i386 which takes an
  additional parameter (relative to pmap_mapdev()) specifying the cache
  mode for this mapping.  Note that on amd64 only WB mappings are done with
  the direct map, all other modes result in a private mapping.
- pmap_mapdev() on i386 and amd64 now defaults to using UC (uncached)
  mappings rather than WB.  Previously we relied on the BIOS setting up
  MTRR's to enforce memio regions being treated as UC.  This might make
  hw.cbb_start_memory unnecessary in some cases now for example.
- A new pmap_mapbios()/pmap_unmapbios() API has been added to allow places
  that used pmap_mapdev() to map non-device memory (such as ACPI tables)
  to do so using WB as before.
- A new pmap_change_attr() function for amd64 and i386 that changes the
  caching mode for a range of KVA.

Reviewed by:	alc
2006-08-11 19:22:57 +00:00
Alexander Leidinger
50e422f056 Add some more errno mappings (bsd -> linux) and a comment about the status..
Submitted by:	"Intron" <mag@intron.ac>
2006-08-10 22:05:25 +00:00
Warner Losh
ddebcb409b Eliminate one set of XBOX #ifdefs. The Xbox code just needs to set a
different TIMER_FREQ value than default.  Accomplish this via the
config file rather than via an #ifdef.
2006-08-09 23:47:38 +00:00
Warner Losh
f5190c1277 Minor style(9) nit. 2006-08-09 23:37:30 +00:00
Nate Lawson
ad3d78ead0 If a beep was enabled, turn it off 3 seconds after resume.
MFC after:	3 days
2006-08-08 01:30:54 +00:00
Alan Cox
14aaab5329 Eliminate the acquisition and release of the page queues lock around a call
to vm_page_sleep_if_busy().
2006-08-06 06:29:16 +00:00
Michael Reifenberger
0b2aa43a65 Dont overwrite cpu_model in the case of Via's C3-CPU.
Noticed by:  Mike Tancsa
MFC after:	2 days
2006-08-04 13:49:16 +00:00
Yaroslav Tykhiy
776fc0e90e Commit the results of the typo hunt by Darren Pilgrim.
This change affects documentation and comments only,
no real code involved.

PR:		misc/101245
Submitted by:	Darren Pilgrim <darren pilgrim bitfreak org>
Tested by:	md5(1)
MFC after:	1 week
2006-08-04 07:56:35 +00:00
Alan Cox
78985e424a Complete the transition from pmap_page_protect() to pmap_remove_write().
Originally, I had adopted sparc64's name, pmap_clear_write(), for the
function that is now pmap_remove_write().  However, this function is more
like pmap_remove_all() than like pmap_clear_modify() or
pmap_clear_reference(), hence, the name change.

The higher-level rationale behind this change is described in
src/sys/amd64/amd64/pmap.c revision 1.567.  The short version is that I'm
trying to clean up and fix our support for execute access.

Reviewed by: marcel@ (ia64)
2006-08-01 19:06:06 +00:00
David E. O'Brien
a4755e0e13 Correct spelling of 3DNow!. 2006-08-01 01:23:39 +00:00
Marcel Moolenaar
302981e72a Remove sio(4) and related options from MI files to amd64, i386
and pc98 MD files. Remove nodevice and nooption lines specific
to sio(4) from ia64, powerpc and sparc64 NOTES. There were no
such lines for arm yet.
sio(4) is usable on less than half the platforms, not counting
a future mips platform. Its presence in MI files is therefore
increasingly becoming a burden.
2006-07-29 18:38:54 +00:00
John Baldwin
cb76d9b05c Retire SYF_ARGMASK and remove both SYF_MPSAFE and SYF_ARGMASK. sy_narg is
now back to just being an argument count.
2006-07-28 20:22:58 +00:00
John Baldwin
91ce2694d1 Regen for MPSAFE flag removal. 2006-07-28 19:08:37 +00:00
John Baldwin
af5bf12239 Now that all system calls are MPSAFE, retire the SYF_MPSAFE flag used to
mark system calls as being MPSAFE:
- Stop conditionally acquiring Giant around system call invocations.
- Remove all of the 'M' prefixes from the master system call files.
- Remove support for the 'M' prefix from the script that generates the
  syscall-related files from the master system call files.
- Don't explicitly set SYF_MPSAFE when registering nfssvc.
2006-07-28 19:05:28 +00:00
John Baldwin
e0b4add8d8 Various fixes to comments in the syscall master files including removing
cruft from the audit import and adding mention of COMPAT4 to freebsd32.
2006-07-28 18:55:18 +00:00
John Baldwin
22ea1bc57a Unify the checking for lock misbehavior in the various syscall()
implementations and adjust some of the checks while I'm here:
- Add a new check to make sure we don't return from a syscall in a critical
  section.
- Add a new explicit check before userret() to make sure we don't return
  with any locks held.  The advantage here is that we can include the
  syscall number and name in syscall() whereas that info is not available
  in userret().
- Drop the mtx_assert()'s of sched_lock and Giant.  They are replaced by
  the more general checks just added.

MFC after:	2 weeks
2006-07-27 22:32:30 +00:00
John Baldwin
7abb84313a Argh, fix compile with XBOX enabled. Somehow I missed a LINT compile. :( 2006-07-27 22:19:02 +00:00
John Baldwin
57b16b0882 Don't allow MAXMEM or hw.physmem to extend the top of memory if our memory
map was obtained from the SMAP.  SMAP is trustworthy, and the memory
extending feature is a band-aid for older systems where FreeBSD's methods
of detecting memory were not always trustworthy.  This fixes the issue
where using hw.physmem could result in the ACPI tables getting trashed
breaking ACPI.

MFC after:	3 days
Tested on:	i386
2006-07-27 19:47:22 +00:00
Pyun YongHyeon
ae05dd24c0 Add stge(4) to the list of drivers supported by GENERIC kernel. 2006-07-25 01:06:32 +00:00
John Baldwin
3ce72960e8 Regen. 2006-07-21 20:41:33 +00:00
John Baldwin
b4c63329d5 - Pass the MPSAFE flag to namei() in linux_uselib() and handle conditional
Giant VFS locking in that function.
- Remove bogus code to handle the case where namei() returns success but a
  NULL vnode pointer.
- Note that this code duplicates exec_check_permissions() and annotate
  where it differs.
- Hold the vnode lock longer to protect the write to set VV_TEXT in
  v_vflag.
- Mark linux_uselib() MPSAFE.

Reviewed by:	rwatson
2006-07-21 20:22:13 +00:00
Alan Cox
3cad40e517 Add pmap_clear_write() to the interface between the virtual memory
system's machine-dependent and machine-independent layers.  Once
pmap_clear_write() is implemented on all of our supported
architectures, I intend to replace all calls to pmap_page_protect() by
calls to pmap_clear_write().  Why?  Both the use and implementation of
pmap_page_protect() in our virtual memory system has subtle errors,
specifically, the management of execute permission is broken on some
architectures.  The "prot" argument to pmap_page_protect() should
behave differently from the "prot" argument to other pmap functions.
Instead of meaning, "give the specified access rights to all of the
physical page's mappings," it means "don't take away the specified
access rights from all of the physical page's mappings, but do take
away the ones that aren't specified."  However, owing to our i386
legacy, i.e., no support for no-execute rights, all but one invocation
of pmap_page_protect() specifies VM_PROT_READ only, when the intent
is, in fact, to remove only write permission.  Consequently, a
faithful implementation of pmap_page_protect(), e.g., ia64, would
remove execute permission as well as write permission.  On the other
hand, some architectures that support execute permission have
basically ignored whether or not VM_PROT_EXECUTE is passed to
pmap_page_protect(), e.g., amd64 and sparc64.  This change represents
the first step in replacing pmap_page_protect() by the less subtle
pmap_clear_write() that is already implemented on amd64, i386, and
sparc64.

Discussed with: grehan@ and marcel@
2006-07-20 17:48:41 +00:00
Alan Cox
7c9cc27f26 MFamd64
pmap_clear_ptes() is already convoluted.  This will worsen with the
 implementation of superpages.  Eliminate it and add pmap_clear_write().

There are no functional changes.  Checked by: md5
2006-07-18 03:17:12 +00:00
Alan Cox
e4cec28398 Now that free_pv_entry() accesses the pmap, call free_pv_entry() in
pmap_remove_all() before rather than after the pmap is unlocked.  At
present, the page queues lock provides sufficient sychronization.  In the
future, the page queues lock may not always be held when free_pv_entry() is
called.
2006-07-17 03:10:17 +00:00
Alan Cox
d259662b81 MFamd64
Make three simplifications to pmap_ts_referenced():
   Eliminate an initialized but otherwise unused variable.
   Eliminate an unnecessary test.
   Exit the loop in a shorter way.
2006-07-16 21:05:58 +00:00
Alan Cox
ddd6244a16 Eliminate the remaining uses of "register".
Convert the remaining K&R-style function declarations to ANSI-style.

Eliminate excessive white space from pmap_ts_referenced().
2006-07-16 19:43:49 +00:00
Alan Cox
d3dd65ab60 Make pc_freemask an array of uint32_t, rather than uint64_t. (I believe
that the use of the latter is simply an oversight in porting the new pv
entry code from amd64.)
2006-07-15 07:24:30 +00:00
John Baldwin
2e5ea45f19 Regen. 2006-07-14 15:42:47 +00:00
John Baldwin
5560239759 Somewhat surprisingly, ibcs2_ioctl() is MPSAFE as it is without needing any
further fixes.
2006-07-14 15:42:21 +00:00
John Baldwin
4028fd5b49 Regen. 2006-07-14 15:31:01 +00:00
John Baldwin
86759770f9 Mark ibcs2_mount() (just returns EINVAL) and ibcs2_umount() (just calls
unmount(2)) MPSAFE.
2006-07-14 15:30:50 +00:00
John Baldwin
e4edf2558e Regen. 2006-07-14 15:11:46 +00:00
John Baldwin
7186587b82 ibcs2_sigprocmask() is already marked MPSAFE in syscalls.xenix, so mark
it MPSAFE in syscalls.isc.
2006-07-14 15:11:20 +00:00
Jung-uk Kim
0758eaa227 Sync specialreg.h changes between amd64 and i386 with few fixes. 2006-07-13 16:09:40 +00:00
John Baldwin
19e9205a23 Simplify the pager support in DDB. Allowing different db commands to
install custom pager functions didn't actually happen in practice (they
all just used the simple pager and passed in a local quit pointer).  So,
just hardcode the simple pager as the only pager and make it set a global
db_pager_quit flag that db commands can check when the user hits 'q' (or a
suitable variant) at the pager prompt.  Also, now that it's easy to do so,
enable paging by default for all ddb commands.  Any command that wishes to
honor the quit flag can do so by checking db_pager_quit.  Note that the
pager can also be effectively disabled by setting $lines to 0.

Other fixes:
- 'show idt' on i386 and pc98 now actually checks the quit flag and
  terminates early.
- 'show intr' now actually checks the quit flag and terminates early.
2006-07-12 21:22:44 +00:00
Michael Reifenberger
df2f5de4e5 Initialise (if necessary) the VIA C3/C7 features.
Store the capabilities for further use by random(4), padlock(4), ...

Obtained from:	mostly OpenBSD
MFC after:	1 week
2006-07-12 19:46:08 +00:00
Michael Reifenberger
9b6560e483 fix typo in identcpu.c and add one define to specialreg.h.
MFC after:	1 week
2006-07-12 16:52:56 +00:00
Michael Reifenberger
e5f87cebb3 First step to identify and initialize the newer VIA C7 CPU
as found in a VIA EPIA EN-15000 board.

Obtained from:	large parts from OpenBSD
2006-07-12 14:52:32 +00:00
Jung-uk Kim
444576c0c4 Add two new CPUID bits for AMD CPUs, i. e., SVM and extended APIC register. 2006-07-12 06:04:12 +00:00
John Baldwin
90aff9de2d Regen. 2006-07-11 20:55:23 +00:00
John Baldwin
be5747d5b5 - Add conditional VFS Giant locking to getdents_common() (linux ABIs),
ibcs2_getdents(), ibcs2_read(), ogetdirentries(), svr4_sys_getdents(),
  and svr4_sys_getdents64() similar to that in getdirentries().
- Mark ibcs2_getdents(), ibcs2_read(), linux_getdents(), linux_getdents64(),
  linux_readdir(), ogetdirentries(), svr4_sys_getdents(), and
  svr4_sys_getdents64() MPSAFE.
2006-07-11 20:52:08 +00:00
John Baldwin
a46a6706c5 Retire the stackgap macros from ibcs2 as they are no longer used. Push
the includes of <sys/exec.h> and <sys/sysent.h> down into the only files
that now need them.
2006-07-10 17:59:26 +00:00
John Baldwin
33f016341e Regen. 2006-07-10 15:55:38 +00:00
John Baldwin
036fd5f3bc Mark ibcs2_msgsys(), ibcs2_semsys(), and ibcs2_shmsys() MPSAFE. 2006-07-10 15:55:17 +00:00
Thomas Wintergerst
5d0c7501b6 Extend i4b to support CAPI manager based ISDN controllers (CAPI manager is part of
c4b, CAPI for BSD). This is a preparation to add CAPI for BSD to the source tree.

Approved by:	hm (mentor)
MFC after:	2 weeks
2006-07-09 21:16:06 +00:00
Matt Jacob
626b06c711 Make the firmware assist driver resident in
preparation for isp using it.

Reviewed by:	sam, max
2006-07-09 16:41:22 +00:00
Matt Jacob
1b530437b6 If PAE is built w/o modules, make sure that isp(4)
has its firmware resident as well.
2006-07-09 16:38:58 +00:00
John Baldwin
16a9155a67 Regen. 2006-07-08 20:14:34 +00:00
John Baldwin
d9f4623307 - Split ioctl() up into ioctl() and kern_ioctl(). The kern_ioctl() assumes
that the 'data' pointer is already setup to point to a valid KVM buffer
  or contains the copied-in data from userland as appropriate (ioctl(2)
  still does this).  kern_ioctl() takes care of looking up a file pointer,
  implementing FIONCLEX and FIOCLEX, and calling fi_ioctl().
- Use kern_ioctl() to implement xenix_rdchk() instead of using the stackgap
  and mark xenix_rdchk() MPSAFE.
2006-07-08 20:12:14 +00:00
John Baldwin
43e757a78d Use kern_connect() in spx_open() to avoid the need for the stackgap. I
also used kern_close() for simplicity though close(2) wasn't requiring
the use of the stackgap.
2006-07-08 20:05:04 +00:00
John Baldwin
c68b315699 - Split the IBCS2 ipc foosys() system calls up into subfunctions matching
the organization in svr4_ipc.c.
- Use kern_msgctl(), kern_semctl(), and kern_shmctl() instead of the
  stackgap.
2006-07-08 19:54:12 +00:00
John Baldwin
839cea4b0a Use ibsc2_key_t rather than key_t. 2006-07-08 19:52:49 +00:00
John Baldwin
ec982ae761 Regen. 2006-07-06 21:43:14 +00:00
John Baldwin
ad6d226d43 - Protect the list of linux ioctl handlers with an sx lock.
- Hold Giant while calling linux ioctl handlers for now as they aren't all
  known to be MPSAFE yet.
- Mark linux_ioctl() MPSAFE.
2006-07-06 21:42:36 +00:00
John Baldwin
42fd98d94b Regen. 2006-07-06 21:33:14 +00:00
John Baldwin
3cb83e714d Add kern_setgroups() and kern_getgroups() and use them to implement
ibcs2_[gs]etgroups() rather than using the stackgap.  This also makes
ibcs2_[gs]etgroups() MPSAFE.  Also, it cleans up one bit of weirdness in
the old setgroups() where it allocated an entire credential just so it had
a place to copy the group list into.  Now setgroups just allocates a
NGROUPS_MAX array on the stack that it copies into and then passes to
kern_setgroups().
2006-07-06 21:32:20 +00:00
John Baldwin
c79c04f176 Use the regular poll(2) function to implement poll(2) for the IBCS2 compat
ABI as FreeBSD's poll(2) is ABI compatible.  The ibcs2_poll() function
attempted to implement poll(2) using a wrapper around select(2).  Besides
being somewhat ugly, it also had at least one bug in that instead of
allocating complete fdset's on the stack via the stackgap it just allocated
pointers to fdsets.
2006-07-06 21:29:05 +00:00
David Xu
20197a8c49 Temporarily remove SCHED_CORE, it seems I have so many works can do now,
one example is POSIX priority mutex for libthr.
2006-07-05 02:32:55 +00:00
Alan Cox
da536e6348 Correct an error in the new pmap_collect(), thus only affecting HEAD.
Specifically, the pv entry was always being freed to the caller's pmap
instead of the pmap to which the pv entry belongs.
2006-07-02 18:22:47 +00:00
Rink Springer
44a8277bc6 Updated the XBOX kernel to use the new nfe(4) driver obtained from
OpenBSD. This driver seems to give a small performance increase, and
should lead to better maintainability in the future.

The nForce Ethernet-specific hack in sys/i386/xbox/xbox.c is still
required, judging from dev/nfe/if_nfe.c. The condition it hacks will
almost certainly only occur on XBOX-es anyway, so it is best left there.

Approved by:	imp (mentor)
2006-06-27 20:22:32 +00:00
John Baldwin
cec34dbf79 Regen. 2006-06-27 18:32:16 +00:00
John Baldwin
49d409a108 - Add a kern_semctl() helper function for __semctl(). It accepts a pointer
to a copied-in copy of the 'union semun' and a uioseg to indicate which
  memory space the 'buf' pointer of the union points to.  This is then used
  in linux_semctl() and svr4_sys_semctl() to eliminate use of the stackgap.
- Mark linux_ipc() and svr4_sys_semsys() MPSAFE.
2006-06-27 18:28:50 +00:00
John Baldwin
0cceebeeb2 Regen. 2006-06-27 14:47:08 +00:00
John Baldwin
597d608f86 - Expand the scope of Giant some in mount(2) to protect the vfsp structure
from going away.  mount(2) is now MPSAFE.
- Expand the scope of Giant some in unmount(2) to protect the mp structure
  (or rather, to handle concurrent unmount races) from going away.
  umount(2) is now MPSAFE, as well as linux_umount() and linux_oldumount().
- nmount(2) and linux_mount() were already MPSAFE.
2006-06-27 14:46:31 +00:00
Alan Cox
8e0e1e2239 Correct a very old and very obscure bug: vmspace_fork() calls
pmap_copy() if the mapping is VM_INHERIT_SHARE.  Suppose the mapping
is also wired.  vmspace_fork() clears the wiring attributes in the vm
map entry but pmap_copy() copies the PG_W attribute in the PTE.  I
don't think this is catastrophic.  It blocks pmap_remove_pages() from
destroying the mapping and corrupts the pmap's wiring count.

This revision fixes the problem by changing pmap_copy() to clear the
PG_W attribute.

Reviewed by: tegge@
2006-06-27 04:28:23 +00:00
David E. O'Brien
bfc788c283 Add a pure open source nForce Ethernet driver, under BSDL.
This driver was ported from OpenBSD by Shigeaki Tagashira
<shigeaki@se.hiroshima-u.ac.jp> and posted at
http://www.se.hiroshima-u.ac.jp/~shigeaki/software/freebsd-nfe.html
It was additionally cleaned up by me.
It is still a work-in-progress and thus is purposefully not in GENERIC.
And it conflicts with nve(4), so only one should be loaded.
2006-06-26 23:41:07 +00:00
Sergey Babkin
d81175c738 Backed out the change by request from rwatson.
PR:		kern/14584
2006-06-26 22:03:22 +00:00
John Baldwin
b820787fb3 Regen. 2006-06-26 18:37:36 +00:00
John Baldwin
cf837b8943 linux_brk() is MPSAFE. 2006-06-26 18:36:16 +00:00
Alan Cox
031bf5ea1e Eliminate a comment that became stale after revision 1.547. 2006-06-25 22:15:02 +00:00
Sergey Babkin
7a799f1ef0 The common UID/GID space implementation. It has been discussed on -arch
in 1999, and there are changes to the sysctl names compared to PR,
according to that discussion. The description is in sys/conf/NOTES.
Lines in the GENERIC files are added in commented-out form.
I'll attach the test script I've used to PR.

PR:		kern/14584
Submitted by:	babkin
2006-06-25 18:37:44 +00:00
Alan Cox
f05446648b Change get_pv_entry() such that the call to vm_page_alloc() specifies
VM_ALLOC_NORMAL instead of VM_ALLOC_SYSTEM when try is TRUE.  In other
words, when get_pv_entry() is permitted to fail, it no longer tries as
hard to allocate a page.

Change pmap_enter_quick_locked() to fail rather than wait if it is
unable to allocate a page table page.  This prevents a race between
pmap_enter_object() and the page daemon.  Specifically, an inactive
page that is a successor to the page that was given to
pmap_enter_quick_locked() might become a cache page while
pmap_enter_quick_locked() waits and later pmap_enter_object() maps
the cache page violating the invariant that cache pages are never
mapped.  Similarly, change
pmap_enter_quick_locked() to call pmap_try_insert_pv_entry() rather
than pmap_insert_entry().  Generally speaking,
pmap_enter_quick_locked() is used to create speculative mappings.  So,
it should not try hard to allocate memory if free memory is scarce.

Add an assertion that the object containing m_start is locked in
pmap_enter_object().  Remove a similar assertion from
pmap_enter_quick_locked() because that function no longer accesses the
containing object.

Remove a stale comment.

Reviewed by: ups@
2006-06-20 20:52:11 +00:00
Alexander Leidinger
aff681d258 regen after change to syscalls.master 2006-06-20 20:41:29 +00:00
Alexander Leidinger
502195ac72 Switch to using the DUMMY infrastructure instead of UNIMPL for the new
syscalls. This way there will be a log message printed to the console
(this time for real).

Note: UNIMPL should be used for syscalls we do not implement ever, e.g.
syscalls to load linux kernel modules.

Submitted by:	rdivacky
Sponsored by:	Goole SoC 2006
P4 IDs:		99600, 99602
2006-06-20 20:38:44 +00:00
Yaroslav Tykhiy
15a901e263 We no longer need to disable interrupts in MD trap machinery
when we're about to call kdb_trap() because the latter MI
function can disable interrupts by itself now.

Pointed out by:	bde
X-MFC remark:	depends on kern/subr_kdb.c#1.18
Sponsored by:	RiNet (Cronyx Plus LLC)
2006-06-20 12:44:21 +00:00
David Xu
d037e6d6d0 Style fix, use low-case. 2006-06-19 07:55:29 +00:00
David Xu
85b2d575de Clear bit 22 in MSR IA32_MISC_ENABLE, according to Intel document,
when the bit 22 is set to 1, CPUID with EAX=0 returns a maximum
value in EAX[7..0] of 3, when set to 0(default), CPUID with EAX=0
returns the number corresponding to the maximum standard function
supported. On my machine, BIOS sets the bit to 1 to make it to be
compatible with old OS, this causes dual-core Pentium-D (two
physical cores) to be identified as hyperthreading (two logical
cores) by function mp_topology().
2006-06-19 07:51:47 +00:00
Yaroslav Tykhiy
8276a67ad0 Fix style while I'm here. 2006-06-18 12:13:49 +00:00
Yaroslav Tykhiy
042bbfae5a The i386 "call" instruction works as follows: it pushes
the return address on the stack and only then "dereferences" %pc.
Therefore, in the case of a call to an invalid address, we arrive
to the trap handler with the invalid value in tf_eip.  This used
to prevent db_backtrace() from assigning the most recent and interesting
frame on the stack to the right spot in the right function, from
which the invalid call was attempted.

Try to detect and work around that by recovering the return address
from the stack.

The work-around requires the fault address be passed to db_backtrace().
Smuggle it as tf_err.

MFC after:	1 month
Sponsored by:	RiNet (Cronyx Plus LLC)
2006-06-18 12:07:00 +00:00
Matt Jacob
375e362989 Unbreak tinderbox- fix device_printf arg to accomodate different sizes
of vm_paddr_t in different contexts (e.g., PAE vs. non PAE).
2006-06-16 14:04:21 +00:00
Yaroslav Tykhiy
a436dbf123 Return -1 from db_numargs() if number of args couldn't be guessed.
Use this later to indicate in backtrace output that args shown are
uncertain.

Sponsored by: RiNet (Cronyx Plus LLC)
2006-06-16 11:49:37 +00:00
Yaroslav Tykhiy
70b906ae82 Guess the number of arguments to a function somewhat better.
Now GCC likes to stick a "mov %eax, %FOO" instruction before
"addl $BAR, %esp" if the function just called returns an int,
which is a very common case in the kernel.

Sponsored by: RiNet (Cronyx Plus LLC)
2006-06-16 11:14:54 +00:00
Alexander Leidinger
28a3ae7f88 Remove COMPAT_43 from GENERIC (and other kernel configs). For amd64 there's
an explicit comment that it's needed for the linuxolator. This is not the
case anymore. For all other architectures there was only a "KEEP THIS".
I'm (and other people too) running a COMPAT_43-less kernel since it's not
necessary anymore for the linuxolator. Roman is running such a kernel for a
for longer time. No problems so far. And I doubt other (newer than ia32
or alpha) architectures really depend on it.

This may result in a small performance increase for some workloads.

If the removal of COMPAT_43 results in a not working program, please
recompile it and all dependencies and try again before reporting a
problem.

The only place where COMPAT_43 is needed (as in: does not compile without
it) is in the (outdated/not usable since too old) svr4 code.

Note: this does not remove the COMPAT_43TTY option.

Nagging by:	rdivacky
2006-06-15 19:58:53 +00:00
Stephan Uphoff
2053c12705 Remove mpte optimization from pmap_enter_quick().
There is a race with the current locking scheme and removing
it should have no measurable performance impact.
This fixes page faults leading to panics in pmap_enter_quick_locked()
on amd64/i386.

Reviewed by: alc,jhb,peter,ps
2006-06-15 01:01:06 +00:00
Alexander Leidinger
4946fe7c4d regen after MFP4 (soc2006/rdivacky_linuxolator) of syscalls.master
P4-Changes:	similar to 98673 and 98675 but regenerated locally
Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
2006-06-13 18:48:30 +00:00
Alexander Leidinger
c8b579c182 MFP4 (soc2006/rdivacky_linuxolator)
Update of syscall.master:
	o	Adding of several new dummy syscalls (268-310)
	o	Synchronization of amd64 syscall.master with i386 one
	o	Auditing added to amd64 syscall.master
	o	Change auditing type for lstat syscall (bugfix). [1]

P4-Changes:	98672, 98674
Noticed by:	rwatson [1]
Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
2006-06-13 18:43:55 +00:00
David Xu
b41f1452d9 Add scheduler CORE, the work I have done half a year ago, recent,
I picked it up again. The scheduler is forked from ULE, but the
algorithm to detect an interactive process is almost completely
different with ULE, it comes from Linux paper "Understanding the
Linux 2.6.8.1 CPU Scheduler", although I still use same word
"score" as a priority boost in ULE scheduler.

Briefly, the scheduler has following characteristic:
1. Timesharing process's nice value is seriously respected,
   timeslice and interaction detecting algorithm are based
   on nice value.
2. per-cpu scheduling queue and load balancing.
3. O(1) scheduling.
4. Some cpu affinity code in wakeup path.
5. Support POSIX SCHED_FIFO and SCHED_RR.
Unlike scheduler 4BSD and ULE which using fuzzy RQ_PPQ, the scheduler
uses 256 priority queues. Unlike ULE which using pull and push, the
scheduelr uses pull method, the main reason is to let relative idle
cpu do the work, but current the whole scheduler is protected by the
big sched_lock, so the benefit is not visible, it really can be worse
than nothing because all other cpu are locked out when we are doing
balancing work, which the 4BSD scheduelr does not have this problem.
The scheduler does not support hyperthreading very well, in fact,
the scheduler does not make the difference between physical CPU and
logical CPU, this should be improved in feature. The scheduler has
priority inversion problem on MP machine, it is not good for
realtime scheduling, it can cause realtime process starving.
As a result, it seems the MySQL super-smack runs better on my
Pentium-D machine when using libthr, despite on UP or SMP kernel.
2006-06-13 13:12:56 +00:00
Marius Strobl
acb8c14985 Make the ISAPNP code optional and only enable it on i386 and pc98 (used
for CBUS-PNP cards there) by default, as there are no amd64 and sparc64
machines with ISA slots and which therefore could make use of this code
known to exist. For sparc64 this additionally allows to get rid of the
compat shims for in{b,w,l}()/out{b,w,l}() etc and the associated hacks.

OK'ed by:	imp, peter
2006-06-12 21:07:13 +00:00
John Baldwin
e3d7caf487 Enable a few more things in x86 NOTES to get broader LINT coverage:
- Turn on iwi(4), ipw(4), and ndis(4) on amd64 and i386.
- Turn on ral(4) and ural(4) on i386, pc98, and amd64.
2006-06-12 20:38:17 +00:00
Alan Cox
b74a62d602 Don't invalidate the TLB in pmap_qenter() unless the old mapping was valid.
Most often, it isn't.

Reviewed by: tegge@
2006-06-12 20:05:27 +00:00
Warner Losh
78878cef94 Add the ability to subset the devices that UART pulls in. This allows
the arm to compile without all the extras that don't appear, at least
not in the flavors of ARM I deal with.  This helps us save about 100k.

If I've botched the available devices on a platform, please let me
know and I'll correct ASAP.
2006-06-12 04:21:50 +00:00
Nate Lawson
dd311cb41a * Ask for a page-aligned page instead of an arbitrary address. This should
not be necessary but might be helpful and at least reduce fragmentation.
* Add an assert to detect if the wakecode ever grows too big.  We include
  1 KB for stack, which should be more than enough also.
* Remove unnecessary initialization of static variables.
* Add comments and a bootverbose print giving the page phys address.
2006-06-10 08:20:17 +00:00
Nate Lawson
716d09af5e Minor tweaks to the resume code. Previous commit reverted alignment back
to 4.  There is no need to be more strict at assembly time since we copy
the code anyway to a private page.

* Clear the direction flag and eflags.  Probably not necessary but it won't
  hurt to be safe.
* Add prefixes to all instructions to prevent any assembler mistakes.
* Remove zeroing of eax - edi.  We use those registers immediately after
  to transfer values to protected mode so this was pointless.
* Update comments to reflect info found during code review.
2006-06-10 08:20:03 +00:00
Nate Lawson
b46f4324ff Move the reset beep tunable/sysctl to debug.acpi.resume_beep. This makes
more sense than under hw.acpi.  Also, document this in the man page.
2006-06-10 08:06:16 +00:00
Nate Lawson
64297e67ab Minor tweaks to the resume code that might help people debug.
* Add hw.acpi.resume_beep tunable and sysctl, default to 0.  Beeps the PC
speaker soon after waking to diagnose whether the wakeup code is even
getting run before other drivers possibly hang the system.  To stop the beep,
cause another beep (i.e. keyboard bell).  Submitted by takawata@, I changed
the frequency to be lower.

* Use 4096 instead of 4 byte alignment.  Might be useful although doesn't
seem to be necessary.

* Remove a useless assignment to acpi_reset_video.  It was overwritten by
the default sysctl value anyway.
2006-06-08 17:54:10 +00:00
Alan Cox
ce142d9ec0 Introduce the function pmap_enter_object(). It maps a sequence of resident
pages from the same object.  Use it in vm_map_pmap_enter() to reduce the
locking overhead of premapping objects.

Reviewed by: tegge@
2006-06-05 20:35:27 +00:00
Ed Maste
f4eaa4b967 Fix cut-n-pasteo: use the i386 version #define for i386 dumps, not the amd64 one. 2006-06-05 18:21:29 +00:00
Alan Cox
62b5e735a6 MFamd64
Eliminate unnecessary, recursive acquisitions and releases of the page
 queues lock by free_pv_entry() and pmap_remove_pages().

 Reduce the scope of the page queues lock in pmap_remove_pages().
2006-06-05 06:08:21 +00:00
Mike Silbersack
f25d341cfb After much discussion with mjacob and scottl, change bus_dmamem_alloc so
that it just warns the user with a printf when it misaligns a piece
of memory that was requested through a busdma tag.

Some drivers (such as mpt, and probably others) were asking for alignments
that could not be satisfied, but as far as driver operation was concerned,
that did not matter.  In the theory that other drivers will fall into
this same category, we agreed that panicing or making the allocation
fail will cause more hardship than is necessary.  The printf should
be sufficient motivation to get the driver glitch fixed.
2006-06-01 04:49:29 +00:00
Matt Jacob
aa57a87a56 Turn the panic on not being able to meet alignment constraints
in bus_dmamem_alloc into the more reasonable EINVAL return.

Also, reclaim memory allocated but then not used if we had
an error return.
2006-05-31 00:37:56 +00:00
David Xu
f1c313bff2 Clear invalid bits only if CPU supports SSE, otherwise, some fields in
struct save87 will be cleared unexpectly.
2006-05-31 00:17:29 +00:00
David Xu
afedf1a7f1 Use the method described in IA-32 Intel Architecture Software Developer's
Manual chapter 11.6.6 to get valid mxcsr bits, use the mxcsr mask to clear
invalid bits passed by user code.

Reviewed by: bde
2006-05-30 23:44:21 +00:00
David Xu
5d84379dd6 Backout changes trying to inherit floating-point environment, although
POSIX (susv3) requires this, but it is unclear what should be inherited,
duplicating whole 387 stack for new thread seems to be unnecessary and
dangerous. Revert to previous code, force a new thread to be started with
clean FP state.
2006-05-29 02:58:37 +00:00
Mike Silbersack
0d65566db8 Add a quick hack to ensure that bus_dmamem_alloc properly aligns
small allocations with large alignment requirements.

Add a panic to detect cases where we've still failed to properly align.
2006-05-28 18:30:36 +00:00
David Xu
4f56cbcbd5 Clear high 16 bits of mxcsr register, according to Intel document, if
the high 16 bits is non-zero, fxrstor instruction will generate GP fault,
resulting kernel crash, this bug can be triggered by setcontext and
ptrace(PT_SETXMMREGS).
2006-05-28 06:51:57 +00:00
David Xu
1db0da9e2b PCB_NPXINITDONE is cleared by npx_fork_thread. 2006-05-28 04:47:56 +00:00
David Xu
40310f021d If parent thread never used FPU, the only work is to clear flag
PCB_NPXINITDONE for new thread and let trap code initialize it.
2006-05-28 04:40:45 +00:00
David Xu
38fd748725 When creating a new thread, inherit floating-point environment from
current thread, this is required by POSIX pthread_create document.
2006-05-28 02:03:13 +00:00
Warner Losh
d708737568 APM was calling the suspend process from a timeout. This meant that
other timeouts could not happen while suspending, including timeouts
for things like msleep.  This caused the system to hang on suspend
when the cbb was enabled, since its suspend path powered down the
socket which used a timeout to wait for it to be done.

APM now creates a thread when it is enabled, and deletes the thread
when it is disabled.  This thread takes the place of the timeout by
doing its polling every ~.9s.  When the thread is disabled, it will
wakeup early, otherwise it times out and polls the varius things the
old timeout polled (APM events, suspend delays, etc).

This makes my Sony VAIO 505TS suspend/resume correctly when APM is
enabled (ACPI is black listed on my 505TS).

This will likely fix other problems with the suspend path where
drivers would sleep with msleep and/or do other timeouts.  Maybe
there's some special case code that would use DELAY while suspending
and msleep otherwise that can be revisited and removed.

This was also tested by glebius@, who pointed out that in the patch I
sent him, I'd forgotten apm_saver.c

MFC After: 3 weeks
2006-05-25 23:06:38 +00:00
Maxim Sobolev
aa1807d5d6 Move clock_lock prototype into <machine/clock.h>, where it is more
appropriate.

Discussed with:	jhb
2006-05-19 18:53:50 +00:00
Marius Strobl
136eda1dc3 - Add C-bus and ISA front-ends for le(4) so it can actually replace
lnc(4) on PC98 and i386. The ISA front-end supports the same non-PNP
  network cards as lnc(4) did and additionally a couple of PNP ones.
  Like lnc(4), the C-bus front-end of le(4) only supports C-NET(98)S
  and is untested due to lack of such hardware, but given that's it's
  based on the respective lnc(4) and not too different from the ISA
  front-end it should be highly likely to work.
- Remove the descriptions of le(4), which where converted from lnc(4),
  from sys/i386/conf/NOTES and sys/pc98/conf/NOTES as there's a common
  one in sys/conf/NOTES.
2006-05-17 21:25:23 +00:00