Commit Graph

5146 Commits

Author SHA1 Message Date
Alexander Leidinger
989500bf1a Import it(4) and lm(4), supporting most popular Super I/O Hardware Monitors.
Submitted by:	Constantine A. Murenin <cnst@FreeBSD.org>
Sponsored by:	Google Summer of Code 2007 (GSoC2007/cnst-sensors)
Mentored by:	syrinx
Tested by:	many
OKed by:	kensmith
Obtained from:	OpenBSD (parts)
2007-10-14 10:55:50 +00:00
Marius Strobl
55aaf894e8 Make the PCI code aware of PCI domains (aka PCI segments) so we can
support machines having multiple independently numbered PCI domains
and don't support reenumeration without ambiguity amongst the
devices as seen by the OS and represented by PCI location strings.
This includes introducing a function pci_find_dbsf(9) which works
like pci_find_bsf(9) but additionally takes a domain number argument
and limiting pci_find_bsf(9) to only search devices in domain 0 (the
only domain in single-domain systems). Bge(4) and ofw_pcibus(4) are
changed to use pci_find_dbsf(9) instead of pci_find_bsf(9) in order
to no longer report false positives when searching for siblings and
dupe devices in the same domain respectively.
Along with this change the sole host-PCI bridge driver converted to
actually make use of PCI domain support is uninorth(4), the others
continue to use domain 0 only for now and need to be converted as
appropriate later on.
Note that this means that the format of the location strings as used
by pciconf(8) has been changed and that consumers of <sys/pciio.h>
potentially need to be recompiled.

Suggested by:	jhb
Reviewed by:	grehan, jhb, marcel
Approved by:	re (kensmith), jhb (PCI maintainer hat)
2007-09-30 11:05:18 +00:00
Christian Brueffer
4fabde5686 Use the correct expanded name for SCTP.
PR:		116496
Submitted by:	koitsu
Reviewed by:	rrs
Approved by:	re (kensmith)
2007-09-26 20:05:07 +00:00
Alan Cox
7bfda801a8 Change the management of cached pages (PQ_CACHE) in two fundamental
ways:

(1) Cached pages are no longer kept in the object's resident page
splay tree and memq.  Instead, they are kept in a separate per-object
splay tree of cached pages.  However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock.  Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.

This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE).  The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held.  Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.

Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case.  Cached pages
are reclaimed far, far more often than they are reactivated.  Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.

(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.

Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated.  Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page.  Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.

Discussed with: many over the course of the summer, including jeff@,
   Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
Attilio Rao
c8790f5d09 Fix some entries in the locks static table of witness.
In particular:
- smp_tlb_mtx is no longer used, so it is axed.
- smp rendezvous lock isn't really a leaf spin-mutex. Its bad placement in
  the table, however, has been the source of a false positive LOR reporting
  with the dt_lock.  However, smp rendezvous lock would have had sched_lock
  there for older lock, so it wasn't still a leaf lock.
- allpmaps is only used in ia32 architecture, so it is inserted in the
  appropriate stub.

Addictionally:
- kse_zombie_lock is no longer present, so its definition is axed out.
- zombie_lock doesn't need to have an exported symbol, so just let's it be
  declared as static.

Tested by: kris
Approved by: jeff (mentor)
Approved by: re
2007-09-20 20:38:43 +00:00
Konstantin Belousov
96a2b63525 Fill in cr2 in the signal context from ksi->ksi_addr.
Together with the sys/i386/i386/trap.c rev. 1.306 it fixes the PR.

Submitted by:	rdivacky
Suggested by:	jhb
Sponsored by:	Google Summer of Code 2007
PR:		kern/77710
Approved by:	re (kensmith)
2007-09-20 13:46:26 +00:00
David Malone
3ab8526963 The kernel version of Linux statfs64 is actually supposed to take
3 arguments, but we had forgotten the second argument. Also make the
Linux statfs64 struct depend on the architecture because it has an
extra 4 bytes padding on amd64 compared to i386.

The three argument fix is from David Taylor, the struct statfs64
stuff is my fault. With this patch I can install i386 Linux matlab
on an amd64 machine.

Submitted by: David Taylor <davidt_at_yadt.co.uk>
Approved by: re (kensmith)
2007-09-18 19:50:33 +00:00
Peter Wemm
8bff6a112b Fix an undefined symbol that as/ld neglected to flag as a problem. It
was used in assembler code in such a way that no unresolved relocation
records were generated, so ld didn't flag the problem.   You can see
this with an 'nm' of the kernel.  There will be 'U MAXCPU' on SMP systems.

The impact of this is that the intrcount/intrnames arrays do not have
the intended amount of space reserved.  This could lead to interesting
problems due to the arrays being present in the middle of kernel code.
An overflow would be rather interesting as executable code would be used
as per-cpu incrementing interrupt counters.

This fixes it for now by exporting MAXCPU to the assembler.  A better fix
might be to define these data structures in C - they're only referenced
in the kernel from C code these days anyway.

Approved by:  re (kensmith)
2007-09-17 21:55:28 +00:00
Jeff Roberson
b61ce5b0e6 - Move all of the PS_ flags into either p_flag or td_flags.
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
   previously the sched_lock.  These bugs have existed for some time.
 - Allow swapout to try each thread in a process individually and then
   swapin the whole process if any of these fail.  This allows us to move
   most scheduler related swap flags into td_flags.
 - Keep ki_sflag for backwards compat but change all in source tools to
   use the new and more correct location of P_INMEM.

Reported by:	pho
Reviewed by:	attilio, kib
Approved by:	re (kensmith)
2007-09-17 05:31:39 +00:00
Alan Cox
6bce07ae73 It has been observed on the mailing lists that the different categories
of pages don't sum to anywhere near the total number of pages on amd64.
This is for the most part because uma_small_alloc() pages have never been
counted as wired pages, like their kmem_malloc() brethren.  They should
be.  This changes fixes that.

It is no longer necessary for the page queues lock to be held to free
pages allocated by uma_small_alloc().  I removed the acquisition and
release of the page queues lock from uma_small_free() on amd64 and ia64
weeks ago.  This patch updates the other architectures that have
uma_small_alloc() and uma_small_free().

Approved by: re (kensmith)
2007-09-15 18:47:02 +00:00
Attilio Rao
4486adc51f Currently the LO_NOPROFILE flag (which is masked on upper level code by
per-primitive macros like MTX_NOPROFILE, SX_NOPROFILE or RW_NOPROFILE) is
not really honoured. In particular lock_profile_obtain_lock_failure() and
lock_profile_obtain_lock_success() are naked respect this flag.
The bug leads to locks marked with no-profiling to be profiled as well.
In the case of the clock_lock, used by the timer i8254 this leads to
unpredictable behaviour both on amd64 and ia32 (double faults panic,
sudden reboots, etc.). The amd64 clock_lock is also not marked as
not profilable as it should be.
Fix these bugs adding proper checks in the lock profiling code and at
clock_lock initialization time.

i8254 bug pointed out by: kris
Tested by: matteo, Giuseppe Cocomazzi <sbudella at libero dot it>
Approved by: jeff (mentor)
Approved by: re
2007-09-14 01:12:39 +00:00
Attilio Rao
0b2e598c14 This is a follow-up, cleaning-up commit about recent changes involving
topology foo functions.
Working at the patch for topology problems in ia32/amd64 evicted some
problems regarding functions ordering in the SI_SUB_CPU family of
SYSINIT'ed subsystems.
In order to avoid problems with new modified to involved functions, a
correct ordering is not semantically specified for SI_SUB_CPU functions
(for a larger view of the issue please visit:
http://lists.freebsd.org/pipermail/freebsd-current/2007-July/075409.html )

Discussed with: peter
Tested by: kris, Rui Paulo <rpaulo@FreeBSD.org>
Approved by: jeff
Approved by: re
2007-09-11 22:54:09 +00:00
Konstantin Belousov
0e6ed4feab Regenerate.
Approved by:	re (kensmith)
2007-08-28 12:36:23 +00:00
Konstantin Belousov
b6e645c90f Implement fake linux sched_getaffinity() syscall to enable java to work
with Linux 2.6 emulation. This shall be reimplemented once FreeBSD gets
native scheduler affinity syscalls.

Submitted by:	rdivacky
Reviewed by:	jkim
Sponsored by:	Google Summer of Code 2007
Approved by:	re (kensmith)
2007-08-28 12:26:35 +00:00
Joseph Koshy
ea49750231 Assign sizes to assembly language support functions.
Approved by:	re (kensmith)
2007-08-22 05:06:14 +00:00
Joseph Koshy
298889efcb Define an END() macro for use in i386 and amd64 assembly code, akin
to the one available on the ia64, sparc64, and sun4v architectures.

Approved by:	re (kensmith)
2007-08-22 04:26:07 +00:00
Alan Cox
8beae25391 In general, when we map a page into the kernel's address space, we no
longer create a pv entry for that mapping.  (The two exceptions are
mappings into the kernel's exec and pipe submaps.)  Consequently, there is
no reason for get_pv_entry() to dig deep into the free page queues, i.e.,
use VM_ALLOC_SYSTEM, by default.  This revision changes get_pv_entry() to
use VM_ALLOC_NORMAL by default, i.e., before calling pmap_collect() to
reclaim pv entries.

Approved by:	re (kensmith)
2007-08-21 04:59:34 +00:00
Dag-Erling Smørgrav
83d18f2283 Add a driver for the on-die digital thermal sensor found on Intel Core
and newer CPUs (including Core 2 and Core / Core 2 based Xeons).  The
driver attaches to each cpu device and creates a sysctl node in that
device's sysctl context (dev.cpu.N.temperature).  When invoked, the
handler binds to the appropriate CPU to ensure a correct reading.

Submitted by:	Rui Paulo <rpaulo@fnop.net>
Sponsored by:	Google Summer of Code 2007
Tested by:	des, marcus, Constantine A. Murenin, Ian FREISLICH
Approved by:	re (kensmith)
MFC after:	3 weeks
2007-08-15 19:26:03 +00:00
Peter Wemm
b7778ae08f Move mp_topology() from apic_init(i386) and apic_setup_local(amd64) to
cpu_start_mp().  This is after we have read the cpuid registers to
calculate the hyperthreading_cpus value for the sysctl that enables or
disables hyperthread cores.  Change mp_topology() to use that information
rather than trying to do it itself.

This solves the problem of ULE being incorrectly told that dual core
Athlon64 X2 or Operton cpus are hyperthreading cores.  At the very least,
we now have a single piece of code to identify hyperthreading.

Obtained from:  jhb
Approved by:  re (kensmith)
2007-08-02 21:17:58 +00:00
John Baldwin
de016534a8 If the trap number stored in the trapframe is corrupted into a negative
value, then we would use a negative index into the trap_msg[] array
resulting in a nested page fault.  Make the 'type' variable holding the
trap number unsigned to avoid this.

MFC after:	2 weeks
Approved by:	re (rwatson)
2007-07-26 15:32:55 +00:00
David Malone
6d8617d42a If clock_ct_to_ts fails to convert time time from the real time clock,
print a one line error message. Add some comments on not being able to
trust the day of week field (I'll act on these comments in a follow up
commit).

Approved by:	re
MFC after:	3 weeks
2007-07-23 09:42:32 +00:00
Jeff Roberson
40380a6a6b - Optimize the amd64 cpu_switch() TD_LOCK blocking and releasing to
require fewer blocking loops.
 - Don't use atomic ops with 4BSD or on UP.
 - Only use the blocking loop if ULE is compiled in.
 - Use the correct memory barrier.

Discussed with:	attilio, jhb, ssouhlal
Tested by:	current@
Approved by:	re
2007-07-17 22:36:56 +00:00
John Baldwin
59d8f3ff08 Fix a couple of issues with the stack limit for 32-bit processes on 64-bit
kernels exposed by the recent fixes to resource limits for 32-bit processes
on 64-bit kernels:
- Let ABIs expose their maximum stack size via a new pointer in sysentvec
  and use that in preference to maxssiz during exec() rather than always
  using maxssiz for all processses.
- Apply the ABI's limit fixup to the previous stack size when adjusting
  RLIMIT_STACK to determine if the existing mapping for the stack needs to
  be grown or shrunk (as well as how much it should be grown or shrunk).

Approved by:	re (kensmith)
2007-07-12 18:01:31 +00:00
Peter Wemm
79d5bdcca5 Don't add the 'pad' argument to the mmap/truncate/etc syscalls.
Submitted by: kensmith
Approved by: re (kensmith)
2007-07-04 23:06:43 +00:00
Bjoern A. Zeeb
118043c6b1 Temporary disconnect i4bing, i4bisppp and i4bipr from the build for
the 7.0 timeframe.

This is needed because I4B is not locked and NET_NEEDS_GIANT goes away.

The plan is to lock I4B and bring everything back for 7.1.

Approved by:	re (kensmith)
2007-07-04 00:18:39 +00:00
Nate Lawson
a1ec53930b Revert previous commit, retaining cpufreq.
Approved by:	re (implicitly)
2007-07-01 22:19:20 +00:00
Nate Lawson
a7b811a620 Add cpufreq(4) to GENERIC. It does not change the frequency by default,
so systems should be relatively unaffected.  Users can then simply enable
powerd(8) in rc.conf to take advantage of it.

Approved by:	re
2007-07-01 21:47:45 +00:00
Alan Cox
ba4b85e482 Pages that do belong to an object and page queue can now be freed without
holding the page queues lock.  Thus, the page table pages released by
pmap_remove() and pmap_remove_pages() can be freed after the page queues
lock is released.

Approved by:	re (kensmith)
2007-07-01 07:08:26 +00:00
Matt Jacob
7fc02735f4 Check for pte being NULL in return from pmap_pte_pde- unlikely or
even impossible, but it's better ot have a panic and a quiesced
gcc4.2.
2007-06-17 04:27:45 +00:00
Matt Jacob
27705ac087 Initialize lastaddr to zero to make gcc4.2 happy. 2007-06-17 04:21:58 +00:00
Peter Wemm
5915fb72fb Prototype (but functional) Linux-ish /dev/nvram interface to the extra
114 bytes of cmos ram in the PC clock chip.  The big difference between
this and the Linux version is that we do not recalculate the checksums
for bytes 16..31.

We use this at work when cloning identical machines - we can copy the
bios settings as well.  Reading /dev/nvram gives 114 bytes of data but
you can seek/read/write whichever bytes you like.

Yes, this is a "foot, gun, fire!" type of device.
2007-06-15 22:58:14 +00:00
Xin LI
a2346f7c3c Enable SCTP by default for GENERIC kernels in order to give it
more exposure.  The current state of SCTP implementation is
considered to be ready for 32-bit platforms, but still need some
work/testing on 64-bit platforms.

Approved by:	re (kensmith)
Discussed with:	rrs
2007-06-14 17:14:27 +00:00
Pyun YongHyeon
b5f0caf909 Add nfe(4) to the list of drivers supported by GENERIC kernel.
While I'm here comment out nve(4) as nfe(4) will take over.

Approved by:	re
2007-06-12 02:24:30 +00:00
Matt Jacob
f2114f3bcd Check against maxsegsz being zero in bus_dma_tag_create and return EINVAL
if it is.

Reviewed by:	scott long
2007-06-11 17:57:24 +00:00
Andrew Thompson
ed3247cea7 Add wlan_scan_ap and wlan_scan_sta to platforms that include wlan. 2007-06-11 08:26:40 +00:00
Marcel Moolenaar
2b39bb4f4f Use default options for default partitioning schemes, rather than
making the relevant files standard. This avoids duplication and
makes it easier to override/disable unwanted schemes. Since ARM
doesn't have a DEFAULTS configuration file, leave the source
files for the BSD and MBR partitioning schemes in files.arm for
now.
2007-06-11 00:38:06 +00:00
Attilio Rao
393a081d42 Optimize vmmeter locking.
In particular:
- Add an explicative table for locking of struct vmmeter members
- Apply new rules for some of those members
- Remove some unuseful comments

Heavily reviewed by: alc, bde, jeff
Approved by: jeff (mentor)
2007-06-10 21:59:14 +00:00
Marcel Moolenaar
01bd17cc99 Add kdb_cpu_sync_icache(), intended to synchronize instruction
caches with data caches after writing to memory. This typically
is required to make breakpoints work on ia64 and powerpc. For
those architectures the function is implemented.
2007-06-09 21:55:17 +00:00
Robert Watson
68d4cc614a Enable AUDIT by default in the GENERIC kernel, allowing security event
auditing to be turned on without a kernel recompile, just an rc.conf
option.

Approved by:	re (kensmith)
Obtained from:	TrustedBSD Project
2007-06-08 20:29:07 +00:00
David Xu
42ce445fed Backout experimental adaptive-spin umtx code. 2007-06-06 07:35:08 +00:00
John Baldwin
ce0b0c05aa Move a warning under bootverbose as no machines that trigger it have ended
up being broken.
2007-06-05 18:57:48 +00:00
Jeff Roberson
5d68dad329 - Add a new argument to cpu_switch. This is a pointer to a mutex that
oldthread should point at before we return.
 - When cpu_switch() is called the td_lock pointer in the old thread may
   point at the blocked lock.  This prevents other processors from
   switching into this thread while we're still switching out.  Wait
   until we're done deactivating the vmspace before we release the
   thread by assigning to td_lock.
 - Before we can activate the new vmspace we must make sure that the new
   thread is not assigned to the blocked lock.  It may be in the process
   of switching out on another cpu.  Spin until the new thread is
   available.
2007-06-05 00:16:43 +00:00
Jeff Roberson
ebb6b0c0ec - Expose td_lock to assembly so it may be used in cpu_switch(). 2007-06-05 00:13:49 +00:00
Jeff Roberson
982d11f836 Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
   sychronization.
 - Use the per-process spinlock rather than the sched_lock for per-process
   scheduling synchronization.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-05 00:00:57 +00:00
Jeff Roberson
1b1618fb12 - Change comments and asserts to reflect the removal of the global
scheduler lock.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:57:32 +00:00
Jeff Roberson
74aaec43e8 Commit 11/14 of sched_lock decomposition.
- There is no globally visible scheduler lock any longer.  For now the
   watchdog can only check Giant.  This model of checking particular locks
   is flawed and should be revisited.  Other metrics should be considered.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:56:33 +00:00
Jeff Roberson
e4b5aee3a8 Commit 10/14 of sched_lock decomposition.
- Use sched_throw() rather than replicating the same cpu_throw() code for
   each architecture.  This also allows the scheduler to use any locking it
   may want to.
 - Use the thread_lock() rather than sched_lock when preempting.
 - The scheduler lock is not required to synchronize release_aps.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:56:08 +00:00
Attilio Rao
6759608248 Rework the PCPU_* (MD) interface:
- Rename PCPU_LAZY_INC into PCPU_INC
- Add the PCPU_ADD interface which just does an add on the pcpu member
  given a specific value.

Note that for most architectures PCPU_INC and PCPU_ADD are not safe.
This is a point that needs some discussions/work in the next days.

Reviewed by: alc, bde
Approved by: jeff (mentor)
2007-06-04 21:38:48 +00:00
David Malone
041b706b2f Despite several examples in the kernel, the third argument of
sysctl_handle_int is not sizeof the int type you want to export.
The type must always be an int or an unsigned int.

Remove the instances where a sizeof(variable) is passed to stop
people accidently cut and pasting these examples.

In a few places this was sysctl_handle_int was being used on 64 bit
types, which would truncate the value to be exported.  In these
cases use sysctl_handle_quad to export them and change the format
to Q so that sysctl(1) can still print them.
2007-06-04 18:25:08 +00:00
Alan Cox
5b4a3e940f Add the machine-specific definitions for configuring the new physical
memory allocator.

Set the size of phys_avail[] and dump_avail[] using one of these
definitions.

Approved by:	re
2007-06-03 23:18:29 +00:00
Attilio Rao
2feb50bf7d Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)
2007-05-31 22:52:15 +00:00
Paolo Pisati
3401f2c1df In some particular cases (like in pccard and pccbb), the real device
handler is wrapped in a couple of functions - a filter wrapper and an
ithread wrapper. In this case (and just in this case), the filter
wrapper could ask the system to schedule the ithread and mask the
interrupt source if the wrapped handler is composed of just an ithread
handler: modify the "old" interrupt code to make it support
this situation, while the "new" interrupt code is already ok.

Discussed with: jhb
2007-05-31 19:25:35 +00:00
Dag-Erling Smørgrav
753bcb5c34 Add CPUID2_PDCM
Requested by:	jkim
MFC after:	3 days
2007-05-31 11:26:45 +00:00
Dag-Erling Smørgrav
783a05dfd3 MFi386: PDCM, remove pointless message
MFC after:	3 days
2007-05-30 14:23:26 +00:00
Pyun YongHyeon
590f73f72e Honor maxsegsz of less than a page size in a DMA tag. Previously it
used to return PAGE_SIZE without respect to restrictions of a DMA tag.
This affected all of the busdma load functions that use
_bus_dmamap_loader_buffer() as their back-end.

Reviewed by:	scottl
2007-05-29 06:30:26 +00:00
Hidetoshi Shimokawa
35fafac2ac Enable fwip and dcons in GENERIC. They seem fairly stable.
Note on dcons:
To enable dcons in kernel, put the following lines in /boot/loader.conf.
You may also want to enable dcons in /etc/ttys.

boot_multicons="YES"
#Force dcons to be the high-level console if a firewire bus presents.
#hw.firewire.dcons_crom.force_console=1

FireWire/dcons support in loader will come shortly.
(i386/amd64 only)
2007-05-28 14:38:43 +00:00
Robert Watson
2129d3ea2b Remove "XXX Giant" comments before calls to kdb_trap() -- the kernel
debugger is quite capable of handling Giant-free execution at this
point.  Several other similar comments remain in trap.c on both i386
and amd64 awaiting analysis.
2007-05-27 19:16:45 +00:00
Konstantin Belousov
1c182de9a9 Move futex support code from <arch>/support.s into linux compat directory.
Implement all futex atomic operations in assembler to not depend on the
fuword() that does not allow to distinguish between -1 and failure return.
Correctly return 0 from atomic operations on success.

In collaboration with:	rdivacky
Tested by:	Scot Hetzel <swhetzel gmail com>, Milos Vyletel <mvyletel mzm cz>
Sponsored by:	Google SoC 2007
2007-05-23 08:33:06 +00:00
Alexander Kabaev
23a29e45cd Allow FreeBSD's native ELF image activators to execute shared libraries the
same way it was enabled for Linux binares in linuxulator.

This allows binaries built with -pie. Many ports auto-detect -fPIE support
in GCC 4.2 and build binaries FreeBSD was unable to run.
2007-05-22 02:22:58 +00:00
Jeff Roberson
80b200da28 - rename VMCNT_DEC to VMCNT_SUB to reflect the count argument.
Suggested by:	julian@
Contributed by:	attilio@
2007-05-20 22:33:42 +00:00
Alexander Kabaev
d586dea015 Remove extern struct pcpu __pcpu[]; from the header file and
move it the the only file where it appears to be used.
2007-05-19 05:03:59 +00:00
Alexander Kabaev
fa298d5ea8 Include machine/pcb.hto turn extern struct pcb stoppcbs[]; construct
into the valid C.
2007-05-19 05:01:43 +00:00
Jeff Roberson
222d01951f - define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts.  This can be used to abstract away pcpu details but also changes
   to use atomics for all counters now.  This means sched lock is no longer
   responsible for protecting counts in the switch routines.

Contributed by:		Attilio Rao <attilio@FreeBSD.org>
2007-05-18 07:10:50 +00:00
John Baldwin
19059a13ed Rework the support for ABIs to override resource limits (used by 32-bit
processes under 64-bit kernels).  Previously, each 32-bit process overwrote
its resource limits at exec() time.  The problem with this approach is that
the new limits affect all child processes of the 32-bit process, including
if the child process forks and execs a 64-bit process.  To fix this, don't
ovewrite the resource limits during exec().  Instead, sv_fixlimits() is
now replaced with a different function sv_fixlimit() which asks the ABI to
sanitize a single resource limit.  We then use this when querying and
setting resource limits.  Thus, if a 32-bit process sets a limit, then
that new limit will be inherited by future children.  However, if the
32-bit process doesn't change a limit, then a future 64-bit child will
see the "full" 64-bit limit rather than the 32-bit limit.

MFC is tentative since it will break the ABI of old linux.ko modules (no
other modules are affected).

MFC after:	1 week
2007-05-14 22:40:04 +00:00
Alexander Kabaev
ec69a8a6d2 Do not dereference linux_to_bsd_signal[-1] if userland has
passed zero as exit signal.

GCC 4.2 changes the kernel data segment layout not to have 0
in that memory location. This code ran by luck before and now
the luck has run out.
2007-05-11 01:25:51 +00:00
Kevin Lo
00465d5ab3 Add wlan_amrr. ural(4) uses amrr as transmit rate control. 2007-05-10 01:39:50 +00:00
Scott Long
f73e86c383 It turns out that the hptiop driver isn't portable after all. Confine it to
amd64 and i386 for now.
2007-05-09 15:55:45 +00:00
Scott Long
4439f8b4b6 Introduce a driver for the Highpoint RocketRAID 3xxx series of controllers.
The driver relies on CAM.

Many thanks to Highpoint for providing this driver.
2007-05-09 07:07:26 +00:00
John Baldwin
2e025791ce Handle CPUs with APIC IDs higher than 32 (at least one IBM server uses
an APIC ID of 38 for its second CPU):
- Add a new MAX_APIC_ID constant for the highest valid APIC ID for modern
  systems.
- Size the various arrays in the MADT, MP Table, and SMP code that are
  indexed by APIC IDs to allow for up to MAX_APIC_ID.
- Explicitly go through and assign logical cpu ids to local APICs before
  starting any of the APs up rather than doing it while starting up the
  APs.  This step is now where we honor MAXCPU.

MFC after:	1 week
2007-05-08 22:01:04 +00:00
John Baldwin
fb610ca1f9 Minor fixes and tweaks to the x86 interrupt code:
- Split the intr_table_lock into an sx lock used for most things, and a
  spin lock to protect intrcnt_index.  Originally I had this as a spin lock
  so interrupt code could use it to lookup sources.  However, we don't
  actually do that because it would add a lot of overhead to interrupts,
  and if we ever do support removing interrupt sources, we can use other
  means to safely do so w/o locking in the interrupt handling code.
- Replace is_enabled (boolean) with is_handlers (a count of handlers) to
  determine if a source is enabled or not.  This allows us to notice when
  a source is no longer in use.  When that happens, we now invoke a new
  PIC method (pic_disable_intr()) to inform the PIC driver that the
  source is no longer in use.  The I/O APIC driver frees the APIC IDT
  vector when this happens.  The MSI driver no longer needs to have a
  hack to clear is_enabled during msi_alloc() and msix_alloc() as a result
  of this change as well.
- Add an apic_disable_vector() to reset an IDT vector back to Xrsvd to
  complement apic_enable_vector() and use it in the I/O APIC and MSI code
  when freeing an IDT vector.
- Add a new nexus hook: nexus_add_irq() to ask the nexus driver to add an
  IRQ to its irq_rman.  The MSI code uses this when it creates new
  interrupt sources to let the nexus know about newly valid IRQs.
  Previously the msi_alloc() and msix_alloc() passed some extra stuff
  back to the nexus methods which then added the IRQs.  This approach is
  a bit cleaner.
- Change the MSI sx lock to a mutex.  If we need to create new sources,
  drop the lock, create the required number of sources, then get the lock
  and try the allocation again.
2007-05-08 21:29:14 +00:00
Paolo Pisati
bafe5a3118 Bring in the reminaing bits to make interrupt filtering work:
o push much of the i386 and amd64 MD interrupt handling code
  (intr_machdep.c::intr_execute_handlers()) into MI code
  (kern_intr.c::ithread_loop())
o move filter handling to kern_intr.c::intr_filter_loop()
o factor out the code necessary to mask and ack an interrupt event
  (intr_machdep.c::intr_eoi_src() and intr_machdep.c::intr_disab_eoi_src()),
  and make them part of 'struct intr_event', passing them as arguments to
  kern_intr.c::intr_event_create().
o spawn a private ithread per handler (struct intr_handler::ih_thread)
  with filter and ithread functions.

Approved by: re (implicit?)
2007-05-06 17:02:50 +00:00
Alan Cox
04a18977c8 Define every architecture as either VM_PHYSSEG_DENSE or
VM_PHYSSEG_SPARSE depending on whether the physical address space is
densely or sparsely populated with memory.  The effect of this
definition is to determine which of two implementations of
vm_page_array and PHYS_TO_VM_PAGE() is used.  The legacy
implementation is obtained by defining VM_PHYSSEG_DENSE, and a new
implementation that trades off time for space is obtained by defining
VM_PHYSSEG_SPARSE.  For now, all architectures except for ia64 and
sparc64 define VM_PHYSSEG_DENSE.  Defining VM_PHYSSEG_SPARSE on ia64
allows the entirety of my Itanium 2's memory to be used.  Previously,
only the first 1 GB could be used.  Defining VM_PHYSSEG_SPARSE on
sparc64 allows USIIIi-based systems to boot without crashing.

This change is a combination of Nathan Whitehorn's patch and my own
work in perforce.

Discussed with: kmacy, marius, Nathan Whitehorn
PR:		112194
2007-05-05 19:50:28 +00:00
John Baldwin
e706f7f0c7 Revamp the MSI/MSI-X code a bit to achieve two main goals:
- Simplify the amount of work that has be done for each architecture by
  pushing more of the truly MI code down into the PCI bus driver.
- Don't bind MSI-X indicies to IRQs so that we can allow a driver to map
  multiple MSI-X messages into a single IRQ when handling a message
  shortage.

The changes include:
- Add a new pcib_if method: PCIB_MAP_MSI() which is called by the PCI bus
  to calculate the address and data values for a given MSI/MSI-X IRQ.
  The x86 nexus drivers map this into a call to a new 'msi_map()' function
  in msi.c that does the mapping.
- Retire the pcib_if method PCIB_REMAP_MSIX() and remove the 'index'
  parameter from PCIB_ALLOC_MSIX().  MD code no longer has any knowledge
  of the MSI-X index for a given MSI-X IRQ.
- The PCI bus driver now stores more MSI-X state in a child's ivars.
  Specifically, it now stores an array of IRQs (called "message vectors" in
  the code) that have associated address and data values, and a small
  virtual version of the MSI-X table that specifies the message vector
  that a given MSI-X table entry uses.  Sparse mappings are permitted in
  the virtual table.
- The PCI bus driver now configures the MSI and MSI-X address/data
  registers directly via custom bus_setup_intr() and bus_teardown_intr()
  methods.  pci_setup_intr() invokes PCIB_MAP_MSI() to determine the
  address and data values for a given message as needed.  The MD code
  no longer has to call back down into the PCI bus code to set these
  values from the nexus' bus_setup_intr() handler.
- The PCI bus code provides a callout (pci_remap_msi_irq()) that the MD
  code can call to force the PCI bus to re-invoke PCIB_MAP_MSI() to get
  new values of the address and data fields for a given IRQ.  The x86
  MSI code uses this when an MSI IRQ is moved to a different CPU, requiring
  a new value of the 'address' field.
- The x86 MSI psuedo-driver loses a lot of code, and in fact the separate
  MSI/MSI-X pseudo-PICs are collapsed down into a single MSI PIC driver
  since the only remaining diff between the two is a substring in a
  bootverbose printf.
- The PCI bus driver will now restore MSI-X state (including programming
  entries in the MSI-X table) on device resume.
- The interface for pci_remap_msix() has changed.  Instead of accepting
  indices for the allocated vectors, it accepts a mini-virtual table
  (with a new length parameter).  This table is an array of u_ints, where
  each value specifies which allocated message vector to use for the
  corresponding MSI-X message.  A vector of 0 forces a message to not
  have an associated IRQ.  The device may choose to only use some of the
  IRQs assigned, in which case the unused IRQs must be at the "end" and
  will be released back to the system.  This allows a driver to use the
  same remap table for different shortage values.  For example, if a driver
  wants 4 messages, it can use the same remap table (which only uses the
  first two messages) for the cases when it only gets 2 or 3 messages and
  in the latter case the PCI bus will release the 3rd IRQ back to the
  system.

MFC after:	1 month
2007-05-02 17:50:36 +00:00
Ariff Abdullah
1d80d190af Disable C1 Enhanced mode on AMD K8 Family Revision F and above to keep
local APIC timer alive.

Reviewed by:	jhb
PR:		i386/104678
MFC after:	3 days
2007-04-25 19:58:42 +00:00
John Baldwin
a5b6b9a68e Fix the triple fault used as a last resort during a reboot to actually
fault.  The previous method zero'd out the page tables, invalidated the
TLB, and then entered a spin loop.  The idea was that the instruction after
the TLB invalidate would result in a page fault and the page fault and
subsequent double fault wouldn't be able to determine the physical page
for their fault handlers' first instruction.  This stopped working when
PGE (PG_G PTE/PDE bit) support was added as a TLB invalidate via %cr3
reload doesn't clear TLB entries with PG_G set.  Thus, the CPU was still
able to map the virtual address for the spin loop and happily performed
its infinite loop.

The triple fault now uses a much more deterministic sledge-hammer approach
to generate a triple fault.  First, the IDT descriptor is set to point to
an empty IDT, so any interrupts (including a double fault) will instantly
fault.  Second, we trigger a int 3 breakpoint to force an interrupt and
kick off a triple fault.

MFC after:	3 days
2007-04-24 21:17:45 +00:00
John Baldwin
4cc968cb95 MFi386: Attempt to reset the machine using the Reset Control register and
Fast A20 and Init register if the keyboard reset doesn't work before
resorting to a triple fault.
2007-04-24 20:06:36 +00:00
Stephan Uphoff
31b4f4a916 Modify TLB invalidation handling.
Reviewed by:	alc@, peter@
MFC after:	1 week
2007-04-21 14:17:30 +00:00
Stephane E. Potvin
0e5179e441 Add support for specifying a minimal size for vm.kmem_size in the loader via
vm.kmem_size_min. Useful when using ZFS to make sure that vm.kmem size will
be at least 256mb (for example) without forcing a particular value via vm.kmem_size.

Approved by: njl (mentor)
Reviewed by: alc
2007-04-21 01:14:48 +00:00
Jung-uk Kim
f1753e0585 Fix style(9) and comments.
Submitted by:	Scot Hetzel (swhetzel at gmail dot com)
2007-04-18 20:12:05 +00:00
Jung-uk Kim
d477452eb3 style(9) says sizeof's are not be followed by a space. Fix them. 2007-04-18 18:11:32 +00:00
Jung-uk Kim
86a0e5dbb6 Implement settimeofday() for Linuxulator/amd64.
Submitted by:	Scot Hetzel (swhetzel at gmail dot com)
2007-04-18 18:08:12 +00:00
John Baldwin
88a5255bc4 Honor the BUS_DMA_NOCACHE flag to bus_dmamem_alloc() on amd64 and i386 by
mapping the pages as UC (uncacheable) using pmap_change_attr().

MFC after:	1 week
Requested by:	ariff
Reviewed by:	scottl
2007-04-17 21:05:34 +00:00
Alan Cox
0b76504872 Eliminate the misuse of PG_FRAME to truncate a virtual address to a virtual
page boundary.

Reviewed by: ru@
2007-04-13 16:07:29 +00:00
Pawel Jakub Dawidek
fef2a25971 Remove trailing '.' for consistency! 2007-04-10 21:40:13 +00:00
Pawel Jakub Dawidek
57bcf75fd2 Add UFS_GJOURNAL options to the GENERIC kernel.
Approved by:	re (kensmith)
2007-04-10 16:49:41 +00:00
Jung-uk Kim
357afa7113 MFP4: Turn emul_lock into a mutex.
Submitted by:	rdivacky
2007-04-02 18:38:13 +00:00
Jung-uk Kim
46bd727a1e Correct BB-profiling and adjust comments.
Pointed out by:	bde
Reviewed by:	bde
2007-03-31 01:47:37 +00:00
Jung-uk Kim
6a4abad780 Fix off-by-4 error in address validation for i386, reduce PCB reloading, and
fix more style(9) nits.

Pointed out by:	bde
Discussed with:	kib
Reviewd by:	bde
2007-03-30 23:19:08 +00:00
Jung-uk Kim
80f87d5e55 Fix more style(9) nits[1] and remove unnecessary use of '#if !defined(_KERNEL)'.
Pointed out by:	bde[1]
2007-03-30 19:33:53 +00:00
Jung-uk Kim
6403d3a160 Use the same wisdom of sys/i386/i386/support.s 1.97 to remove obfuscation.
Pointed out by:	bde
2007-03-30 18:27:57 +00:00
Jung-uk Kim
b5def2b6b5 MFP4: Fix style(9) nits and grammar in comments. 2007-03-30 17:27:13 +00:00
Jung-uk Kim
5e397f16cd MFP4: 114193, 114194
Dont "return" in linux_clone() after we forked the new process in a case
of problems.  Move the copyout of p2->p_pid outside the emul_lock coverage.

Submitted by:	Roman Divacky
2007-03-30 17:16:51 +00:00
Jung-uk Kim
a328699b34 MFP4: Linux futex support for amd64.
Initial patch was submitted by kib and additional work was done
by Divacky Roman.

Tested by:	emulation
2007-03-30 01:07:28 +00:00
Jung-uk Kim
3a33908404 Regen for set_thread_area. 2007-03-30 00:08:21 +00:00
Jung-uk Kim
9c5b213e51 MFP4: Linux set_thread_area syscall (aka TLS) support for amd64.
Initial version was submitted by Divacky Roman and mostly rewritten by me.

Tested by:	emulation
2007-03-30 00:06:21 +00:00
Julian Elischer
6734f35eac Implement the openat() linux syscall
Submitted by:	Roman Divacky (rdivacky@)
MFC after:	2 weeks
2007-03-29 02:11:46 +00:00
Kris Kennaway
67eae018cb Remove unnecessary giant acquisition around panic in #ifdef DIAGNOSTIC
code.

# There is some question about whether this code is even relevant any
# longer (it dates back to prehistoric times, i.e. present in r1.1),
# especially on amd64.

Reviewed by:	jhb
2007-03-26 21:45:44 +00:00
Nate Lawson
0d4ac62a35 Add an interface for drivers to be notified of changes to CPU frequency.
cpufreq_pre_change is called before the change, giving each driver a chance
to revoke the change.  cpufreq_post_change provides the results of the
change (success or failure).  cpufreq_levels_changed gives the unit number
of the cpufreq device whose number of available levels has changed.  Hook
in all the drivers I could find that needed it.

* TSC: update TSC frequency value.  When the available levels change, take the
highest possible level and notify the timecounter set_cputicker() of that
freq.  This gets rid of the "calcru: runtime went backwards" messages.
* identcpu: updates the sysctl hw.clockrate value
* Profiling: if profiling is active when the clock changes, let the user
know the results may be inaccurate.

Reviewed by:	bde, phk
MFC after:	1 month
2007-03-26 18:03:29 +00:00
Jung-uk Kim
2be4e4713a Catch up with ACPI-CA 20070320 import. 2007-03-22 18:16:43 +00:00
John Baldwin
d66ff27773 Change the amd64, i386, and ia64 nexus drivers to setup bus space tags and
handles when activating a resource via bus_activate_resource() rather than
doing some of the work in bus_alloc_resource() and some of it in
bus_activate_resource().

One note is that when using isa_alloc_resourcev() on PC-98, drivers now
need to just use bus_release_resource() without explicitly calling
bus_deactivate_resource() first.  nyan@ has already fixed all of the PC-98
drivers.
2007-03-21 15:36:38 +00:00
John Baldwin
b8783b00f8 Add a new apic0 psuedo-device to claim memory resources for the memory
address ranges used by local and I/O APICs in the system.  Some systems
also reserve these ranges as system resources via either PnPBIOS or
ACPI, so this device currently attaches after acpi0 and legacy0 so that
the system resources are given precedence.
2007-03-20 21:53:31 +00:00
John Baldwin
95a07592ee Add a new ram0 pseudo-device that claims memory resouces for physical
addresses corresponding to system RAM.  On amd64 ram0 uses the SMAP
and claims all the type 1 SMAP regions.  On i386 ram0 uses the
dump_avail[] array.  Note that on i386 we have to ignore regions above
4G in PAE kernels since bus resources use longs.
2007-03-20 21:08:39 +00:00
Jung-uk Kim
2498f259d4 - Add macros for newly added CPUID bits in the corresponding header files.
- Use correct capticalization in xTPR as Intel uses in their documents.
- Use proper description instead of vendor code name in comment.
2007-03-20 20:22:45 +00:00
John Baldwin
ce533e82a2 Tweak the probe/attach order of devices on the x86 nexus devices.
Various BIOS-related psuedo-devices are added at an order of 5.  acpi0 is
added at an order of 10, and legacy0 is added at an order of 11.
2007-03-20 20:21:44 +00:00
John Baldwin
86f07bb052 MFi386 1.173: Display two new Intel feature bits. 2007-03-20 18:48:04 +00:00
Jung-uk Kim
ab5916a526 Add another CPUID for AMD CPUs and fix style(9) while I am here. 2007-03-12 20:27:21 +00:00
Alan Cox
c640357f04 Push down the implementation of PCPU_LAZY_INC() into the machine-dependent
header file.  Reimplement PCPU_LAZY_INC() on amd64 and i386 making it
atomic with respect to interrupts.

Reviewed by: bde, jhb
2007-03-11 05:54:29 +00:00
Alan Cox
a2a90145e8 Completely eliminate "avail_start". It serves no useful purpose. 2007-03-10 20:26:43 +00:00
John Baldwin
07bb813626 Defer calling lapic_init() until we've completed the 'MPTable: <...>'
printf.  Otherwise, printfs inside of lapic_init() (such as during a
verbose boot) can uglify the output.
2007-03-09 15:49:57 +00:00
Mohan Srinivasan
f9bb753844 Over NFS, an open() call could result in multiple over-the-wire
GETATTRs being generated - one from lookup()/namei() and the other
from nfs_open() (for cto consistency). This change eliminates the
GETATTR in nfs_open() if an otw GETATTR was done from the namei()
path. Instead of extending the vop interface, we timestamp each attr
load, and use this to detect whether a GETATTR was done from namei()
for this syscall. Introduces a thread-local variable that counts the
syscalls made by the thread and uses <pid, tid, thread syscalls> as
the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on
thread state that could be used as the timestamp with minimal overhead.
2007-03-09 04:02:38 +00:00
Scott Long
94e6bc303f Don't increment total_bounced when doing no-op dmamap_sync ops. 2007-03-06 18:28:43 +00:00
John Baldwin
4c5bec1161 Change the x86 interrupt code to use FreeBSD CPU IDs (i.e. PCPU_GET(cpuid))
rather than local APIC IDs to keep track of CPUs which can handle
interrupts.
2007-03-06 17:16:47 +00:00
Alan Cox
8da3fc95a7 Acquiring smp_ipi_mtx on every call to pmap_invalidate_*() is wasteful.
For example, during a buildworld more than half of the calls do not
generate an IPI because the only TLB entry invalidated is on the calling
processor.  This revision pushes down the acquisition and release of
smp_ipi_mtx into smp_tlb_shootdown() and smp_targeted_tlb_shootdown() and
instead uses sched_pin() and sched_unpin() in pmap_invalidate_*() so that
thread migration doesn't lead to a missed TLB invalidation.

Reviewed by: jhb
MFC after: 3 weeks
2007-03-05 21:40:10 +00:00
John Baldwin
aa7a005ee0 Use vm_paddr_t rather than uintptr_t when passing the physical address of
APICs to lapic_init() and ioapic_create().
2007-03-05 20:35:17 +00:00
John Baldwin
db181dc741 Add a simple device driver to "eat" any I/O APICs that show up as PCI
devices.

MFC after:	1 week
2007-03-05 16:22:49 +00:00
Jung-uk Kim
a4e3bad794 MFP4: 115220, 115222
- Fix style(9) and reduce diff between amd64 and i386.
- Prefix Linuxulator macros with LINUX_ to prevent future collision.
2007-03-02 00:08:47 +00:00
Jung-uk Kim
6a5964d385 MFP4: 115094
Linux does not check file descriptor when MAP_ANONYMOUS is set.
This should fix recent LTP test regressions.

Reported by:	Scot Hetzel (swhetzel at gmail dot com)
		netchild
2007-02-27 02:08:01 +00:00
Alexander Leidinger
802e08a360 Partial MFp4 of 114977:
Whitespace commit: Fix grammar, spelling and punctuation.

Submitted by:	"Scot Hetzel" <swhetzel@gmail.com>
2007-02-24 16:49:25 +00:00
John Baldwin
860d8e2312 Use ih_filter instead of ih_handler in a couple of places. This fixes
most INTR_FAST handlers on i386.

Reviewed by:	piso
2007-02-23 20:03:24 +00:00
Paolo Pisati
ef544f6312 o break newbus api: add a new argument of type driver_filter_t to
bus_setup_intr()

o add an int return code to all fast handlers

o retire INTR_FAST/IH_FAST

For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current

Reviewed by: many
Approved by: re@
2007-02-23 12:19:07 +00:00
Konstantin Belousov
e277569ee2 MFi386 rev. 1.544 of i386/i386/pmap.c:
Rounding addr upwards to next 2M boundary in pmap_growkernel() could
cause addr to become 0, resulting in an early return without populating
the last PDE.

Reported and tested by:	kris
Suggested by:	alc
MFC after:	1 week
2007-02-19 10:55:16 +00:00
Alan Cox
ae0663a383 Eliminate some acquisitions and releases of the page queues lock that are
no longer necessary.
2007-02-18 06:33:02 +00:00
John Baldwin
9be403be00 Add bootverbose printfs to indicate which IDT vectors are assigned to MSI
interrupts.
2007-02-15 22:22:57 +00:00
Jung-uk Kim
4d93342633 Fix accidental removal of an empty line from the previous commit. 2007-02-15 01:20:43 +00:00
Jung-uk Kim
da351821f7 Regen. 2007-02-15 01:15:31 +00:00
Jung-uk Kim
1e5ed8c1c2 MFP4: 113033
Port iopl(2) from i386.  This fixes LTP iopl01 and iopl02 on amd64.
2007-02-15 01:13:36 +00:00
Jung-uk Kim
10931a467a MFP4: 113025, 113146, 113177, 113203, 113500, 113546, 113570
- PROT_READ, PROT_WRITE, or PROT_EXEC implies PROT_READ and PROT_EXEC.
Linux/ia64's i386 emulation layer does this and it complies with Linux
header files.  This fixes mmap05 LTP test case on amd64.
- Do not adjust stack size when failure has occurred.
- Synchronize i386 mmap/mprotect with amd64.
2007-02-15 00:54:40 +00:00
Brooks Davis
983f970981 Include GEOM_LABEL in GENERIC. It's very useful and not well publicized
enough.

Approved by:	pjd
2007-02-09 19:03:18 +00:00
John Baldwin
71ddf30bd2 Don't send interrupts to CPUs disabled via lapic hints.
Reported by:	Ludger Bolmerg <lbolmerg ! web.de>
MFC after:	3 days
Pointy hat to:	jhb
2007-02-08 16:49:59 +00:00
Marcel Moolenaar
1d3aed33e8 Evolve the ctlreq interface added to geom_gpt into a generic
partitioning class that supports multiple schemes. Current
schemes supported are APM (Apple Partition Map) and GPT.
Change all GEOM_APPLE anf GEOM_GPT options into GEOM_PART_APM
and GEOM_PART_GPT (resp).

The ctlreq interface supports verbs to create and destroy
partitioning schemes on a disk; to add, delete and modify
partitions; and to commit or undo changes made.
2007-02-07 18:55:31 +00:00
Bruce Evans
1300fd67f3 Fixed some style bugs. Routine except:
- don't use __GNUCLIKE___OFFSETOF, since __offsetof() is a standard
  FreeBSD implementaion detail which has nothing to do with GNUC.
2007-02-06 18:04:02 +00:00
Bruce Evans
3764a82377 Simplified PCPU_GET() and PCPU_SET(). We must copy through a temporary
variable to avoid invalid constraints in dead code.  Use an array of
u_char's (inside a struct) instead of a char/short/int/long variable so
that the variable and its accesses can be spelled in the same way in all
cases and code doesn't need to be cloned just to hold the spelling
differences.

Fixed strict-aliasing errors in PCPU_SET() and in the amd64 PCPU_GET().
Cast to (void *) as in rev.1.37 of the i386 version where the errors
were fixed for the i386 PCPU_GET() only.  It would be more correct to
copy to and from the temp. variable using memcpy(), but then an
ifdef tangle would be required to ensure using the builtin memcpy().
We depend on fairly aggressive optimization to put the temp. variable
only in a register despite it being copied using
*(type *)(void *)&anothertype and could depend on this when using
memcpy() too.  This seems to work right even for -O0, but the -O0 case
has not been completely tested.

This change gives identical object code for all object files in LINT
on amd64 (except for one file with a __TIME__ stamp).  For LINT on
i386 it gives unimportant differences in instruction order and padding
in a few object files.  This was only tested for -O.

This change (actually a previous version of it) gives the following
reductions in the number of object files in LINT that fail to compile
with -O2 but without the -fno-strict-aliasing kludge:
- amd64: 29 (down from 211)
- i386: 36 (down from 47)

gcc-3.4.6 actually allows the invalid constraints that result from not
using the temp. variable, at least with -O[1-2], but gcc-3.3.3 crashes
on them and I don't want to depend on compiler bugs.
2007-02-06 16:21:09 +00:00
John Baldwin
c632517124 Change GDB_BUFSZ to be large enough to hold a register dump where each
register takes 16 characters (64-bit register in hex).  In practice this
is a slight bit of overkill as 7 of the 56 registers are only 32-bit, but
having the buffer too small results in remote kgdb trashing kernel memory
when it connects.

PR:		amd64/108673
Submitted by:	Ravi Murty, Nikhil Rao @ Intel
MFC after:	3 days
2007-02-05 21:48:32 +00:00
Konstantin Belousov
d0b2365eec Introduce some more SO_ option equivalents from Linux to FreeBSD.
The msg variable in linux_recvmsg() was not initialized.
Copy it from userspace.

Submitted by: rdivacky
2007-02-01 13:36:19 +00:00
Konstantin Belousov
a9ccaccfc3 Fix LOR that occurs because proctree_lock was acquired while holding
emuldata lock by moving the code upwards outside the emul_lock coverage.

Submitted by: rdivacky
2007-02-01 13:27:52 +00:00
Konstantin Belousov
84fbdf86b3 MFi386: Use LINUX_SIG_VALID macro.
Submitted by: rdivacky
2007-02-01 13:24:40 +00:00
Joseph Koshy
20c71e39c3 Use a known good stack at the time of servicing an NMI --- reuse
the space allocated for the double fault handler since this space
is otherwise unused till the time a double fault occurs.

This change should have been committed alongside r1.127 of
"exception.S", but I somehow missed doing so.

Problem reported by:	jeff
Pointy hat to:		jkoshy
2007-01-27 18:13:24 +00:00
Jeff Roberson
f0393f063a - Remove setrunqueue and replace it with direct calls to sched_add().
setrunqueue() was mostly empty.  The few asserts and thread state
   setting were moved to the individual schedulers.  sched_add() was
   chosen to displace it for naming consistency reasons.
 - Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be
   different on all three schedulers where it was only called in one place
   each.
 - Remove the long ifdef'd out remrunqueue code.
 - Remove the now redundant ts_state.  Inspect the thread state directly.
 - Don't set TSF_* flags from kern_switch.c, we were only doing this to
   support a feature in one scheduler.
 - Change sched_choose() to return a thread rather than a td_sched.  Also,
   rely on the schedulers to return the idlethread.  This simplifies the
   logic in choosethread().  Aside from the run queue links kern_switch.c
   mostly does not care about the contents of td_sched.

Discussed with:	julian

 - Move the idle thread loop into the per scheduler area.  ULE wants to
   do something different from the other schedulers.

Suggested by:	jhb

Tested on:	x86/amd64 sched_{4BSD, ULE, CORE}.
2007-01-23 08:46:51 +00:00
Jeff Roberson
3c93ca7d2f - Allow the schedulers to IPI_PREEMPT idlethread. This puts the decision
for this behavior on the initiator side.
2007-01-23 08:38:39 +00:00
Bruce Evans
71799af2d5 Cleaned up declaration and initialization of clock_lock. It is only
used by clock code, so don't export it to the world for machdep.c to
initialize.  There is a minor problem initializing it before it is
used, since although clock initialization is split up so that parts
of it can be done early, the first part was never done early enough
to actually work.  Split it up a bit more and do the first part as
late as possible to document the necessary order.  The functions that
implement the split are still bogusly exported.

Cleaned up initialization of the i8254 clock hardware using the new
split.  Actually initialize it early enough, and don't work around it
not being initialized in DELAY() when DELAY() is called early for
initialization of some console drivers.

This unfortunately moves a little more code before the early debugger
breakpoint so that it is harder to debug.  The ordering of console and
related initialization is delicate because we want to do as little as
possible before the breakpoint, but must initialize a console.
2007-01-23 08:01:20 +00:00
John Baldwin
5fe82bca57 Expand the MSI/MSI-X API to address some deficiencies in the MSI-X support.
- First off, device drivers really do need to know if they are allocating
  MSI or MSI-X messages.  MSI requires allocating powerof2() messages for
  example where MSI-X does not.  To address this, split out the MSI-X
  support from pci_msi_count() and pci_alloc_msi() into new driver-visible
  functions pci_msix_count() and pci_alloc_msix().  As a result,
  pci_msi_count() now just returns a count of the max supported MSI
  messages for the device, and pci_alloc_msi() only tries to allocate MSI
  messages.  To get a count of the max supported MSI-X messages, use
  pci_msix_count().  To allocate MSI-X messages, use pci_alloc_msix().
  pci_release_msi() still handles both MSI and MSI-X messages, however.
  As a result of this change, drivers using the existing API will only
  use MSI messages and will no longer try to use MSI-X messages.
- Because MSI-X allows for each message to have its own data and address
  values (and thus does not require all of the messages to have their
  MD vectors allocated as a group), some devices allow for "sparse" use
  of MSI-X message slots.  For example, if a device supports 8 messages
  but the OS is only able to allocate 2 messages, the device may make the
  best use of 2 IRQs if it enables the messages at slots 1 and 4 rather
  than default of using the first N slots (or indicies) at 1 and 2.  To
  support this, add a new pci_remap_msix() function that a driver may call
  after a successful pci_alloc_msix() (but before allocating any of the
  SYS_RES_IRQ resources) to allow the allocated IRQ resources to be
  assigned to different message indices.  For example, from the earlier
  example, after pci_alloc_msix() returned a value of 2, the driver would
  call pci_remap_msix() passing in array of integers { 1, 4 } as the
  new message indices to use.  The rid's for the SYS_RES_IRQ resources
  will always match the message indices.  Thus, after the call to
  pci_remap_msix() the driver would be able to access the first message
  in slot 1 at SYS_RES_IRQ rid 1, and the second message at slot 4 at
  SYS_RES_IRQ rid 4.  Note that the message slots/indices are 1-based
  rather than 0-based so that they will always correspond to the rid
  values (SYS_RES_IRQ rid 0 is reserved for the legacy INTx interrupt).
  To support this API, a new PCIB_REMAP_MSIX() method was added to the
  pcib interface to change the message index for a single IRQ.

Tested by:	scottl
2007-01-22 21:48:44 +00:00
Alexander Leidinger
d071f5048c MFp4 (113077, 113083, 113103, 113124, 113097):
Dont expose em->shared to the outside world before its properly
	initialized. Might not affect anything but its at least a better
	coding style.

	Dont expose em via p->p_emuldata until its properly initialized.
	This also enables us to get rid of some locking and simplify the
	code because we are workin on a local copy.

	In linux_fork and linux_vfork create the process in stopped state
	to be sure that the new process runs with fully initialized emuldata
	structure [1]. Also fix the vfork (both in linux_clone and linux_vfork)
	race that could result in never woken up process [2].

Reported by:	Scot Hetzel	[1]
Suggested by:	jhb		[2]
Reviewed by:	jhb (at least some important parts)
Submitted by:	rdivacky
Tested by:	Scot Hetzel (on amd64)

Change 2 comments (in the new code) to comply to style(9).

Suggested by:	jhb
2007-01-20 14:58:59 +00:00
Craig Rodrigues
5a09873361 Revert previous change.
Requested by:	kan
2007-01-18 05:46:32 +00:00
Craig Rodrigues
e76c6d8cd3 Forward declare __pcpu as a pointer type instead of an array type to
eliminate GCC 4.1 error: "array type has incomplete element type".
2007-01-18 02:00:04 +00:00
Alexander Leidinger
973ac082f8 MFp4 (112893):
Make linux_vfork() actually work. This enables make to work again with 2.6.
It also fixes the LTP vfork tests.

Submitted by:	rdivacky
2007-01-14 16:20:37 +00:00
Warner Losh
fed32d7544 Remove 3rd clause, renumber, ok per email 2007-01-12 07:26:21 +00:00
John Baldwin
cad688f011 Remove magic from rman_activate_resource() that uses the direct map at
KERNBASE for the first 1 MB of RAM instead of calling pmap_mapdev().
pmap_mapdev() knows how to handle the first 1 MB (and has known for a
while now) and properly maps the memory as UC to boot.

MFC after:	2 weeks
2007-01-11 19:40:19 +00:00
Jeff Roberson
b31e373bf7 - Use the correct test in the ipi bitmask handler for IPI_PREEMPT so that
we actually issue preemptions.
 - Remove the #ifdef IPI_PREEMPTION so it is always compiled in.  Leave
   the option which optionally enables support in sched_4bsd.  sched_ule.c
   will soon use this functionality as a run time rather than compile time
   option.
 - Compare against the idlethread rather than the priority.  There are some
   idle prio tasks that we can preempt.

Discussed with:	ups
Tested on:	i386, amd64
2007-01-11 00:17:02 +00:00
Jung-uk Kim
5efc6c44ff Add SSSE3 extensions and correct CNXT-ID spelling for Intel processors. 2007-01-09 19:23:22 +00:00
Alexander Leidinger
1c65504ca8 MFp4 (112498):
Rename the locking flags to EMUL_DOLOCK and EMUL_DONTLOCK to prevent confusion.

Submitted by:	rdivacky
2007-01-07 19:00:38 +00:00
Alexander Leidinger
4f383e20a9 MFi386 rev 1.56:
Bring the linux mmap code more into line with how linux (2.4.x) behaves.

Tested by:	Scot Hetzel <swhetzel@gmail.com> on amd64 without PROT_EXEC

Additionally to the i386 version always use PROT_EXEC in the mapping like the
previous version of the amd64 code did. We need to examinate this further to
decide what the right thing to do is. For now this fixes several problems in
the LTP test runs and should behave regarding PROT_EXEC like before.
2007-01-06 15:58:34 +00:00
Alexander Leidinger
99e9dcf022 regen after addition of linux_utimes and linux_rt_sigtimedwait 2006-12-31 13:20:31 +00:00
Alexander Leidinger
c9447c7551 MFp4 (111746, 108671, 108945, 112352):
- add linux utimes syscall [1]
 - add linux rt_sigtimedwait syscall [2]

Submitted by:	"Scot Hetzel" <swhetzel@gmail.com> [1]
Submitted by:	Bruce Becker <hostmaster@whois.gts.net> [2]
PR:		93199 [2]
2006-12-31 13:16:00 +00:00
Bruce Evans
f28e1c8f99 Fixed some style bugs (mainly assorted errors in comments, and inconsistent
spelling of `result').
2006-12-29 15:29:49 +00:00
Bruce Evans
6c296ffa81 Fixed some style bugs (whitespace only). 2006-12-29 14:28:23 +00:00
Bruce Evans
7e4277e591 Try harder to garbage-collect the "LOCORE" (really asm) version of
MPLOCKED.  The cleaning in rev.1.25 was supposed to have been undone
by rev.1.26, but 1.26 could never have actually affected asm files
since atomic.h is full of C declarations so including it in asm files
would just give syntax errors.  The asm MPLOCKED is even less needed
than when misplaced definitions of it were first removed, and is now
unused in any asm file in the src tree except in anachronismns in
sys/i386/i386/support.s.
2006-12-29 13:36:26 +00:00
Robert Watson
e9e1341c06 Regenerate. 2006-12-29 01:17:09 +00:00
Robert Watson
a46b391df7 Assign or clean up audit identifiers for a number of additional Linux
system calls on the amd64 architecture.

Some minor white space tweaks for consistency with other syscalls.master
files.

Obtained from:	TrustedBSD Project
2006-12-29 01:17:02 +00:00
Bruce Evans
276c702d8d Removed gratuitous cosmetic differences with the i386 version. This
mainly involves removing all __CC_SUPPORTS___INLINE__ ifdefs.  These
ifdefs are even less needed for amd64 than for i386, but the i386
atomic.h never had them.  The ifdefs here were just an optimization
of obsolescent compatibility cruft (__inline) for a null set of
compilers.  I think null sets of compilers should only be supported
in cases where this is more than an optimization, doesn't require
extensive ifdefs, and only involves not-so-obsolescent compatibility
cruft (plain inline here).
2006-12-28 08:15:14 +00:00
Bruce Evans
26ab2d1d23 Avoid an instruction in atomic_cmpset_{int_long)() in most cases.
These functions are used a lot for mutexes, so this reduces the text
size of an average kernel by about 0.75%.  This wasn't intended to
be a significant optimization, but it somehow increased the maximum
number of packets per second that can be transmitted by my bge hardware
from 320000 to 460000 (this benchmark is CPU-bound and remarkably
sensitive to changes in the text section).

Details: we would prefer to leave the result of the cmpxchg in %al,
but cannot tell gcc that it is there, so we have to convert it to an
integer register.  We converted  to %al, then to %[re]ax, but the
latter step is usually wasted since gcc usually only wants the condition
code and can recover it from %al just as easily as from %[re]ax.  Let
gcc promote %al in the few cases where this is needed.

Nearby style fixes;
- let gcc manage the load of `res', and don't abuse `res' for a copy of `exp'
- don't echo `res's name in comments
- consistently spell the condition code as 'e' after comparison for equality
- don't hard-code %al anywhere except in constraints
- for the version that doesn't use cmpxchg, there is no requirement to use
  %al anywhere, so don't hard-code it in the constraints either.

Style non-fix:
- for the versions that use cmpxchg, keep using "a" (was %[re]ax, now %al)
  for the main output operand, although this is not required.  The input
  and output operands that use the "a" constraint are now decoupled, and
  this makes things clearer except for the reason that the output register
  is hard-coded.  It is now just a hack to tell gcc that the input "a" has
  been clobbered without increasing the number of operands.
2006-12-27 20:26:00 +00:00
David Xu
4b0f4e9d9e Fix a panic when rebooting a SMP machine, when option STOP_NMI is used,
nmi handler is used to stop other processors, nmi hander calls trap(),
however, trap() now accepts a pointer rather than a reference, this was
changed by kmacy@.
2006-12-23 03:30:50 +00:00
Jung-uk Kim
77424f4177 MFP4: 109655
- Move linux_nanosleep() from src/sys/amd64/linux32/linux32_machdep.c to
src/sys/compat/linux/linux_time.c.
- Validate timespec ranges before use as Linux kernel does.
- Fix l_timespec structure.
- Clean up style(9) nits.
2006-12-20 20:17:35 +00:00
David Xu
4e32b7b3cc Add a lwpid field into per-cpu structure, the lwpid represents current
running thread's id on each cpu. This allow us to add in-kernel adaptive
spin for user level mutex. While spinning in user space is possible,
without correct thread running state exported from kernel, it hardly
can be implemented efficiently without wasting cpu cycles, however
exporting thread running state unlikely will be implemented soon as
it has to design and stablize interfaces. This implementation is
transparent to user space, it can be disabled dynamically. With this
change, mutex ping-pong program's performance is improved massively on
SMP machine. performance of mysql super-smack select benchmark is increased
about 7% on Intel dual dual-core2 Xeon machine, it indicates on systems
which have bunch of cpus and system-call overhead is low (athlon64, opteron,
and core-2 are known to be fast), the adaptive spin does help performance.

Added sysctls:
    kern.threads.umtx_dflt_spins
        if the sysctl value is non-zero, a zero umutex.m_spincount will
        cause the sysctl value to be used a spin cycle count.
    kern.threads.umtx_max_spins
        the sysctl sets upper limit of spin cycle count.

Tested on: Athlon64 X2 3800+, Dual Xeon 5130
2006-12-20 04:40:39 +00:00
Kip Macy
1726d94f4e Evidently neither GENERIC nor kan's config had isa in it :-0. As
Doug Barton says, "embrace the LINT".
2006-12-17 21:51:44 +00:00
Kip Macy
e5f8d4099d Newer versions of gcc don't support treating structures passed by value
as if they were really passed by reference. Specifically, the dead stores
elimination pass in the GCC 4.1 optimiser breaks the non-compliant behavior
on which FreeBSD relied. This change brings FreeBSD up to date by switching
trap frames to being explicitly passed by reference.

Reviewed by: kan
Tested by: kan
2006-12-17 06:48:40 +00:00
Pyun YongHyeon
1f90cf9895 Add msk(4) to the list of drivers supported by GENERIC kernel. 2006-12-13 03:41:47 +00:00
John Baldwin
8964299ac8 Give Host-PCI bridge drivers their own pcib_alloc_msi() and
pcib_alloc_msix() methods instead of using the method from the generic
PCI-PCI bridge driver as the PCI-PCI methods will be gaining some PCI-PCI
specific logic soon.
2006-12-12 19:27:01 +00:00
John Baldwin
fde45e231a Sort function prototypes. 2006-12-12 19:24:45 +00:00
John Baldwin
c304531851 Add a function to return the MD interrupt source cookie associated with
an interrupt event.  Use this in the x86 code to fixup the intrcnt names
when an interrupt handler is removed.
2006-12-12 19:20:19 +00:00
Maxim Sobolev
efa43a53bd Allow machdep.cpu_idle_hlt to be set from the loader. This should allow
to workaround the problem with SMP kernels on Turion64 X2 processors
described in kern/104678 and may be useful in other situations too.

MFC after:	3 days
2006-12-06 18:27:17 +00:00
Julian Elischer
ad1e7d285a Threading cleanup.. part 2 of several.
Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs.  Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.
2006-12-06 06:34:57 +00:00
Ruslan Ermilov
3cbc967ef7 Use a different bitmask for superpages' base address so that it
doesn't conflict with the PG_PDE_PAT bit.  (We still don't mask
off all the reserved bits but that's okay for now.)

Reviewed by:	alc
2006-12-05 11:31:33 +00:00
Alexander Leidinger
786e4fc47d MFP4 (110939):
MFi386: return EOPNOTSUPP for unknown module events.

Submitted by:	rdivacky
2006-12-03 21:06:07 +00:00
Alexander Leidinger
43d9d89b3f Sync with i386 (remove the LINUX stuff) now that the module is usable. 2006-12-03 21:02:09 +00:00
Bruce Evans
b73057227b Optimized RTC accesses by avoiding null writes to the index register
and by only delaying when an RTC register is written to.  The delay
after writing to the data register is now not just a workaround.

This reduces the number of ISA accesses in the usual case from 4 to
1.  The usual case is 2 rtcin()'s for each RTC interrupt.  The index
register is almost always RTC_INTR for this.  The 3 extra ISA accesses
were 1 for writing the index and 2 for delays.  Some delays are needed
in theory, but in practice they now just slow down slow accesses some
more since almost eveyone including us does them wrong so modern systems
enforce sufficient delays in hardware.  I used to have the delays ifdefed
out, but with the index register optimization the delays are rarely
executed so the old magic ones can be kept or even implemented non-
magically without significant cost.

Optimizing RTC interrupt handling is more interesting than it used to
be because RTC interrupts are currently needed to fix the more efficient
apic timer interrupts on some systems.  apic_timer_hz is normally 2000
so the RTC interrupt rate needs to be 2048 to keep the apic timer
firing on such systems.  Without these changes, each RTC interrupt
normally took 10 ISA accesses (2 PIC accesses and 2 sets of 4 RTC
accesses).  Each ISA access takes 1-1.5uS so 10 of then at 2048 Hz
takes 2-3% of a CPU.  Now 4 of them take 0.8-1.2% of a CPU.
2006-12-03 03:49:28 +00:00
John Birrell
e0b651251d Turn console printf buffering into a kernel option and only on
by default for sun4v where it is absolutely required.

This change moves the buffer from struct pcpu to the stack to avoid
using the critical section which created a LOR in a couple of cases
due to interaction with the tty code and kqueue. The LOR can't be
fixed with the critical section and the pcpu buffer can't be used
without the critical section.

Putting the buffer on the stack was my initial solution, but it was
pointed out that the stress on the stack might cause problems
depending on the call path. We don't have a way of creating tests
for those possible cases, so it's best to leave this as an option
for the time being. In time we may get enough data to enable this
option more generally.
2006-11-30 04:17:05 +00:00
Ruslan Ermilov
34028cf7d1 Differentiate between data and instruction fetch in the fatal
page fault trap handler.

Reviewed by:	alc
2006-11-28 20:04:00 +00:00
Ruslan Ermilov
ca830b9a74 Use a define instead of a "magic" value. 2006-11-23 21:37:04 +00:00
Ruslan Ermilov
7b0381568e Finish the PG_NX support at the pmap level.
Reviewed by:	alc
2006-11-23 21:36:02 +00:00
Ruslan Ermilov
f27eb21694 It's been possible to build linprocfs as a module for some time now.
Submitted by:	rdivacky
2006-11-22 10:34:12 +00:00
Alan Cox
da44960498 The global variable avail_end is redundant and only used once. Eliminate
it.  Make avail_start static to the pmap on amd64.  (It no longer exists
on other architectures.)
2006-11-19 20:54:58 +00:00
John Baldwin
81efc3d94c Add support for 8 byte hardware watches in long mode. Kernel hardware
watches support 8 byte watches.  For userland, we disallow 8 byte watches
for 32-bit tasks.
2006-11-17 20:27:01 +00:00
John Baldwin
7693afca4e - Add macro constants for the various fields in %dr7 and use them in place
of various scattered magic values.
- Pretty print the address of hardware watchpoints in 'show watch' rather
  than just displaying hex.
- Expand address field width on amd64 for 64-bit pointers.
2006-11-17 19:20:32 +00:00
John Baldwin
5527d3ed75 Trim some noise from bootverbose:
- Drop the printf in intr_machdep.c when we assign an interrupt souce to
  a CPU.  Each source already has a more detailed printf.
- Don't output a line for each ioapic pin showing its initial state, this
  has outlived its usefulness.
- When an APIC enumerator sets the bus, polarity, or trigger mode of an
  ioapic pin, just return success without printing anything if the new
  value matches the current one.

MFC after:	2 weeks
2006-11-17 16:41:03 +00:00
John Baldwin
5d346a567c A few more style fixes. 2006-11-17 16:37:35 +00:00
John Baldwin
71f4007710 Various whitespace and style fixes. 2006-11-15 19:53:48 +00:00
John Baldwin
15f266289d Fix a typo that broke MSI (MSI-X worked fine) in the later revisions of
the MSI patches.
2006-11-15 18:40:00 +00:00
John Baldwin
4184900911 MD support for PCI Message Signalled Interrupts on amd64 and i386:
- Add a new apic_alloc_vectors() method to the local APIC support code
  to allocate N contiguous IDT vectors (aligned on a M >= N boundary).
  This function is used to allocate IDT vectors for a group of MSI
  messages.
- Add MSI and MSI-X PICs.  The PIC code here provides methods to manage
  edge-triggered MSI messages as x86 interrupt sources.  In addition to
  the PIC methods, msi.c also includes methods to allocate and release
  MSI and MSI-X messages.  For x86, we allow for up to 128 different
  MSI IRQs starting at IRQ 256 (IRQs 0-15 are reserved for ISA IRQs,
  16-254 for APIC PCI IRQs, and IRQ 255 is reserved).
- Add pcib_(alloc|release)_msi[x]() methods to the MD x86 PCI bridge
  drivers to bubble the request up to the nexus driver.
- Add pcib_(alloc|release)_msi[x]() methods to the x86 nexus drivers that
  ask the MSI PIC code to allocate resources and IDT vectors.

MFC after:	2 months
2006-11-13 22:23:34 +00:00
John Baldwin
818b0b4bdf Various fixes:
- Remove an extra entry from the array for 0x0f prefixed instruction groups.
  This fixes decoding of instructions where the second opcode >= 0x80.
- Add support for the 64-bit immediate mov instructions.
- When short_addr is enabled, don't parse the modr/m byte for a 16-bit
  address, but as a 32-bit address.
- Support %rip relative addressing.
- Don't print a displacement of 0 if there is a base or index register.

MFC after:	3 days
2006-11-13 21:14:54 +00:00
Ruslan Ermilov
d77f5882e7 Fix NKPT comments to match reality. Note that the current value
of NKPT is no longer enough to run amd64 with 16G of RAM, as it
doesn't have space for mapping a kernel (16M kernel would require
additionally 8 page tables).
2006-11-13 20:33:54 +00:00
Ruslan Ermilov
26af9ac7d0 Fix a comment. 2006-11-13 06:26:57 +00:00
Alan Cox
44b8bd66f9 Make pmap_enter() responsible for setting PG_WRITEABLE instead
of its caller.  (As a beneficial side-effect, a high-contention
acquisition of the page queues lock in vm_fault() is eliminated.)
2006-11-12 21:48:34 +00:00
Ruslan Ermilov
9f70620442 Regen.
Forgotten by:	trhodes
2006-11-11 21:49:08 +00:00
Ruslan Ermilov
7eae4829bf Spelling. 2006-11-07 21:57:18 +00:00
Ruslan Ermilov
81490cbe6f Line up memory amount reporting that got broken when s/real/usable/. 2006-11-07 21:55:39 +00:00
John Baldwin
6ddd7e6a5a Add a new 'union l_sigval' to use in place of 'union sigval' in the
linux siginfo structure.  l_sigval uses a l_uintptr_t for sival_ptr so
that sival_ptr is the right size for linux32 on amd64.  Since no code
currently uses 'lsi_ptr' this is just a cosmetic nit rather than a bug
fix.
2006-11-07 18:53:49 +00:00
John Baldwin
3900a3be21 Remove duplicate IDTVEC macro definition, it's already defined in
<machine/intr_machdep.h>.
2006-11-07 18:46:33 +00:00
Robert Watson
acd3428b7d Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
John Birrell
8391a99bf7 Remove the KDTRACE option again because of the complaints about having
it as a default.

For the record, the KDTRACE option caused _no_ additional source files
to be compiled in; certainly no CDDL source files. All it did was to
allow existing BSD licensed kernel files to include one or more CDDL
header files.

By removing this from DEFAULTS, the onus is on a kernel builder to add
the option to the kernel config, possibly by including GENERIC and
customising from there. It means that DTrace won't be a feature
available in FreeBSD by default, which is the way I intended it to be.

Without this option, you can't load the dtrace module (which contains
the dtrace device and the DTrace framework). This is equivalent to
requiring an option in a kernel config before you can load the linux
emulation module, for example.

I think it is a mistake to have DTrace ported to FreeBSD, but not
to have it available to everyone, all the time. The only exception
to this is the companies which distribute systems with FreeBSD embedded.
Those companies will customise their systems anyway. The KDTRACE
option was intended for them, and only them.
2006-11-04 23:50:12 +00:00
John Birrell
1f80cd9398 Build in kernel support for loading DTrace modules by default. This
adds the hooks that DTrace modules register with, and adds a few functions
which have the dtrace_ prefix to allow the DTrace FBT (function boundary
trace) provider to avoid tracing because they are called from the DTtrace
probe context.

Unlike other forms of tracing and debug, DTrace support in the kernel
incurs negligible run-time cost.

I think the only reason why anyone wouldn't want to have kernel support
enabled for DTrace would be due to the license (CDDL) under which DTrace
is released.
2006-11-04 04:58:10 +00:00
John Birrell
3d068827c2 Add a cnputs() function to write a string to the console with
a lock to prevent interspersed strings written from different CPUs
at the same time.

To avoid putting a buffer on the stack or having to malloc one,
space is incorporated in the per-cpu structure. The buffer
size if 128 bytes; chosen because it's the next power of 2 size
up from 80 characters.

String writes to the console are buffered up the end of the line
or until the buffer fills. Then the buffer is flushed to all
console devices.

Existing low level console output via cnputc() is unaffected by
this change. ithread calls to log() are also unaffected to avoid
blocking those threads.

A minor change to the behaviour in a panic situation is that
console output will still be buffered, but won't be written to
a tty as before. This should prevent interspersed panic output
as a number of CPUs panic before we end up single threaded
running ddb.

Reviewed by:	scottl, jhb
MFC after:	2 weeks
2006-11-01 04:54:51 +00:00
Konstantin Belousov
d4d2a400e4 Fix a typo resulting in truncated linux32 signal trampoline code copied
to the usermode. Usually, signal handler segfaulted on return.

Reviewed by:	jhb
MFC after:	3 days
2006-10-31 17:53:02 +00:00
Alexander Leidinger
96ed72ac81 regen after linux_io_* backout 2006-10-29 14:12:44 +00:00
Alexander Leidinger
3680a41902 Backout the linux aio stuff. Several problems where identified and the
dynamic nature (if no native aio code is available, the linux part
returns ENOSYS because of missing requisites) should be solved differently
than it is.

All this will be done in P4.

Not included in this commit is a backout of the changes to the native aio
code (removing static in some places). Those changes (and some more) will
also be needed when the reworked linux aio stuff will reenter the tree.

Requested by:	rwatson
Discussed with:	rwatson
2006-10-29 14:02:39 +00:00
Bruce Evans
6a70163fcc Removed some SMP ifdefs so that using the TSC as a cputime clock is
not completely decided at config time.  Just don't default to using
the TSC if there are multiple active CPUs.  Also, don't default to
using the TSC if it is broken.  SMP ifdefs are still used to disallow
using perfmon since perfmon is always broken if SMP is just configured.

This only helps much for SMP kernels running on 1 CPU.  The overheads
for using the i8254 cputime clock were a bit too high on 486/33's, and
now on multi-GHz CPUs they are usually in the 99-99.9% range.  Switching
from the old default of an i8254 clock to the TSC works poorly because
the overheads are not recalibrated.

Use the same condition for declaring perfmon stuff as for using it.
2006-10-29 09:48:44 +00:00
Bruce Evans
91b4d1bfc2 In the userland .mcount():
- Don't use a frame pointer.  Our callers need a frame pointer, but we
  could only use one to support things that aren't supported.  (These
  things are:
  - profiling of profiling
  - debugging of profiling.  The core ENTRY() macro doesn't support
    forcing a frame pointer for debugging, so don't do more here.)
- Ensure that we are in the text section and have normal alignment.
- Use the normal syntax for `.type'.
2006-10-28 13:12:06 +00:00
Alexander Leidinger
c1ea90bfd3 regen (prctl addition) 2006-10-28 11:24:38 +00:00
Bruce Evans
43f0ea0a27 i386/include/profile.h:
Fixed a syntax error for the (!__KERNEL && !__GNUCLIKE_ASM) case in
rev.1.36.  Apparently, this case has never been reached even by lint.

Submitted by:	stefanf

{amd64,i386}/include/profile.h:
In case the above case is actually reached, break it properly by
providing null support that will fail at link time instead of a stub
that gives wrong (null) profiling at runtime.
2006-10-28 11:03:03 +00:00
Alexander Leidinger
955d762aca MFP4:
Implement prctl().

Submitted by:	rdivacky
Tested with:	LTP
2006-10-28 10:59:59 +00:00
Bruce Evans
853b92dacf In MCOUNT_OVERHEAD(label), actually use the `label' parameter. We were
still using the global label named "profil", and this worked accidentally
because all callers use the same name.
2006-10-28 07:59:11 +00:00
Bruce Evans
3a110062fd Cleaned up includes. <machine/profile.h> was unused. <machine/timerreg.h>
was only used in the GUPROF case, so the messes to get its i386 prerequisites
included shouldn't have been needed.

Fixed some style bugs. Quote #error contents, and don't repeat an #error
directive on amd64.
2006-10-28 06:38:51 +00:00
Bruce Evans
94450a83e8 Removed all traces of HIDENAME() in amd64 and i386 kernel code. Using
this used to be slightly cleaner than using ifdefs in a few places to
support both a.out and elf, but using it now just causes messes and
unportabilities.  It seems to be impossible to implement the elf
HIDENAME() portably in cpp (since token pasting of "." and <name> is
invalid).

*/prof_machdep.c:
- Removed all uses of CNAME().  CNAME() is easy enough to use in pure
  asm code, but using it in inline asm requires messy quoting.  The
  core pure asm code has been hacked on more and all uses of CNAME() in
  it have already gone away.  Just assume the elf convention here too.
- Removed now-uneeded include of <machine/asmacros.h>.
- Removed the workaround for a namespace conflict with this include.
2006-10-28 06:04:29 +00:00
Bruce Evans
447647908c Don't call mexitcount or provide a stub mexitcount to call when
profiling is configured but high resolution profiling is not configured.
Only functions in *.[Ss] called the stub, so efficiency was not
significantly affected.
2006-10-27 14:17:50 +00:00
John Birrell
3750d1ecad Remove the KSE option now that it's in DEFAULTS on these arches/machines.
The 'nooption' kernel config entry has to be used to turn KSE off now.
This isn't my preferred way of dealing with this, but I'll defer to
scottl's experience with the io/mem kernel option change and the grief
experienced over that.

Submitted by:	scottl@
2006-10-26 22:11:35 +00:00
John Birrell
013d6d8cb4 Add 'options KSE' to the kernel config DEFAULTS on all arches/machines
except sun4v.

This change makes the transition from a default to an option more
transparent and is an attempt to head off all the compliants that are
likely from people who don't read UPDATING, based on experience with
the io/mem change.

Submitted by:	scottl@
2006-10-26 22:05:25 +00:00
John Birrell
8460a577a4 Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by:	davidxu@
2006-10-26 21:42:22 +00:00
Ruslan Ermilov
837f167eb2 Move "device splash" back to MI NOTES and "files", it's MI. 2006-10-23 13:23:14 +00:00
Alan Cox
43200cd3ed Eliminate unnecessary PG_BUSY tests. 2006-10-22 04:18:01 +00:00
Ruslan Ermilov
7971a9bc04 MFi386: 1.13: Fix booting with ps2 keyboards. 2006-10-21 12:52:46 +00:00
Dag-Erling Smørgrav
c43ac89acc Move more MD devices and options out of MI NOTES. 2006-10-20 09:52:27 +00:00
Bruce Evans
045f738b58 Don't show debug registers in "show registers". Special registers should
be displayed specially, and debug registers are among of the least
interesting special registers (far behind %cr3).  The debug registers
are still accessible as variables and displayed in another bogus place
("show watches").
2006-10-20 09:44:21 +00:00
Dag-Erling Smørgrav
c276283866 The VGA_DEBUG option only exists on {amd64,i386,ia64}.
Also remove 'device io' from amd64 NOTES; DEFAULTS takes care of it.
2006-10-20 08:56:26 +00:00
Warner Losh
e54ad0a189 Remove references to pccard.conf 2006-10-19 05:17:55 +00:00
David Xu
5f641fc0fb o Add keyword volatile for user mutex owner field.
o Fix type consistent problem by using type long for old
  umtx and wait channel.
o Rename casuptr to casuword.
2006-10-17 02:24:47 +00:00
John Baldwin
b85360078a Add one more include to fix the case of !DDB and !atpic. 2006-10-16 21:40:46 +00:00
Hiroki Sato
b84baf83b9 Add a newline to the printf().
Spotted by:	Peter Carah <pete@altadena.net>
MFC after:	3 days
2006-10-15 16:52:59 +00:00
Alexander Leidinger
95f2da66d3 regen (linux AIO stuff) 2006-10-15 14:24:10 +00:00
Alexander Leidinger
6a1162d4cd MFP4 (with some minor changes):
Implement the linux_io_* syscalls (AIO). They are only enabled if the native
AIO code is available (either compiled in to the kernel or as a module) at
the time the functions are used. If the AIO stuff is not available there
will be a ENOSYS.

From the submitter:
---snip---
DESIGN NOTES:

1. Linux permits a process to own multiple AIO queues (distinguished by
   "context"), but FreeBSD creates only one single AIO queue per process.
   My code maintains a request queue (STAILQ of queue(3)) per "context",
   and throws all AIO requests of all contexts owned by a process into
   the single FreeBSD per-process AIO queue.

   When the process calls io_destroy(2), io_getevents(2), io_submit(2) and
   io_cancel(2), my code can pick out requests owned by the specified context
   from the single FreeBSD per-process AIO queue according to the per-context
   request queues maintained by my code.

2. The request queue maintained by my code stores contrast information between
   Linux IO control blocks (struct linux_iocb) and FreeBSD IO control blocks
   (struct aiocb). FreeBSD IO control block actually exists in userland memory
   space, required by FreeBSD native aio_XXXXXX(2).

3. It is quite troubling that the function io_getevents() of libaio-0.3.105
   needs to use Linux-specific "struct aio_ring", which is a partial mirror
   of context in user space. I would rather take the address of context in
   kernel as the context ID, but the io_getevents() of libaio forces me to
   take the address of the "ring" in user space as the context ID.

   To my surprise, one comment line in the file "io_getevents.c" of
   libaio-0.3.105 reads:

             Ben will hate me for this

REFERENCE:

1. Linux kernel source code:   http://www.kernel.org/pub/linux/kernel/v2.6/
   (include/linux/aio_abi.h, fs/aio.c)

2. Linux manual pages:         http://www.kernel.org/pub/linux/docs/manpages/
   (io_setup(2), io_destroy(2), io_getevents(2), io_submit(2), io_cancel(2))

3. Linux Scalability Effort:   http://lse.sourceforge.net/io/aio.html
   The design notes:           http://lse.sourceforge.net/io/aionotes.txt

4. The package libaio, both source and binary:
       http://rpmfind.net/linux/rpm2html/search.php?query=libaio
   Simple transparent interface to Linux AIO system calls.

5. Libaio-oracle:              http://oss.oracle.com/projects/libaio-oracle/
   POSIX AIO implementation based on Linux AIO system calls (depending on
   libaio).
---snip---

Submitted by:	Li, Xiao <intron@intron.ac>
2006-10-15 14:22:14 +00:00
Alexander Leidinger
0a62e03542 MFP4 (106538 + 106541):
Implement CLONE_VFORK. This fixes the clone05 LTP test.

Submitted by:	rdivacky
2006-10-15 13:39:40 +00:00
Alexander Leidinger
2482245b0c Revert my previous commit, I mismerged this to the wrong place.
Pointy hat to:	netchild
2006-10-15 13:30:45 +00:00
Alexander Leidinger
21aed094a9 MFP4 (106541): Fix the clone05 test in the LTP.
Submitted by:	rdivacky
2006-10-15 13:25:23 +00:00
Alexander Leidinger
4b3583a354 MFP4 (107144[1]): Implement CLONE_FS on i386[1] and amd64.
Submitted by:	rdivacky	[1]
2006-10-15 13:22:14 +00:00
John Baldwin
d3998dcf2e Move the 2 additional #includes down into the #ifndef DEV_ATPIC section. 2006-10-13 17:31:57 +00:00
John Birrell
e70cbcb5ba Attempt to fix the GENERIC kernel build which has been failing on
tinderbox for a couple of days.
2006-10-13 04:53:22 +00:00
John Baldwin
5d54487ef2 Fix nodevice atpic compile.
Pointy hat to:	jhb
2006-10-12 12:48:21 +00:00
John Baldwin
520ffff83e Change the x86 interrupt code to suspend/resume interrupt controllers
(PICs) rather than interrupt sources.  This allows interrupt controllers
with no interrupt pics (such as the 8259As when APIC is in use) to
participate in suspend/resume.
- Always register the 8259A PICs even if we don't use any of their pins.
- Explicitly reset the 8259As on resume on amd64 if 'device atpic' isn't
  included.
- Add a "dummy" PIC for the local APIC on the BSP to reset the local APIC
  on resume.  This gets suspend/resume working with APIC on UP systems.
  SMP still needs more work to bring the APs back to life.

The MFC after is tentative.

Tested by:	anholt (i386)
Submitted by:	Andrea Bittau <a.bittau at cs.ucl.ac.uk> (3)
MFC after:	1 week
2006-10-10 23:23:12 +00:00
John Baldwin
6e20fe33ba Oops, fix sign bug in #ifdef for value of INTRCNT_COUNT.
PR:		kern/99870
Submitted by:	jkim
MFC after:	3 days
2006-10-10 19:26:35 +00:00
Simon L. B. Nielsen
4517aab293 - Remove SCHED_ULE from GENERIC to better avoid foot-shooting by
unsuspecting users.
- Add a comment in NOTES about experimental status of SCHED_ULE.
- Make warning about experimental status in sched_ule(4) a bit
  stronger.

Suggested and reviewed by:	dougb
Discussed on:			developers
MFC after:			3 days
2006-10-05 20:31:58 +00:00
David Xu
c6511aea86 Move some declaration of 32-bit signal structures into file
freebsd32-signal.h, implement sigtimedwait and sigwaitinfo system calls.
2006-10-05 01:56:11 +00:00
John Birrell
6825d60738 PR:
Submitted by:
Reviewed by:
Approved by:
Obtained from:
MFC after:
Security:
Move the relocation definitions to the common elf header so that DTrace
can use them on one architecture targeted to a different one.

Add the additional ELF types defines in Sun's "Linker and Libraries"
manual.
2006-10-04 21:37:10 +00:00
Poul-Henning Kamp
e5037a18a9 Use utc_offset() where applicable, and hide the internals of it
as static variables.
2006-10-02 18:23:37 +00:00
Poul-Henning Kamp
b69f71eb29 Second part of a little cleanup in the calendar/timezone/RTC handling.
Split subr_clock.c in two parts (by repo-copy):
   subr_clock.c contains generic RTC and calendaric stuff. etc.
   subr_rtc.c contains the newbus'ified RTC interface.

Centralize the machdep.{adjkerntz,disable_rtc_set,wall_cmos_clock}
sysctls and associated variables into subr_clock.c.  They are
not machine dependent and we have generic code that relies on being
present so they are not even optional.
2006-10-02 15:42:02 +00:00
Poul-Henning Kamp
f645b0b51c First part of a little cleanup in the calendar/timezone/RTC handling.
Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.
2006-10-02 12:59:59 +00:00
Maxim Sobolev
2c473eaf67 Extend comment explaining why code is conditional at !defined(SCHED_ULE).
Suggested by:	ru
2006-09-27 22:09:35 +00:00
Maxim Sobolev
6e93c19e3d Since ULE doesn't honor hlt_cpus_mask don't compile code that prevents
timer interrupt servicing for disabled HTT cores in ULE case. Should be
probably fixed in ULE code instead, but we have no real maintainer for
ULE to do it.

PR:		103697
2006-09-27 18:51:19 +00:00
Ruslan Ermilov
6c9fdda750 Added COMPAT_FREEBSD6 option. 2006-09-26 12:36:34 +00:00
David Xu
07a8ebcc75 Stop reloading %fs and %gs, since it causes the base address from
GDT to be loaded into FS.base and GS.base, these values of course
are not the values set by sysarch() with I386_SET_FSBASE and
I386_SET_GSBASE, the change fixed a crash for 32bit libthr after
signal handler returned and normal code is accessing thread pointer,
for example: movl %gs:8, %eax.
2006-09-23 13:42:09 +00:00
John Baldwin
d72a078647 Update the ipmi(4) driver:
- Split out the communication protocols into their own files and use
  a couple of function pointers in the softc that the commuication
  protocols setup in their own attach routine.
- Add support for the SSIF interface (talking to IPMI over SMBus).
- Add an ACPI attachment.
- Add a PCI attachment that attaches to devices with the IPMI interface
  subclass.
- Split the ISA attachment out into its own file: ipmi_isa.c.
- Change the code to probe the SMBIOS table for an IPMI entry to just use
  pmap_mapbios() to map the table in rather than trying to setup a fake
  resource on an isa device and then activating the resource to map in the
  table.
- Make bus attachments leaner by adding attach functions for each
  communication interface (ipmi_kcs_attach(), ipmi_smic_attach(), etc.)
  that setup per-interface data.
- Formalize the model used by the driver to handle requests by adding an
  explicit struct ipmi_request object that holds the state of a given
  request and reply for the entire lifetime of the request.  By bundling
  the request into an object, it is easier to add retry logic to the various
  communication backends (as well as eventually support BT mode which uses
  a slightly different message format than KCS, SMIC, and SSIF).
- Add a per-softc lock and remove D_NEEDGIANT as the driver is now MPSAFE.
- Add 32-bit compatibility ioctl shims so you can use a 32-bit ipmitool
  on FreeBSD/amd64.
- Add ipmi(4) to i386 and amd64 NOTES.

Submitted by:	ambrisko (large portions of 2 and 3)
Sponsored by:	IronPort Systems, Yahoo!
MFC after:	6 days
2006-09-22 22:11:29 +00:00
Alexander Kabaev
d9cb97ff9d Use __builtin_va_start instead of __builtin_stdarg_start. GCC4 obsoletes
the former and  __builtin_va_start was present in all GCC version 3.1 and
later.
2006-09-21 01:37:02 +00:00
Wojciech A. Koszek
dec10b39fd Correct 'interrupt interrupt' -> 'interrupt' in the comment.
Requested by:	jhb
Approved by:	cognet (mentor)
2006-09-20 20:52:11 +00:00