Commit Graph

12508 Commits

Author SHA1 Message Date
Konstantin Belousov
d4e9009cc8 Fix the VM_BCACHE_SIZE_MAX definition on i386 to match the maximal
buffer map size, auto-tuned on the 4GB machine.  Having the maxbcache
bigger than the buffer map causes the transient bio map sizing logic
to assume that there is enough KVA to use approximately 90MB (buffer
map is sized to 110MB, and maxbcache is 200MB).  The increase in the
KVA usage caused other big KVA consumers, like nvidia.ko, to fail the
initialization.

Change the definition for both PAE and non-PAE cases, since PAE is
even more KVA-starved.

Reported and tested by:	David Wolfskill
Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
2013-03-27 10:52:18 +00:00
Konstantin Belousov
ee75e7de7b Implement the concept of the unmapped VMIO buffers, i.e. buffers which
do not map the b_pages pages into buffer_map KVA.  The use of the
unmapped buffers eliminate the need to perform TLB shootdown for
mapping on the buffer creation and reuse, greatly reducing the amount
of IPIs for shootdown on big-SMP machines and eliminating up to 25-30%
of the system time on i/o intensive workloads.

The unmapped buffer should be explicitely requested by the GB_UNMAPPED
flag by the consumer.  For unmapped buffer, no KVA reservation is
performed at all. The consumer might request unmapped buffer which
does have a KVA reserve, to manually map it without recursing into
buffer cache and blocking, with the GB_KVAALLOC flag.

When the mapped buffer is requested and unmapped buffer already
exists, the cache performs an upgrade, possibly reusing the KVA
reservation.

Unmapped buffer is translated into unmapped bio in g_vfs_strategy().
Unmapped bio carry a pointer to the vm_page_t array, offset and length
instead of the data pointer.  The provider which processes the bio
should explicitely specify a readiness to accept unmapped bio,
otherwise g_down geom thread performs the transient upgrade of the bio
request by mapping the pages into the new bio_transient_map KVA
submap.

The bio_transient_map submap claims up to 10% of the buffer map, and
the total buffer_map + bio_transient_map KVA usage stays the
same. Still, it could be manually tuned by kern.bio_transient_maxcnt
tunable, in the units of the transient mappings.  Eventually, the
bio_transient_map could be removed after all geom classes and drivers
can accept unmapped i/o requests.

Unmapped support can be turned off by the vfs.unmapped_buf_allowed
tunable, disabling which makes the buffer (or cluster) creation
requests to ignore GB_UNMAPPED and GB_KVAALLOC flags.  Unmapped
buffers are only enabled by default on the architectures where
pmap_copy_page() was implemented and tested.

In the rework, filesystem metadata is not the subject to maxbufspace
limit anymore. Since the metadata buffers are always mapped, the
buffers still have to fit into the buffer map, which provides a
reasonable (but practically unreachable) upper bound on it. The
non-metadata buffer allocations, both mapped and unmapped, is
accounted against maxbufspace, as before. Effectively, this means that
the maxbufspace is forced on mapped and unmapped buffers separately.
The pre-patch bufspace limiting code did not worked, because
buffer_map fragmentation does not allow the limit to be reached.

By Jeff Roberson request, the getnewbuf() function was split into
smaller single-purpose functions.

Sponsored by:	The FreeBSD Foundation
Discussed with:	jeff (previous version)
Tested by:	pho, scottl (previous version), jhb, bf
MFC after:	2 weeks
2013-03-19 14:13:12 +00:00
Attilio Rao
774d251d99 Sync back vmcontention branch into HEAD:
Replace the per-object resident and cached pages splay tree with a
path-compressed multi-digit radix trie.
Along with this, switch also the x86-specific handling of idle page
tables to using the radix trie.

This change is supposed to do the following:
- Allowing the acquisition of read locking for lookup operations of the
  resident/cached pages collections as the per-vm_page_t splay iterators
  are now removed.
- Increase the scalability of the operations on the page collections.

The radix trie does rely on the consumers locking to ensure atomicity of
its operations.  In order to avoid deadlocks the bisection nodes are
pre-allocated in the UMA zone.  This can be done safely because the
algorithm needs at maximum one new node per insert which means the
maximum number of the desired nodes is the number of available physical
frames themselves.  However, not all the times a new bisection node is
really needed.

The radix trie implements path-compression because UFS indirect blocks
can lead to several objects with a very sparse trie, increasing the number
of levels to usually scan.  It also helps in the nodes pre-fetching by
introducing the single node per-insert property.

This code is not generalized (yet) because of the possible loss of
performance by having much of the sizes in play configurable.
However, efforts to make this code more general and then reusable in
further different consumers might be really done.

The only KPI change is the removal of the function vm_page_splay() which
is now reaped.
The only KBI change, instead, is the removal of the left/right iterators
from struct vm_page, which are now reaped.

Further technical notes broken into mealpieces can be retrieved from the
svn branch:
http://svn.freebsd.org/base/user/attilio/vmcontention/

Sponsored by:	EMC / Isilon storage division
In collaboration with:	alc, jeff
Tested by:	flo, pho, jhb, davide
Tested by:	ian (arm)
Tested by:	andreast (powerpc)
2013-03-18 00:25:02 +00:00
Konstantin Belousov
e8a4a618cf Add pmap function pmap_copy_pages(), which copies the content of the
pages around, taking array of vm_page_t both for source and
destination.  Starting offsets and total transfer size are specified.

The function implements optimal algorithm for copying using the
platform-specific optimizations.  For instance, on the architectures
were the direct map is available, no transient mappings are created,
for i386 the per-cpu ephemeral page frame is used.  The code was
typically borrowed from the pmap_copy_page() for the same
architecture.

Only i386/amd64, powerpc aim and arm/arm-v6 implementations were
tested at the time of commit. High-level code, not committed yet to
the tree, ensures that the use of the function is only allowed after
explicit enablement.

For sparc64, the existing code has known issues and a stab is added
instead, to allow the kernel linking.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho (i386, amd64), scottl (amd64), ian (arm and arm-v6)
MFC after:	2 weeks
2013-03-14 20:18:12 +00:00
Alan Cox
9f585991ba The kernel pmap is statically allocated, so there is really no need to
explicitly initialize its pm_root field to zero.

Sponsored by:	EMC / Isilon Storage Division
2013-03-10 21:07:44 +00:00
Attilio Rao
89f6b8632c Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
  - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
  - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
  - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
  - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
    (in order to avoid visibility of implementation details)
  - The read-mode operations are added:
    VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
    VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
  sys/mutex.h in consumers directly to cater its inlining functions
  using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
  consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
  the compat layer because the name clash between FreeBSD and solaris
  versions must be avoided.
  At this purpose zfs redefines the vm_object locking functions
  directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit.  Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by:	EMC / Isilon storage division
Reviewed by:	jeff
Reviewed by:	pjd (ZFS specific review)
Discussed with:	alc
Tested by:	pho
2013-03-09 02:32:23 +00:00
Bryan Venteicher
0cfbcf8c7b Remove the virtio dependency entry for the VirtIO device drivers. This
will prevent the kernel from linking if the device driver are included
without the virtio module. Remove pci and scbus for the same reason.

Also explain the relationship and necessity of the virtio and virtio_pci
modules. Currently in FreeBSD, we only support VirtIO PCI, but it could
be replaced with a different interface (like MMIO) and the device
(network, block, etc) will still function.

Requested by:	luigi
Approved by:	grehan (mentor)
MFC after:	3 days
2013-03-06 07:17:53 +00:00
Kenneth D. Merry
3a45b4781a Re-enable CTL in GENERIC on i386 and amd64, but turn on the CTL disable
tunable by default.

This will allow GENERIC configurations to boot on small memory boxes, but
not require end users who want to use CTL to recompile their kernel.  They
can simply set kern.cam.ctl.disable=0 in loader.conf.

The eventual solution to the memory usage problem is to change the way
CTL allocates memory to be more configurable, but this should fix things
for small memory situations in the mean time.

UPDATING:		Explain the change in the CTL configuration, and
			how users can enable CTL if they would like to use
			it.

sys/conf/options:	Add a new option, CTL_DISABLE, that prevents CTL
			from initializing.

ctl.c:			If CTL_DISABLE is turned on, don't initialize.

i386/conf/GENERIC,
amd64/conf/GENERIC:	Re-enable device ctl, and add the CTL_DISABLE
			option.
2013-03-04 21:18:45 +00:00
Attilio Rao
03e78eac37 Fix-up r247622 by also renaming pv_list iterator into the xen
pmap verbatim copy.

Sponsored by:	EMC / Isilon storage division
Reported by:	tinderbox
2013-03-03 01:02:57 +00:00
Attilio Rao
b38d37f7b5 Merge from vmc-playground branch:
Rename the pv_entry_t iterator from pv_list to pv_next.
Besides being more correct technically (as the name seems to suggest
this is a list while it is an iterator), it will also be needed by
vm_radix work to avoid a nameclash on macro expansions.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc, jeff
Tested by:	flo, pho, jhb, davide
2013-03-02 14:19:08 +00:00
Adrian Chadd
fe138cc2af Disable the ctl driver in GENERIC.
It unfortunately steals a fair chunk of RAM at startup even if it's not
actively used, which prevents FreeBSD VMs of 128MB from successfully
booting and running.
2013-03-02 08:12:41 +00:00
Pawel Jakub Dawidek
2609222ab4 Merge Capsicum overhaul:
- Capability is no longer separate descriptor type. Now every descriptor
  has set of its own capability rights.

- The cap_new(2) system call is left, but it is no longer documented and
  should not be used in new code.

- The new syscall cap_rights_limit(2) should be used instead of
  cap_new(2), which limits capability rights of the given descriptor
  without creating a new one.

- The cap_getrights(2) syscall is renamed to cap_rights_get(2).

- If CAP_IOCTL capability right is present we can further reduce allowed
  ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed
  ioctls can be retrived with cap_ioctls_get(2) syscall.

- If CAP_FCNTL capability right is present we can further reduce fcntls
  that can be used with the new cap_fcntls_limit(2) syscall and retrive
  them with cap_fcntls_get(2).

- To support ioctl and fcntl white-listing the filedesc structure was
  heavly modified.

- The audit subsystem, kdump and procstat tools were updated to
  recognize new syscalls.

- Capability rights were revised and eventhough I tried hard to provide
  backward API and ABI compatibility there are some incompatible changes
  that are described in detail below:

	CAP_CREATE old behaviour:
	- Allow for openat(2)+O_CREAT.
	- Allow for linkat(2).
	- Allow for symlinkat(2).
	CAP_CREATE new behaviour:
	- Allow for openat(2)+O_CREAT.

	Added CAP_LINKAT:
	- Allow for linkat(2). ABI: Reuses CAP_RMDIR bit.
	- Allow to be target for renameat(2).

	Added CAP_SYMLINKAT:
	- Allow for symlinkat(2).

	Removed CAP_DELETE. Old behaviour:
	- Allow for unlinkat(2) when removing non-directory object.
	- Allow to be source for renameat(2).

	Removed CAP_RMDIR. Old behaviour:
	- Allow for unlinkat(2) when removing directory.

	Added CAP_RENAMEAT:
	- Required for source directory for the renameat(2) syscall.

	Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR):
	- Allow for unlinkat(2) on any object.
	- Required if target of renameat(2) exists and will be removed by this
	  call.

	Removed CAP_MAPEXEC.

	CAP_MMAP old behaviour:
	- Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and
	  PROT_WRITE.
	CAP_MMAP new behaviour:
	- Allow for mmap(2)+PROT_NONE.

	Added CAP_MMAP_R:
	- Allow for mmap(PROT_READ).
	Added CAP_MMAP_W:
	- Allow for mmap(PROT_WRITE).
	Added CAP_MMAP_X:
	- Allow for mmap(PROT_EXEC).
	Added CAP_MMAP_RW:
	- Allow for mmap(PROT_READ | PROT_WRITE).
	Added CAP_MMAP_RX:
	- Allow for mmap(PROT_READ | PROT_EXEC).
	Added CAP_MMAP_WX:
	- Allow for mmap(PROT_WRITE | PROT_EXEC).
	Added CAP_MMAP_RWX:
	- Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).

	Renamed CAP_MKDIR to CAP_MKDIRAT.
	Renamed CAP_MKFIFO to CAP_MKFIFOAT.
	Renamed CAP_MKNODE to CAP_MKNODEAT.

	CAP_READ old behaviour:
	- Allow pread(2).
	- Disallow read(2), readv(2) (if there is no CAP_SEEK).
	CAP_READ new behaviour:
	- Allow read(2), readv(2).
	- Disallow pread(2) (CAP_SEEK was also required).

	CAP_WRITE old behaviour:
	- Allow pwrite(2).
	- Disallow write(2), writev(2) (if there is no CAP_SEEK).
	CAP_WRITE new behaviour:
	- Allow write(2), writev(2).
	- Disallow pwrite(2) (CAP_SEEK was also required).

	Added convinient defines:

	#define	CAP_PREAD		(CAP_SEEK | CAP_READ)
	#define	CAP_PWRITE		(CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_R		(CAP_MMAP | CAP_SEEK | CAP_READ)
	#define	CAP_MMAP_W		(CAP_MMAP | CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_X		(CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL)
	#define	CAP_MMAP_RW		(CAP_MMAP_R | CAP_MMAP_W)
	#define	CAP_MMAP_RX		(CAP_MMAP_R | CAP_MMAP_X)
	#define	CAP_MMAP_WX		(CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_MMAP_RWX		(CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_RECV		CAP_READ
	#define	CAP_SEND		CAP_WRITE

	#define	CAP_SOCK_CLIENT \
		(CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \
		 CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN)
	#define	CAP_SOCK_SERVER \
		(CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \
		 CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \
		 CAP_SETSOCKOPT | CAP_SHUTDOWN)

	Added defines for backward API compatibility:

	#define	CAP_MAPEXEC		CAP_MMAP_X
	#define	CAP_DELETE		CAP_UNLINKAT
	#define	CAP_MKDIR		CAP_MKDIRAT
	#define	CAP_RMDIR		CAP_UNLINKAT
	#define	CAP_MKFIFO		CAP_MKFIFOAT
	#define	CAP_MKNOD		CAP_MKNODAT
	#define	CAP_SOCK_ALL		(CAP_SOCK_CLIENT | CAP_SOCK_SERVER)

Sponsored by:	The FreeBSD Foundation
Reviewed by:	Christoph Mallon <christoph.mallon@gmx.de>
Many aspects discussed with:	rwatson, benl, jonathan
ABI compatibility discussed with:	kib
2013-03-02 00:53:12 +00:00
Alexander Motin
fdc5dd2d2f MFcalloutng:
Switch eventtimers(9) from using struct bintime to sbintime_t.
Even before this not a single driver really supported full dynamic range of
struct bintime even in theory, not speaking about practical inexpediency.
This change legitimates the status quo and cleans up the code.
2013-02-28 13:46:03 +00:00
Davide Italiano
acccf7d8b4 MFcalloutng:
When CPU becomes idle, cpu_idleclock() calculates time to the next timer
event in order to reprogram hw timer. Return that time in sbintime_t to
the caller and pass it to acpi_cpu_idle(), where it can be used as one
more factor (quite precise) to extimate furter sleep time and choose
optimal sleep state. This is a preparatory change for further callout
improvements will be committed in the next days.

The commmit is not targeted for MFC.
2013-02-28 10:46:54 +00:00
Attilio Rao
dc1558d1cd Merge from vmobj-rwlock:
VM_OBJECT_LOCKED() macro is only used to implement a custom version
of lock assertions right now (which likely spread out thanks to
copy and paste).
Remove it and implement actual assertions.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
Tested by:	pho
2013-02-27 18:12:13 +00:00
Warner Losh
ebf4319503 Locking for todr got pushed down into inittodr and the client
libraries it calls (although some might not be doing it right). We are
serialized right now by giant as well. This means the splsoftclock are
now an anachronism that has no benefit, even marking where locking
needs to happen. Remove them.
2013-02-21 07:16:40 +00:00
Konstantin Belousov
31a53cd036 Convert machine/elf.h, machine/frame.h, machine/sigframe.h,
machine/signal.h and machine/ucontext.h into common x86 includes,
copying from amd64 and merging with i386.

Kernel-only compat definitions are kept in the i386/include/sigframe.h
and i386/include/signal.h, to reduce amd64 kernel namespace pollution.
The amd64 compat uses its own definitions so far.

The _MACHINE_ELF_WANT_32BIT definition is to allow the
sys/boot/userboot/userboot/elf32_freebsd.c to use i386 ELF definitions
on the amd64 compile host.  The same hack could be usefully abused by
other code too.
2013-02-20 17:39:52 +00:00
Jung-uk Kim
00a54dfb1c Consistently use round_page(x) rather than roundup(x, PAGE_SIZE). There is
no functional change.
2013-02-15 22:43:08 +00:00
Andriy Gapon
1a89ca4cf5 cpususpend_handler: mark AP as resumed only after fully setting up lapic
Reviewed by:	jhb
Tested by:	Sergey V. Dyatko <sergey.dyatko@gmail.com>,
		KAHO Toshikazu <kaho@elam.kais.kyoto-u.ac.jp>
MFC after:	12 days
2013-02-02 12:04:32 +00:00
Andriy Gapon
548b201607 x86 suspend/resume: suspend pics and pseudo-pics in reverse order
- change 'pics' from STAILQ to TAILQ
- ensure that Local APIC is always first in 'pics'

Reviewed by:	jhb
Tested by:	Sergey V. Dyatko <sergey.dyatko@gmail.com>,
		KAHO Toshikazu <kaho@elam.kais.kyoto-u.ac.jp>
MFC after:	12 days
2013-02-02 12:02:42 +00:00
Eitan Adler
4752ed3d7f Remove support for plip from the GENERIC kernel as no systems in the
last 10 years require this support.

Discussed with:	db
Discussed with:	kib
Reviewed by:	imp
Reviewed by:	jhb
Reviewed by:	-hackers
Approved by:	cperciva (mentor)
2013-02-01 20:17:11 +00:00
Andre Oppermann
8291b48244 Remove unused VM_MAX_AUTOTUNE_NMBCLUSTERS define. 2013-02-01 14:16:37 +00:00
John Baldwin
d825ce0a5d Reduce duplication between i386/linux/linux.h and amd64/linux32/linux.h
by moving bits that are MI out into headers in compat/linux.

Reviewed by:	Chagin Dmitry  dmitry | gmail
MFC after:	2 weeks
2013-01-29 18:41:30 +00:00
John Baldwin
fb709557a3 Don't assume that all Linux TCP-level socket options are identical to
FreeBSD TCP-level socket options (only the first two are).  Instead,
using a mapping function and fail unsupported options as we do for other
socket option levels.

MFC after:	2 weeks
2013-01-23 21:44:48 +00:00
John Baldwin
b5821c6f0e Fix build with SMP disabled.`
Reported by:	bf
2013-01-19 01:18:22 +00:00
John Baldwin
f876ffeae3 Don't attempt to use clflush on the local APIC register window. Various
CPUs exhibit bad behavior if this is done (Intel Errata AAJ3, hangs on
Pentium-M, and trashing of the local APIC registers on a VIA C7).  The
local APIC is implicitly mapped UC already via MTRRs, so the clflush isn't
necessary anyway.

MFC after:	2 weeks
2013-01-17 21:32:25 +00:00
Bryan Venteicher
ae366ffcbd Add VirtIO to the i386 and amd64 GENERIC kernels
This also removes the kludge from r239009 that covered only
the network driver.

Reviewed by:	grehan
Approved by:	grehan (mentor)
MFC after:	1 week
2013-01-13 07:14:16 +00:00
Konstantin Belousov
0dcbedfa61 Enable the UFS quotas for big-iron GENERIC kernels.
Discussed with:	      mckusick
MFC after:	      2 weeks
2013-01-03 19:03:41 +00:00
Dag-Erling Smørgrav
36fca20f10 As discussed on -current last October, remove the firewire drivers from
GENERIC.
2013-01-03 14:30:24 +00:00
Marius Strobl
9355088a94 Fix !INVARIANTS && !SMP build.
MFC after:	3 days
2013-01-03 01:09:50 +00:00
Jim Harris
f2fcc434ee Revert r243960 based on feedback regarding keeping x86 headers unified
(mdf@, tijl@) and use of KASSERT/systm.h in bus.h (zeising@, bde@).

Alternate implementation will be made in a separate commit.
2012-12-13 21:27:20 +00:00
Jim Harris
71a30c4436 Add amd64 implementations for 8-byte bus_space routines.
Submitted by:	Carl Delsey <carl.r.delsey@intel.com>
Discussed with:	jhb, rwatson
Reviewed by:	jimharris
MFC after:	1 week
2012-12-06 22:33:31 +00:00
Konstantin Belousov
349438a243 Print the frame addresses for the backtraces on i386 and amd64. It
allows both to inspect the frame sizes and to manually peek into the
frames from ddb, if needed.

Reviewed by:	dim
MFC after:	2 weeks
2012-12-03 22:16:51 +00:00
Jung-uk Kim
7609e73ca0 Remove duplicate code. Reduce diff between amd64 and i386. 2012-12-01 00:56:19 +00:00
Jung-uk Kim
8c2b353ead Use volatile keywords properly. 2012-11-30 20:15:01 +00:00
Jung-uk Kim
231ac244f8 Tidy up inline assembly. No functional change. 2012-11-30 00:59:37 +00:00
Dimitry Andric
bdea742cf7 Fix a minor warning in sys/i386/xen/clock.c.
MFC after:	3 days
2012-11-12 20:50:11 +00:00
Alfred Perlstein
79f62ed690 Allow maxusers to scale on machines with large address space.
Some hooks are added to clamp down maxusers and nmbclusters for
small address space systems.

VM_MAX_AUTOTUNE_MAXUSERS - the max maxusers that will be autotuned based on
physical memory.
VM_MAX_AUTOTUNE_NMBCLUSTERS - max nmbclusters based on physical memory.

These are set to the old values on i386 to preserve the clamping that was
being done to all arches.

Another macro VM_AUTOTUNE_NMBCLUSTERS is provided to allow an override
for the calculation on a MD basis.  Currently no arch defines this.

Reviewed by: peter
MFC after: 2 weeks
2012-11-10 02:08:40 +00:00
Attilio Rao
cfedf924d3 Rework the known rwlock to benefit about staying on their own
cache line in order to avoid manual frobbing but using
struct rwlock_padalign.

Reviewed by:	alc, jimharris
2012-11-03 23:03:14 +00:00
Konstantin Belousov
140dedb81c The r241025 fixed the case when a binary, executed from nullfs mount,
was still possible to open for write from the lower filesystem.  There
is a symmetric situation where the binary could already has file
descriptors opened for write, but it can be executed from the nullfs
overlay.

Handle the issue by passing one v_writecount reference to the lower
vnode if nullfs vnode has non-zero v_writecount.  Note that only one
write reference can be donated, since nullfs only keeps one use
reference on the lower vnode.  Always use the lower vnode v_writecount
for the checks.

Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is
currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT
to manipulate the v_writecount value, which manages a single bypass
reference to the lower vnode.  Caling the VOPs instead of directly
accessing v_writecount provide the fix described in the previous
paragraph.

Tested by:	pho
MFC after:	3 weeks
2012-11-02 13:56:36 +00:00
Konstantin Belousov
9065aa6497 Add missed sched_pin().
Submitted by:	Svatopluk Kraus <onwahe@gmail.com>
Reviewed by:	alc
MFC after:	3 days
2012-10-24 18:21:22 +00:00
Konstantin Belousov
5050aa86cf Remove the support for using non-mpsafe filesystem modules.
In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by:	attilio
Tested by:	pho
2012-10-22 17:50:54 +00:00
Eitan Adler
a8de37b024 This isn't functionally identical. In some cases a hint to disable
unit 0 would in fact disable all units.

This reverts r241856

Approved by: cperciva (implicit)
2012-10-22 13:06:09 +00:00
Eitan Adler
1611e2c0d1 The 'testing memory' patch gets printed too many times
Approved by: cperciva (implicit)
2012-10-22 11:57:26 +00:00
Eitan Adler
76b7512247 Now that device disabling is generic, remove extraneous code from the
device drivers that used to provide this feature.

Reviewed by:	des
Approved by:	cperciva
MFC after:	1 week
2012-10-22 03:41:14 +00:00
Eitan Adler
267cc84937 Explain the upcoming delay by printing a message when the kernel
is about to begin testing memory.

Reviewed by:	dteske, adri
Approved by:	cperciva
MFC after:	1 week
2012-10-22 03:16:39 +00:00
Konstantin Belousov
737ced3ecf MFamd64: add machdep.uprintf_signal.
MFC after:	1 week
2012-10-14 17:09:50 +00:00
Andriy Gapon
851dbc07af pciereg_cfg*: use assembly to access the mem-mapped cfg space
AMD BKDG for CPU families 10h and later requires that the memory
mapped config is always read into or written from al/ax/eax register.

Discussed with:	kib, alc
Reviewed by:	kib (earlier version)
MFC after:	25 days
2012-10-14 10:13:50 +00:00
Alan Cox
c1a16a1fb2 Replace all uses of the vm page queues lock by a new R/W lock.
Unfortunately, this lock cannot be defined as static under Xen because it
is (ab)used to serialize queued page table changes.

Tested by:	sbruno
2012-10-12 23:26:00 +00:00
Alan Cox
0bec9f73db MFi386 r241356
Add several asserts.

MFC after:	3 days
2012-10-10 17:15:34 +00:00
Kevin Lo
9823d52705 Revert previous commit...
Pointyhat to:	kevlo (myself)
2012-10-10 08:36:38 +00:00
Attilio Rao
3a4730256a Add an unified macro to deny ability from the compiler to reorder
instruction loads/stores at its will.
The macro __compiler_membar() is currently supported for both gcc and
clang, but kernel compilation will fail otherwise.

Reviewed by:	bde, kib
Discussed with:	dim, theraven
MFC after:	2 weeks
2012-10-09 14:32:30 +00:00
Attilio Rao
af2bdacafb Reverts r234074,234105,234564,234723,234989,235231-235232 and part of
r234247.
Use, instead, the static intializer introduced in r239923 for x86 and
sparc64 intr_cpus, unwinding the code to the initial version.

Reviewed by:	marius
2012-10-09 12:22:43 +00:00
Kevin Lo
a10cee30c9 Prefer NULL over 0 for pointers 2012-10-09 08:27:40 +00:00
Konstantin Belousov
4e4458327a Add several asserts to i386 pmap, which mostly state that pv entry shall
have corresponding pte.

Reviewed by:	alc
Tested by:	pho
MFC after:	3 days
2012-10-08 18:33:08 +00:00
Alan Cox
e1de0706a0 In a few places, like the implementation of ptrace(), a thread may call
upon pmap_enter() to create a mapping within a different address space,
i.e., not the thread's own address space.  On i386, this entails the
creation of a temporary mapping to the affected page table page (PTP).  In
general, pmap_enter() will read from this PTP, allocate a PV entry, and
write to this PTP.  The trouble comes when the system is short of memory.
In order to allocate a new PV entry, an older PV entry has to be
reclaimed.  Reclaiming a PV entry involves destroying a mapping, which
requires access to the affected PTP.  Thus, the PTP mapped at the
beginning of pmap_enter() is no longer mapped at the end of pmap_enter(),
which leads to pmap_enter() modifying the wrong PTP.  To address this
problem, pmap_pv_reclaim() is changed to use an alternate method of
mapping PTPs.

Update a related comment.

Reported by:	pho
Diagnosed by:	kib
MFC after:	5 days
2012-10-08 16:57:05 +00:00
Kenneth D. Merry
25aae1bed3 Add the mps(4) driver to the i386 GENERIC config file. LSI has tested it
on i386 and verified that it works.

Submitted by:	Harald Schmalzbauer, John Baldwin, Kashyap Desai
MFC after:	3 days
2012-10-01 21:42:32 +00:00
Kevin Lo
954c5baed9 Add missing header needed by free(9).
Spotted by:	David Wolfskill <david at catwhisker dot org>
2012-09-30 15:42:20 +00:00
Kevin Lo
b5db12bfb5 Free result of device_get_children(9). 2012-09-30 09:21:10 +00:00
John Baldwin
960b5a7080 - Re-shuffle the <machine/pc/bios.h> headers to move all kernel-specific
bits under #ifdef _KERNEL but leave definitions for various structures
  defined by standards ($PIR table, SMAP entries, etc.) available to
  userland.
- Consolidate duplicate SMBIOS table structure definitions in ipmi(4)
  and smbios(4) in <machine/pc/bios.h> and make them available to
  userland.

MFC after:	2 weeks
2012-09-28 11:59:32 +00:00
Alan Cox
e4b8a2fc5a Eliminate a stale comment. It describes another use case for the pmap in
Mach that doesn't exist in FreeBSD.
2012-09-28 05:30:59 +00:00
Dimitry Andric
f99157cced After r205013, amd64 and i386 CPU family and model IDs were printed out
in hexadecimal, but without any 0x prefix, which can be very misleading.

MFC after:	3 days
2012-09-21 10:31:19 +00:00
Jim Harris
eb85d44f06 Integrate nvme(4) and nvd(4) into the amd64 and i386 builds.
Sponsored by:	Intel
2012-09-17 19:26:33 +00:00
Eitan Adler
582212fa04 s/teh/the/g
Approved by:	cperciva
MFC after:	3 days
2012-09-14 21:59:55 +00:00
Konstantin Belousov
c5e3d0ab11 Rename the IVY_RNG option to RDRAND_RNG.
Based on submission by:	Arthur Mesh <arthurmesh@gmail.com>
MFC after:	2 weeks
2012-09-13 10:12:16 +00:00
Alan Cox
7336315b0a Simplify pmap_unmapdev(). Since kmem_free() eventually calls pmap_remove(),
pmap_unmapdev()'s own direct efforts to destroy the page table entries are
redundant, so eliminate them.

Don't set PTE_W on the page table entry in pmap_kenter{,_attr}() on MIPS.
Setting PTE_W on MIPS is inconsistent with the implementation of this
function on other architectures.  Moreover, PTE_W should not be set, unless
the pmap's wired mapping count is incremented, which pmap_kenter{,_attr}()
doesn't do.

MFC after:	10 days
2012-09-10 16:11:29 +00:00
Attilio Rao
324e57150d userret() already checks for td_locks when INVARIANTS is enabled, so
there is no need to check if Giant is acquired after it.

Reviewed by:	kib
MFC after:	1 week
2012-09-08 18:27:11 +00:00
Konstantin Belousov
ef9461ba0e Add support for new Intel on-CPU Bull Mountain random number
generator, found on IvyBridge and supposedly later CPUs, accessible
with RDRAND instruction.

From the Intel whitepapers and articles about Bull Mountain, it seems
that we do not need to perform post-processing of RDRAND results, like
AES-encryption of the data with random IV and keys, which was done for
Padlock. Intel claims that sanitization is performed in hardware.

Make both Padlock and Bull Mountain random generators support code
covered by kernel config options, for the benefit of people who prefer
minimal kernels. Also add the tunables to disable hardware generator
even if detected.

Reviewed by:	markm, secteam (simon)
Tested by:	bapt, Michael Moll <kvedulv@kvedulv.de>
MFC after:	3 weeks
2012-09-05 13:18:51 +00:00
Alan Cox
d8f9ed32c5 Rename {_,}pmap_unwire_pte_hold() to {_,}pmap_unwire_ptp() and update the
comment describing them.  Both the function names and the comment had grown
stale.  Quite some time has passed since these pmap implementations last
used the page's hold count to track the number of valid mapping within a
page table page.  Also, returning TRUE from pmap_unwire_ptp() rather than
_pmap_unwire_ptp() eliminates a few instructions from callers like
pmap_enter_quick_locked() where pmap_unwire_ptp()'s return value is used
directly by a conditional statement.
2012-09-05 06:02:54 +00:00
Xin LI
0807ad7422 Add hpt27xx to GENERIC kernel for amd64 and i386 systems.
MFC after:	2 weeks
2012-09-04 21:02:57 +00:00
John Baldwin
778eefa40d Fix duplicate entries for mwl(4):
- Move mwlfw from {amd64,i386}/conf/NOTES to sys/conf/NOTES (mwl(4) is
  already present in sys/conf/NOTES).
- Remove duplicate mwl(4) entries from {amd64,i386}/conf/NOTES.
- While here, add a description to the sfxge line in amd64/conf/NOTES.
2012-09-04 19:19:36 +00:00
Dimitry Andric
7b4806d69f Remove the argument-less .align directive in sys/i386/bios/smapi_bios.S.
Specifying no argument is undocumented in the gas manual, and clang's
integrated assembler refuses to parse it.  Also, removing it causes no
change at all in the resulting object file.

MFC after:	1 week
2012-08-29 18:22:52 +00:00
John Baldwin
4c1044491b Fix misspelled "Infiniband".
Submitted by:	gcooper
MFC after:	3 days
2012-08-28 11:34:09 +00:00
Dag-Erling Smørgrav
ae7f84a9a4 Parly revert r239255: reinstate a default maxswzone on i386, where KVA is
scarce, but set it slightly higher so we can handle 8 GB of swap.
2012-08-27 13:22:27 +00:00
Glen Barber
67944c4572 Grammar fix: s/NIC's/NICs/
MFC after:	3 days
2012-08-26 01:21:02 +00:00
Dag-Erling Smørgrav
e2082935f0 As discussed on -current, remove the hardcoded default maxswzone.
MFC after:	3 weeks
2012-08-14 17:01:21 +00:00
John Baldwin
6e4ac34b07 Remove the deassert INIT IPI from the IPI startup sequence for APs.
It is not listed in the boot sequence in the MP specification (1.4),
and it is explicitly ignored on modern CPUs.  It was only ever required
when bootstrapping systems with external APICs (that is, SMP machines
with 486s), which FreeBSD has never supported (and never will).

While here, tidy some comments and remove some banal ones.
2012-08-13 18:52:51 +00:00
John Baldwin
a4284ef768 Add a 10 millisecond delay after sending the initial INIT IPI. This
matches the algorithm in the MP specification (1.4).  Previously we
were sending out the deassert INIT IPI immediately after the initial
INIT IPI was sent.
2012-08-13 16:33:22 +00:00
Colin Percival
347c7fd7bf Build modules along with the XENHVM kernels.
No objections from:	freebsd-xen mailing list
MFC after:	1 week
2012-08-13 07:36:57 +00:00
Alan Cox
bab30462fb Eliminate an unnecessary acquisition and release of the page queues lock
from pmap_pte().  PT_SET_MA() is not a queued mapping update, but instead
an immediate mapping update, so the page queues lock is not required here.

Reviewed by:	cperciva
2012-08-10 05:47:04 +00:00
Konstantin Belousov
0220d04fe3 Add lfence().
MFC after:	1 week
2012-08-01 17:24:53 +00:00
John Baldwin
bb07b39fa1 Regen. 2012-07-30 20:45:17 +00:00
John Baldwin
e6e0554a98 The linux_lstat() system call accepts a pointer to a 'struct l_stat', not a
'struct ostat'.
2012-07-30 20:44:45 +00:00
Konstantin Belousov
a42fa0af44 Change (unused) prototype for stmxcsr() to match reality.
Noted by:	jhb
MFC after:	1 week
2012-07-30 19:26:02 +00:00
Konstantin Belousov
e93d0cbef1 MFamd64 r238623:
Introduce curpcb magic variable.

Requested and reviewed by:	bde
MFC after:	3 weeks
2012-07-26 09:11:37 +00:00
Konstantin Belousov
1e39a4bcee MFCamd64 r238598:
Provide siginfo.si_code for floating point errors when error occurs
using the SSE math processor.

MFC after:    3 weeks
2012-07-21 21:52:48 +00:00
Konstantin Belousov
dad46c5594 MFamd64 r238668:
Stop clearing x87 exceptions in the #MF handler.

Requested by:	bde
MFC after:	1 week
2012-07-21 21:49:05 +00:00
Konstantin Belousov
4596f12de7 MFamd64 r238597:
Add stmxcsr.

MFC after:	3 weeks
2012-07-21 21:39:23 +00:00
Konstantin Belousov
1a3f27687e MFamd64 r238669:
Force clean FPU state in PCB user FPU save area for PT_I386_{GET,SET}XMMREGS.

Reported by:	bde
MFC after:	1 week
2012-07-21 21:39:02 +00:00
John Baldwin
d706ec297a Add a clts() wrapper around the 'clts' instruction to <machine/cpufunc.h>
on x86 and use that to implement stop_emulating() in the fpu/npx code.
Reimplement start_emulating() in the non-XEN case by using load_cr0() and
rcr0() instead of the 'lmsw' and 'smsw' instructions.  Intel explicitly
discourages the use of 'lmsw' and 'smsw' on 80386 and later processors in
the description of these instructions in Volume 2 of the ADM.

Reviewed by:	kib
MFC after:	1 month
2012-07-09 20:55:39 +00:00
John Baldwin
5355f65974 Partially revert r217515 so that the mem_range_softc variable is always
present on x86 kernels.  This fixes the build of kernels that include
'device acpi' but do not include 'device mem'.

MFC after:	1 month
2012-07-09 20:42:08 +00:00
Christian Brueffer
87e5ba0b54 Fix XEN build, broken in r237924.
Reported by:	gcooper
Pointy hat:	brueffer
2012-07-02 14:03:19 +00:00
Christian Brueffer
593a9dc9b8 Replace an unreachable panic() in vm86_getptr (been there for 13 years) with
a KASSERT() behind the functions's only consumer.

Suggested by:	kib
Reviewed by:	kib
CID:		4494
Found with:	Coverity Prevent(tm)
MFC after:	2 weeks
2012-07-01 12:59:00 +00:00
Alan Cox
e30df26e7b Add new pmap layer locks to the predefined lock order. Change the names
of a few existing VM locks to follow a consistent naming scheme.
2012-06-27 03:45:25 +00:00
Konstantin Belousov
a30facd9c7 Commit changes missed from r237435. Properly calculate the signal
trampoline addresses after the shared page is enabled.  Handle FreeBSD
ABIs without shared page support too.

Reported and tested by:	David Wolfskill <david catwhisker org>
	 (previous version)
Pointy hat to: kib
MFC after:   1 month
2012-06-22 16:05:56 +00:00
Konstantin Belousov
d69ae4126b Enable shared page on i386, now it has a use for vdso_timehands.
MFC after:	1 month
2012-06-22 07:16:29 +00:00
Konstantin Belousov
aea810386d Implement mechanism to export some kernel timekeeping data to
usermode, using shared page.  The structures and functions have vdso
prefix, to indicate the intended location of the code in some future.

The versioned per-algorithm data is exported in the format of struct
vdso_timehands, which mostly repeats the content of in-kernel struct
timehands. Usermode reading of the structure can be lockless.
Compatibility export for 32bit processes on 64bit host is also
provided. Kernel also provides usermode with indication about
currently used timecounter, so that libc can fall back to syscall if
configured timecounter is unknown to usermode code.

The shared data updates are initiated both from the tc_windup(), where
a fast task is queued to do the update, and from sysctl handlers which
change timecounter. A manual override switch
kern.timecounter.fast_gettime allows to turn off the mechanism.

Only x86 architectures export the real algorithm data, and there, only
for tsc timecounter. HPET counters page could be exported as well, but
I prefer to not further glue the kernel and libc ABI there until
proper vdso-based solution is developed.

Minimal stubs neccessary for non-x86 architectures to still compile
are provided.

Discussed with:	bde
Reviewed by:	jhb
Tested by:	flo
MFC after:	1 month
2012-06-22 07:06:40 +00:00
Konstantin Belousov
232aa31fb9 Reserve AT_TIMEKEEP auxv entry for providing usermode the pointer to
timekeeping information.

MFC after:  1 week
2012-06-22 06:38:31 +00:00
Navdeep Parhar
09fe63205c - Updated TOE support in the kernel.
- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
  These are available as t3_tom and t4_tom modules that augment cxgb(4)
  and cxgbe(4) respectively.  The cxgb/cxgbe drivers continue to work as
  usual with or without these extra features.

- iWARP driver for Terminator 3 ASIC (kernel verbs).  T4 iWARP in the
  works and will follow soon.

Build-tested with make universe.

30s overview
============
What interfaces support TCP offload?  Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE

Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe

Which connections are offloaded?  Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe

Reviewed by:	bz, gnn
Sponsored by:	Chelsio communications.
MFC after:	~3 months (after 9.1, and after ensuring MFC is feasible)
2012-06-19 07:34:13 +00:00
Alan Cox
6031c68de4 The page flag PGA_WRITEABLE is set and cleared exclusively by the pmap
layer, but it is read directly by the MI VM layer.  This change introduces
pmap_page_is_write_mapped() in order to completely encapsulate all direct
access to PGA_WRITEABLE in the pmap layer.

Aesthetics aside, I am making this change because amd64 will likely begin
using an alternative method to track write mappings, and having
pmap_page_is_write_mapped() in place allows me to make such a change
without further modification to the MI VM layer.

As an added bonus, tidy up some nearby comments concerning page flags.

Reviewed by:	kib
MFC after:	6 weeks
2012-06-16 18:56:19 +00:00
Adrian Chadd
83567110bd Oops - use the actual 11n enable option. 2012-06-15 15:32:16 +00:00
Adrian Chadd
3342d83059 Ok, ok. 802.11n can be on by default in GENERIC in -HEAD.
God help me.
2012-06-15 02:16:29 +00:00
Jung-uk Kim
acd7df97cc - Fix resumectx() prototypes to reflect reality.
- For i386, simply jump to resumectx() with PCB in %ecx.
- Fix a style(9) nit while I am here.
2012-06-13 21:03:01 +00:00
Mitsuru IWASAKI
77c80e2e5b Share IPI init and startup code of mp_machdep.c with acpi_wakeup.c
as ipi_startup().
2012-06-12 00:14:54 +00:00
Mitsuru IWASAKI
8a6c6fadc7 Some fixes for r236772.
- Remove cpuset stopped_cpus which is no longer used.
- Add a short comment for cpuset suspended_cpus clearing.
- Fix the un-ordered x86/acpica/acpi_wakeup.c in conf/files.amd64 and i386.

Pointed-out by:	attilio@
2012-06-10 02:38:51 +00:00
Mitsuru IWASAKI
fb864578af Add x86/acpica/acpi_wakeup.c for amd64 and i386. Difference of
suspend/resume procedures are minimized among them.

common:
- Add global cpuset suspended_cpus to indicate APs are suspended/resumed.
- Remove acpi_waketag and acpi_wakemap from acpivar.h (no longer used).
- Add some variables in acpi_wakecode.S in order to minimize the difference
  among amd64 and i386.
- Disable load_cr3() because now CR3 is restored in resumectx().

amd64:
- Add suspend/resume related members (such as MSR) in PCB.
- Modify savectx() for above new PCB members.
- Merge acpi_switch.S into cpu_switch.S as resumectx().

i386:
- Merge(and remove) suspendctx() into savectx() in order to match with
  amd64 code.

Reviewed by:	attilio@, acpi@
2012-06-09 00:37:26 +00:00
Alan Cox
23c0d041ba Various small changes to PV entry management:
Constify pc_freemask[].

pmap_pv_reclaim()
  Eliminate "freemask" because it was a pessimization.  Add a comment about
  the resident count adjustment.

free_pv_entry() [i386 only]
  Merge an optimization from amd64 (r233954).

get_pv_entry()
  Eliminate the move to tail of the pv_chunk on the global pv_chunks list.
  (The right strategy needs more thought.  Moreover, there were unintended
  differences between the amd64 and i386 implementation.)

pmap_remove_pages()
  Eliminate unnecessary ()'s.
2012-06-04 03:51:08 +00:00
Andriy Gapon
7adc598a15 free wdog_kern_pat calls in post-panic paths from under SW_WATCHDOG
Those calls are useful with hardware watchdog drivers too.

MFC after:	3 weeks
2012-06-03 08:01:12 +00:00
Alan Cox
0d6f49d84a Isolate the global pv list lock from data and other locks to prevent false
sharing within the cache.
2012-06-02 22:14:10 +00:00
Konstantin Belousov
fa9f322df9 Use plain store for atomic_store_rel on x86, instead of implicitly
locked xchg instruction.  IA32 memory model guarantees that store has
release semantic, since stores cannot pass loads or stores.

Reviewed by:	  bde, jhb
Tested by:	  pho
MFC after:	  2 weeks
2012-06-02 18:10:16 +00:00
Jung-uk Kim
9ad569771a Consistently use ACPI_SUCCESS() and ACPI_FAILURE() macros wherever possible. 2012-06-01 21:33:33 +00:00
Jung-uk Kim
db08ae007d Tidy up code clutter in SMP case a bit. No functional change. 2012-06-01 19:19:04 +00:00
Jung-uk Kim
108705d043 Call AcpiSetFirmwareWakingVector() with interrupt disabled for consistency. 2012-06-01 18:18:48 +00:00
Jung-uk Kim
d3638dc4de Improve style(9) in the previous commit. 2012-06-01 17:07:52 +00:00
Mitsuru IWASAKI
f0a101b7e2 Call AcpiLeaveSleepStatePrep() in interrupt disabled context
(described in ACPICA source code).

- Move intr_disable() and intr_restore() from acpi_wakeup.c to acpi.c
  and call AcpiLeaveSleepStatePrep() in interrupt disabled context.
- Add acpi_wakeup_machdep() to execute wakeup MD procedures and call
  it twice in interrupt disabled/enabled context (ia64 version is
  just dummy).
- Rename wakeup_cpus variable in acpi_sleep_machdep() to suspcpus in
  order to be shared by acpi_sleep_machdep() and acpi_wakeup_machdep().
- Move identity mapping related code to acpi_install_wakeup_handler()
  (i386 version) for preparation of x86/acpica/acpi_wakeup.c
  (MFC candidate).

Reviewed by:	jkim@
MFC after:	2 days
2012-06-01 15:26:32 +00:00
Alan Cox
d85fbe8a91 Eliminate code duplication in free_pv_entry() and pmap_remove_pages() by
introducing free_pv_chunk().
2012-06-01 04:26:50 +00:00
Alan Cox
a2efa4249e Eliminate some purely stylistic differences among the amd64, i386 native,
and i386 xen PV entry allocators.
2012-05-30 04:16:54 +00:00
Alan Cox
0490d34982 MFi386 pmap r233433
Disable detailed PV entry accounting by default.  (A config option for
  enabling it was already introduced in r233433.)
2012-05-29 16:11:15 +00:00
Alan Cox
6516bffdef Rename pmap_collect() to pmap_pv_reclaim() and rewrite it such that it no
longer uses the active and inactive paging queues.  Instead, the pmap now
maintains an LRU-ordered list of pv entry pages, and pmap_pv_reclaim() uses
this list to select pv entries for reclamation.

Note: The old pmap_collect() tried to avoid reclaiming mappings for pages
that have either a hold_count or a busy field that is non-zero.  However,
this isn't necessary for correctness, and the locking in pmap_collect() was
insufficient to guarantee that such mappings weren't reclaimed.  The new
pmap_pv_reclaim() doesn't even try.

Tested by:	sbruno
MFC after:	5 weeks
2012-05-29 15:41:20 +00:00
Kevin Lo
544c5e5b53 Make sure that each va_start has one and only one matching va_end,
especially in error cases.
2012-05-29 01:48:06 +00:00
Alan Cox
4edfd622b5 Update a comment in get_pv_entry() to reflect the changes to the
synchronization of pv_vafree in r236158.
2012-05-28 17:35:23 +00:00
Alan Cox
8b0f4e0a0d Replace all uses of the vm page queues lock by a r/w lock that is private
to this pmap.c.  This new r/w lock is used primarily to synchronize access
to the PV lists.  However, it will be used in a somewhat unconventional
way.  As finer-grained PV list locking is added to each of the pmap
functions that acquire this r/w lock, its acquisition will be changed from
write to read, enabling concurrent execution of the pmap functions with
finer-grained locking.

X-MFC after:	r236045
2012-05-27 16:24:00 +00:00
Alan Cox
33853281b4 Rename pmap_collect() to pmap_pv_reclaim() and rewrite it such that it no
longer uses the active and inactive paging queues.  Instead, the pmap now
maintains an LRU-ordered list of pv entry pages, and pmap_pv_reclaim() uses
this list to select pv entries for reclamation.

Note: The old pmap_collect() tried to avoid reclaiming mappings for pages
that have either a hold_count or a busy field that is non-zero.  However,
this isn't necessary for correctness, and the locking in pmap_collect() was
insufficient to guarantee that such mappings weren't reclaimed.  The new
pmap_pv_reclaim() doesn't even try.

MFC after:	5 weeks
2012-05-26 06:10:25 +00:00
Bjoern A. Zeeb
920b965865 MFp4 bz_ipv6_fast:
in_cksum.h required ip.h to be included for struct ip.  To be
  able to use some general checksum functions like in_addword()
  in a non-IPv4 context, limit the (also exported to user space)
  IPv4 specific functions to the times, when the ip.h header is
  present and IPVERSION is defined (to 4).

  We should consider more general checksum (updating) functions
  to also allow easier incremental checksum updates in the L3/4
  stack and firewalls, as well as ponder further requirements by
  certain NIC drivers needing slightly different pseudo values
  in offloading cases.  Thinking in terms of a better "library".

  Sponsored by:	The FreeBSD Foundation
  Sponsored by:	iXsystems

Reviewed by:	gnn (as part of the whole)
MFC After:	3 days
2012-05-24 22:00:48 +00:00
Alan Cox
4e65634580 MF amd64 r233097, r233122
With the changes over the past year to how accesses to the page's dirty
field are synchronized, there is no need for pmap_protect() to acquire
the page queues lock unless it is going to access the pv lists or
PMAP1/PADDR1.

Style fix to pmap_protect().
2012-05-24 15:25:35 +00:00
Konstantin Belousov
ccc00630ae Enable drm2 modules build.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
2012-05-23 21:07:01 +00:00
Mitsuru IWASAKI
fe756f2a59 Remove cpususpend IDT vector for XEN.
This broke XEN kernel building.
2012-05-20 08:17:20 +00:00
Mitsuru IWASAKI
29d8e665ba Revert the previous commit on wakecode address verbose printing.
This broke PAE kernel building.
2012-05-19 02:31:38 +00:00
Mitsuru IWASAKI
e3fd0bc1b2 Add SMP/i386 suspend/resume support.
Most part is merged from amd64.

- i386/acpica/acpi_wakecode.S
Replaced with amd64 code (from realmode to paging enabling code).

- i386/acpica/acpi_wakeup.c
Replaced with amd64 code (except for wakeup_pagetables stuff).

- i386/include/pcb.h
- i386/i386/genassym.c
Added PCB new members (CR0, CR2, CR4, DS, ED, FS, SS, GDT, IDT, LDT
and TR) needed for suspend/resume, not for context switch.

- i386/i386/swtch.s
Added suspendctx() and resumectx().
Note that savectx() was not changed and used for suspending (while
amd64 code uses it).
BSP and AP execute the same sequence, suspendctx(), acpi_wakecode()
and resumectx() for suspend/resume (in case of UP system also).

- i386/i386/apic_vector.s
Added cpususpend().

- i386/i386/mp_machdep.c
- i386/include/smp.h
Added cpususpend_handler().

- i386/include/apicvar.h
- kern/subr_smp.c
- sys/smp.h
Added IPI_SUSPEND and suspend_cpus().

- i386/i386/initcpu.c
- i386/i386/machdep.c
- i386/include/md_var.h
- pc98/pc98/machdep.c
Moved initializecpu() declarations to md_var.h.

MFC after:	3 days
2012-05-18 18:55:58 +00:00
John Baldwin
424e69759c Centralize declaration of the debug.acpi sysctl node. 2012-05-17 17:58:53 +00:00
Andriy Gapon
99a312d048 i386 bootinfo: re-arrange EFI fields for natural alignment and packing
Suggested by:	bde
MFC after:	2 weeks
2012-05-13 09:25:39 +00:00
Alexander Motin
c078c18853 Add options GEOM_RAID into i386 and amd64 GENERIC kernels.
ataraid(4) previously was present there and having GEOM RAID is convinient.
Unlike other classes GEOM RAID can be set up from BIOS before install and
users are expecting it to be detected automatically.
2012-05-10 12:37:32 +00:00
Brooks Davis
b3a397a8de The DDB_CTF has little or nothing to do with the debugger so move it
next KDTRACE_HOOKS.
2012-05-09 01:37:48 +00:00
Alexander Leidinger
19e252baeb - >500 static DTrace probes for the linuxulator
- DTrace scripts to check for errors, performance, ...
  they serve mostly as examples of what you can do with the static probe;s
  with moderate load the scripts may be overwhelmed, excessive lock-tracing
  may influence program behavior (see the last design decission)

Design decissions:
 - use "linuxulator" as the provider for the native bitsize; add the
   bitsize for the non-native emulation (e.g. "linuxuator32" on amd64)
 - Add probes only for locks which are acquired in one function and released
   in another function. Locks which are aquired and released in the same
   function should be easy to pair in the code, inter-function
   locking is more easy to verify in DTrace.
 - Probes for locks should be fired after locking and before releasing to
   prevent races (to provide data/function stability in DTrace, see the
   man-page of "dtrace -v ..." and the corresponding DTrace docs).
2012-05-05 19:42:38 +00:00
Attilio Rao
b8be27bf29 Revert part of r234723 by re-enabling the SMP protection for
intr_bind() on x86.
This has been requested by jhb and I strongly disagree with this,
but as long as he is the x86 and interrupt subsystem maintainer I will
follow his directives.

The disagreement cames from what we should really consider as a
public KPI. IMHO, if we really need a selection between the kernel
functions, we may need an explicit protection like _KERNEL_KPI, which
defines which subset of the kernel function might really be considered
as part of the KPI (for thirdy part modules) and which not.
As long as we don't have this mechanism I just consider any possible
function as usable by thirdy part code, thus intr_bind() included.

MFC after:	1 week
2012-05-03 21:44:01 +00:00
Dimitry Andric
460378bf13 Add a convenience macro for the returns_twice attribute, and apply it to
the prototypes of the appropriate functions (getcontext, savectx,
setjmp, sigsetjmp and vfork).

MFC after:	2 weeks
2012-04-29 11:04:31 +00:00
Attilio Rao
70dbd1604c Clean up the intr* MD KPI from the SMP dependency, removing a cause of
discrepancy between modules and kernel, but deal with SMP differences
within the functions themselves.

As an added bonus this also helps in terms of code readability.

Requested by:	gibbs
Reviewed by:	jhb, marius
MFC after:	1 week
2012-04-26 20:24:25 +00:00
Brooks Davis
e9acaa9ae4 Enable DTrace hooks in GENERIC.
Reviewed by:	gnn
Approved by:	core (jhb, imp)
Requested by:	a cast of thousands
MFC after:	3 days
2012-04-20 21:37:42 +00:00
Jung-uk Kim
17b27db088 Regen for r234359. 2012-04-16 23:17:29 +00:00
Jung-uk Kim
f69f4d8630 Correct an argument type of iopl syscall for Linuxulator. This also fixes
a warning from Clang, i. e., "args->level < 0 is always false".
2012-04-16 23:16:18 +00:00
Jung-uk Kim
13fa650c75 Regen for r234357. 2012-04-16 22:59:51 +00:00
Jung-uk Kim
db8eb180d9 Correct arguments of stat64, fstat64 and lstat64 syscalls for Linuxulator. 2012-04-16 22:58:28 +00:00
Jung-uk Kim
28cc85fd09 Regen for r234352. 2012-04-16 21:24:23 +00:00
Jung-uk Kim
d69a426fce - Implement pipe2 syscall for Linuxulator. This syscall appeared in 2.6.27
but GNU libc used it without checking its kernel version, e. g., Fedora 10.
- Move pipe(2) implementation for Linuxulator from MD files to MI file,
sys/compat/linux/linux_file.c.  There is no MD code for this syscall at all.
- Correct an argument type for pipe() from l_ulong * to l_int *.  Probably
this was the source of MI/MD confusion.

Reviewed by:	emulation
2012-04-16 21:22:02 +00:00
Jung-uk Kim
67490d785a - When interrupt is not requested for VM86 call, make a fake exit point and
push the address onto stack as we do for INTn emulation.  This avoids stack
underflow when we encounter RETF instruction in VM86 mode.  Lack of this
exit point actually caused page fault in VM86 mode with VESA module when we
resume from suspend state[1].
- Remove unnecessary CLI and STI instructions from BIOS interrupt emulation.
INTn and IRET must be able to emulate the flag correctly.

Reported by:	gavin [1]
Tested by:	gavin (early revision)
MFC after:	3 days
2012-04-16 19:31:44 +00:00
Andriy Gapon
56c2dc796b add actual interrupt counters to back ipi_invlcache_counts
Otherwise one could run into a panic with COUNT_IPIS when cache
invalidation actually happened.

Reviewed by:	jhb
MFC after:	1 week
2012-04-13 07:18:19 +00:00
Andriy Gapon
f84633cdcc bump INTRCNT_COUNT values to reflect actual numbers of IPI counters
Maybe the numbers should be conditionalized on COUNT_IPIS

Reviewed by:	jhb
MFC after:	1 week
2012-04-13 07:15:40 +00:00
John Baldwin
ed5a2b61fd Add OFED and the associated options and drivers to x86 LINT builds:
- Mark 'sdp' as requiring 'inet'.
- Always include "opt_inet.h" and "opt_inet6.h" and modify the IB
  driver Makefiles to honor WITH/WITHOUT_INET/INET6/_SUPPORT options
  to determine what should be enabled during a module build.
- Fix the mlxen(4) driver and the core IB code to compile without
  if INET is disabled (including when both INET and INET6 are disabled).

Reviewed by:	bz
MFC after:	2 weeks
2012-04-12 14:01:06 +00:00
Marius Strobl
0fec3e2d81 Fix !SMP build after r234074.
Reviewed by:	attilio, jhb
2012-04-10 16:08:46 +00:00
Attilio Rao
79257559ee BSP is not added to the mask of valid target CPUs for interrupts
in set_apic_interrupt_ids(). Besides, set_apic_interrupts_ids() is not
called in the !SMP case too.
Fix this by:
- Adding the BSP as an interrupt target directly in cpu_startup().
- Remove an obsolete optimization where the BSP are skipped in
  set_apic_interrupt_ids().

Reported by:	jh
Reviewed by:	jhb
MFC after:	3 days
X-MFC:		r233961
Pointy hat to:	me
2012-04-09 22:41:19 +00:00