freebsd-dev

Author	SHA1	Message	Date
Sergey Kandaurov	4053b05b91	Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize. Submitted by: perryh pluto.rain.com (previous version) Reviewed by: jhb Approved by: kib (mentor) Tested by: universe	2011-01-21 10:26:26 +00:00
John Baldwin	c305730dc0	Remove bogus usage of INTR_FAST. "Fast" interrupts are now indicated by registering a filter handler rather than a threaded handler. Also remove a bogus use of INTR_MPSAFE for a filter.	2011-01-06 21:08:06 +00:00
Colin Percival	b5e61aab00	Spell CRITICAL_ASSERT correctly. Submitted by: jhb MFC with: r216944	2011-01-04 16:29:07 +00:00
Colin Percival	aa829cef6a	Add hamfisted locking to the Xen/PV pmap code: Only allow one thread to be in {pmap_pinit, pmap_copy, pmap_release} at a time. This reduces the rate of panics when running 'make index' from ~0.6/hour to ~0.02/hour (p < 10^-30). At a later date this locking will be removed, and for this reason, it is wrapped in #ifdef HAMFISTED_LOCKING; this temporary hack is being put in place with the intention of shipping somewhat-stable Xen bits in FreeBSD 8.2-RELEASE. PR: kern/153672 MFC after: 3 days	2011-01-04 15:55:15 +00:00
Colin Percival	d01b2dad73	Adjust the critical section protecting _xen_flush_queue to cover the entire range where the page mapping request queue needs to be atomically examined and modified. Oddly, while this doesn't seem to affect the overall rate of panics (running 'make index' on EC2 t1.micro instances, there are 0.6 +/- 0.1 panics per hour, both before and after this change), it eliminates vm_fault from panic backtraces, leaving only backtraces going through vmspace_fork.	2011-01-04 00:16:38 +00:00
Colin Percival	d262f2dcfc	Make i386_set_ldt work on i386/XEN, step 1/5. Lock the vm page queue mutex around calls to pte_store. As with many other uses of the vm page queue mutex in i386/xen/pmap.c, this is bogus and needs to be replaced at some future date by a spin lock dedicated to protecting the queue of pending xen page mapping hypervisor calls. (But for now, bogus locking is better than a panic.) MFC after: 3 days	2010-12-31 17:39:31 +00:00
Colin Percival	4a416f8375	Remove a "not strictly correct" (and panic-inducing) workaround for a bug which doesn't seem to exist. PR: kern/141328 MFC after: 3 days	2010-12-28 14:36:32 +00:00
Colin Percival	8ea0b3bb2f	Lock the vm page queue mutex in pmap_pte_release around the call to PMAP_SET_VA; this fixes a mutex-not-held panic when a process which called mlock(2) exits, and parallels a change made in pmap_pte 10 months ago (svn r204160). Note: The locking in this code is utterly broken. We should not be using the VM page queue mutex to protect the queue of pending Xen page mapping hypervisor calls. Even if it made sense to do so, this commit and r204160 introduce LORs between the vm page queue mutex and PMAP2mutex. (However, a possible deadlock is better than a guaranteed panic, and this change will hopefully make life easier for whoever fixes the Xen pmap locking in the future.) PR: kern/140313 MFC after: 3 days	2010-12-26 13:05:43 +00:00
Colin Percival	20d1a304b3	Reduce the Xen timecounter from 1GHz to 2^-9 GHz, thereby increasing the timecounter period from 2^32 ns (~4.3s) to 2^41 ns (~36m39s). Some time sharing systems can skip clock interrupts for a few seconds when under load (e.g., if we've recently used more than our fair share of CPU and someone else wants a burst of CPU) and we were losing time in quanta of 2^32 ns due to timecounter wrapping. Increasing the timecounter period up to 2^41 ns is definitely overkill, but we still have microsecond timecounter precision, and anyone using paravirtualized hardware when they need submicrosecond timing is crazy.	2010-12-11 22:33:33 +00:00
Colin Percival	0f30ed5bc6	Make the machdep.independent_wallclock sysctl do what it says on the box.	2010-12-11 20:12:42 +00:00
Colin Percival	5c0ab2fa8b	Revert r215819 and fix the bug properly. In pmap_qremove, paging table updates were being queued by pmap_kremove, but the queue wasn't being flushed; as a result, the updates didn't happen until after the call to pmap_invalidate_range, and old entries could stick around in the TLB. Adding a PT_UPDATES_FLUSH() call immediately before pmap_invalidate_range ensures that after the invalidation the TLB will be repopulated with the correct new entries. Thanks to: kib, avg, alc	2010-11-25 22:06:07 +00:00
Colin Percival	98702b3990	Work around paging bug. Somehow we seem to be ending up with entries in the TLB which don't correspond to ptes with PG_V set; prior to this commit I'm sometimes getting the wrong data when pages are loaded into the buffer cache (they're being loaded, but the missing TLB invalidation is causing the wrong data to be visible).	2010-11-25 15:41:34 +00:00
Colin Percival	1a3b2b87de	Rename HYPERVISOR_multicall (which performs the multicall hypercall) to _HYPERVISOR_multicall, and create a new HYPERVISOR_multicall function which invokes _HYPERVISOR_multicall and checks that the individual hypercalls all succeeded.	2010-11-25 15:05:21 +00:00
Colin Percival	0bd7a92067	Remove vestigal debugging code which, in fork-heavy workloads, can cause a 30x slowdown.	2010-11-25 04:45:31 +00:00
Colin Percival	c3f128981e	In xen_get_timecount, return the full ns-precision time rather than rounding to 1/HZ precision. I have no idea why the rounding was introduced in the first place, but it makes FreeBSD unhappy.	2010-11-22 09:04:29 +00:00
Colin Percival	61381fcf2d	Unifdef XEN. This file is only compiled with the XEN kernel option set, and the !XEN bits get in the way of understanding the code.	2010-11-20 21:36:12 +00:00
Colin Percival	8cdabbaf32	Add VTOM(va) macro as xpmap_ptom(VTOP(va)) to convert to machine addresses. Clean up the code by converting xpmap_ptom(VTOP(...)) to VTOM(...) and converting xpmap_ptom(VM_PAGE_TO_PHYS(...)) to VM_PAGE_TO_MACH(...). In a few places we take advantage of the fact that xpmap_ptom can commute with setting PG_* flags. This commit should have no net effect save to improve the readability of this code.	2010-11-20 20:04:29 +00:00
Colin Percival	ad520892d7	Make pmap_release consistent with pmap_pinit with respect to unpinning pages. The pinning of NPGPTD pages is #if 0ed out in pmap_pinit (I'm not quite sure why...) and this commit adds a corresponding #if 0 in pmap_release to avoid unpinning those pages. Some versions of Xen seem to silently ignore requests to unpin pages which were never pinned in the first place, but some return an error (causing FreeBSD to panic) prior to this commit.	2010-11-19 15:12:19 +00:00
Colin Percival	f4c884f95a	Make pmap_release match pmap_pinit by invoking pmap_qremove(pmap->pm_pdpt) to match pmap_pinit's pmap_qenter(pmap->pm_pdpt) call in the case of PAE.	2010-11-18 21:29:43 +00:00
Colin Percival	f86f965ef8	Don't KASSERT in pmap_release that xpmap_ptom(VM_PAGE_TO_PHYS(m)) == (pmap->pm_pdpt[i] & PG_FRAME) for i = NPGPTD, since pmap->pm_pdpt[i] is only initialized for 0 <= i < NPGPTD. This fixes an inevitable panic with XEN && PAE && INVARIANTS when pmap_release is called (e.g., when /sbin/init is launched).	2010-11-18 21:02:40 +00:00
Attilio Rao	fcb250f392	Move the mptable.h under x86/include/. Sponsored by: Sandvine Incorporated MFC after: 14 days	2010-11-09 20:28:09 +00:00
John Baldwin	c5b0b5fc6b	Sync the APIC startup sequence with amd64: - Register APIC enumerators at SI_SUB_TUNABLES - 1 instead of SI_SUB_CPU - 1. - Probe CPUs at SI_SUB_TUNABLES - 1. This allows i386 to set a truly accurate mp_maxid value rather than always setting it to MAXCPU - 1.	2010-11-08 20:35:09 +00:00
John Baldwin	32c3d3b6e6	Move <machine/apicreg.h> to <x86/apicreg.h>.	2010-11-01 18:18:46 +00:00
Justin T. Gibbs	ff662b5c98	Improve the Xen para-virtualized device infrastructure of FreeBSD: o Add support for backend devices (e.g. blkback) o Implement extensions to the Xen para-virtualized block API to allow for larger and more outstanding I/Os. o Import a completely rewritten block back driver with support for fronting I/O to both raw devices and files. o General cleanup and documentation of the XenBus and XenStore support code. o Robustness and performance updates for the block front driver. o Fixes to the netfront driver. Sponsored by: Spectra Logic Corporation sys/xen/xenbus/init.txt: Deleted: This file explains the Linux method for XenBus device enumeration and thus does not apply to FreeBSD's NewBus approach. sys/xen/xenbus/xenbus_probe_backend.c: Deleted: Linux version of backend XenBus service routines. It was never ported to FreeBSD. See xenbusb.c, xenbusb_if.m, xenbusb_front.c xenbusb_back.c for details of FreeBSD's XenBus support. sys/xen/xenbus/xenbusvar.h: sys/xen/xenbus/xenbus_xs.c: sys/xen/xenbus/xenbus_comms.c: sys/xen/xenbus/xenbus_comms.h: sys/xen/xenstore/xenstorevar.h: sys/xen/xenstore/xenstore.c: Split XenStore into its own tree. XenBus is a software layer built on top of XenStore. The old arrangement and the naming of some structures and functions blurred these lines making it difficult to discern what services are provided by which layer and at what times these services are available (e.g. during system startup and shutdown). sys/xen/xenbus/xenbus_client.c: sys/xen/xenbus/xenbus.c: sys/xen/xenbus/xenbus_probe.c: sys/xen/xenbus/xenbusb.c: sys/xen/xenbus/xenbusb.h: Split up XenBus code into methods available for use by client drivers (xenbus.c) and code used by the XenBus "bus code" to enumerate, attach, detach, and service bus drivers. sys/xen/reboot.c: sys/dev/xen/control/control.c: Add a XenBus front driver for handling shutdown, reboot, suspend, and resume events published in the XenStore. Move all PV suspend/reboot support from reboot.c into this driver. sys/xen/blkif.h: New file from Xen vendor with macros and structures used by a block back driver to service requests from a VM running a different ABI (e.g. amd64 back with i386 front). sys/conf/files: Adjust kernel build spec for new XenBus/XenStore layout and added Xen functionality. sys/dev/xen/balloon/balloon.c: sys/dev/xen/netfront/netfront.c: sys/dev/xen/blkfront/blkfront.c: sys/xen/xenbus/... sys/xen/xenstore/... o Rename XenStore APIs and structures from xenbus_* to xs_. o Adjust to use of M_XENBUS and M_XENSTORE malloc types for allocation of objects returned by these APIs. o Adjust for changes in the bus interface for Xen drivers. sys/xen/xenbus/... sys/xen/xenstore/... Add Doxygen comments for these interfaces and the code that implements them. sys/dev/xen/blkback/blkback.c: o Rewrite the Block Back driver to attach properly via newbus, operate correctly in both PV and HVM mode regardless of domain (e.g. can be in a DOM other than 0), and to deal with the latest metadata available in XenStore for block devices. o Allow users to specify a file as a backend to blkback, in addition to character devices. Use the namei lookup of the backend path to automatically configure, based on file type, the appropriate backend method. The current implementation is limited to a single outstanding I/O at a time to file backed storage. sys/dev/xen/blkback/blkback.c: sys/xen/interface/io/blkif.h: sys/xen/blkif.h: sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/blkfront/block.h: Extend the Xen blkif API: Negotiable request size and number of requests. This change extends the information recorded in the XenStore allowing block front/back devices to negotiate for optimal I/O parameters. This has been achieved without sacrificing backward compatibility with drivers that are unaware of these protocol enhancements. The extensions center around the connection protocol which now includes these additions: o The back-end device publishes its maximum supported values for, request I/O size, the number of page segments that can be associated with a request, the maximum number of requests that can be concurrently active, and the maximum number of pages that can be in the shared request ring. These values are published before the back-end enters the XenbusStateInitWait state. o The front-end waits for the back-end to enter either the InitWait or Initialize state. At this point, the front end limits it's own capabilities to the lesser of the values it finds published by the backend, it's own maximums, or, should any back-end data be missing in the store, the values supported by the original protocol. It then initializes it's internal data structures including allocation of the shared ring, publishes its maximum capabilities to the XenStore and transitions to the Initialized state. o The back-end waits for the front-end to enter the Initalized state. At this point, the back end limits it's own capabilities to the lesser of the values it finds published by the frontend, it's own maximums, or, should any front-end data be missing in the store, the values supported by the original protocol. It then initializes it's internal data structures, attaches to the shared ring and transitions to the Connected state. o The front-end waits for the back-end to enter the Connnected state, transitions itself to the connected state, and can commence I/O. Although an updated front-end driver must be aware of the back-end's InitWait state, the back-end has been coded such that it can tolerate a front-end that skips this step and transitions directly to the Initialized state without waiting for the back-end. sys/xen/interface/io/blkif.h: o Increase BLKIF_MAX_SEGMENTS_PER_REQUEST to 255. This is the maximum number possible without changing the blkif request header structure (nr_segs is a uint8_t). o Add two new constants: BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK, and BLKIF_MAX_SEGMENTS_PER_SEGMENT_BLOCK. These respectively indicate the number of segments that can fit in the first ring-buffer entry of a request, and for each subsequent (sg element only) ring-buffer entry associated with the "header" ring-buffer entry of the request. o Add the blkif_request_segment_t typedef for segment elements. o Add the BLKRING_GET_SG_REQUEST() macro which wraps the RING_GET_REQUEST() macro and returns a properly cast pointer to an array of blkif_request_segment_ts. o Add the BLKIF_SEGS_TO_BLOCKS() macro which calculates the number of ring entries that will be consumed by a blkif request with the given number of segments. sys/xen/blkif.h: o Update for changes in interface/io/blkif.h macros. o Update the BLKIF_MAX_RING_REQUESTS() macro to take the ring size as an argument to allow this calculation on multi-page rings. o Add a companion macro to BLKIF_MAX_RING_REQUESTS(), BLKIF_RING_PAGES(). This macro determines the number of ring pages required in order to support a ring with the supplied number of request blocks. sys/dev/xen/blkback/blkback.c: sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/blkfront/block.h: o Negotiate with the other-end with the following limits: Reqeust Size: MAXPHYS Max Segments: (MAXPHYS/PAGE_SIZE) + 1 Max Requests: 256 Max Ring Pages: Sufficient to support Max Requests with Max Segments. o Dynamically allocate request pools and segemnts-per-request. o Update ring allocation/attachment code to support a multi-page shared ring. o Update routines that access the shared ring to handle multi-block requests. sys/dev/xen/blkfront/blkfront.c: o Track blkfront allocations in a blkfront driver specific malloc pool. o Strip out XenStore transaction retry logic in the connection code. Transactions only need to be used when the update to multiple XenStore nodes must be atomic. That is not the case here. o Fully disable blkif_resume() until it can be fixed properly (it didn't work before this change). o Destroy bus-dma objects during device instance tear-down. o Properly handle backend devices with powef-of-2 sector sizes larger than 512b. sys/dev/xen/blkback/blkback.c: Advertise support for and implement the BLKIF_OP_WRITE_BARRIER and BLKIF_OP_FLUSH_DISKCACHE blkif opcodes using BIO_FLUSH and the BIO_ORDERED attribute of bios. sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/blkfront/block.h: Fix various bugs in blkfront. o gnttab_alloc_grant_references() returns 0 for success and non-zero for failure. The check for < 0 is a leftover Linuxism. o When we negotiate with blkback and have to reduce some of our capabilities, print out the original and reduced capability before changing the local capability. So the user now gets the correct information. o Fix blkif_restart_queue_callback() formatting. Make sure we hold the mutex in that function before calling xb_startio(). o Fix a couple of KASSERT()s. o Fix a check in the xb_remove_ macro to be a little more specific. sys/xen/gnttab.h: sys/xen/gnttab.c: Define GNTTAB_LIST_END publicly as GRANT_REF_INVALID. sys/dev/xen/netfront/netfront.c: Use GRANT_REF_INVALID instead of driver private definitions of the same constant. sys/xen/gnttab.h: sys/xen/gnttab.c: Add the gnttab_end_foreign_access_references() API. This API allows a client to batch the release of an array of grant references, instead of coding a private for loop. The implementation takes advantage of this batching to reduce lock overhead to one acquisition and release per-batch instead of per-freed grant reference. While here, reduce the duration the gnttab_list_lock is held during gnttab_free_grant_references() operations. The search to find the tail of the incoming free list does not rely on global state and so can be performed without holding the lock. sys/dev/xen/xenpci/evtchn.c: sys/dev/xen/evtchn/evtchn.c: sys/xen/xen_intr.h: o Implement the bind_interdomain_evtchn_to_irqhandler API for HVM mode. This allows an HVM domain to serve back end devices to other domains. This API is already implemented for PV mode. o Synchronize the API between HVM and PV. sys/dev/xen/xenpci/xenpci.c: o Scan the full region of CPUID space in which the Xen VMM interface may be implemented. On systems using SuSE as a Dom0 where the Viridian API is also exported, the VMM interface is above the region we used to search. o Pass through bus_alloc_resource() calls so that XenBus drivers attaching on an HVM system can allocate unused physical address space from the nexus. The block back driver makes use of this facility. sys/i386/xen/xen_machdep.c: Use the correct type for accessing the statically mapped xenstore metadata. sys/xen/interface/hvm/params.h: sys/xen/xenstore/xenstore.c: Move hvm_get_parameter() to the correct global header file instead of as a private method to the XenStore. sys/xen/interface/io/protocols.h: Sync with vendor. sys/xeninterface/io/ring.h: Add macro for calculating the number of ring pages needed for an N deep ring. To avoid duplication within the macros, create and use the new __RING_HEADER_SIZE() macro. This macro calculates the size of the ring book keeping struct (producer/consumer indexes, etc.) that resides at the head of the ring. Add the __RING_PAGES() macro which calculates the number of shared ring pages required to support a ring with the given number of requests. These APIs are used to support the multi-page ring version of the Xen block API. sys/xeninterface/io/xenbus.h: Add Comments. sys/xen/xenbus/... o Refactor the FreeBSD XenBus support code to allow for both front and backend device attachments. o Make use of new config_intr_hook capabilities to allow front and back devices to be probed/attached in parallel. o Fix bugs in probe/attach state machine that could cause the system to hang when confronted with a failure either in the local domain or in a remote domain to which one of our driver instances is attaching. o Publish all required state to the XenStore on device detach and failure. The majority of the missing functionality was for serving as a back end since the typical "hot-plug" scripts in Dom0 don't handle the case of cleaning up for a "service domain" that is not itself. o Add dynamic sysctl nodes exposing the generic ivars of XenBus devices. o Add doxygen style comments to the majority of the code. o Cleanup types, formatting, etc. sys/xen/xenbus/xenbusb.c: Common code used by both front and back XenBus busses. sys/xen/xenbus/xenbusb_if.m: Method definitions for a XenBus bus. sys/xen/xenbus/xenbusb_front.c: sys/xen/xenbus/xenbusb_back.c: XenBus bus specialization for front and back devices. MFC after: 1 month	2010-10-19 20:53:30 +00:00
John Baldwin	60c7b36b7a	Update various places that store or manipulate CPU masks to use cpumask_t instead of int or u_int. Since cpumask_t is currently u_int on all platforms this should just be a cosmetic change.	2010-08-11 23:22:53 +00:00
John Baldwin	d9d8d1449d	Add a new ipi_cpu() function to the MI IPI API that can be used to send an IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that constructed a mask for a single CPU with calls to ipi_cpu() instead. This will matter more in the future when we transition from cpumask_t to cpuset_t for CPU masks in which case building a CPU mask is more expensive. Submitted by: peter, sbruno Reviewed by: rookie Obtained from: Yahoo! (x86) MFC after: 1 month	2010-08-06 15:36:59 +00:00
Alan Cox	9124d0d6a3	Relax one of the new assertions in pmap_enter() a little. Specifically, allow pmap_enter() to be performed on an unmanaged page that doesn't have VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages. (See, for example, pmap_remove_write().)	2010-06-11 15:49:39 +00:00
Alan Cox	ce18658792	Reduce the scope of the page queues lock and the number of PG_REFERENCED changes in vm_pageout_object_deactivate_pages(). Simplify this function's inner loop using TAILQ_FOREACH(), and shorten some of its overly long lines. Update a stale comment. Assert that PG_REFERENCED may be cleared only if the object containing the page is locked. Add a comment documenting this. Assert that a caller to vm_page_requeue() holds the page queues lock, and assert that the page is on a page queue. Push down the page queues lock into pmap_ts_referenced() and pmap_page_exists_quick(). (As of now, there are no longer any pmap functions that expect to be called with the page queues lock held.) Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever be passed an unmanaged page. Assert this rather than returning "0" and "FALSE" respectively. ARM: Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH(). Push down the page queues lock inside of pmap_clearbit(), simplifying pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write(). Additionally, this allows for avoiding the acquisition of the page queues lock in some cases. PowerPC/AIM: moea_page_exits_quick() and moea_page_wired_mappings() will never be called before pmap initialization is complete. Therefore, the check for moea_initialized can be eliminated. Push down the page queues lock inside of moea_clear_bit(), simplifying moea_clear_modify() and moea_clear_reference(). The last parameter to moea_clear_bit() is never used. Eliminate it. PowerPC/BookE: Simplify mmu_booke_page_exists_quick()'s control flow. Reviewed by: kib@	2010-06-10 16:56:35 +00:00
Alan Cox	b2830a9649	Eliminate a stale comment.	2010-05-31 06:06:10 +00:00
Alan Cox	72dc3eb65b	Simplify the inner loop of pmap_collect(): While iterating over the page's pv list, there is no point in checking whether or not the pv list is empty. Instead, wait until the loop completes.	2010-05-30 18:48:41 +00:00
Alan Cox	a1192299b3	Merge various changes from i386/i386/pmap.c: The remaining, unmerged portions of r175404 Retire PMAP_DIAGNOSTIC. Any useful diagnostics that were conditionally compiled under PMAP_DIAGNOSTIC are now KASSERT()s. (Note: The kernel option DIAGNOSTIC still disables inlining of certain pmap functions.) Eliminate dead code from pmap_enter(). This code implemented an assertion. On i386, an equivalent check is already implemented. However, on amd64, a small change is required to implement an equivalent check. Eliminate \n from a nearby panic string. Use KASSERT() to reimplement pmap_copy()'s two assertions. Merge portions of r177659 To date, we have assumed that the TLB will only set the PG_M bit in a PTE if that PTE has the PG_RW bit set. However, this assumption does not hold on recent processors from Intel. For example, consider a PTE that has the PG_RW bit set but the PG_M bit clear. Suppose this PTE is cached in the TLB and later the PG_RW bit is cleared in the PTE, but the corresponding TLB entry is not (yet) invalidated. Historically, upon a write access using this (stale) TLB entry, the TLB would observe that the PG_RW bit had been cleared and initiate a page fault, aborting the setting of the PG_M bit in the PTE. Now, however, P4- and Core2-family processors will set the PG_M bit before observing that the PG_RW bit is clear and initiating a page fault. In other words, the write does not occur but the PG_M bit is still set. The real impact of this difference is not that great. Specifically, we should no longer assert that any PTE with the PG_M bit set must also have the PG_RW bit set, and we should ignore the state of the PG_M bit unless the PG_RW bit is set. r208609 Defer freeing any page table pages in pmap_remove_all() until after the page queues lock is released. This may reduce the amount of time that the page queues lock is held by pmap_remove_all(). r208645 When I pushed down the page queues lock into pmap_is_modified(), I created an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls vm_page_dirty() must perform the call first. Otherwise, pmap_is_modified() could return FALSE without acquiring the page queues lock because the page is not (currently) writeable, and the caller to pmap_is_modified() might believe that the page's dirty field is clear because it has not seen the effect of the vm_page_dirty() call. When I pushed down the page queues lock into pmap_is_modified(), I overlooked one place where this ordering dependence is violated: pmap_enter(). In a rare situation pmap_enter() can be called to replace a dirty mapping to one page with a mapping to another page. (I say rare because replacements generally occur as a result of a copy-on-write fault, and so the old page is not dirty.) This change delays clearing PG_WRITEABLE until after vm_page_dirty() has been called. Fixing the ordering dependency also makes it easy to introduce a small optimization: When pmap_enter() used to replace a mapping to one page with a mapping to another page, it freed the pv entry for the first mapping and later called the pv entry allocator for the new mapping. Now, pmap_enter() attempts to recycle the old pv entry, saving two calls to the pv entry allocator. There is no point in setting PG_WRITEABLE on unmanaged pages, so don't. Update a comment to reflect this. Tidy up the variable declarations at the start of pmap_enter().	2010-05-30 04:44:32 +00:00
Alan Cox	c46b90e90a	Push down page queues lock acquisition in pmap_enter_object() and pmap_is_referenced(). Eliminate the corresponding page queues lock acquisitions from vm_map_pmap_enter() and mincore(), respectively. In mincore(), this allows some additional cases to complete without ever acquiring the page queues lock. Assert that the page is managed in pmap_is_referenced(). On powerpc/aim, push down the page queues lock acquisition from moea_is_modified() and moea_is_referenced() into moea*_query_bit(). Again, this will allow some additional cases to complete without ever acquiring the page queues lock. Reorder a few statements in vm_page_dontneed() so that a race can't lead to an old reference persisting. This scenario is described in detail by a comment. Correct a spelling error in vm_page_dontneed(). Assert that the object is locked in vm_page_clear_dirty(), and restrict the page queues lock assertion to just those cases in which the page is currently writeable. Add object locking to vnode_pager_generic_putpages(). This was the one and only place where vm_page_clear_dirty() was being called without the object being locked. Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call to vm_page_clear_dirty(). Change vnode_pager_generic_putpages() to the modern-style of function definition. Also, change the name of one of the parameters to follow virtual memory system naming conventions. Reviewed by: kib	2010-05-26 18:00:44 +00:00
Alan Cox	567e51e18c	Roughly half of a typical pmap_mincore() implementation is machine- independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)	2010-05-24 14:26:57 +00:00
Alan Cox	9ab6032f73	On entry to pmap_enter(), assert that the page is busy. While I'm here, make the style of assertion used by pmap_enter() consistent across all architectures. On entry to pmap_remove_write(), assert that the page is neither unmanaged nor fictitious, since we cannot remove write access to either kind of page. With the push down of the page queues lock, pmap_remove_write() cannot condition its behavior on the state of the PG_WRITEABLE flag if the page is busy. Assert that the object containing the page is locked. This allows us to know that the page will neither become busy nor will PG_WRITEABLE be set on it while pmap_remove_write() is running. Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly do copy-on-write-based zero-copy transmit on unmanaged or fictitious pages, so don't even try. Previously, the call to pmap_remove_write() would have failed silently.	2010-05-16 23:45:10 +00:00
Alan Cox	3c4a24406b	Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.	2010-05-08 20:34:01 +00:00
Kip Macy	958d87cd86	merge 194209 in to the i386/xen pmap requested by: alc@	2010-04-30 03:26:12 +00:00
Kip Macy	2965a45315	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib	2010-04-30 00:46:43 +00:00
Alan Cox	14dd3a29ea	MFi386 r207205 Clearing a page table entry's accessed bit (PG_A) and setting the page's PG_REFERENCED flag in pmap_protect() can't really be justified, so don't do it.	2010-04-27 05:35:35 +00:00
Alan Cox	7b85f59183	Resurrect pmap_is_referenced() and use it in mincore(). Essentially, pmap_ts_referenced() is not always appropriate for checking whether or not pages have been referenced because it clears any reference bits that it encounters. For example, in mincore(), clearing the reference bits has two negative consequences. First, it throws off the activity count calculations performed by the page daemon. Specifically, a page on which mincore() has called pmap_ts_referenced() looks less active to the page daemon than it should. Consequently, the page could be deactivated prematurely by the page daemon. Arguably, this problem could be fixed by having mincore() duplicate the activity count calculation on the page. However, there is a second problem for which that is not a solution. In order to clear a reference on a 4KB page, it may be necessary to demote a 2/4MB page mapping. Thus, a mincore() by one process can have the side effect of demoting a superpage mapping within another process!	2010-04-24 17:32:52 +00:00
John Baldwin	cf684ede27	Make NKPT a kernel option on i386 so that it can be set to a non-default value from kernel config files. Tested by: Charles Sprickman spork of bway net MFC after: 2 weeks	2010-03-10 19:50:52 +00:00
Attilio Rao	3258030144	Introduce the new kernel sub-tree x86 which should contain all the code shared and generalized between our current amd64, i386 and pc98. This is just an initial step that should lead to a more complete effort. For the moment, a very simple porting of cpufreq modules, BIOS calls and the whole MD specific ISA bus part is added to the sub-tree but ideally a lot of code might be added and more shared support should grow. Sponsored by: Sandvine Incorporated Reviewed by: emaste, kib, jhb, imp Discussed on: arch MFC: 3 weeks	2010-02-25 14:13:39 +00:00
Kip Macy	20f72e6f2f	- fix bootstrap for variable KVA_PAGES - remove unused CADDR1 - hold lock across page table update MFC after: 3 days	2010-02-21 01:13:34 +00:00
Ed Schouten	ddc534916d	Allow the pmap code to be built with GCC from FreeBSD 7 again. This patch basically gives us the best of both worlds. Instead of forcing the compiler to emulate GNU-style inline semantics even though we're using ISO C99, it will only use GNU-style inlining when the compiler is configured that way (__GNUC_GNU_INLINE__). Tested by: jhb	2010-02-18 14:28:38 +00:00
Ed Schouten	91bfd816f2	Recommit r193732: Remove __gnu89_inline. Now that we use C99 almost everywhere, just use C99-style in the pmap code. Since the pmap code is the only consumer of __gnu89_inline, remove it from cdefs.h as well. Because the flag was only introduced 17 months ago, I don't expect any problems. Reviewed by: alc It was backed out, because it prevented us from building kernels using a 7.x compiler. Now that most people use 8.x, there is nothing that holds us back. Even if people run 7.x, they should be able to build a kernel if they run `make kernel-toolchain' or `make buildworld' first.	2010-01-19 15:31:18 +00:00
Alan Cox	418f8af3a9	Eliminate an unused declaration.	2010-01-11 15:51:13 +00:00
Alan Cox	92697f16aa	Eliminate unused declarations.	2010-01-10 21:00:52 +00:00
Bjoern A. Zeeb	23b6a4482b	Unbreak the XEN build after r201751.	2010-01-08 16:56:11 +00:00
Alan Cox	28a5e2a5d7	Make pmap_set_pg() static.	2010-01-07 17:34:45 +00:00
Alan Cox	0d7e26de54	Eliminate unused variables (see r137912).	2010-01-07 04:47:09 +00:00
Kip Macy	72bc4ff74d	- revert pmap_kenter_temporary to taking a physical address - make minidump work	2009-12-10 03:09:35 +00:00

1 2 3

128 Commits