freebsd-dev

Author	SHA1	Message	Date
John Baldwin	32c3d3b6e6	Move <machine/apicreg.h> to <x86/apicreg.h>.	2010-11-01 18:18:46 +00:00
Justin T. Gibbs	ff662b5c98	Improve the Xen para-virtualized device infrastructure of FreeBSD: o Add support for backend devices (e.g. blkback) o Implement extensions to the Xen para-virtualized block API to allow for larger and more outstanding I/Os. o Import a completely rewritten block back driver with support for fronting I/O to both raw devices and files. o General cleanup and documentation of the XenBus and XenStore support code. o Robustness and performance updates for the block front driver. o Fixes to the netfront driver. Sponsored by: Spectra Logic Corporation sys/xen/xenbus/init.txt: Deleted: This file explains the Linux method for XenBus device enumeration and thus does not apply to FreeBSD's NewBus approach. sys/xen/xenbus/xenbus_probe_backend.c: Deleted: Linux version of backend XenBus service routines. It was never ported to FreeBSD. See xenbusb.c, xenbusb_if.m, xenbusb_front.c xenbusb_back.c for details of FreeBSD's XenBus support. sys/xen/xenbus/xenbusvar.h: sys/xen/xenbus/xenbus_xs.c: sys/xen/xenbus/xenbus_comms.c: sys/xen/xenbus/xenbus_comms.h: sys/xen/xenstore/xenstorevar.h: sys/xen/xenstore/xenstore.c: Split XenStore into its own tree. XenBus is a software layer built on top of XenStore. The old arrangement and the naming of some structures and functions blurred these lines making it difficult to discern what services are provided by which layer and at what times these services are available (e.g. during system startup and shutdown). sys/xen/xenbus/xenbus_client.c: sys/xen/xenbus/xenbus.c: sys/xen/xenbus/xenbus_probe.c: sys/xen/xenbus/xenbusb.c: sys/xen/xenbus/xenbusb.h: Split up XenBus code into methods available for use by client drivers (xenbus.c) and code used by the XenBus "bus code" to enumerate, attach, detach, and service bus drivers. sys/xen/reboot.c: sys/dev/xen/control/control.c: Add a XenBus front driver for handling shutdown, reboot, suspend, and resume events published in the XenStore. Move all PV suspend/reboot support from reboot.c into this driver. sys/xen/blkif.h: New file from Xen vendor with macros and structures used by a block back driver to service requests from a VM running a different ABI (e.g. amd64 back with i386 front). sys/conf/files: Adjust kernel build spec for new XenBus/XenStore layout and added Xen functionality. sys/dev/xen/balloon/balloon.c: sys/dev/xen/netfront/netfront.c: sys/dev/xen/blkfront/blkfront.c: sys/xen/xenbus/... sys/xen/xenstore/... o Rename XenStore APIs and structures from xenbus_* to xs_. o Adjust to use of M_XENBUS and M_XENSTORE malloc types for allocation of objects returned by these APIs. o Adjust for changes in the bus interface for Xen drivers. sys/xen/xenbus/... sys/xen/xenstore/... Add Doxygen comments for these interfaces and the code that implements them. sys/dev/xen/blkback/blkback.c: o Rewrite the Block Back driver to attach properly via newbus, operate correctly in both PV and HVM mode regardless of domain (e.g. can be in a DOM other than 0), and to deal with the latest metadata available in XenStore for block devices. o Allow users to specify a file as a backend to blkback, in addition to character devices. Use the namei lookup of the backend path to automatically configure, based on file type, the appropriate backend method. The current implementation is limited to a single outstanding I/O at a time to file backed storage. sys/dev/xen/blkback/blkback.c: sys/xen/interface/io/blkif.h: sys/xen/blkif.h: sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/blkfront/block.h: Extend the Xen blkif API: Negotiable request size and number of requests. This change extends the information recorded in the XenStore allowing block front/back devices to negotiate for optimal I/O parameters. This has been achieved without sacrificing backward compatibility with drivers that are unaware of these protocol enhancements. The extensions center around the connection protocol which now includes these additions: o The back-end device publishes its maximum supported values for, request I/O size, the number of page segments that can be associated with a request, the maximum number of requests that can be concurrently active, and the maximum number of pages that can be in the shared request ring. These values are published before the back-end enters the XenbusStateInitWait state. o The front-end waits for the back-end to enter either the InitWait or Initialize state. At this point, the front end limits it's own capabilities to the lesser of the values it finds published by the backend, it's own maximums, or, should any back-end data be missing in the store, the values supported by the original protocol. It then initializes it's internal data structures including allocation of the shared ring, publishes its maximum capabilities to the XenStore and transitions to the Initialized state. o The back-end waits for the front-end to enter the Initalized state. At this point, the back end limits it's own capabilities to the lesser of the values it finds published by the frontend, it's own maximums, or, should any front-end data be missing in the store, the values supported by the original protocol. It then initializes it's internal data structures, attaches to the shared ring and transitions to the Connected state. o The front-end waits for the back-end to enter the Connnected state, transitions itself to the connected state, and can commence I/O. Although an updated front-end driver must be aware of the back-end's InitWait state, the back-end has been coded such that it can tolerate a front-end that skips this step and transitions directly to the Initialized state without waiting for the back-end. sys/xen/interface/io/blkif.h: o Increase BLKIF_MAX_SEGMENTS_PER_REQUEST to 255. This is the maximum number possible without changing the blkif request header structure (nr_segs is a uint8_t). o Add two new constants: BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK, and BLKIF_MAX_SEGMENTS_PER_SEGMENT_BLOCK. These respectively indicate the number of segments that can fit in the first ring-buffer entry of a request, and for each subsequent (sg element only) ring-buffer entry associated with the "header" ring-buffer entry of the request. o Add the blkif_request_segment_t typedef for segment elements. o Add the BLKRING_GET_SG_REQUEST() macro which wraps the RING_GET_REQUEST() macro and returns a properly cast pointer to an array of blkif_request_segment_ts. o Add the BLKIF_SEGS_TO_BLOCKS() macro which calculates the number of ring entries that will be consumed by a blkif request with the given number of segments. sys/xen/blkif.h: o Update for changes in interface/io/blkif.h macros. o Update the BLKIF_MAX_RING_REQUESTS() macro to take the ring size as an argument to allow this calculation on multi-page rings. o Add a companion macro to BLKIF_MAX_RING_REQUESTS(), BLKIF_RING_PAGES(). This macro determines the number of ring pages required in order to support a ring with the supplied number of request blocks. sys/dev/xen/blkback/blkback.c: sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/blkfront/block.h: o Negotiate with the other-end with the following limits: Reqeust Size: MAXPHYS Max Segments: (MAXPHYS/PAGE_SIZE) + 1 Max Requests: 256 Max Ring Pages: Sufficient to support Max Requests with Max Segments. o Dynamically allocate request pools and segemnts-per-request. o Update ring allocation/attachment code to support a multi-page shared ring. o Update routines that access the shared ring to handle multi-block requests. sys/dev/xen/blkfront/blkfront.c: o Track blkfront allocations in a blkfront driver specific malloc pool. o Strip out XenStore transaction retry logic in the connection code. Transactions only need to be used when the update to multiple XenStore nodes must be atomic. That is not the case here. o Fully disable blkif_resume() until it can be fixed properly (it didn't work before this change). o Destroy bus-dma objects during device instance tear-down. o Properly handle backend devices with powef-of-2 sector sizes larger than 512b. sys/dev/xen/blkback/blkback.c: Advertise support for and implement the BLKIF_OP_WRITE_BARRIER and BLKIF_OP_FLUSH_DISKCACHE blkif opcodes using BIO_FLUSH and the BIO_ORDERED attribute of bios. sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/blkfront/block.h: Fix various bugs in blkfront. o gnttab_alloc_grant_references() returns 0 for success and non-zero for failure. The check for < 0 is a leftover Linuxism. o When we negotiate with blkback and have to reduce some of our capabilities, print out the original and reduced capability before changing the local capability. So the user now gets the correct information. o Fix blkif_restart_queue_callback() formatting. Make sure we hold the mutex in that function before calling xb_startio(). o Fix a couple of KASSERT()s. o Fix a check in the xb_remove_ macro to be a little more specific. sys/xen/gnttab.h: sys/xen/gnttab.c: Define GNTTAB_LIST_END publicly as GRANT_REF_INVALID. sys/dev/xen/netfront/netfront.c: Use GRANT_REF_INVALID instead of driver private definitions of the same constant. sys/xen/gnttab.h: sys/xen/gnttab.c: Add the gnttab_end_foreign_access_references() API. This API allows a client to batch the release of an array of grant references, instead of coding a private for loop. The implementation takes advantage of this batching to reduce lock overhead to one acquisition and release per-batch instead of per-freed grant reference. While here, reduce the duration the gnttab_list_lock is held during gnttab_free_grant_references() operations. The search to find the tail of the incoming free list does not rely on global state and so can be performed without holding the lock. sys/dev/xen/xenpci/evtchn.c: sys/dev/xen/evtchn/evtchn.c: sys/xen/xen_intr.h: o Implement the bind_interdomain_evtchn_to_irqhandler API for HVM mode. This allows an HVM domain to serve back end devices to other domains. This API is already implemented for PV mode. o Synchronize the API between HVM and PV. sys/dev/xen/xenpci/xenpci.c: o Scan the full region of CPUID space in which the Xen VMM interface may be implemented. On systems using SuSE as a Dom0 where the Viridian API is also exported, the VMM interface is above the region we used to search. o Pass through bus_alloc_resource() calls so that XenBus drivers attaching on an HVM system can allocate unused physical address space from the nexus. The block back driver makes use of this facility. sys/i386/xen/xen_machdep.c: Use the correct type for accessing the statically mapped xenstore metadata. sys/xen/interface/hvm/params.h: sys/xen/xenstore/xenstore.c: Move hvm_get_parameter() to the correct global header file instead of as a private method to the XenStore. sys/xen/interface/io/protocols.h: Sync with vendor. sys/xeninterface/io/ring.h: Add macro for calculating the number of ring pages needed for an N deep ring. To avoid duplication within the macros, create and use the new __RING_HEADER_SIZE() macro. This macro calculates the size of the ring book keeping struct (producer/consumer indexes, etc.) that resides at the head of the ring. Add the __RING_PAGES() macro which calculates the number of shared ring pages required to support a ring with the given number of requests. These APIs are used to support the multi-page ring version of the Xen block API. sys/xeninterface/io/xenbus.h: Add Comments. sys/xen/xenbus/... o Refactor the FreeBSD XenBus support code to allow for both front and backend device attachments. o Make use of new config_intr_hook capabilities to allow front and back devices to be probed/attached in parallel. o Fix bugs in probe/attach state machine that could cause the system to hang when confronted with a failure either in the local domain or in a remote domain to which one of our driver instances is attaching. o Publish all required state to the XenStore on device detach and failure. The majority of the missing functionality was for serving as a back end since the typical "hot-plug" scripts in Dom0 don't handle the case of cleaning up for a "service domain" that is not itself. o Add dynamic sysctl nodes exposing the generic ivars of XenBus devices. o Add doxygen style comments to the majority of the code. o Cleanup types, formatting, etc. sys/xen/xenbus/xenbusb.c: Common code used by both front and back XenBus busses. sys/xen/xenbus/xenbusb_if.m: Method definitions for a XenBus bus. sys/xen/xenbus/xenbusb_front.c: sys/xen/xenbus/xenbusb_back.c: XenBus bus specialization for front and back devices. MFC after: 1 month	2010-10-19 20:53:30 +00:00
John Baldwin	60c7b36b7a	Update various places that store or manipulate CPU masks to use cpumask_t instead of int or u_int. Since cpumask_t is currently u_int on all platforms this should just be a cosmetic change.	2010-08-11 23:22:53 +00:00
John Baldwin	d9d8d1449d	Add a new ipi_cpu() function to the MI IPI API that can be used to send an IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that constructed a mask for a single CPU with calls to ipi_cpu() instead. This will matter more in the future when we transition from cpumask_t to cpuset_t for CPU masks in which case building a CPU mask is more expensive. Submitted by: peter, sbruno Reviewed by: rookie Obtained from: Yahoo! (x86) MFC after: 1 month	2010-08-06 15:36:59 +00:00
Alan Cox	9124d0d6a3	Relax one of the new assertions in pmap_enter() a little. Specifically, allow pmap_enter() to be performed on an unmanaged page that doesn't have VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages. (See, for example, pmap_remove_write().)	2010-06-11 15:49:39 +00:00
Alan Cox	ce18658792	Reduce the scope of the page queues lock and the number of PG_REFERENCED changes in vm_pageout_object_deactivate_pages(). Simplify this function's inner loop using TAILQ_FOREACH(), and shorten some of its overly long lines. Update a stale comment. Assert that PG_REFERENCED may be cleared only if the object containing the page is locked. Add a comment documenting this. Assert that a caller to vm_page_requeue() holds the page queues lock, and assert that the page is on a page queue. Push down the page queues lock into pmap_ts_referenced() and pmap_page_exists_quick(). (As of now, there are no longer any pmap functions that expect to be called with the page queues lock held.) Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever be passed an unmanaged page. Assert this rather than returning "0" and "FALSE" respectively. ARM: Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH(). Push down the page queues lock inside of pmap_clearbit(), simplifying pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write(). Additionally, this allows for avoiding the acquisition of the page queues lock in some cases. PowerPC/AIM: moea_page_exits_quick() and moea_page_wired_mappings() will never be called before pmap initialization is complete. Therefore, the check for moea_initialized can be eliminated. Push down the page queues lock inside of moea_clear_bit(), simplifying moea_clear_modify() and moea_clear_reference(). The last parameter to moea_clear_bit() is never used. Eliminate it. PowerPC/BookE: Simplify mmu_booke_page_exists_quick()'s control flow. Reviewed by: kib@	2010-06-10 16:56:35 +00:00
Alan Cox	b2830a9649	Eliminate a stale comment.	2010-05-31 06:06:10 +00:00
Alan Cox	72dc3eb65b	Simplify the inner loop of pmap_collect(): While iterating over the page's pv list, there is no point in checking whether or not the pv list is empty. Instead, wait until the loop completes.	2010-05-30 18:48:41 +00:00
Alan Cox	a1192299b3	Merge various changes from i386/i386/pmap.c: The remaining, unmerged portions of r175404 Retire PMAP_DIAGNOSTIC. Any useful diagnostics that were conditionally compiled under PMAP_DIAGNOSTIC are now KASSERT()s. (Note: The kernel option DIAGNOSTIC still disables inlining of certain pmap functions.) Eliminate dead code from pmap_enter(). This code implemented an assertion. On i386, an equivalent check is already implemented. However, on amd64, a small change is required to implement an equivalent check. Eliminate \n from a nearby panic string. Use KASSERT() to reimplement pmap_copy()'s two assertions. Merge portions of r177659 To date, we have assumed that the TLB will only set the PG_M bit in a PTE if that PTE has the PG_RW bit set. However, this assumption does not hold on recent processors from Intel. For example, consider a PTE that has the PG_RW bit set but the PG_M bit clear. Suppose this PTE is cached in the TLB and later the PG_RW bit is cleared in the PTE, but the corresponding TLB entry is not (yet) invalidated. Historically, upon a write access using this (stale) TLB entry, the TLB would observe that the PG_RW bit had been cleared and initiate a page fault, aborting the setting of the PG_M bit in the PTE. Now, however, P4- and Core2-family processors will set the PG_M bit before observing that the PG_RW bit is clear and initiating a page fault. In other words, the write does not occur but the PG_M bit is still set. The real impact of this difference is not that great. Specifically, we should no longer assert that any PTE with the PG_M bit set must also have the PG_RW bit set, and we should ignore the state of the PG_M bit unless the PG_RW bit is set. r208609 Defer freeing any page table pages in pmap_remove_all() until after the page queues lock is released. This may reduce the amount of time that the page queues lock is held by pmap_remove_all(). r208645 When I pushed down the page queues lock into pmap_is_modified(), I created an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls vm_page_dirty() must perform the call first. Otherwise, pmap_is_modified() could return FALSE without acquiring the page queues lock because the page is not (currently) writeable, and the caller to pmap_is_modified() might believe that the page's dirty field is clear because it has not seen the effect of the vm_page_dirty() call. When I pushed down the page queues lock into pmap_is_modified(), I overlooked one place where this ordering dependence is violated: pmap_enter(). In a rare situation pmap_enter() can be called to replace a dirty mapping to one page with a mapping to another page. (I say rare because replacements generally occur as a result of a copy-on-write fault, and so the old page is not dirty.) This change delays clearing PG_WRITEABLE until after vm_page_dirty() has been called. Fixing the ordering dependency also makes it easy to introduce a small optimization: When pmap_enter() used to replace a mapping to one page with a mapping to another page, it freed the pv entry for the first mapping and later called the pv entry allocator for the new mapping. Now, pmap_enter() attempts to recycle the old pv entry, saving two calls to the pv entry allocator. There is no point in setting PG_WRITEABLE on unmanaged pages, so don't. Update a comment to reflect this. Tidy up the variable declarations at the start of pmap_enter().	2010-05-30 04:44:32 +00:00
Alan Cox	c46b90e90a	Push down page queues lock acquisition in pmap_enter_object() and pmap_is_referenced(). Eliminate the corresponding page queues lock acquisitions from vm_map_pmap_enter() and mincore(), respectively. In mincore(), this allows some additional cases to complete without ever acquiring the page queues lock. Assert that the page is managed in pmap_is_referenced(). On powerpc/aim, push down the page queues lock acquisition from moea_is_modified() and moea_is_referenced() into moea*_query_bit(). Again, this will allow some additional cases to complete without ever acquiring the page queues lock. Reorder a few statements in vm_page_dontneed() so that a race can't lead to an old reference persisting. This scenario is described in detail by a comment. Correct a spelling error in vm_page_dontneed(). Assert that the object is locked in vm_page_clear_dirty(), and restrict the page queues lock assertion to just those cases in which the page is currently writeable. Add object locking to vnode_pager_generic_putpages(). This was the one and only place where vm_page_clear_dirty() was being called without the object being locked. Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call to vm_page_clear_dirty(). Change vnode_pager_generic_putpages() to the modern-style of function definition. Also, change the name of one of the parameters to follow virtual memory system naming conventions. Reviewed by: kib	2010-05-26 18:00:44 +00:00
Alan Cox	567e51e18c	Roughly half of a typical pmap_mincore() implementation is machine- independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)	2010-05-24 14:26:57 +00:00
Alan Cox	9ab6032f73	On entry to pmap_enter(), assert that the page is busy. While I'm here, make the style of assertion used by pmap_enter() consistent across all architectures. On entry to pmap_remove_write(), assert that the page is neither unmanaged nor fictitious, since we cannot remove write access to either kind of page. With the push down of the page queues lock, pmap_remove_write() cannot condition its behavior on the state of the PG_WRITEABLE flag if the page is busy. Assert that the object containing the page is locked. This allows us to know that the page will neither become busy nor will PG_WRITEABLE be set on it while pmap_remove_write() is running. Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly do copy-on-write-based zero-copy transmit on unmanaged or fictitious pages, so don't even try. Previously, the call to pmap_remove_write() would have failed silently.	2010-05-16 23:45:10 +00:00
Alan Cox	3c4a24406b	Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.	2010-05-08 20:34:01 +00:00
Kip Macy	958d87cd86	merge 194209 in to the i386/xen pmap requested by: alc@	2010-04-30 03:26:12 +00:00
Kip Macy	2965a45315	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib	2010-04-30 00:46:43 +00:00
Alan Cox	14dd3a29ea	MFi386 r207205 Clearing a page table entry's accessed bit (PG_A) and setting the page's PG_REFERENCED flag in pmap_protect() can't really be justified, so don't do it.	2010-04-27 05:35:35 +00:00
Alan Cox	7b85f59183	Resurrect pmap_is_referenced() and use it in mincore(). Essentially, pmap_ts_referenced() is not always appropriate for checking whether or not pages have been referenced because it clears any reference bits that it encounters. For example, in mincore(), clearing the reference bits has two negative consequences. First, it throws off the activity count calculations performed by the page daemon. Specifically, a page on which mincore() has called pmap_ts_referenced() looks less active to the page daemon than it should. Consequently, the page could be deactivated prematurely by the page daemon. Arguably, this problem could be fixed by having mincore() duplicate the activity count calculation on the page. However, there is a second problem for which that is not a solution. In order to clear a reference on a 4KB page, it may be necessary to demote a 2/4MB page mapping. Thus, a mincore() by one process can have the side effect of demoting a superpage mapping within another process!	2010-04-24 17:32:52 +00:00
John Baldwin	cf684ede27	Make NKPT a kernel option on i386 so that it can be set to a non-default value from kernel config files. Tested by: Charles Sprickman spork of bway net MFC after: 2 weeks	2010-03-10 19:50:52 +00:00
Attilio Rao	3258030144	Introduce the new kernel sub-tree x86 which should contain all the code shared and generalized between our current amd64, i386 and pc98. This is just an initial step that should lead to a more complete effort. For the moment, a very simple porting of cpufreq modules, BIOS calls and the whole MD specific ISA bus part is added to the sub-tree but ideally a lot of code might be added and more shared support should grow. Sponsored by: Sandvine Incorporated Reviewed by: emaste, kib, jhb, imp Discussed on: arch MFC: 3 weeks	2010-02-25 14:13:39 +00:00
Kip Macy	20f72e6f2f	- fix bootstrap for variable KVA_PAGES - remove unused CADDR1 - hold lock across page table update MFC after: 3 days	2010-02-21 01:13:34 +00:00
Ed Schouten	ddc534916d	Allow the pmap code to be built with GCC from FreeBSD 7 again. This patch basically gives us the best of both worlds. Instead of forcing the compiler to emulate GNU-style inline semantics even though we're using ISO C99, it will only use GNU-style inlining when the compiler is configured that way (__GNUC_GNU_INLINE__). Tested by: jhb	2010-02-18 14:28:38 +00:00
Ed Schouten	91bfd816f2	Recommit r193732: Remove __gnu89_inline. Now that we use C99 almost everywhere, just use C99-style in the pmap code. Since the pmap code is the only consumer of __gnu89_inline, remove it from cdefs.h as well. Because the flag was only introduced 17 months ago, I don't expect any problems. Reviewed by: alc It was backed out, because it prevented us from building kernels using a 7.x compiler. Now that most people use 8.x, there is nothing that holds us back. Even if people run 7.x, they should be able to build a kernel if they run `make kernel-toolchain' or `make buildworld' first.	2010-01-19 15:31:18 +00:00
Alan Cox	418f8af3a9	Eliminate an unused declaration.	2010-01-11 15:51:13 +00:00
Alan Cox	92697f16aa	Eliminate unused declarations.	2010-01-10 21:00:52 +00:00
Bjoern A. Zeeb	23b6a4482b	Unbreak the XEN build after r201751.	2010-01-08 16:56:11 +00:00
Alan Cox	28a5e2a5d7	Make pmap_set_pg() static.	2010-01-07 17:34:45 +00:00
Alan Cox	0d7e26de54	Eliminate unused variables (see r137912).	2010-01-07 04:47:09 +00:00
Kip Macy	72bc4ff74d	- revert pmap_kenter_temporary to taking a physical address - make minidump work	2009-12-10 03:09:35 +00:00
Attilio Rao	385e11118c	i386 has not (yet) any DEV_ATPIC conditional than axe it out from Xen version. No objections by: kmacy	2009-11-27 01:02:17 +00:00
Kip Macy	be7747b449	fixup kernel core dumps on paravirtual guests	2009-11-24 07:17:51 +00:00
Andriy Gapon	6cc16fcb4e	reflect that pg_ps_enabled is a tunable, not just a read-only sysctl Nod from: jhb	2009-11-11 14:21:31 +00:00
Marcel Moolenaar	1a4fcaebe3	o Introduce vm_sync_icache() for making the I-cache coherent with the memory or D-cache, depending on the semantics of the platform. vm_sync_icache() is basically a wrapper around pmap_sync_icache(), that translates the vm_map_t argumument to pmap_t. o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc it replaces the pmap_page_executable() function, added to solve the I-cache problem in uiomove_fromphys(). o In proc_rwmem() call vm_sync_icache() when writing to a page that has execute permissions. This assures that when breakpoints are written, the I-cache will be coherent and the process will actually hit the breakpoint. o This also fixes the Book-E PMAP implementation that was missing necessary locking while trying to deal with the I-cache coherency in pmap_enter() (read: mmu_booke_enter_locked). The key property of this change is that the I-cache is made coherent after writes have been done. Doing it in the PMAP layer when adding or changing a mapping means that the I-cache is made coherent before any writes happen. The difference is key when the I-cache prefetches.	2009-10-21 18:38:02 +00:00
Kip Macy	46aba52a50	make read_eflags and write_eflags accomplish the same effect on PVM as native, simplifying interrupt handling	2009-10-01 22:05:38 +00:00
Kip Macy	7d05808361	fix UP compilation	2009-09-11 23:41:11 +00:00
Jung-uk Kim	3bcdfb9bf8	Consolidate CPUID to CPU family/model macros for amd64 and i386 to reduce unnecessary #ifdef's for shared code between them.	2009-09-10 17:27:36 +00:00
Konstantin Belousov	9411b7675e	As was done in r196643 for i386 and amd64, swap the start/end virtual addresses in pmap_invalidate_cache_range(). Reported by: Vincent Hoffman <vince unsane co uk> Reviewed by: jhb MFC after: 3 days	2009-09-09 19:40:54 +00:00
Adrian Chadd	56af96debe	Delete whitespace not in i386/pmap.c	2009-09-01 12:17:47 +00:00
Adrian Chadd	c0bb2b8058	Migrate to use cpuset_t.	2009-09-01 06:15:50 +00:00
Adrian Chadd	fa52c6106b	Merge in the pat_works work from sys/i386/i386/pmap.c - primarily to reduce diff size.	2009-09-01 05:15:45 +00:00
Adrian Chadd	f8d44dbb74	Fix broken build.	2009-09-01 03:44:25 +00:00
Adrian Chadd	82fe1cca67	Revert previous commit; that was left-over junk in the tree.	2009-08-31 23:35:59 +00:00
Adrian Chadd	0ad6375395	Shuffle pagezero() into the same location as in sys/i386/i386/pmap.c.	2009-08-31 23:30:39 +00:00
Attilio Rao	68a5590836	Port recent IPI enhachements to en: * Introduce the ipi_nmi_handler() function for the Xen infrastructure * Fixup adeguately the ipi sender functions Approved by: re (kib)	2009-08-15 18:37:06 +00:00
Attilio Rao	dc6fbf6545	* Completely Remove the option STOP_NMI from the kernel. This option has proven to have a good effect when entering KDB by using a NMI, but it completely violates all the good rules about interrupts disabled while holding a spinlock in other occasions. This can be the cause of deadlocks on events where a normal IPI_STOP is expected. * Adds an new IPI called IPI_STOP_HARD on all the supported architectures. This IPI is responsible for sending a stop message among CPUs using a privileged channel when disponible. In other cases it just does match a normal IPI_STOP. Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64 architectures, while on the other has a normal IPI_STOP effect. It is responsibility of maintainers to eventually implement an hard stop when necessary and possible. * Use the new IPI facility in order to implement a new userend SMP kernel function called stop_cpus_hard(). That is specular to stop_cpu() but it does use the privileged channel for the stopping facility. * Let KDB use the newly introduced function stop_cpus_hard() and leave stop_cpus() for all the other cases * Disable interrupts on CPU0 when starting the process of APs suspension. * Style cleanup and comments adding This patch should fix the reboot/shutdown deadlocks many users are constantly reporting on mailing lists. Please don't forget to update your config file with the STOP_NMI option removal Reviewed by: jhb Tested by: pho, bz, rink Approved by: re (kib)	2009-08-13 17:09:45 +00:00
Konstantin Belousov	ed190ad06a	Fix XEN build breakage, by implementing pmap_invalidate_cache_range() and using it when appropriate. Merge analogue of the r195836 optimization to XEN. Approved by: re (kensmith)	2009-07-29 19:38:33 +00:00
John Baldwin	013818111a	Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to a device pager (OBJT_DEVICE) object in that it uses fictitious pages to provide aliases to other memory addresses. The primary difference is that it uses an sglist(9) to determine the physical addresses for a given offset into the object instead of invoking the d_mmap() method in a device driver. Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks	2009-07-24 13:50:29 +00:00
Alan Cox	9861cbc6ca	Change the handling of fictitious pages by pmap_page_set_memattr() on amd64 and i386. Essentially, fictitious pages provide a mechanism for creating aliases for either normal or device-backed pages. Therefore, pmap_page_set_memattr() on a fictitious page needn't update the direct map or flush the cache. Such actions are the responsibility of the "primary" instance of the page or the device driver that "owns" the physical address. For example, these actions are already performed by pmap_mapdev(). The device pager needn't restore the memory attributes on a fictitious page before releasing it. It's now pointless. Add pmap_page_set_memattr() to the Xen pmap. Approved by: re (kib)	2009-07-19 21:40:19 +00:00
Alan Cox	3153e878dd	Add support to the virtual memory system for configuring machine- dependent memory attributes: Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the fact that there are machine-dependent memory attributes that have nothing to do with controlling the cache's behavior. Introduce vm_object_set_memattr() for setting the default memory attributes that will be given to an object's pages. Introduce and use pmap_page_{get,set}_memattr() for getting and setting a page's machine-dependent memory attributes. Add full support for these functions on amd64 and i386 and stubs for them on the other architectures. The function pmap_page_set_memattr() is also responsible for any other machine-dependent aspects of changing a page's memory attributes, such as flushing the cache or updating the direct map. The uses include kmem_alloc_contig(), vm_page_alloc(), and the device pager: kmem_alloc_contig() can now be used to allocate kernel memory with non-default memory attributes on amd64 and i386. vm_page_alloc() and the device pager will set the memory attributes for the real or fictitious page according to the object's default memory attributes. Update the various pmap functions on amd64 and i386 that map pages to incorporate each page's memory attributes in the mapping. Notes: (1) Inherent to this design are safety features that prevent the specification of inconsistent memory attributes by different mappings on amd64 and i386. In addition, the device pager provides a warning when a device driver creates a fictitious page with memory attributes that are inconsistent with the real page that the fictitious page is an alias for. (2) Storing the machine-dependent memory attributes for amd64 and i386 as a dedicated "int" in "struct md_page" represents a compromise between space efficiency and the ease of MFCing these changes to RELENG_7. In collaboration with: jhb Approved by: re (kib)	2009-07-12 23:31:20 +00:00
Alan Cox	0e18ab26d0	PAE adds another level to the i386 page table. This level is a small 4-entry table that must be located within the first 4GB of RAM. This requirement is met by defining an UMA zone with a custom back-end allocator function. This revision makes two changes to this back-end allocator function: (1) It replaces the use of contigmalloc() with the use of kmem_alloc_contig(). This eliminates "double accounting", i.e., accounting by both the UMA zone and malloc tags. (I made the same change for the same reason to the zones supporting jumbo frames a week ago.) (2) It passes through the "wait" parameter, i.e., M_WAITOK, M_ZERO, etc. to kmem_alloc_contig() rather than ignoring it. pmap_init() calls uma_zalloc() with both M_WAITOK and M_ZERO. At the moment, this is harmless only because the default behavior of contigmalloc()/kmem_alloc_contig() is to wait and because pmap_init() doesn't really depend on the memory being zeroed. The back-end allocator function in the Xen pmap is dead code. I am changing it nonetheless because I don't want to leave any "bad examples" in the source tree for someone to copy at a later date. Approved by: re (kib)	2009-07-05 21:40:21 +00:00
Jeff Roberson	50c202c592	Implement a facility for dynamic per-cpu variables. - Modules and kernel code alike may use DPCPU_DEFINE(), DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined PCPU_. Requires only one extra instruction more than PCPU_ and is virtually the same as __thread for builtin and much faster for shared objects. DPCPU variables can be initialized when defined. - Modules are supported by relocating the module's per-cpu linker set over space reserved in the kernel. Modules may fail to load if there is insufficient space available. - Track space available for modules with a one-off extent allocator. Free may block for memory to allocate space for an extent. Reviewed by: jhb, rwatson, kan, sam, grehan, marius, marcel, stas	2009-06-23 22:42:39 +00:00
Ed Schouten	5942207fb4	Revert my change; reintroduce __gnu89_inline. It turns out our compiler in stable/7 can't build this code anymore. Even though my opinion is that those people should just run `make kernel-toolchain' before building a kernel, I am willing to wait and commit this after we've branched stable/8. Requested by: rwatson	2009-06-08 18:23:43 +00:00
Ed Schouten	032e3d1d19	Remove __gnu89_inline. Now that we use C99 almost everywhere, just use C99-style in the pmap code. Since the pmap code is the only consumer of __gnu89_inline, remove it from cdefs.h as well. Because the flag was only introduced 17 months ago, I don't expect any problems. Reviewed by: alc	2009-06-08 17:27:25 +00:00
Adrian Chadd	c22ca7f04f	Fix the MP IPI code to differentiate between bitmapped IPIs and function IPIs. This attempts to fix the IPI handling code to correctly differentiate between bitmapped IPIs and function IPIs. The Xen IPIs were on low numbers which clashed with the bitmapped IPIs. This commit bumps those IPI numbers up to 240 and above (just like in the i386 code) and fiddles with the ipi_vectors[] logic to call the correct function. This still isn't "right". Specifically, the IPI code may work fine for TLB shootdown events but the rendezvous/lazypmap IPIs are thrown by calling ipi_*() routines which don't set the call_func stuff (function id, addr1, addr2) that the TLB shootdown events are. So the Xen SMP support is still broken. PR: 135069	2009-05-31 08:11:39 +00:00
Adrian Chadd	23ca223bfa	Remove some unused code in ipi_selected() . The code path this was copied from (sys/i386/i386/mp_machdep.c:ipi_selected()) handles bitmap'ed IPIs and normal IPIs via separate notification paths. Xen SMP handles them the same way.	2009-05-31 07:25:24 +00:00
Adrian Chadd	eee45687e9	Even though I'm not quite sure that the call_func stuff will work properly in all the places/cases IPI messages will be generated, at least be consistent with how the call_data pointer is assigned and cleared (ie, all done inside the spinlock. Ensure that its NULL before continuing, just to try and identify situations where things are going horribly wrong.	2009-05-30 15:20:25 +00:00
Adrian Chadd	d68bbb81fb	Don't schedule a CALL_FUNCTION_VECTOR software IPI if the IPI was signaled via the bitmap (and thus sent via RESCHEDULE_VECTOR.)	2009-05-30 14:59:08 +00:00
Adrian Chadd	a235aceb1a	Correctly report the IPI IRQs being created; make it clear what vectors they are for.	2009-05-30 06:37:03 +00:00
Adrian Chadd	fd27593538	Fix the Xen TOD update when the hypervisor wall clock is nudged. The "wall clock" in the current code is actually the hypervisor start time. The time of day is the "start time" plus the hypervisor "uptime". Large enough bumps in the dom0 clock lead to a hypervisor "bump" which is implemented as a bump in the start time, not the uptime. The clock.c routines were reading in the hypervisor start time and then using this as the TOD. This meant that any hypervisor time bump would cause the FreeBSD DomU to set its TOD to the hypervisor start time, rather than the actual TOD. This fix is a bit hacky and some reshuffling should be done later on to clarify what is going on. I've left the wall clock code alone. (The code which updates shadow_tv and shadow_tv_version.) A new routine adds the uptime to the shadow_tv, which is then used to update the TOD. I've included some debugging so it is obvious when the clock is nudged. PR: 135008	2009-05-29 13:43:21 +00:00
Adrian Chadd	7d18ff9a2b	Migrate the Xen hypervisor clock reading routines into something sharable.	2009-05-29 13:36:06 +00:00
Adrian Chadd	58e3a119e5	Say hello to a very basic, read-only, Xen Hypervisor RTC. The hypervisor doesn't provide a single "TOD" - it instead provides a "start time" and a "running time". These are added together to form the current TOD. The TOD is in UTC. This RTC is only (initially) designed to be read at startup. There's some further poking that needs to happen to pick up hypervisor time changes (ie, by the Dom0 time being adjusted by something). This time adjustment currently can cause "weird stuff" in the DomU clock; I'll begin investigating and repairing that in subsequent commits. PR: 135008	2009-05-28 04:17:05 +00:00
Attilio Rao	120b18d86f	FreeBSD right now support 32 CPUs on all the architectures at least. With the arrival of 128+ cores it is necessary to handle more than that. One of the first thing to change is the support for cpumask_t that needs to handle more than 32 bits masking (which happens now). Some places, however, still assume that cpumask_t is a 32 bits mask. Fix that situation by using always correctly cpumask_t when needed. While here, remove the part under STOP_NMI for the Xen support as it is broken in any case. Additively make ipi_nmi_pending as static. Reviewed by: jhb, kmacy Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2009-05-14 17:43:00 +00:00
Alexander Motin	1703f2b424	Rename statclock_disable variable to atrtcclock_disable that it actually is, and hide it inside of atrtc driver. Add new tunable hint.atrtc.0.clock controlling it. Setting it to 0 disables using RTC clock as stat-/ profclock sources. Teach i386 and amd64 SMP platforms to emulate stat-/profclocks using i8254 hardclock, when LAPIC and RTC clocks are disabled. This allows to reduce global interrupt rate of idle system down to about 100 interrupts per core, permitting C3 and deeper C-states provide maximum CPU power efficiency.	2009-05-03 17:47:21 +00:00
Kip Macy	51a70df2db	fix XEN compilation	2009-05-02 22:22:00 +00:00
Doug Rabson	3e33218d77	Fix the Xen build for i386 PV mode.	2009-04-01 17:06:28 +00:00
John Baldwin	957f3ee2f7	Remove unused arg from npxinit(). Forgot to commit this file in the last i386 FPU change.	2009-03-05 18:43:54 +00:00
Kip Macy	69f93ab8b5	- fix formatting - fix types in ticks_to_system_time	2009-02-15 06:36:02 +00:00
Alan Cox	2f0b74fb06	Remove unnecessary page queues locking around vm_page_wakeup(). Approved by: kmacy	2009-02-14 22:07:22 +00:00
Kip Macy	ffc4f90610	Don't try to directly update page tables	2009-02-08 21:54:51 +00:00
Kip Macy	2650d60534	pass in smp_processor_id to identify the cpu in use	2009-02-05 04:00:55 +00:00
Kip Macy	bd1c698cab	adjust the way that idle happens so as to avoid missing timer interrupts	2009-02-05 02:01:18 +00:00
Kip Macy	476f05d531	make sure that interrupts are disabled when handling page faults et al	2009-02-03 03:43:00 +00:00
Bjoern A. Zeeb	e72f0a071c	Bring over the code from sys/i386/i386/mp_machdep.c, r187880 to make XEN config compile again: - Allocate apic vectors on a per-cpu basis.	2009-01-31 21:40:27 +00:00
Kip Macy	3a6d1fcf9c	merge 186535, 186537, and 186538 from releng_7_xen Log: - merge in latest xenbus from dfr's xenhvm - fix race condition in xs_read_reply by converting tsleep to mtx_sleep Log: unmask evtchn in bind_{virq, ipi}_to_irq Log: - remove code for handling case of not being able to sleep - eliminate tsleep - make sleeps atomic	2008-12-29 06:31:03 +00:00
Kip Macy	23dc562170	Integrate 185578 from dfr Use newbus to managed devices	2008-12-04 07:59:05 +00:00
Kip Macy	cbc936b6d8	fix initialization for case of normal kernbase remove unused shutdown code	2008-12-04 07:28:13 +00:00
Kip Macy	e67fc8f258	only call hardclock on cpu0 pointed out by: Scott Long	2008-10-25 20:42:10 +00:00
Kip Macy	951f87e4ae	handle case where eflags represents actual flags value when restoring interrupts	2008-10-25 04:46:02 +00:00
Kip Macy	1f5aa99363	Fix general issues with IPI support	2008-10-24 07:58:38 +00:00
Kip Macy	b1efbd6b47	Fix IPI support	2008-10-23 07:20:43 +00:00
Kip Macy	a51ac24407	Hook in ipi handlers	2008-10-21 08:03:12 +00:00
Kip Macy	764b7ef593	Implement infrastructure for gluing i386 ipi functions in to xen's infrastructure	2008-10-21 06:39:40 +00:00
Kip Macy	756fb0605f	Add routine for initializing AP clock	2008-10-21 06:38:40 +00:00
Kip Macy	39f8ee3024	Import interrupt management defines from latest xenlinux	2008-10-20 05:42:38 +00:00
Kip Macy	9bf38e47a3	- move gdt, ldt allocation to before KPT allocation - fix bugs where we would: - try to map the hypervisors address space - accidentally kick out an existing kernel mapping for some domain creation memory allocation sizes - accidentally skip a 2MB kernel mapping for some domain creation memory allocation sizes - don't rely on trapping in to xen to read rcr2, reference through vcpu - whitespace cleanups	2008-10-19 01:27:40 +00:00
Kip Macy	ba32964d08	GC unused values	2008-10-19 01:23:30 +00:00
Marius Strobl	6f04e7b9aa	Remove ipi_all() and ipi_self() as the former hasn't been used at all to date and the latter also is only used in ia64 and powerpc code which no longer serves a real purpose after bring-up and just can be removed as well. Note that architectures like sun4u also provide no means of implementing IPI'ing a CPU itself natively in the first place. Suggested by: jhb Reviewed by: arch, grehan, jhb	2008-09-28 18:34:14 +00:00
Kip Macy	852c25eda2	move ipi_pcpu to evtchn.c	2008-09-26 05:54:24 +00:00
Kip Macy	920ba15bf9	Update xen/interface includes to the latest in mercurial MFC after: 1 month	2008-09-26 05:29:39 +00:00
Kip Macy	4e01238d67	add initial ipi definitions MFC after: 1 month	2008-09-25 07:11:04 +00:00
Kip Macy	9d741e6d59	Make nkpt dependent on the size of the initial memory allocation MFC after: 1 month	2008-09-25 07:03:09 +00:00
Kip Macy	425eaba985	Change order of pcpu initialization so the pc_prvspace is set MFC after: 1 month	2008-09-18 02:59:19 +00:00
Kip Macy	6cf8efdc48	fix initial page directory setup for APs to work when KERNBASE < 0xc0000000 MFC after: 1 month	2008-09-18 01:09:15 +00:00
Kip Macy	57bd99b3c5	Some people have very strange notions of how large KVA_PAGES should be. The core of this change generalizes the initial page directory setup so that the kernel can be given arbitrarily large or small. - small formatting fixes - update copyright MFC after: 1 month	2008-09-17 19:11:37 +00:00
Kip Macy	6859a304c6	Get initial bootstrap of APs working under xen. Note that the APs still blow up in sched_throw(). MFC after: 1 month	2008-09-10 07:11:08 +00:00
Kip Macy	1fc7c4a654	enable the xen_guest string so that the freebsd xen kernel will at least pass muster with the loader on 3.0.3 Note that this doesn't actually make it work as Xen 3.0.3 appears to disallow recursive mappings on the page directory	2008-09-03 00:06:10 +00:00
Kip Macy	42e68d4d74	Accomodate the fact that the number of l1 pages varies with the size of the initially allocated memory - this lets us boot with 3GB MFC after: 1 month	2008-09-02 02:55:45 +00:00
Kip Macy	32a5d14b26	Fix problem with large memory configuration by ensuring that all NKPT page table pages have been zeroed before entering them in the page directory MFC after: 1 month	2008-09-02 01:32:52 +00:00
Kip Macy	4e683d7252	Fix boot time pmap_growkernel panic for case where vm is allocated >= 768M MFC after: 1 month	2008-08-21 02:57:02 +00:00
Kip Macy	18bad85737	- clean up interrupt handling for xen a tiny bit - parse the command line in to kenv - defer shutdown watcher until later in boot MFC after: 1 month	2008-08-20 09:16:46 +00:00
Kip Macy	ecded8075f	protect queue_log not queue MFC after: 1 month	2008-08-19 02:39:34 +00:00

1 2 3 4

156 Commits