freebsd-dev

Author	SHA1	Message	Date
Konstantin Belousov	49e3050e6c	VI_OBJDIRTY vnode flag mirrors the state of OBJ_MIGHTBEDIRTY vm object flag. Besides providing the redundand information, need to update both vnode and object flags causes more acquisition of vnode interlock. OBJ_MIGHTBEDIRTY is only checked for vnode-backed vm objects. Remove VI_OBJDIRTY and make sure that OBJ_MIGHTBEDIRTY is set only for vnode-backed vm objects. Suggested and reviewed by: alc Tested by: pho MFC after: 3 weeks	2009-12-21 12:29:38 +00:00
Antoine Brodin	4e2d83fc4a	Remove trailing ";" in UMA_HASH_INSERT and UMA_HASH_REMOVE macros. MFC after: 1 month	2009-12-05 17:45:56 +00:00
Alan Cox	79f6ebe233	Properly synchronize the previous change.	2009-11-28 00:50:09 +00:00
Alan Cox	d8778512cf	Support the new VM_PROT_COPY option on wired pages. The effect of which is that a debugger can now set a breakpoint in a program that uses mlock(2) on its text segment or mlockall(2) on its entire address space.	2009-11-27 22:08:29 +00:00
Alan Cox	e2997fea72	Simplify the invocation of vm_fault(). Specifically, eliminate the flag VM_FAULT_DIRTY. The information provided by this flag can be trivially inferred by vm_fault(). Discussed with: kib	2009-11-27 20:24:11 +00:00
Alan Cox	a6d42a0d62	Replace VM_PROT_OVERRIDE_WRITE by VM_PROT_COPY. VM_PROT_OVERRIDE_WRITE has represented a write access that is allowed to override write protection. Until now, VM_PROT_OVERRIDE_WRITE has been used to write breakpoints into text pages. Text pages are not just write protected but they are also copy-on-write. VM_PROT_OVERRIDE_WRITE overrides the write protection on the text page and triggers the replication of the page so that the breakpoint will be written to a private copy. However, here is where things become confused. It is the debugger, not the process being debugged that requires write access to the copied page. Nonetheless, the copied page is being mapped into the process with write access enabled. In other words, once the debugger sets a breakpoint within a text page, the program can write to its private copy of that text page. Whereas prior to setting the breakpoint, a SIGSEGV would have occurred upon a write access. VM_PROT_COPY addresses this problem. The combination of VM_PROT_READ and VM_PROT_COPY forces the replication of a copy-on-write page even though the access is only for read. Moreover, the replicated page is only mapped into the process with read access, and not write access. Reviewed by: kib MFC after: 4 weeks	2009-11-26 05:16:07 +00:00
Alan Cox	2db65ab46e	Simplify both the invocation and the implementation of vm_fault() for wiring pages. (Note: Claims made in the comments about the handling of breakpoints in wired pages have been false for roughly a decade. This and another bug involving breakpoints will be fixed in coming changes.) Reviewed by: kib	2009-11-18 18:05:54 +00:00
Alan Cox	2dd02f4773	Eliminate an unnecessary #include. (This #include should have been removed in r188331 when vnode_pager_lock() was eliminated.)	2009-11-04 03:12:56 +00:00
Alan Cox	86684848b6	Eliminate a bit of hackery from vm_fault(). The operations that this hackery sought to prevent are now properly supported by vm_map_protect(). (See r198505.) Reviewed by: kib	2009-11-03 17:15:15 +00:00
Attilio Rao	1b9d701fee	Split P_NOLOAD into a per-thread flag (TDF_NOLOAD). This improvements aims for avoiding further cache-misses in scheduler specific functions which need to keep track of average thread running time and further locking in places setting for this flag. Reported by: jeff (originally), kris (currently) Reviewed by: jhb Tested by: Giuseppe Cocomazzi <sbudella at email dot it>	2009-11-03 16:46:52 +00:00
Alan Cox	2fafce9e4c	Avoid pointless calls to pmap_protect(). Reviewed by: kib	2009-11-02 17:45:39 +00:00
Ivan Voras	8a9c731f13	Add sysctl documentation strings. The descriptions are derived from tuning(7). One of the descriptions references tuning(7) because it is too complex to adequatly describe here (it is not a simple boolean sysctl) and users should be warned to that. Reviewed by: alc, kib Approved by: gnn (mentor)	2009-11-02 16:56:59 +00:00
Alan Cox	e4ed417a35	Correct an error in vm_fault_copy_entry() that has existed since the first version of this file. When a process forks, any wired pages are immediately copied because copy-on-write is not supported for wired pages. In other words, the child process is given its own private copy of each wired page from its parent's address space. Unfortunately, to date, these copied pages have been mapped into the child's address space with the wrong permissions, typically VM_PROT_ALL. This change corrects the permissions. Reviewed by: kib	2009-10-31 17:39:56 +00:00
Konstantin Belousov	210a688642	When protection of wired read-only mapping is changed to read-write, install new shadow object behind the map entry and copy the pages from the underlying objects to it. This makes the mprotect(2) call to actually perform the requested operation instead of silently do nothing and return success, that causes SIGSEGV on later write access to the mapping. Reuse vm_fault_copy_entry() to do the copying, modifying it to behave correctly when src_entry == dst_entry. Reviewed by: alc MFC after: 3 weeks	2009-10-27 10:15:58 +00:00
Alan Cox	7afab86c3d	Simplify the inner loop of vm_fault_copy_entry(). Reviewed by: kib	2009-10-26 00:01:52 +00:00
Alan Cox	36930fc947	Eliminate an unnecessary check from vm_fault_prefault().	2009-10-25 17:30:50 +00:00
Marcel Moolenaar	1a4fcaebe3	o Introduce vm_sync_icache() for making the I-cache coherent with the memory or D-cache, depending on the semantics of the platform. vm_sync_icache() is basically a wrapper around pmap_sync_icache(), that translates the vm_map_t argumument to pmap_t. o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc it replaces the pmap_page_executable() function, added to solve the I-cache problem in uiomove_fromphys(). o In proc_rwmem() call vm_sync_icache() when writing to a page that has execute permissions. This assures that when breakpoints are written, the I-cache will be coherent and the process will actually hit the breakpoint. o This also fixes the Book-E PMAP implementation that was missing necessary locking while trying to deal with the I-cache coherency in pmap_enter() (read: mmu_booke_enter_locked). The key property of this change is that the I-cache is made coherent after writes have been done. Doing it in the PMAP layer when adding or changing a mapping means that the I-cache is made coherent before any writes happen. The difference is key when the I-cache prefetches.	2009-10-21 18:38:02 +00:00
Konstantin Belousov	5c0e1c11a3	Remove spurious call to priv_check(PRIV_VM_SWAP_NOQUOTA). Call priv_check(PRIV_VM_SWAP_NORLIMIT) only when per-uid limit is actually exceed. Both changes aim at calling priv_check(9) only for the cases when privilege is actually exercised by the process. Reported and tested by: rwatson Reviewed by: alc MFC after: 3 days	2009-10-18 12:55:39 +00:00
Alan Cox	e67e0775e6	Align and pad the page queue and free page queue locks so that the linker can't possibly place them together within the same cache line. MFC after: 3 weeks	2009-10-04 18:53:10 +00:00
Bjoern A. Zeeb	fc50832731	Back out the functional parts from r197537. After r197711, affecting all user mappings, mmap no longer needs special treatment.	2009-10-02 17:51:46 +00:00
Konstantin Belousov	6fecb26b6b	Move the annotation for vm_map_startup() immediately before the function. MFC after: 3 days	2009-10-01 12:48:35 +00:00
Simon L. B. Nielsen	27bfa95847	Do not allow mmap with the MAP_FIXED argument to map at address zero. This is done to make it harder to exploit kernel NULL pointer security vulnerabilities. While this of course does not fix vulnerabilities, it does mitigate their impact. Note that this may break some applications, most likely emulators or similar, which for one reason or another require mapping memory at zero. This restriction can be disabled with the security.bsd.mmap_zero sysctl variable. Discussed with: rwatson, bz Tested by: bz (Wine), simon (VirtualBox) Submitted by: jhb	2009-09-27 14:49:51 +00:00
Konstantin Belousov	497a82382b	Old (a.out) rtld attempts to mmap zero-length region, e.g. when bss of the linked object is zero-length. More old code assumes that mmap of zero length returns success. For a.out and pre-8 ELF binaries, allow the mmap of zero length. Reported by: tegge Reviewed by: tegge, alc, jhb MFC after: 3 days	2009-09-20 12:40:56 +00:00
Konstantin Belousov	8a945d109c	Reintroduce the r196640, after fixing the problem with my testing. Remove the altkstacks, instead instantiate threads with kernel stack allocated with the right size from the start. For the thread that has kernel stack cached, verify that requested stack size is equial to the actual, and reallocate the stack if sizes differ [1]. This fixes the bug introduced by r173361 that was committed several days after r173004 and consisted of kthread_add(9) ignoring the non-default kernel stack size. Also, r173361 removed the caching of the kernel stacks for a non-first thread in the process. Introduce separate kernel stack cache that keeps some limited amount of preallocated kernel stacks to lower the latency of thread allocation. Add vm_lowmem handler to prune the cache on low memory condition. This way, system with reasonable amount of the threads get lower latency of thread creation, while still not exhausting significant portion of KVA for unused kstacks. Submitted by: peter [1] Discussed with: jhb, julian, peter Reviewed by: jhb Tested by: pho (and retested according to new test scenarious) MFC after: 1 week	2009-09-01 11:41:51 +00:00
Konstantin Belousov	f25fa6abb2	Reverse r196640 and r196644 for now.	2009-08-29 21:53:08 +00:00
Konstantin Belousov	c3cf0b476f	Remove the altkstacks, instead instantiate threads with kernel stack allocated with the right size from the start. For the thread that has kernel stack cached, verify that requested stack size is equial to the actual, and reallocate the stack if sizes differ [1]. This fixes the bug introduced by r173361 that was committed several days after r173004 and consisted of kthread_add(9) ignoring the non-default kernel stack size. Also, r173361 removed the caching of the kernel stacks for a non-first thread in the process. Introduce separate kernel stack cache that keeps some limited amount of preallocated kernel stacks to lower the latency of thread allocation. Add vm_lowmem handler to prune the cache on low memory condition. This way, system with reasonable amount of the threads get lower latency of thread creation, while still not exhausting significant portion of KVA for unused kstacks. Submitted by: peter [1] Discussed with: jhb, julian, peter Reviewed by: jhb Tested by: pho MFC after: 1 week	2009-08-29 13:28:02 +00:00
John Baldwin	58e279db02	Mark the fake pages constructed by the OBJT_SG pager valid. This was accidentally lost at one point during the PAT development. Without this fix vm_pager_get_pages() was zeroing each of the pages. Submitted by: czander @ NVidia MFC after: 3 days	2009-08-29 02:17:40 +00:00
John Baldwin	2fa8c8d21e	Extend the device pager to support different memory attributes on different pages in an object. - Add a new variant of d_mmap() currently called d_mmap2() which accepts an additional in/out parameter that is the memory attribute to use for the requested page. - A driver either uses d_mmap() or d_mmap2() for all requests but not both. The current implementation uses a flag in the cdevsw (D_MMAP2) to indicate that the driver provides a d_mmap2() handler instead of d_mmap(). This is done to make the change ABI compatible with existing drivers and MFC'able to 7 and 8. Submitted by: alc MFC after: 1 month	2009-08-28 14:06:55 +00:00
John Baldwin	63cbfee7b0	Remove debugging that crept in with previous commit. Reported by: nwhitehorn Approved by: re (kib)	2009-07-24 15:06:49 +00:00
John Baldwin	013818111a	Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to a device pager (OBJT_DEVICE) object in that it uses fictitious pages to provide aliases to other memory addresses. The primary difference is that it uses an sglist(9) to determine the physical addresses for a given offset into the object instead of invoking the d_mmap() method in a device driver. Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks	2009-07-24 13:50:29 +00:00
Alan Cox	9861cbc6ca	Change the handling of fictitious pages by pmap_page_set_memattr() on amd64 and i386. Essentially, fictitious pages provide a mechanism for creating aliases for either normal or device-backed pages. Therefore, pmap_page_set_memattr() on a fictitious page needn't update the direct map or flush the cache. Such actions are the responsibility of the "primary" instance of the page or the device driver that "owns" the physical address. For example, these actions are already performed by pmap_mapdev(). The device pager needn't restore the memory attributes on a fictitious page before releasing it. It's now pointless. Add pmap_page_set_memattr() to the Xen pmap. Approved by: re (kib)	2009-07-19 21:40:19 +00:00
Alan Cox	13de722155	An addendum to r195649, "Add support to the virtual memory system for configuring machine-dependent memory attributes...": Don't set the memory attribute for a "real" page that is allocated to a device object in vm_page_alloc(). It is a pointless act, because the device pager replaces this "real" page with a "fake" page and sets the memory attribute on that "fake" page. Eliminate pointless code from pmap_cache_bits() on amd64. Employ the "Self Snoop" feature supported by some x86 processors to avoid cache flushes in the pmap. Approved by: re (kib)	2009-07-18 01:50:05 +00:00
John Baldwin	0fe0ed8bf8	- Change mmap() to fail requests with EINVAL that pass a length of 0. This behavior is mandated by POSIX. - Do not fail requests that pass a length greater than SSIZE_MAX (such as > 2GB on 32-bit platforms). The 'len' parameter is actually an unsigned 'size_t' so negative values don't really make sense. Submitted by: Alexander Best alexbestms at math.uni-muenster.de Reviewed by: alc Approved by: re (kib) MFC after: 1 week	2009-07-14 19:45:36 +00:00
Alan Cox	3153e878dd	Add support to the virtual memory system for configuring machine- dependent memory attributes: Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the fact that there are machine-dependent memory attributes that have nothing to do with controlling the cache's behavior. Introduce vm_object_set_memattr() for setting the default memory attributes that will be given to an object's pages. Introduce and use pmap_page_{get,set}_memattr() for getting and setting a page's machine-dependent memory attributes. Add full support for these functions on amd64 and i386 and stubs for them on the other architectures. The function pmap_page_set_memattr() is also responsible for any other machine-dependent aspects of changing a page's memory attributes, such as flushing the cache or updating the direct map. The uses include kmem_alloc_contig(), vm_page_alloc(), and the device pager: kmem_alloc_contig() can now be used to allocate kernel memory with non-default memory attributes on amd64 and i386. vm_page_alloc() and the device pager will set the memory attributes for the real or fictitious page according to the object's default memory attributes. Update the various pmap functions on amd64 and i386 that map pages to incorporate each page's memory attributes in the mapping. Notes: (1) Inherent to this design are safety features that prevent the specification of inconsistent memory attributes by different mappings on amd64 and i386. In addition, the device pager provides a warning when a device driver creates a fictitious page with memory attributes that are inconsistent with the real page that the fictitious page is an alias for. (2) Storing the machine-dependent memory attributes for amd64 and i386 as a dedicated "int" in "struct md_page" represents a compromise between space efficiency and the ease of MFCing these changes to RELENG_7. In collaboration with: jhb Approved by: re (kib)	2009-07-12 23:31:20 +00:00
Konstantin Belousov	529ab57b9a	When VM_MAP_WIRE_HOLESOK is not specified and vm_map_wire(9) encounters non-readable and non-executable map entry, the entry is skipped from wiring and loop is aborted. But, since MAP_ENTRY_WIRE_SKIPPED was not set for the map entry, its wired_count is later erronously decremented. vm_map_delete(9) for such map entry stuck in "vmmaps". Properly set MAP_ENTRY_WIRE_SKIPPED when aborting the loop. Reported by: John Marshall <john.marshall riverwillow com au> Approved by: re (kensmith)	2009-07-12 12:37:38 +00:00
Konstantin Belousov	121fd46175	When forking a vm space that has wired map entries, do not forget to charge the objects created by vm_fault_copy_entry. The object charge was set, but reserve not incremented. Reported by: Greg Rivers <gcr+freebsd-current tharned org> Reviewed by: alc (previous version) Approved by: re (kensmith)	2009-07-03 22:17:37 +00:00
Konstantin Belousov	9b4d473a6e	Eliminiate code duplication by calling vm_object_destroy() from vm_object_collapse(). Requested and reviewed by: alc Approved by: re (kensmith)	2009-06-28 08:42:17 +00:00
Alan Cox	e999111ae7	This change is the next step in implementing the cache control functionality required by video card drivers. Specifically, this change introduces vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all architectures. In addition, this changes adds a vm_cache_mode_t parameter to kmem_alloc_contig() and vm_phys_alloc_contig(). These will be the interfaces for allocating mapped kernel memory and physical memory, respectively, with non-default cache modes. In collaboration with: jhb	2009-06-26 04:47:43 +00:00
Konstantin Belousov	9f80ce043d	Change the type of uio_resid member of struct uio from int to ssize_t. Note that this does not actually enable full-range i/o requests for 64 architectures, and is done now to update KBI only. Tested by: pho Reviewed by: jhb, bde (as part of the review of the bigger patch)	2009-06-25 18:46:30 +00:00
Konstantin Belousov	7a8af8eee2	Initialize the uip to silence gcc warning that seems to sneak in in some build environments. Reported by: alc, bf1783 at googlemail com	2009-06-24 09:26:33 +00:00
Alan Cox	26f4eea53f	The bits set in a page's dirty mask are a subset of the bits set in its valid mask. Consequently, there is no need to perform a bit-wise and of the page's dirty and valid masks in order to determine which parts of a page are dirty and valid. Eliminate an unnecessary #include.	2009-06-24 04:45:03 +00:00
Konstantin Belousov	3364c323e6	Implement global and per-uid accounting of the anonymous memory. Add rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved for the uid. The accounting information (charge) is associated with either map entry, or vm object backing the entry, assuming the object is the first one in the shadow chain and entry does not require COW. Charge is moved from entry to object on allocation of the object, e.g. during the mmap, assuming the object is allocated, or on the first page fault on the entry. It moves back to the entry on forks due to COW setup. The per-entry granularity of accounting makes the charge process fair for processes that change uid during lifetime, and decrements charge for proper uid when region is unmapped. The interface of vm_pager_allocate(9) is extended by adding struct ucred *, that is used to charge appropriate uid when allocation if performed by kernel, e.g. md(4). Several syscalls, among them is fork(2), may now return ENOMEM when global or per-uid limits are enforced. In collaboration with: pho Reviewed by: alc Approved by: re (kensmith)	2009-06-23 20:45:22 +00:00
Alan Cox	ce1b857f33	Validate the page in one place, dev_pager_getpages(), rather than doing it in two places, dev_pager_getfake() and dev_pager_updatefake(). Compare a pointer to "NULL" rather than "0".	2009-06-22 19:09:48 +00:00
Alan Cox	ef327c3ee7	Implement a mechanism within vm_phys_alloc_contig() to defer all necessary calls to vdrop() until after the free page queues lock is released. This eliminates repeatedly releasing and reacquiring the free page queues lock each time the last cached page is reclaimed from a vnode-backed object.	2009-06-21 20:29:14 +00:00
Alan Cox	6f0489c670	Strive for greater consistency among the places that implement real, fictious, and contiguous page allocation. Eliminate unnecessary reinitialization of a page's fields.	2009-06-21 00:21:33 +00:00
Andrew Thompson	f06a3a36ac	Track the kernel mapping of a physical page by a new entry in vm_page structure. When the page is shared, the kernel mapping becomes a special type of managed page to force the cache off the page mappings. This is needed to avoid stale entries on all ARM VIVT caches, and VIPT caches with cache color issue. Submitted by: Mark Tinguely Reviewed by: alc Tested by: Grzegorz Bernacki, thompsa	2009-06-18 20:42:37 +00:00
Alan Cox	aea6e893ed	Add support for UMA_SLAB_KERNEL to page_free(). (While I'm here remove an unnecessary newline character from the end of two panic messages.)	2009-06-18 07:27:11 +00:00
Alan Cox	f0553fdbc4	Eliminate unnecessary forward declarations.	2009-06-17 20:12:23 +00:00
Alan Cox	d78200e4e8	Refactor contigmalloc() into two functions: a simple front-end that deals with the malloc tag and calls a new back-end, kmem_alloc_contig(), that allocates the pages and maps them. The motivations for this change are two-fold: (1) A cache mode parameter will be added to kmem_alloc_contig(). In other words, kmem_alloc_contig() will be extended to support the allocation of memory with caller-specified caching. (2) The UMA allocation function that is used by the two jumbo frames zones can use kmem_alloc_contig() in place of contigmalloc() and thereby avoid having free jumbo frames held by the zone counted as live malloc()ed memory.	2009-06-17 17:19:48 +00:00
Alan Cox	2d59a004af	Pass the size of the mapping to contigmapping() as a "vm_size_t" rather than a "vm_pindex_t". A "vm_size_t" is more convenient for it to use.	2009-06-17 07:11:38 +00:00

1 2 3 4 5 ...

2596 Commits