freebsd-skq

Author	SHA1	Message	Date
trasz	b2b2bfee2e	Fix comment intentation.	2010-12-04 17:41:58 +00:00
kib	f8badff8ef	Do not use __FreeBSD_version prefix for the special osrel version. The ports/Mk/bsd.port.mk uses sys/param.h to fetch osrel, and cannot grok several constants with the prefix. Reported and tested by: swell.k gmail com MFC after: 1 week	2010-11-14 21:59:11 +00:00
kib	336fd1996d	Use symbolic names instead of hardcoding values for magic p_osrel constants. MFC after: 1 week	2010-11-14 18:24:12 +00:00
alc	ec0b99e7f0	Allow a POSIX shared memory object that is opened for read but not for write to nonetheless be mapped PROT_WRITE and MAP_PRIVATE, i.e., copy-on-write. (This is a regression in the new implementation of POSIX shared memory objects that is used by HEAD and RELENG_8. This bug does not exist in RELENG_7's user-level, file-based implementation.) PR: 150260 MFC after: 3 weeks	2010-09-19 19:42:04 +00:00
rstone	62d7f50b87	Fix a typo in r212281. uintptr -> uintptr_t Pointy hat to: rstone Approved by: emaste (mentor) MFC after: 2 weeks	2010-09-07 02:51:11 +00:00
rstone	0dd3ce30eb	In munmap() downgrade the vm_map_lock to a read lock before taking a read lock on the pmc-sx lock. This prevents a deadlock with pmc_log_process_mappings, which has an exclusive lock on pmc-sx and tries to get a read lock on a vm_map. Downgrading the vm_map_lock in munmap allows pmc_log_process_mappings to continue, preventing the deadlock. Without this change I could cause a deadlock on a multicore 8.1-RELEASE system by having one thread constantly mmap'ing and then munmap'ing a PROT_EXEC mapping in a loop while I repeatedly invoked and stopped pmcstat in system-wide sampling mode. Reviewed by: fabient Approved by: emaste (mentor) MFC after: 2 weeks	2010-09-07 00:23:45 +00:00
alc	115cb6b29f	Add the MAP_PREFAULT_READ option to mmap(2). Reviewed by: jhb, kib	2010-08-28 16:57:07 +00:00
kib	ba7ee96f4a	Add new make_dev_p(9) flag MAKEDEV_ETERNAL to inform devfs that created cdev will never be destroyed. Propagate the flag to devfs vnodes as VV_ETERNVALDEV. Use the flags to avoid acquiring devmtx and taking a thread reference on such nodes. In collaboration with: pho MFC after: 1 month	2010-08-06 09:42:15 +00:00
trasz	9d4312eb10	Fix commented out resource limit check in mlockall(2). It's still racy, but at least less misleading.	2010-07-27 19:26:18 +00:00
alc	3f1d4b057c	Push down page queues lock acquisition in pmap_enter_object() and pmap_is_referenced(). Eliminate the corresponding page queues lock acquisitions from vm_map_pmap_enter() and mincore(), respectively. In mincore(), this allows some additional cases to complete without ever acquiring the page queues lock. Assert that the page is managed in pmap_is_referenced(). On powerpc/aim, push down the page queues lock acquisition from moea_is_modified() and moea_is_referenced() into moea*_query_bit(). Again, this will allow some additional cases to complete without ever acquiring the page queues lock. Reorder a few statements in vm_page_dontneed() so that a race can't lead to an old reference persisting. This scenario is described in detail by a comment. Correct a spelling error in vm_page_dontneed(). Assert that the object is locked in vm_page_clear_dirty(), and restrict the page queues lock assertion to just those cases in which the page is currently writeable. Add object locking to vnode_pager_generic_putpages(). This was the one and only place where vm_page_clear_dirty() was being called without the object being locked. Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call to vm_page_clear_dirty(). Change vnode_pager_generic_putpages() to the modern-style of function definition. Also, change the name of one of the parameters to follow virtual memory system naming conventions. Reviewed by: kib	2010-05-26 18:00:44 +00:00
alc	32b13ee957	Roughly half of a typical pmap_mincore() implementation is machine- independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)	2010-05-24 14:26:57 +00:00
kmacy	1dc1263413	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib	2010-04-30 00:46:43 +00:00
alc	0a905b1db9	Resurrect pmap_is_referenced() and use it in mincore(). Essentially, pmap_ts_referenced() is not always appropriate for checking whether or not pages have been referenced because it clears any reference bits that it encounters. For example, in mincore(), clearing the reference bits has two negative consequences. First, it throws off the activity count calculations performed by the page daemon. Specifically, a page on which mincore() has called pmap_ts_referenced() looks less active to the page daemon than it should. Consequently, the page could be deactivated prematurely by the page daemon. Arguably, this problem could be fixed by having mincore() duplicate the activity count calculation on the page. However, there is a second problem for which that is not a solution. In order to clear a reference on a 4KB page, it may be necessary to demote a 2/4MB page mapping. Thus, a mincore() by one process can have the side effect of demoting a superpage mapping within another process!	2010-04-24 17:32:52 +00:00
jhb	399c01844a	Reject attempts to create a MAP_ANON mapping with a non-zero offset. PR: kern/71258 Submitted by: Alexander Best MFC after: 2 weeks	2010-03-23 21:08:07 +00:00
bz	3b88ee8187	Back out the functional parts from r197537. After r197711, affecting all user mappings, mmap no longer needs special treatment.	2009-10-02 17:51:46 +00:00
simon	a0b7c793b4	Do not allow mmap with the MAP_FIXED argument to map at address zero. This is done to make it harder to exploit kernel NULL pointer security vulnerabilities. While this of course does not fix vulnerabilities, it does mitigate their impact. Note that this may break some applications, most likely emulators or similar, which for one reason or another require mapping memory at zero. This restriction can be disabled with the security.bsd.mmap_zero sysctl variable. Discussed with: rwatson, bz Tested by: bz (Wine), simon (VirtualBox) Submitted by: jhb	2009-09-27 14:49:51 +00:00
kib	c04bcd3033	Old (a.out) rtld attempts to mmap zero-length region, e.g. when bss of the linked object is zero-length. More old code assumes that mmap of zero length returns success. For a.out and pre-8 ELF binaries, allow the mmap of zero length. Reported by: tegge Reviewed by: tegge, alc, jhb MFC after: 3 days	2009-09-20 12:40:56 +00:00
jhb	d81f73fcb5	- Change mmap() to fail requests with EINVAL that pass a length of 0. This behavior is mandated by POSIX. - Do not fail requests that pass a length greater than SSIZE_MAX (such as > 2GB on 32-bit platforms). The 'len' parameter is actually an unsigned 'size_t' so negative values don't really make sense. Submitted by: Alexander Best alexbestms at math.uni-muenster.de Reviewed by: alc Approved by: re (kib) MFC after: 1 week	2009-07-14 19:45:36 +00:00
kib	fa686c638e	Implement global and per-uid accounting of the anonymous memory. Add rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved for the uid. The accounting information (charge) is associated with either map entry, or vm object backing the entry, assuming the object is the first one in the shadow chain and entry does not require COW. Charge is moved from entry to object on allocation of the object, e.g. during the mmap, assuming the object is allocated, or on the first page fault on the entry. It moves back to the entry on forks due to COW setup. The per-entry granularity of accounting makes the charge process fair for processes that change uid during lifetime, and decrements charge for proper uid when region is unmapped. The interface of vm_pager_allocate(9) is extended by adding struct ucred *, that is used to charge appropriate uid when allocation if performed by kernel, e.g. md(4). Several syscalls, among them is fork(2), may now return ENOMEM when global or per-uid limits are enforced. In collaboration with: pho Reviewed by: alc Approved by: re (kensmith)	2009-06-23 20:45:22 +00:00
rwatson	f4934662e5	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd	2009-06-05 14:55:22 +00:00
jhb	fea04a3fd1	Add an extension to the character device interface that allows character device drivers to use arbitrary VM objects to satisfy individual mmap() requests. - A new d_mmap_single(cdev, &foff, objsize, &object, prot) callback is added to cdevsw. This function is called for each mmap() request. If it returns ENODEV, then the mmap() request will fall back to using the device's device pager object and d_mmap(). Otherwise, the method can return a VM object to satisfy this entire mmap() request via object. It can also modify the starting offset into this object via foff. This allows device drivers to use the file offset as a cookie to identify specific VM objects. - vm_mmap_vnode() has been changed to call vm_mmap_cdev() directly when mapping V_CHR vnodes. This avoids duplicating all the cdev mmap handling code and simplifies some of vm_mmap_vnode(). - D_VERSION has been bumped to D_VERSION_02. Older device drivers using D_VERSION_01 are still supported. MFC after: 1 month	2009-06-01 21:32:52 +00:00
alc	85b0c58343	Retire VM_PROT_READ_IS_EXEC. It was intended to be a micro-optimization, but I see no benefit from it today. VM_PROT_READ_IS_EXEC was only intended for use on processors that do not distinguish between read and execute permission. On an mmap(2) or mprotect(2), it automatically added execute permission if the caller specified permissions included read permission. The hope was that this would reduce the number of vm map entries needed to implement an address space because there would be fewer neighboring vm map entries that differed only in the presence or absence of VM_PROT_EXECUTE. (See vm/vm_mmap.c revision 1.56.) Today, I don't see any real applications that benefit from VM_PROT_READ_IS_EXEC. In any case, vm map entries are now organized as a self-adjusting binary search tree instead of an ordered list. So, the need for coalescing vm map entries is not as great as it once was.	2009-04-04 23:12:14 +00:00
kib	66c697aade	Revert the addition of the freelist argument for the vm_map_delete() function, done in r188334. Instead, collect the entries that shall be freed, in the deferred_freelist member of the map. Automatically purge the deferred freelist when map is unlocked. Tested by: pho Reviewed by: alc	2009-02-24 20:57:43 +00:00
kib	ae7c4cfa11	Do not call vm_object_deallocate() from vm_map_delete(), because we hold the map lock there, and might need the vnode lock for OBJT_VNODE objects. Postpone object deallocation until caller of vm_map_delete() drops the map lock. Link the map entries to be freed into the freelist, that is released by the new helper function vm_map_entry_free_freelist(). Reviewed by: tegge, alc Tested by: pho	2009-02-08 20:39:17 +00:00
jhb	dc43531891	Now that vfs_markatime() no longer requires an exclusive lock due to the VOP_MARKATIME() changes, use a shared vnode lock for mmap(). Submitted by: ups	2009-01-21 14:43:35 +00:00
rwatson	3667673537	Update mmap() comment: no more block devices, so no more block device cache coherency questions. MFC after: 3 days	2008-10-22 16:50:12 +00:00
kib	2a1cba02c6	Allow the d_mmap driver methods to use cdevpriv KPI during verification phase of establishing mapping. Discussed with: rwatson, jhb, rnoland Tested by: rnoland MFC after: 3 days	2008-09-20 19:56:02 +00:00
attilio	dbf35e279f	Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2008-08-28 15:23:18 +00:00
trhodes	f37865f7f0	Fill in a few sysctl descriptions. Reviewed by: alc, Matt Dillon <dillon@apollo.backplane.com> Approved by: alc	2008-08-03 14:26:15 +00:00
alc	933221c70b	To date, our implementation of munmap(2) has required that the entirety of the specified range be mapped. Specifically, it has returned EINVAL if the entire range is not mapped. There is not, however, any basis for this in either SuSv2 or our own man page. Moreover, neither Linux nor Solaris impose this requirement. This revision removes this requirement. Submitted by: Tijl Coosemans PR: 118510 MFC after: 6 weeks	2008-05-24 21:57:16 +00:00
alc	0a1d595c3a	In order to map device memory using superpages, mmap(2) must find a superpage-aligned virtual address for the mapping. Revision 1.65 implemented an overly simplistic and generally ineffectual method for finding a superpage-aligned virtual address. Specifically, it rounds the virtual address corresponding to the end of the data segment up to the next superpage-aligned virtual address. If this virtual address is unallocated, then the device will be mapped using superpages. Unfortunately, in modern times, where applications like the X server dynamically load much of their code, this virtual address is already allocated. In such cases, mmap(2) simply uses the first available virtual address, which is not necessarily superpage aligned. This revision changes mmap(2) to use a more robust method, specifically, the VMFS_ALIGNED_SPACE option that is now implemented by vm_map_find().	2008-05-17 19:32:48 +00:00
alc	e919c282cd	vm_map_fixed(), unlike vm_map_find(), does not update "addr", so it can be passed by value.	2008-04-28 05:30:23 +00:00
kib	de73f6b678	Do not dereference cdev->si_cdevsw, use the dev_refthread() to properly obtain the reference. In particular, this fixes the panic reported in the PR. Remove the comments stating that this needs to be done. PR: kern/119422 MFC after: 1 week	2008-03-20 16:08:42 +00:00
rwatson	877d7c65ba	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink	2008-03-16 10:58:09 +00:00
jhb	8cd9437636	Add a new file descriptor type for IPC shared memory objects and use it to implement shm_open(2) and shm_unlink(2) in the kernel: - Each shared memory file descriptor is associated with a swap-backed vm object which provides the backing store. Each descriptor starts off with a size of zero, but the size can be altered via ftruncate(2). The shared memory file descriptors also support fstat(2). read(2), write(2), ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared memory file descriptors. - shm_open(2) and shm_unlink(2) are now implemented as system calls that manage shared memory file descriptors. The virtual namespace that maps pathnames to shared memory file descriptors is implemented as a hash table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash of the pathname. - As an extension, the constant 'SHM_ANON' may be specified in place of the path argument to shm_open(2). In this case, an unnamed shared memory file descriptor will be created similar to the IPC_PRIVATE key for shmget(2). Note that the shared memory object can still be shared among processes by sharing the file descriptor via fork(2) or sendmsg(2), but it is unnamed. This effectively serves to implement the getmemfd() idea bandied about the lists several times over the years. - The backing store for shared memory file descriptors are garbage collected when they are not referenced by any open file descriptors or the shm_open(2) virtual namespace. Submitted by: dillon, peter (previous versions) Submitted by: rwatson (I based this on his version) Reviewed by: alc (suggested converting getmemfd() to shm_open())	2008-01-08 21:58:16 +00:00
rwatson	60570a92bf	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
peter	9b1e0dd3a8	Fix cosmetic bug in stale copy of msync_args. 'len' is size_t, not int.	2007-10-18 22:47:39 +00:00
kib	77766ce03f	Do not drop vm_map lock between doing vm_map_remove() and vm_map_insert(). For this, introduce vm_map_fixed() that does that for MAP_FIXED case. Dropping the lock allowed for parallel thread to occupy the freed space. Reported by: Tijl Coosemans <tijl ulyssis org> Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks	2007-08-20 12:05:45 +00:00
peter	fd45cf2a6d	Add freebsd6_ wrappers for mmap/lseek/pread/pwrite/truncate/ftruncate Approved by: re (kensmith)	2007-07-04 22:57:21 +00:00
mjacob	fadc531504	Make sure object is NULL- there is a possible case where you could fall through to it being used w/o being set. Put a break in the default case.	2007-06-17 04:17:48 +00:00
attilio	7dd8ed88a9	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)	2007-05-31 22:52:15 +00:00
jeff	e1996cb960	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>	2007-05-18 07:10:50 +00:00
rwatson	10d0d9cf47	Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>	2006-11-06 13:42:10 +00:00
rwatson	7beaaf5cd2	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA	2006-10-22 11:52:19 +00:00
kib	b90260e703	Make the mincore(2) return ENOMEM when requested range is not fully mapped. Requested by: Bruno Haible <bruno at clisp org> Reviewed by: alc Approved by: pjd (mentor) MFC after: 1 month	2006-06-21 12:59:05 +00:00
trhodes	2b289f67a8	It seems that POSIX would rather ENODEV returned in place of EINVAL when trying to mmap() an fd that isn't a normal file. Reference: http://www.opengroup.org/onlinepubs/009695399/functions/mmap.html Submitted by: fanf	2006-04-21 07:17:25 +00:00
jkoshy	48e5e4792d	MFP4: Support for profiling dynamically loaded objects. Kernel changes: Inform hwpmc of executable objects brought into the system by kldload() and mmap(), and of their removal by kldunload() and munmap(). A helper function linker_hwpmc_list_objects() has been added to "sys/kern/kern_linker.c" and is used by hwpmc to retrieve the list of currently loaded kernel modules. The unused `MAPPINGCHANGE' event has been deprecated in favour of separate `MAP_IN' and `MAP_OUT' events; this change reduces space wastage in the log. Bump the hwpmc's ABI version to "2.0.00". Teach hwpmc(4) to handle the map change callbacks. Change the default per-cpu sample buffer size to hold 32 samples (up from 16). Increment __FreeBSD_version. libpmc(3) changes: Update libpmc(3) to deal with the new events in the log file; bring the pmclog(3) manual page in sync with the code. pmcstat(8) changes: Introduce new options to pmcstat(8): "-r" (root fs path), "-M" (mapfile name), "-q"/"-v" (verbosity control). Option "-k" now takes a kernel directory as its argument but will also work with the older invocation syntax. Rework string handling in pmcstat(8) to use an opaque type for interned strings. Clean up ELF parsing code and add support for tracking dynamic object mappings reported by a v2.0.00 hwpmc(4). Report statistics at the end of a log conversion run depending on the requested verbosity level. Reviewed by: jhb, dds (kernel parts of an earlier patch) Tested by: gallatin (earlier patch)	2006-03-26 12:20:54 +00:00
dds	0fb2e655fd	Move execve's access time update functionality into a new vfs_mark_atime() function, and use the new function for performing efficient atime updates in mmap(). Reviewed by: bde MFC after: 2 weeks	2005-10-12 06:56:00 +00:00
dds	c066055957	Update the vnode's access time after an mmap operation on it. Before this change a copy operation with cp(1) would not update the file access times. According to the POSIX mmap(2) documentation: the st_atime field of the mapped file may be marked for update at any time between the mmap() call and the corresponding munmap() call. The initial read or write reference to a mapped region shall cause the file's st_atime field to be marked for update if it has not already been marked for update.	2005-10-04 14:58:58 +00:00
peter	eb46ffb067	Remove unused (but initialized) variable 'objsize' from vm_mmap()	2005-09-20 22:08:27 +00:00

1 2 3 4 5

250 Commits