Commit Graph

234 Commits

Author SHA1 Message Date
Don Lewis
f0ea4612ef Style(9) changes.
Pointed out by:	bde
2004-03-15 06:43:51 +00:00
Don Lewis
be4c5ad025 Remove redundant suser() check. 2004-03-15 06:36:55 +00:00
Don Lewis
169299398a Undo the merger of mlock()/vslock and munlock()/vsunlock() and the
introduction of kern_mlock() and kern_munlock() in
        src/sys/kern/kern_sysctl.c      1.150
        src/sys/vm/vm_extern.h          1.69
        src/sys/vm/vm_glue.c            1.190
        src/sys/vm/vm_mmap.c            1.179
because different resource limits are appropriate for transient and
"permanent" page wiring requests.

Retain the kern_mlock() and kern_munlock() API in the revived
vslock() and vsunlock() functions.

Combine the best parts of each of the original sets of implementations
with further code cleanup.  Make the mclock() and vslock()
implementations as similar as possible.

Retain the RLIMIT_MEMLOCK check in mlock().  Move the most strigent
test, which can return EAGAIN, last so that requests that have no
hope of ever being satisfied will not be retried unnecessarily.

Disable the test that can return EAGAIN in the vslock() implementation
because it will cause the sysctl code to wedge.

Tested by:	Cy Schubert <Cy.Schubert AT komquats.com>
2004-03-05 22:03:11 +00:00
Alexander Kabaev
30d4dd7ee9 Pich up a do {} while(0) cleanup by phk that was discarded accidentally in
previous revision.

Submitted by:	alc
2004-03-01 02:44:33 +00:00
Alexander Kabaev
c8daea132f Move the code dealing with vnode out of several functions into a single
helper function vm_mmap_vnode.

Discussed with:	jeffr,alc (a while ago)
2004-02-27 22:02:15 +00:00
Don Lewis
47934cef8f Split the mlock() kernel code into two parts, mlock(), which unpacks
the syscall arguments and does the suser() permission check, and
kern_mlock(), which does the resource limit checking and calls
vm_map_wire().  Split munlock() in a similar way.

Enable the RLIMIT_MEMLOCK checking code in kern_mlock().

Replace calls to vslock() and vsunlock() in the sysctl code with
calls to kern_mlock() and kern_munlock() so that the sysctl code
will obey the wired memory limits.

Nuke the vslock() and vsunlock() implementations, which are no
longer used.

Add a member to struct sysctl_req to track the amount of memory
that is wired to handle the request.

Modify sysctl_wire_old_buffer() to return an error if its call to
kern_mlock() fails.  Only wire the minimum of the length specified
in the sysctl request and the length specified in its argument list.
It is recommended that sysctl handlers that use sysctl_wire_old_buffer()
should specify reasonable estimates for the amount of data they
want to return so that only the minimum amount of memory is wired
no matter what length has been specified by the request.

Modify the callers of sysctl_wire_old_buffer() to look for the
error return.

Modify sysctl_old_user to obey the wired buffer length and clean up
its implementation.

Reviewed by:	bms
2004-02-26 00:27:04 +00:00
John Baldwin
91d5354a2c Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count.  The plimit
  structure is treated similarly to struct ucred in that is is always copy
  on write, so having a reference to a structure is sufficient to read from
  it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
  limits from a process to keep the limit structure from changing out from
  under you while reading from it.
- Various global limits that are ints are not protected by a lock since
  int writes are atomic on all the archs we support and thus a lock
  wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
  behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
  either an rlimit, or the current or max individual limit of the specified
  resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
  other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
  (it didn't used the stackgap when it should have) but uses lim_rlimit()
  and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
  but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits.  It
  also no longer uses the stackgap for accessing sysctl's for the
  ibcs2_sysconf() syscall but uses kernel_sysctl() instead.  As a result,
  ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by:	mtm (mostly, I only did a few cleanups and catchups)
Tested on:	i386
Compiled on:	alpha, amd64
2004-02-04 21:52:57 +00:00
Alan Cox
cafe836a56 - Correct an error in mincore(2) that has existed since its introduction:
mincore(2) should check that the page is valid, not just allocated.
   Otherwise, it can return a false positive for a page that is not yet
   resident because it is being read from disk.
2003-12-21 06:03:40 +00:00
Alexander Kabaev
5e6dbda017 Remove trailing whitespace. 2003-12-08 02:45:45 +00:00
Alan Cox
c8123cb800 Addendum to revision 1.174: In the case where vm_pager_allocate() is called
to create a vnode-backed object, the vnode lock must be held by the caller.

Reported by:	truckman
Discussed with:	kan
2003-12-08 00:47:33 +00:00
Alan Cox
20eec4bbdb Fix a deadlock between vm_fault() and vm_mmap(): The expected lock ordering
between vm_map and vnode locks is that vm_map locks are acquired first.  In
revision 1.150 mmap(2) was changed to pass a locked vnode into vm_mmap().
This creates a lock-order reversal when vm_mmap() calls one of the vm_map
routines that acquires a vm_map lock.  The solution implemented herein is
to release the vnode lock in mmap() before calling vm_mmap() and reacquire
this lock if necessary in vm_mmap().

Approved by:	re (scottl)
Reviewed by:	jeff, kan, rwatson
2003-12-06 05:45:32 +00:00
Alan Cox
6f8b4fc03a - Remove long dead code. 2003-11-14 08:22:38 +00:00
Alan Cox
b7b7cd4421 Changes to msync(2)
- Return EBUSY if the region was wired by mlock(2) and MS_INVALIDATE
   is specified to msync(2).  This is required by the Open Group Base
   Specifications Issue 6.
 - vm_map_sync() doesn't return KERN_FAILURE.  Thus, msync(2) can't
   possibly return EIO.
 - The second major loop in vm_map_sync() handles sub maps.  Thus,
   failing on sub maps in the first major loop isn't necessary.
2003-11-14 06:55:11 +00:00
Alan Cox
d88346020b - The Open Group Base Specifications Issue 6 specifies that an munmap(2)
must return EINVAL if size is zero.  Submitted by: tegge
 - In order to avoid a race condition in multithreaded applications, the
   check and removal operations by munmap(2) must be in the same critical
   section.  To accomodate this, vm_map_check_protection() is modified to
   require its caller to obtain at least a read lock on the map.
2003-11-10 01:37:40 +00:00
Alan Cox
637315ed9c - Remove Giant from msync(2). Giant is still acquired by the lower layers
if we drop into the pmap or vnode layers.
 - Migrate the handling of zero-length msync(2)s into vm_map_sync() so that
   multithread applications can't change the map between implementing the
   zero-length hack in msync(2) and reacquiring the map lock in
   vm_map_sync().

Reviewed by:	tegge
2003-11-09 22:09:04 +00:00
Alan Cox
950f8459d4 - Rename vm_map_clean() to vm_map_sync(). This better reflects the fact
that msync(2) is its only caller.
 - Migrate the parts of the old vm_map_clean() that examined the internals
   of a vm object to a new function vm_object_sync() that is implemented in
   vm_object.c.  At the same, introduce the necessary vm object locking so
   that vm_map_sync() and vm_object_sync() can be called without Giant.

Reviewed by:	tegge
2003-11-09 05:25:35 +00:00
Bruce M Simpson
11f7ddc563 Only the super-user should be able to wire pages via the mlock() family
of system calls at this time.  Remove various #ifdef's to enforce this.
2003-10-06 01:59:04 +00:00
Marcel Moolenaar
fd75d71049 Part 2 of implementing rstacks: add the ability to create rstacks and
use the ability on ia64 to map the register stack. The orientation of
the stack (i.e. its grow direction) is passed to vm_map_stack() in the
overloaded cow argument. Since the grow direction is represented by
bits, it is possible and allowed to create bi-directional stacks.
This is not an advertised feature, more of a side-effect.

Fix a bug in vm_map_growstack() that's specific to rstacks and which
we could only find by having the ability to create rstacks: when
the mapped stack ends at the faulting address, we have not actually
mapped the faulting address. we need to include or cover the faulting
address.

Note that at this time mmap(2) has not been extended to allow the
creation of rstacks by processes. If such a need arises, this can
be done.

Tested on: alpha, i386, ia64, sparc64
2003-09-27 22:28:14 +00:00
Peter Wemm
c460ac3a00 Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit
systems where the data/stack/etc limits are too big for a 32 bit process.

Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c.

Supply an ia32_fixlimits function.  Export the clip/default values to
sysctl under the compat.ia32 heirarchy.

Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max
value rather than the sysctl tweakable variable.  This allows mmap to
place mappings at sensible locations when limits have been reduced.

Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same
method as mmap(0, ...) now does.

Note that we cannot remove all references to the sysctl tweakable
maxdsiz etc variables because /etc/login.conf specifies a datasize
of 'unlimited'.  And that causes exec etc to fail since it can no
longer find space to mmap things.
2003-09-25 01:10:26 +00:00
Alan Cox
7ebcee376a Revise the locking in mincore(2). 2003-09-07 18:47:54 +00:00
Bruce M Simpson
abd498aa71 Add the mlockall() and munlockall() system calls.
- All those diffs to syscalls.master for each architecture *are*
   necessary. This needed clarification; the stub code generation for
   mlockall() was disabled, which would prevent applications from
   linking to this API (suggested by mux)
 - Giant has been quoshed. It is no longer held by the code, as
   the required locking has been pushed down within vm_map.c.
 - Callers must specify VM_MAP_WIRE_HOLESOK or VM_MAP_WIRE_NOHOLES
   to express their intention explicitly.
 - Inspected at the vmstat, top and vm pager sysctl stats level.
   Paging-in activity is occurring correctly, using a test harness.
 - The RES size for a process may appear to be greater than its SIZE.
   This is believed to be due to mappings of the same shared library
   page being wired twice. Further exploration is needed.
 - Believed to back out of allocations and locks correctly
   (tested with WITNESS, MUTEX_PROFILING, INVARIANTS and DIAGNOSTIC).

PR:             kern/43426, standards/54223
Reviewed by:    jake, alc
Approved by:    jake (mentor)
MFC after:	2 weeks
2003-08-11 07:14:08 +00:00
Poul-Henning Kamp
a5d841d4ce Remove unnecessary cast. 2003-07-04 12:23:43 +00:00
Poul-Henning Kamp
3b6d965263 Add a f_vnode field to struct file.
Several of the subtypes have an associated vnode which is used for
stuff like the f*() functions.

By giving the vnode a speparate field, a number of checks for the specific
subtype can be replaced simply with a check for f_vnode != NULL, and
we can later free f_data up to subtype specific use.

At this point in time, f_data still points to the vnode, so any code I
might have overlooked will still work.
2003-06-22 08:41:43 +00:00
Poul-Henning Kamp
a6af4ff136 Use a do {...} while (0); and a couple of breaks to reduce the level
of indentation a bit.
2003-06-21 08:27:06 +00:00
David E. O'Brien
874651b13c Use __FBSDID(). 2003-06-11 23:50:51 +00:00
Alan Cox
bc5b057f6c Hold the vm object's lock when performing vm_page_lookup(). 2003-06-09 07:01:05 +00:00
John Baldwin
69297bf8c9 suser() does not need the proc lock, just the setting of P_PROTECTED in
p_flag needs the lock.
2003-04-17 22:38:27 +00:00
Wes Peters
f4cf2141f6 Add a facility allowing processes to inform the VM subsystem they are
critical and should not be killed when pageout is looking for more
memory pages in all the wrong places.

Reviewed by:	arch@
Sponsored by:	St. Bernard Software
2003-03-31 21:09:57 +00:00
Maxime Henrion
6900a17c75 The object type can't be OBJT_PHYS in vm_mmap().
Reviewed by:	peter
2003-03-30 00:56:20 +00:00
Matthew Dillon
48e3128b34 Bow to the whining masses and change a union back into void *. Retain
removal of unnecessary casts and throw in some minor cleanups to see if
anyone complains, just for the hell of it.
2003-01-13 00:33:17 +00:00
Matthew Dillon
cd72f2180b Change struct file f_data to un_data, a union of the correct struct
pointer types, and remove a huge number of casts from code using it.

Change struct xfile xf_data to xun_data (ABI is still compatible).

If we need to add a #define for f_data and xf_data we can, but I don't
think it will be necessary.  There are no operational changes in this
commit.
2003-01-12 01:37:13 +00:00
Alan Cox
e80b7b691e Lock page field accesses in mincore().
Approved by:	re (blanket)
2002-11-28 08:01:39 +00:00
Robert Watson
3e732e7d7d Invoke mac_check_vnode_mmap() during mmap operations on vnodes,
permitting policies to restrict access to memory mapping based on
the credential requesting the mapping, the target vnode, the
requested rights, or other policy considerations.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-22 15:56:44 +00:00
Jake Burkholder
05ba50f522 Use the fields in the sysentvec and in the vm map header in place of the
constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS.
This is mainly so that they can be variable even for the native abi, based
on different machine types.  Get stack protections from the sysentvec too.
This makes it trivial to map the stack non-executable for certain abis, on
machines that support it.
2002-09-21 22:07:17 +00:00
Jeff Roberson
f6b5b182e8 - Hold a lock on the vnode acquired from the file table across the call to
vm_mmap() as well as the GETATTR etc.
 - If the handle is a vnode in vm_mmap() assert that it is locked.
 - Wiggle Giant around a little to account for the extra vnode operation.
2002-07-06 22:14:38 +00:00
Matthew Dillon
070f64fe6f Part I of RLIMIT_VMEM implementation. Implement core functionality for
a new resource limit that covers a process's entire VM space, including
mmap()'d space.

(Part II will be additional code to check RLIMIT_VMEM during exec() but it
needs more fleshing out).

PR:		kern/18209
Submitted by:	Andrey Alekseyev <uitm@zenon.net>, Dmitry Kim <jason@nichego.net>
MFC after:	7 days
2002-06-26 00:29:28 +00:00
Alan Cox
2cd301d1e1 o Remove the unnecessary acquisition and release of Giant around fdrop()
in mmap(2).
2002-06-23 01:48:22 +00:00
Alan Cox
c04c996b25 o Reduce the scope of Giant in vm_mmap() to just the code that manipulates
a vnode.  (Thus, MAP_ANON and MAP_STACK never acquire Giant.)
2002-06-22 19:13:56 +00:00
Alan Cox
042bb29940 o Remove GIANT_REQUIRED from vm_fault_user_wire().
o Move pmap_pageable() outside of Giant in vm_fault_unwire().
   (pmap_pageable() is a no-op on all supported architectures.)
 o Remove the acquisition and release of Giant from mlock().
2002-06-16 20:42:29 +00:00
Alan Cox
e30616dbfe o Remove the acquisition and release of Giant from munlock().
Reviewed by:	tegge
2002-06-15 05:05:04 +00:00
Alan Cox
1d7cf06c8c o Use vm_map_wire() and vm_map_unwire() in place of vm_map_pageable() and
vm_map_user_pageable().
 o Remove vm_map_pageable() and vm_map_user_pageable().
 o Remove vm_map_clear_recursive() and vm_map_set_recursive().  (They were
   only used by vm_map_pageable() and vm_map_user_pageable().)

Reviewed by:	tegge
2002-06-14 18:21:01 +00:00
Alfred Perlstein
fa7212543f fix typo in _SYS_SYSPROTO_H_ case: s/mlockall_args/munlockall_args
Submitted by: Mark Santcroos <marks@ripe.net>
2002-06-06 18:51:14 +00:00
Alfred Perlstein
99b9331a4f Check for defined(__i386__) instead of just defined(i386) since the compiler
will be updated to only define(__i386__) for ANSI cleanliness.
2002-05-30 07:32:58 +00:00
Alan Cox
4b9fdc2bce o Acquire and release Giant around pmap operations in vm_fault_unwire()
and vm_map_delete().  Assert GIANT_REQUIRED in vm_map_delete()
   only if operating on the kernel_object or the kmem_object.
 o Remove GIANT_REQUIRED from vm_map_remove().
 o Remove the acquisition and release of Giant from munmap().
2002-05-26 04:54:56 +00:00
Alan Cox
e0be79afbf o Eliminate the acquisition and release of Giant from minherit(2).
(vm_map_inherit() no longer requires Giant to be held.)
2002-05-18 18:59:00 +00:00
Alan Cox
094f6d2694 o Remove GIANT_REQUIRED from vm_map_madvise(). Instead, acquire and
release Giant around vm_map_madvise()'s call to pmap_object_init_pt().
 o Replace GIANT_REQUIRED in vm_object_madvise() with the acquisition
   and release of Giant.
 o Remove the acquisition and release of Giant from madvise().
2002-05-18 07:48:06 +00:00
Alan Cox
4328504956 o Remove the acquisition and release of Giant from mprotect(). 2002-05-18 03:58:16 +00:00
Alan Cox
8c5c5d049f o Remove GIANT_REQUIRED from vm_map_lookup_entry() and
vm_map_check_protection().
 o Call vm_map_check_protection() without Giant held in munmap().
2002-05-04 02:07:36 +00:00
John Baldwin
44731cab3b Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API.  The entire API now consists of two functions
similar to the pre-KSE API.  The suser() function takes a thread pointer
as its only argument.  The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0.  The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on:	smp@
2002-04-01 21:31:13 +00:00
Alfred Perlstein
11caded34f Remove __P. 2002-03-19 22:20:14 +00:00
Eivind Eklund
a128794977 - Remove a number of extra newlines that do not belong here according to
style(9)
- Minor space adjustment in cases where we have "( ", " )", if(), return(),
  while(), for(), etc.
- Add /* SYMBOL */ after a few #endifs.

Reviewed by:	alc
2002-03-10 21:52:48 +00:00
John Baldwin
a854ed9893 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
Bruce Evans
1e92845e1b Garbage-collect options ACPI_NO_ENABLE_ON_BOOT, AML_DEBUG, BLEED,
DEVICE_SYSCTLS, KEY, LOUTB, NFS_MUIDHASHSIZ, NFS_UIDHASHSIZ, PCI_QUIET
and SIMPLELOCK_DEBUG.
2002-02-15 13:16:11 +00:00
Alfred Perlstein
a4db49537b Replace ffind_* with fget calls.
Make fget MPsafe.

Make fgetvp and fgetsock use the fget subsystem to reduce code bloat.

Push giant down in fpathconf().
2002-01-14 00:13:45 +00:00
Alfred Perlstein
426da3bcfb SMP Lock struct file, filedesc and the global file list.
Seigo Tanimura (tanimura) posted the initial delta.

I've polished it quite a bit reducing the need for locking and
adapting it for KSE.

Locks:

1 mutex in each filedesc
   protects all the fields.
   protects "struct file" initialization, while a struct file
     is being changed from &badfileops -> &pipeops or something
     the filedesc should be locked.

1 mutex in each struct file
   protects the refcount fields.
   doesn't protect anything else.
   the flags used for garbage collection have been moved to
     f_gcflag which was the FILLER short, this doesn't need
     locking because the garbage collection is a single threaded
     container.
  could likely be made to use a pool mutex.

1 sx lock for the global filelist.

struct file *	fhold(struct file *fp);
        /* increments reference count on a file */

struct file *	fhold_locked(struct file *fp);
        /* like fhold but expects file to locked */

struct file *	ffind_hold(struct thread *, int fd);
        /* finds the struct file in thread, adds one reference and
                returns it unlocked */

struct file *	ffind_lock(struct thread *, int fd);
        /* ffind_hold, but returns file locked */

I still have to smp-safe the fget cruft, I'll get to that asap.
2002-01-13 11:58:06 +00:00
Paul Saab
cbc89bfbfe Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader
tunable.

Reviewed by:	peter
MFC after:	2 weeks
2001-10-10 23:06:54 +00:00
Robert Watson
8c5d4fe829 o Modify access control checks in mmap() to use securelevel_gt() instead
of direct variable access.

Obtained from:	TrustedBSD Project
2001-09-26 20:29:39 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
Matthew Dillon
d2c60af81a Cleanup 2001-08-31 01:26:30 +00:00
Matthew Dillon
676274db9b Remove support for the badly broken MAP_INHERIT (from -current only). 2001-08-24 19:29:56 +00:00
Matthew Dillon
54d9214595 whitespace / register cleanup 2001-07-04 19:00:13 +00:00
Matthew Dillon
0cddd8f023 With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage).  Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
2001-07-04 16:20:28 +00:00
John Baldwin
190609dd48 Stick VM syscalls back under Giant if the BLEED option is not defined. 2001-05-24 18:04:29 +00:00
John Baldwin
e4ca250d4b - Obtain Giant in mmap() syscall while messing with file descriptors and
vnodes.
- Fix an old bug that would leak a reference to a fd if the vnode being
  mmap'd wasn't of type VREG or VCHR.
- Lock Giant in vm_mmap() around calls into the VM that can call into
  pager routines that need Giant or into other VM routines that need
  Giant.
- Replace code that used a goto to jump around the else branch of a test
  to use an else branch instead.
2001-05-23 22:17:43 +00:00
John Baldwin
12635f9c89 Unlock the VM lock at the end of munlock() instead of locking it again. 2001-05-22 06:07:36 +00:00
Alfred Perlstein
2395531439 Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb
2001-05-19 01:28:09 +00:00
Mark Murray
fb919e4d5a Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)
2001-05-01 08:13:21 +00:00
Matthew Dillon
279d722604 This patchset fixes a large number of file descriptor race conditions.
Pre-rfork code assumed inherent locking of a process's file descriptor
    array.  However, with the advent of rfork() the file descriptor table
    could be shared between processes.  This patch closes over a dozen
    serious race conditions related to one thread manipulating the table
    (e.g. closing or dup()ing a descriptor) while another is blocked in
    an open(), close(), fcntl(), read(), write(), etc...

PR: kern/11629
Discussed with: Alexander Viro <viro@math.psu.edu>
2000-11-18 21:01:04 +00:00
Boris Popov
9ff5ce6baf Add three new VOPs: VOP_CREATEVOBJECT, VOP_DESTROYVOBJECT and VOP_GETVOBJECT.
They will be used by nullfs and other stacked filesystems to support full
cache coherency.

Reviewed in general by:	mckusick, dillon
2000-09-12 09:49:08 +00:00
Kirk McKusick
3592b7155c Clean up the snapshot code so that it no longer depends on the use of
the SF_IMMUTABLE flag to prevent writing. Instead put in explicit
checking for the SF_SNAPSHOT flag in the appropriate places. With
this change, it is now possible to rename and link to snapshot files.
It is also possible to set or clear any of the owner, group, or
other read bits on the file, though none of the write or execute
bits can be set. There is also an explicit test to prevent the
setting or clearing of the SF_SNAPSHOT flag via chflags() or
fchflags(). Note also that the modify time cannot be changed as
it needs to accurately reflect the time that the snapshot was taken.

Submitted by:	Robert Watson <rwatson@FreeBSD.org>
2000-07-26 23:07:01 +00:00
Mark Murray
2589f2499d Nifty idea from Jeroen van Gelderen; don't call a routine to check if
we are using the /dev/zero device, just check a flag (supplied by
/dev/zero).
Reviewed by:	dfr
2000-06-25 09:44:32 +00:00
Peter Wemm
249645144d Checkpoint of a new physical memory backed object type, that does not
have pv_entries.  This is intended for very special circumstances,
eg: a certain database that has a 1GB shm segment mapped into 300
processes.  That would consume 2GB of kvm just to hold the pv_entries
alone.  This would not be used on systems unless the physical ram was
available, as it's not pageable.

This is a work-in-progress, but is a useful and functional checkpoint.
Matt has got some more fixes for it that will be committed soon.

Reviewed by:	dillon
2000-05-21 13:41:29 +00:00
Peter Wemm
0385347c1a Implement an optimization of the VM<->pmap API. Pass vm_page_t's directly
to various pmap_*() functions instead of looking up the physical address
and passing that.  In many cases, the first thing the pmap code was doing
was going to a lot of trouble to get back the original vm_page_t, or
it's shadow pv_table entry.

Inspired by: John Dyson's 1998 patches.

Also:
Eliminate pv_table as a seperate thing and build it into a machine
dependent part of vm_page_t.  This eliminates having a seperate set of
structions that shadow each other in a 1:1 fashion that we often went to
a lot of trouble to translate from one to the other. (see above)
This happens to save 4 bytes of physical memory for each page in the
system.  (8 bytes on the Alpha).

Eliminate the use of the phys_avail[] array to determine if a page is
managed (ie: it has pv_entries etc).  Store this information in a flag.
Things like device_pager set it because they create vm_page_t's on the
fly that do not have pv_entries.  This makes it easier to "unmanage" a
page of physical memory (this will be taken advantage of in subsequent
commits).

Add a function to add a new page to the freelist.  This could be used
for reclaiming the previously wasted pages left over from preloaded
loader(8) files.

Reviewed by:	dillon
2000-05-21 12:50:18 +00:00
Garrett Wollman
aa543039b5 Implement POSIX.1b shared memory objects. In this implementation,
shared memory objects are regular files; the shm_open(3) routine
uses fcntl(2) to set a flag on the descriptor which tells mmap(2)
to automatically apply MAP_NOSYNC.

Not objected to by: bde, dillon, dufault, jasone
2000-04-22 15:22:31 +00:00
Philippe Charnier
5929bcfaba Revert spelling mistake I made in the previous commit
Requested by: Alan and Bruce
2000-03-27 20:41:17 +00:00
Philippe Charnier
956f31353c Spelling 2000-03-26 15:20:23 +00:00
Paul Saab
9730a5daab Add MAP_NOCORE to mmap(2), and MADV_NOCORE and MADV_CORE to madvise(2).
This
This feature allows you to specify if mmap'd data is included in
an application's corefile.

Change the type of eflags in struct vm_map_entry from u_char to
vm_eflags_t (an unsigned int).

Reviewed by:	dillon,jdp,alfred
Approved by:	jkh
2000-02-28 04:10:35 +00:00
Matthew Dillon
1f6889a1eb Fix null-pointer dereference crash when the system is intentionally
run out of KVM through a mmap()/fork() bomb that allocates hundreds
    of thousands of vm_map_entry structures.

    Add panic to make null-pointer dereference crash a little more verbose.

    Add a new sysctl, vm.max_proc_mmap, which specifies the maximum number
    of mmap()'d spaces (discrete vm_map_entry's in the process).  The value
    defaults to around 9000 for a 128MB machine.  The test is scaled for the
    number of processes sharing a vmspace (aka linux threads).  Setting
    the value to 0 disables the feature.

PR: kern/16573
Approved by: jkh
2000-02-16 21:11:33 +00:00
Guido van Rooij
00d76afede Use MAP_NOSYNC for vnodes without any links in their filesystem.
This is necessary for vmware: it does not use an anonymous mmap for
the memory of the virtual system. In stead it creates a temp file an
unlinks it. For a 50 MB file, this results in a ot of syncing
every 30 seconds.

Reviewed by:	Matthew Dillon <dillon@backplane.com>
2000-01-03 19:13:53 +00:00
Matthew Dillon
4f79d873c1 Add MAP_NOSYNC feature to mmap(), and MADV_NOSYNC and MADV_AUTOSYNC to
madvise().

    This feature prevents the update daemon from gratuitously flushing
    dirty pages associated with a mapped file-backed region of memory.  The
    system pager will still page the memory as necessary and the VM system
    will still be fully coherent with the filesystem.  Modifications made
    by other means to the same area of memory, for example by write(), are
    unaffected.  The feature works on a page-granularity basis.

    MAP_NOSYNC allows one to use mmap() to share memory between processes
    without incuring any significant filesystem overhead, putting it in
    the same performance category as SysV Shared memory and anonymous memory.

Reviewed by: julian, alc, dg
1999-12-12 03:19:33 +00:00
Poul-Henning Kamp
923502ff91 useracc() the prequel:
Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>.  This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.
1999-10-29 18:09:36 +00:00
Matthew Dillon
b430905573 cleanup madvise code, add a few more sanity checks.
Reviewed by:	Alan Cox <alc@cs.rice.edu>,  dg@root.com
1999-09-21 05:00:48 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Poul-Henning Kamp
0ef1c82630 Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>,
a few lines into <sys/vnode.h>.

Add a few fields to struct specinfo, paving the way for the fun part.
1999-08-08 18:43:05 +00:00
Alan Cox
4738fa0970 vm_mmap:
Insure that device mappings get MAP_PREFAULT(_PARTIAL) set,
	so that 4M page mappings are used when possible.

Reviewed by:	Luoqi Chen <luoqi@watermarkgroup.com>
1999-06-05 18:21:53 +00:00
Alan Cox
e972780a11 Add the options MAP_PREFAULT and MAP_PREFAULT_PARTIAL to vm_map_find/insert,
eliminating the need for the pmap_object_init_pt calls in imgact_* and
mmap.

Reviewed by:	David Greenman <dg@root.com>
1999-05-17 00:53:56 +00:00
Alan Cox
ea41812fe5 Remove prototypes for functions that don't exist anymore (vm_map.h).
Remove a useless argument from vm_map_madvise's interface (vm_map.c,
	vm_map.h, and vm_mmap.c).

Remove a redundant test in vm_uiomove (vm_map.c).

Make two changes to vm_object_coalesce:

1. Determine whether the new range of pages actually overlaps
the existing object's range of pages before calling vm_object_page_remove.
(Prior to this change almost 90% of the calls to vm_object_page_remove
were to remove pages that were beyond the end of the object.)

2. Free any swap space allocated to removed pages.
1999-05-16 05:07:34 +00:00
Alan Cox
e5f13bdd09 Simplify vm_map_find/insert's interface: remove the MAP_COPY_NEEDED option.
It never makes sense to specify MAP_COPY_NEEDED without also specifying
MAP_COPY_ON_WRITE, and vice versa.  Thus, MAP_COPY_ON_WRITE suffices.

Reviewed by:	David Greenman <dg@root.com>
1999-05-14 23:09:34 +00:00
Peter Wemm
4d38e6b5ec Add brackets to silence egcs and help clarity. 1999-05-06 22:06:45 +00:00
Luoqi Chen
d28ab90f02 Don't ignore mmap() address hint below the text section. 1999-05-06 00:46:19 +00:00
Poul-Henning Kamp
f711d546d2 Suser() simplification:
1:
  s/suser/suser_xxx/

2:
  Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
  s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.
1999-04-27 11:18:52 +00:00
Peter Wemm
db42d90829 unifdef -DVM_STACK - it's been on for a while for x86 and was checked
and appeared to be working for the Alpha some time ago.
1999-04-19 14:14:14 +00:00
Alan Cox
dd2622a8cd To avoid a conflict for the vm_map's lock with vm_fault, release
the read lock around the subyte operations in mincore.  After the lock is
reacquired, use the map's timestamp to determine if we need to restart
the scan.
1999-03-02 22:55:02 +00:00
Alan Cox
eff50fcd4c mincore doesn't modify the vm_map. Therefore, it doesn't require
an exclusive lock.  A read lock will suffice.
1999-03-01 20:42:16 +00:00
Luoqi Chen
b1028ad122 Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This
is the preparation step for moving pmap storage out of vmspace proper.

Reviewed by:	Alan Cox	<alc@cs.rice.edu>
		Matthew Dillion	<dillon@apollo.backplane.com>
1999-02-19 14:25:37 +00:00
Matthew Dillon
9fdfe602fc Remove MAP_ENTRY_IS_A_MAP 'share' maps. These maps were once used to
attempt to optimize forks but were essentially given-up on due to
    problems and replaced with an explicit dup of the vm_map_entry structure.
    Prior to the removal, they were entirely unused.
1999-02-07 21:48:23 +00:00
Julian Elischer
2907af2a96 Mostly remove the VM_STACK OPTION.
This changes the definitions of a few items so that structures are the
same whether or not the option itself is enabled. This allows
people to enable and disable the option without recompilng the world.

As the author says:

|I ran into a problem pulling out the VM_STACK option.  I was aware of this
|when I first did the work, but then forgot about it.  The VM_STACK stuff
|has some code changes in the i386 branch.  There need to be corresponding
|changes in the alpha branch before it can come out completely.

what is done:
|
|1) Pull the VM_STACK option out of the header files it appears in.  This
|really shouldn't affect anything that executes with or without the rest
|of the VM_STACK patches.  The vm_map_entry will then always have one
|extra element (avail_ssize).  It just won't be used if the VM_STACK
|option is not turned on.
|
|I've also pulled the option out of vm_map.c.  This shouldn't harm anything,
|since the routines that are enabled as a result are not called unless
|the VM_STACK option is enabled elsewhere.
|
|2) Add what appears to be appropriate code the the alpha branch, still
|protected behind the VM_STACK switch.  I don't have an alpha machine,
|so we would need to get some testers with alpha machines to try it out.
|
|Once there is some testing, we can consider making the change permanent
|for both i386 and alpha.
|
[..]
|
|Once the alpha code is adequately tested, we can pull VM_STACK out
|everywhere.
|

Submitted by:	"Richard Seaman, Jr." <dick@tar.com>
1999-01-26 02:49:52 +00:00
Matthew Dillon
1c7c3c6a86 This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
    fixes, several VM optimizations, and some additional revamping of the
    VM code.  The specific bug fixes will be documented with additional
    forced commits.  This commit is somewhat rough in regards to code
    cleanup issues.

Reviewed by:	"John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
1999-01-21 08:29:12 +00:00
Julian Elischer
2267af789e Add (but don't activate) code for a special VM option to make
downward growing stacks more general.
Add (but don't activate) code to use the new stack facility
when running threads, (specifically the linux threads support).
This allows people to use both linux compiled linuxthreads, and also the
native FreeBSD linux-threads port.

The code is conditional on VM_STACK. Not using this will
produce the old heavily tested system.

Submitted by: Richard Seaman <dick@tar.com>
1999-01-06 23:05:42 +00:00
Dmitrij Tejblum
fc56545639 Don't disable mmap with large file offset. 1998-12-09 20:22:21 +00:00