2185 Commits

Author SHA1 Message Date
alc
215850cc69 MFC
File                  Revisions
  kern/imgact_aout.c    1.100
  kern/imgact_elf.c     1.167-1.172, 1.175
  kern/imgact_gzip.c    1.55
  vm/vm_extern.h        1.77
  vm/vm_glue.c          1.214

  Use sf_buf_alloc() instead of vm_map_find() on exec_map to create
  the ephemeral mappings that are used as the source for three copy
  operations from kernel space to user space.  There are two reasons
  for making this change: (1) Under heavy load exec_map can fill up
  causing vm_map_find() to fail.  When it fails, the nascent process
  is aborted (SIGABRT).  Whereas, this reimplementation using
  sf_buf_alloc() sleeps.  (2) Although it is possible to sleep on
  vm_map_find()'s failure until address space becomes available (see
  kmem_alloc_wait()), using sf_buf_alloc() is faster.  Furthermore,
  the reimplementation uses a CPU private mapping, avoiding a TLB
  shootdown on multiprocessors.

  The second argument to vm_map_find() should be NULL instead of 0.

  Correct a long-standing problem in elfN_map_insert(): In order to
  copy a page to user space, the user space mapping must allow write
  access.

  Eliminate an unneeded (vm_prot_t) parameter from two functions.
  Eliminate unnecessary uses of a local variable.

  Maintain the vnode lock throughout elfN_load_file() rather than
  releasing it and reacquiring it in vrele().  Consequently, there is
  no reason to increase the reference count on the vm object caching
  the file's pages.

  Eliminate unused parameters to elfN_load_file().

  Maintain the lock on the vnode for most of exec_elfN_imgact().
  Specifically, it is required for the I/O that may be performed by
  elfN_load_section().

  Avoid an obscure deadlock in the a.out, elf, and gzip image
  activators.  Add a comment describing why the deadlock does not
  occur in the common case and how it might occur in less usual
  circumstances.

  Eliminate an unused variable from exec_aout_imgact().

  Avoid a vm object reference leak in a rarely used code path.

  An executable contains at most one PT_INTERP program header.
  Therefore, the loop that searches for it can terminate after it is
  found rather than iterating over the entire set of program headers.

  Eliminate an unneeded initialization.

Approved by: re (mux)
2006-03-16 00:25:32 +00:00
jeff
3f93b5e105 MFC Rev 1.226
VFS SMP fixes, stack api, softupdates fixes.

Sponsored by:	Isilon Systems, Inc.
Approved by:	re (scottl)
2006-03-13 03:08:26 +00:00
jeff
789d9290ea MFC Revs 1.357, 1.355
VFS SMP fixes, stack api, softupdates fixes.

Sponsored by:	Isilon Systems, Inc.
Approved by:	re (scottl)
2006-03-13 03:08:21 +00:00
tegge
7d50ddd92e MFC: Eliminate a deadlock when creating snapshots. Blocking
vn_start_write() must be called without any vnode locks held.
     Remove calls to vn_start_write() and vn_finished_write() in
     vnode_pager_putpages() and add these calls before the vnode lock
     is obtained to most of the callers that don't already have them.

Approved by:	re (mux)
2006-03-09 00:18:45 +00:00
tegge
329a13b6d7 MFC: Hold extra reference to vm object while cleaning pages.
Ignore dirty pages owned by "dead" objects.

Approved by:	re (mux)
2006-03-09 00:07:35 +00:00
tegge
871ed2943c MFC: Expand scope of marker to reduce the number of page queue scan restarts.
Approved by:	re (mux)
2006-03-09 00:02:51 +00:00
tegge
816296ec32 MFC: Check return value from nonblocking call to vn_start_write().
Approved by:	re (mux)
2006-03-09 00:01:29 +00:00
tegge
d69ff42e3b MFC: Don't access fs->first_object after dropping reference to it.
The result could be a missed or extra giant unlock.

Approved by:	re (mux)
2006-03-08 23:53:39 +00:00
yar
dbcb706f58 Work around the shortness of the size argument to
vnode_create_vobject() while preserving the binary ABI
to filesystem modules in RELENG_6: introduce a new function
vnode_create_vobject_off() that takes the size argument
as off_t; move all stock file systems to it; re-implement
the old vnode_create_vobject() using vnode_create_vobject_off()
so that old or binary-only FS modules can work w/o hitting the
bug.  The trick is to pass a size of 0 to vnode_create_vobject_off()
so that it will call VOP_GETATTR() and thus get the actual,
untruncated file size even if the calling module still uses
the old vnode_create_vobject().

PR:		kern/92243
Approved by:	re (scottl)
2006-02-20 00:53:15 +00:00
rwatson
67ad5e7105 Merge uma_core.c:1.136 from HEAD to RELENG_6:
Skip per-cpu caches associated with absent CPUs when generating a
  memory statistics record stream via sysctl.

Approved by:	re (scottl)
2006-02-14 03:37:58 +00:00
ps
ee81c92cbf MFC:
- rate limit vnode_pager_putpages printfs to once a second.
- rate limit filesystem full and out of inodes messages to once a
  second.
2005-12-28 20:11:51 +00:00
dds
3459412dad MFC changes from 2005.10.26:
Move execve's access time update functionality into a
new vfs_mark_atime() function, and use the new function
for performing efficient atime updates in mmap().
2005-12-26 13:47:20 +00:00
alc
245bd5abe3 MFC
Pass a value of type vm_prot_t to pmap_enter_quick() so that it determine
  whether the mapping should permit execute access.

  Revision  Changes    Path
  1.179     +2 -2      src/sys/alpha/alpha/pmap.c
  1.527     +4 -2      src/sys/amd64/amd64/pmap.c
  1.37      +3 -3      src/sys/arm/arm/pmap.c
  1.531     +2 -2      src/sys/i386/i386/pmap.c
  1.163     +4 -3      src/sys/ia64/ia64/pmap.c
  1.100     +3 -2      src/sys/powerpc/powerpc/pmap.c
  1.149     +3 -2      src/sys/sparc64/sparc64/pmap.c
  1.72      +1 -1      src/sys/vm/pmap.h
  1.207     +2 -1      src/sys/vm/vm_fault.c
  1.368     +2 -2      src/sys/vm/vm_map.c
2005-11-13 21:45:49 +00:00
alc
7b49c93e5f MFC
Introduce the vm.boot_pages tunable and sysctl, which controls the number
  of pages reserved to bootstrap the kernel memory allocator.
2005-11-13 08:44:25 +00:00
alc
bad48fafaa MFC revisions 1.307 and 1.308
Consider the zero-copy transmission of a page that was wired by mlock(2).
  If a copy-on-write fault occurs on the page, the new copy should inherit
  a part of the original page's wire count.

  If a physical page is mapped by two or more virtual addresses, transmitted
  by the zero-copy sockets method, and written to before the transmission
  completes, we need to destroy all of the existing mappings to the page,
  not just the one that we fault on.  Otherwise, the mappings will no longer
  be to the same page and changes made through one of the mappings will not
  be visible through the others.
2005-11-13 07:38:15 +00:00
alc
3256409909 MFC revision 1.130
Introduce a new lock for the purpose of synchronizing access to the
  UMA boot pages.

  Disable recursion on the general UMA lock now that startup_alloc() no
  longer uses it.

  Eliminate the variable uma_boot_free.  It serves no purpose.

  Note: This change eliminates a lock-order reversal between a system
  map mutex and the UMA lock.  See
  http://sources.zabbadoz.net/freebsd/lor.html#109 for details.
2005-11-13 06:22:34 +00:00
rwatson
098c7ecbda Merge kern_malloc.c:1.148, uma_core.c:1.133 from HEAD to RELENG_6:
Change format string for u_int64_t to %ju from %llu, in order to use the
  correct format string on 64-bit systems.

  Pointed out by: pjd
2005-11-07 18:59:12 +00:00
rwatson
65c267c84e Merge uma_core.c:1.132 from HEAD to RELENG_6:
Add a "show uma" command to DDB, which prints out the current stats for
  available UMA zones.  Quite useful for post-mortem debugging of memory
  leaks without a dump device configured on a panicked box.
2005-11-07 18:57:08 +00:00
delphij
eba1d76119 MFC (by alc) changesets that addresses several race conditions that can
cause a kernel compiled with ZERO_COPY_SOCKETS to panic under certain
circumstances:
	sys/kern/uipc_cow.c:	1.24 - 1.26
	sys/vm/vm_object.c:	1.351

Approved by:	re (scottl)
2005-10-26 20:21:23 +00:00
delphij
86fff66f18 MFC (by jhb):
| Trim a couple of unneeded includes.
|
| Revision  Changes    Path
| 1.153     +0 -1      src/sys/kern/subr_turnstile.c
| 1.35      +0 -1      src/sys/vm/vm_zeroidle.c

Approved by:	re (scottl)
2005-10-09 03:25:37 +00:00
delphij
f1abc488c4 MFC (by alc)
| Eliminate an incorrect cast.
|
| Revision  Changes    Path
| 1.208     +1 -1      src/sys/vm/vm_fault.c

Approved by:	re (scottl)
2005-10-09 03:08:28 +00:00
delphij
a23e4ad2fe MFC (by alc)
| Eliminate an incorrect (and unnecessary) cast.
|
| Revision  Changes    Path
| 1.367     +1 -1      src/sys/vm/vm_map.c

Approved by:	re (scottl)
2005-10-09 03:07:29 +00:00
delphij
1a4ed13319 MFC (by peter)
| Remove unused (but initialized) variable 'objsize' from vm_mmap()
|
| Revision  Changes    Path
| 1.201     +1 -2      src/sys/vm/vm_mmap.c

Approved by:	re (scottl)
2005-10-09 03:05:23 +00:00
rwatson
c15d856fac Merge uma_dbg.c:1.21, uma_dbg.h:1.9 from HEAD to RELENG_6:
Improve canonicalization of copyrights.  Order copyrights by order of
  assertion (jeff, bmilekic, rwatson).

  Suggested ages ago by:  bde

Approved by:	re (kensmith)
2005-08-20 13:31:05 +00:00
alc
a574f3c833 MFC
Eliminate inconsistency in the setting of the B_DONE flag.

Approved by:	re (kensmith)
2005-08-20 06:07:55 +00:00
tegge
9f6c2d705f MFC: Check for marker pages when scanning active and inactive page queues.
Approved by:	re (kensmith)
2005-08-15 14:28:48 +00:00
kan
4d72fc60d0 MFC: Do not use vm_pager_init() to initialize vnode_pbuf_freecnt variable.
vm_pager_init() is run before required nswbuf variable has been set
to correct value. This caused system to run with single pbuf available
for vnode_pager. Handle both cluster_pbuf_freecnt and vnode_pbuf_freecnt
variableis in the same way.

Approved by:	re (kensmith)
2005-08-15 14:04:47 +00:00
rwatson
26b1b83850 Merge vm_page.h:1.137 from HEAD to RELENG_6:
Don't perform a nested include of opt_vmpage.h if LIBMEMSTAT is defined,
  as opt_vmpage.h will not be available to user space library builds.  A
  similar existing check is present for KLD_MODULE for similar reasons.

Approved by:	re (hrs)
2005-08-15 09:02:01 +00:00
rwatson
634870d90c Merge uma_int.h:1.37 from HEAD to RELENG_6:
Wrap inlines in uma_int.h in #ifdef _KERNEL so that uma_int.h can be
  used from memstat_uma.c for the purposes of kvm access without lots
  of additional unsafe includes.

Approved by:	re (hrs)
2005-08-15 09:01:11 +00:00
ssouhlal
871d11a963 MFC rev 1.222:
Use atomic operations on runningbufspace.

 PR:             kern/84318
 Submitted by:   ade

Approved by:	re (kensmith)
2005-08-15 06:22:09 +00:00
tegge
979b6a27c8 MFC: Don't allow pagedaemon to skip pages while scanning active and
inactive page queues due to the vm object being locked.

Approved by:	re (kensmith)
2005-08-12 16:43:27 +00:00
rwatson
b90366a3fc Merge uma.h:1.27, uma_core.c:1.129 from HEAD to RELENG_6:
Rename UMA_MAX_NAME to UTH_MAX_NAME, since it's a maximum in the
  monitoring API, which might or might not be the same as the internal
  maximum (currently none).

  Export flag information on UMA zones -- in particular, whether or
  not this is a secondary zone, and so the keg free count should be
  considered in that light.

Approved by:	re (kensmith)
2005-07-28 12:10:19 +00:00
rwatson
45ebd5c0a6 Merge uma_core.c:1.128 from HEAD to RELENG_6:
Further UMA statistics related changes:

    - Add a new uma_zfree_internal() flag, ZFREE_STATFREE, which causes it to
      to update the zone's uz_frees statistic.  Previously, the statistic was
      updated unconditionally.

    - Use the flag in situations where a "real" free occurs: i.e., one where
      the caller is freeing an allocated item, to be differentiated from
      situations where uma_zfree_internal() is used to tear down the item
      during slab teardown in order to invoke its fini() method.  Also use
      the flag when UMA is freeing its internal objects.

    - When exchanging a bucket with the zone from the per-CPU cache when
      freeing an item, flush cache statistics back to the zone (since the
      zone lock and critical section are both held) to match the allocation
      case.

Approved by:	re (kensmith)
2005-07-23 15:11:13 +00:00
rwatson
5562de6958 Merge uma_core.c:1.127 from HEAD to RELENG_6:
Use mp_maxid in preference to MAXCPU when creating exports of UMA
  per-CPU cache statistics.  UMA sizes the cache array based on the
  number of CPUs at boot (mp_maxid + 1), and iterating based on MAXCPU
  could read off the end of the array (into the next zone).

  Reported by:    yongari

Approved by:	re (kensmith)
2005-07-23 15:10:29 +00:00
rwatson
3fddaeea0e Merge uma.h:1.26, uma_int.h:1.36, uma_core.c:1.126 from HEAD to
RELENG_6:

  Improve canonicalization of copyrights.  Order copyrights by order of
  assertion (jeff, bmilekic, rwatson).

  Suggested ages ago by:  bde

Approved by:	re (kensmith)
2005-07-23 15:10:00 +00:00
rwatson
834f5ace40 Merge uma_core.c:1.125 from HEAD to RELENG_5:
Move the unlocking of the zone mutex in sysctl_vm_zone_stats() so that
  it covers the following of the uc_alloc/freebucket cache pointers.
  Originally, I felt that the race wasn't helped by holding the mutex,
  hence a comment in the code and not holding it across the cache access.
  However, it does improve consistency, as while it doesn't prevent
  bucket exchange, it does prevent bucket pointer invalidation.  So a
  race in gathering cache free space statistics still can occur, but not
  one that follows an invalid bucket pointer, if the mutex is held.

  Submitted by:   yongari

Approved by:	re (kensmith)
2005-07-23 15:08:53 +00:00
rwatson
5123f5ca4b Merge uma.h:1.25, uma_int.h:1.35, uma_core.c:1.124 from HEAD to
RELENG_6:

  Increase the flags field for kegs from a 16 to a 32 bit value;
  we have exhausted all 16 flags.

Approved by:	re (kensmith)
2005-07-23 15:08:12 +00:00
rwatson
383e61944d Merge uma.h:1.24, uma_int.h:1.34, uma_core.c:1.123 from HEAD to
RELENG_6:

  Track UMA(9) allocation failures by zone, and export via sysctl.

  Requested by:   victor cruceru <victor dot cruceru at gmail dot com>

Approved by:	re (kensmith)
2005-07-23 15:06:54 +00:00
rwatson
adabb7b041 Merge uma.h:1.23, uma_int.h:1.33, uma_core.c:1.122 from HEAD to
RELENG_6:

  Introduce a new sysctl, vm.zone_stats, which exports UMA(9) allocator
  statistics via a binary structure stream:

  - Add structure 'uma_stream_header', which defines a stream version,
    definition of MAXCPUs used in the stream, and the number of zone
    records in the stream.

  - Add structure 'uma_type_header', which defines the name, alignment,
    size, resource allocation limits, current pages allocated, preferred
    bucket size, and central zone + keg statistics.

  - Add structure 'uma_percpu_stat', which, for each per-CPU cache,
    includes the number of allocations and frees, as well as the number
    of free items in the cache.

  - When the sysctl is queried, return a stream header, followed by a
    series of type descriptions, each consisting of a type header
    followed by a series of MAXCPUs uma_percpu_stat structures holding
    per-CPU allocation information.  Typical values of MAXCPU will be
    1 (UP compiled kernel) and 16 (SMP compiled kernel).

  This query mechanism allows user space monitoring tools to extract
  memory allocation statistics in a machine-readable form, and to do so
  at a per-CPU granularity, allowing monitoring of allocation patterns
  across CPUs in order to better understand the distribution of work and
  memory flow over multiple CPUs.

  While here, also export the number of UMA zones as a sysctl
  vm.uma_count, in order to assist in sizing user swpace buffers to
  receive the stream.

  A follow-up commit of libmemstat(3), a library to monitor kernel memory
  allocation, will occur in the next few days.  This change directly
  supports converting netstat(1)'s "-mb" mode to using UMA-sourced stats
  rather than separately maintained mbuf allocator statistics.

Approved by:	re (kensmith)
2005-07-23 15:05:24 +00:00
rwatson
0788375698 Merge uma_int.h:1.32, uma_core.c:1.121 from HEAD to RELENG_6:
In addition to tracking allocs in the zone, also track frees.  Add
  a zone free counter, as well as a cache free counter.

Approved by:	re (kensmith)
2005-07-23 15:03:49 +00:00
rwatson
8093747980 Merge uma_core.c:1.20 from HEAD to RELENG_6:
In an earlier world order, UMA would flush per-CPU statistics to the
  zone whenever it was moving buckets between the zone and the cache,
  or when coalescing statistics across the CPU.  Remove flushing of
  statistics to the zone when coalescing statistics as part of sysctl,
  as we won't be running on the right CPU to write to the cache
  statistics.

  Add a missed gathering of statistics: when uma_zalloc_internal()
  does a special case allocation of a single item, make sure to update
  the zone statistics to represent this.  Previously this case wasn't
  accounted for in user-visible statistics.

Approved by:	re (kensmith)
2005-07-23 15:01:48 +00:00
jhb
288ca48d6c MFC: Fix instant panics when booting with debug.mpsafevm=0 by fixing up
an old test.

Approved by:	re (kensmith)
2005-07-18 19:53:21 +00:00
silby
64582f3995 Change the panic in trash_ctor into just a printf for now. Once the reports
of panics in trash_ctor relating to mbufs have been examined and a fix
found, this will be turned back into a panic.

Approved by: re (rwatson)
2005-06-26 23:44:07 +00:00
alc
67602b23a9 Increase UMA_BOOT_PAGES to prevent a crash during initialization. See
http://docs.FreeBSD.org/cgi/mid.cgi?42AD8270.8060906 for a detailed
description of the crash.

Reported by: Eric Anderson
Approved by: re (scottl)
MFC after: 3 days
2005-06-16 17:06:34 +00:00
green
3bb055500e The new contigmalloc(9) has a bad degenerate case where there were
many regions checked again and again despite knowing the pages
contained were not usable and only satisfied the alignment constraints
This case was compounded, especially for large allocations, by the
practice of looping from the top of memory so as to keep out of the
important low-memory regions.  While the old contigmalloc(9) has the
same problem, it is not as noticeable due to looping from the low
memory to high.

This degenerate case is fixed, as well as reversing the sense of the
rest of the loops within it, to provide a tremendous speed increase.
This makes the best case O(n * VM overhead) much more likely than the
worst case O(4 * VM overhead).  For comparison, the worst case for old
contigmalloc would be O(5 * VM overhead) in addition to its strategy
of turning used memory into free being highly pessimal.

Also, fix a bug that in practice most likely couldn't have been triggered,
int the new contigmalloc(9): it walked backwards from the end of memory
without accounting for how many pages it needed.  Potentially, nonexistant
pages could have been mapped.  This hasn't occurred because the kernel
generally requests as its first contigmalloc(9) a single page.

Reported by: Nicolas Dehaine <nicko@stbernard.com>, wes
MFC After: 1 month
More testing by: Nicolas Dehaine <nicko@stbernard.com>, wes
2005-06-11 00:05:16 +00:00
alc
53e95f1eb2 Add a comment to the effect that fictitious pages do not require the
initialization of their machine-dependent fields.
2005-06-10 17:27:54 +00:00
alc
2d109601cb Introduce a procedure, pmap_page_init(), that initializes the
vm_page's machine-dependent fields.  Use this function in
vm_pageq_add_new_page() so that the vm_page's machine-dependent and
machine-independent fields are initialized at the same time.

Remove code from pmap_init() for initializing the vm_page's
machine-dependent fields.

Remove stale comments from pmap_init().

Eliminate the Boolean variable pmap_initialized from the alpha, amd64,
i386, and ia64 pmap implementations.  Its use is no longer required
because of the above changes and earlier changes that result in physical
memory that is being mapped at initialization time being mapped without
pv entries.

Tested by: cognet, kensmith, marcel
2005-06-10 03:33:36 +00:00
alc
6224234587 Update some comments to reflect the change from spl-based to lock-based
synchronization.
2005-05-28 17:56:18 +00:00
ups
acfce18a2a Use low level constructs borrowed from interrupt threads to wait for
work in proc0.
Remove the TDP_WAKEPROC0 workaround.
2005-05-23 23:01:53 +00:00
alc
9c80b49669 Swap in can occur safely without Giant. Release Giant on entry to
scheduler().
2005-05-22 21:06:07 +00:00