File Revisions
kern/imgact_aout.c 1.100
kern/imgact_elf.c 1.167-1.172, 1.175
kern/imgact_gzip.c 1.55
vm/vm_extern.h 1.77
vm/vm_glue.c 1.214
Use sf_buf_alloc() instead of vm_map_find() on exec_map to create
the ephemeral mappings that are used as the source for three copy
operations from kernel space to user space. There are two reasons
for making this change: (1) Under heavy load exec_map can fill up
causing vm_map_find() to fail. When it fails, the nascent process
is aborted (SIGABRT). Whereas, this reimplementation using
sf_buf_alloc() sleeps. (2) Although it is possible to sleep on
vm_map_find()'s failure until address space becomes available (see
kmem_alloc_wait()), using sf_buf_alloc() is faster. Furthermore,
the reimplementation uses a CPU private mapping, avoiding a TLB
shootdown on multiprocessors.
The second argument to vm_map_find() should be NULL instead of 0.
Correct a long-standing problem in elfN_map_insert(): In order to
copy a page to user space, the user space mapping must allow write
access.
Eliminate an unneeded (vm_prot_t) parameter from two functions.
Eliminate unnecessary uses of a local variable.
Maintain the vnode lock throughout elfN_load_file() rather than
releasing it and reacquiring it in vrele(). Consequently, there is
no reason to increase the reference count on the vm object caching
the file's pages.
Eliminate unused parameters to elfN_load_file().
Maintain the lock on the vnode for most of exec_elfN_imgact().
Specifically, it is required for the I/O that may be performed by
elfN_load_section().
Avoid an obscure deadlock in the a.out, elf, and gzip image
activators. Add a comment describing why the deadlock does not
occur in the common case and how it might occur in less usual
circumstances.
Eliminate an unused variable from exec_aout_imgact().
Avoid a vm object reference leak in a rarely used code path.
An executable contains at most one PT_INTERP program header.
Therefore, the loop that searches for it can terminate after it is
found rather than iterating over the entire set of program headers.
Eliminate an unneeded initialization.
Approved by: re (mux)
vn_start_write() must be called without any vnode locks held.
Remove calls to vn_start_write() and vn_finished_write() in
vnode_pager_putpages() and add these calls before the vnode lock
is obtained to most of the callers that don't already have them.
Approved by: re (mux)
vnode_create_vobject() while preserving the binary ABI
to filesystem modules in RELENG_6: introduce a new function
vnode_create_vobject_off() that takes the size argument
as off_t; move all stock file systems to it; re-implement
the old vnode_create_vobject() using vnode_create_vobject_off()
so that old or binary-only FS modules can work w/o hitting the
bug. The trick is to pass a size of 0 to vnode_create_vobject_off()
so that it will call VOP_GETATTR() and thus get the actual,
untruncated file size even if the calling module still uses
the old vnode_create_vobject().
PR: kern/92243
Approved by: re (scottl)
Move execve's access time update functionality into a
new vfs_mark_atime() function, and use the new function
for performing efficient atime updates in mmap().
Consider the zero-copy transmission of a page that was wired by mlock(2).
If a copy-on-write fault occurs on the page, the new copy should inherit
a part of the original page's wire count.
If a physical page is mapped by two or more virtual addresses, transmitted
by the zero-copy sockets method, and written to before the transmission
completes, we need to destroy all of the existing mappings to the page,
not just the one that we fault on. Otherwise, the mappings will no longer
be to the same page and changes made through one of the mappings will not
be visible through the others.
Introduce a new lock for the purpose of synchronizing access to the
UMA boot pages.
Disable recursion on the general UMA lock now that startup_alloc() no
longer uses it.
Eliminate the variable uma_boot_free. It serves no purpose.
Note: This change eliminates a lock-order reversal between a system
map mutex and the UMA lock. See
http://sources.zabbadoz.net/freebsd/lor.html#109 for details.
Add a "show uma" command to DDB, which prints out the current stats for
available UMA zones. Quite useful for post-mortem debugging of memory
leaks without a dump device configured on a panicked box.
cause a kernel compiled with ZERO_COPY_SOCKETS to panic under certain
circumstances:
sys/kern/uipc_cow.c: 1.24 - 1.26
sys/vm/vm_object.c: 1.351
Approved by: re (scottl)
Improve canonicalization of copyrights. Order copyrights by order of
assertion (jeff, bmilekic, rwatson).
Suggested ages ago by: bde
Approved by: re (kensmith)
vm_pager_init() is run before required nswbuf variable has been set
to correct value. This caused system to run with single pbuf available
for vnode_pager. Handle both cluster_pbuf_freecnt and vnode_pbuf_freecnt
variableis in the same way.
Approved by: re (kensmith)
Don't perform a nested include of opt_vmpage.h if LIBMEMSTAT is defined,
as opt_vmpage.h will not be available to user space library builds. A
similar existing check is present for KLD_MODULE for similar reasons.
Approved by: re (hrs)
Wrap inlines in uma_int.h in #ifdef _KERNEL so that uma_int.h can be
used from memstat_uma.c for the purposes of kvm access without lots
of additional unsafe includes.
Approved by: re (hrs)
Rename UMA_MAX_NAME to UTH_MAX_NAME, since it's a maximum in the
monitoring API, which might or might not be the same as the internal
maximum (currently none).
Export flag information on UMA zones -- in particular, whether or
not this is a secondary zone, and so the keg free count should be
considered in that light.
Approved by: re (kensmith)
Further UMA statistics related changes:
- Add a new uma_zfree_internal() flag, ZFREE_STATFREE, which causes it to
to update the zone's uz_frees statistic. Previously, the statistic was
updated unconditionally.
- Use the flag in situations where a "real" free occurs: i.e., one where
the caller is freeing an allocated item, to be differentiated from
situations where uma_zfree_internal() is used to tear down the item
during slab teardown in order to invoke its fini() method. Also use
the flag when UMA is freeing its internal objects.
- When exchanging a bucket with the zone from the per-CPU cache when
freeing an item, flush cache statistics back to the zone (since the
zone lock and critical section are both held) to match the allocation
case.
Approved by: re (kensmith)
Use mp_maxid in preference to MAXCPU when creating exports of UMA
per-CPU cache statistics. UMA sizes the cache array based on the
number of CPUs at boot (mp_maxid + 1), and iterating based on MAXCPU
could read off the end of the array (into the next zone).
Reported by: yongari
Approved by: re (kensmith)
RELENG_6:
Improve canonicalization of copyrights. Order copyrights by order of
assertion (jeff, bmilekic, rwatson).
Suggested ages ago by: bde
Approved by: re (kensmith)
Move the unlocking of the zone mutex in sysctl_vm_zone_stats() so that
it covers the following of the uc_alloc/freebucket cache pointers.
Originally, I felt that the race wasn't helped by holding the mutex,
hence a comment in the code and not holding it across the cache access.
However, it does improve consistency, as while it doesn't prevent
bucket exchange, it does prevent bucket pointer invalidation. So a
race in gathering cache free space statistics still can occur, but not
one that follows an invalid bucket pointer, if the mutex is held.
Submitted by: yongari
Approved by: re (kensmith)
RELENG_6:
Track UMA(9) allocation failures by zone, and export via sysctl.
Requested by: victor cruceru <victor dot cruceru at gmail dot com>
Approved by: re (kensmith)
RELENG_6:
Introduce a new sysctl, vm.zone_stats, which exports UMA(9) allocator
statistics via a binary structure stream:
- Add structure 'uma_stream_header', which defines a stream version,
definition of MAXCPUs used in the stream, and the number of zone
records in the stream.
- Add structure 'uma_type_header', which defines the name, alignment,
size, resource allocation limits, current pages allocated, preferred
bucket size, and central zone + keg statistics.
- Add structure 'uma_percpu_stat', which, for each per-CPU cache,
includes the number of allocations and frees, as well as the number
of free items in the cache.
- When the sysctl is queried, return a stream header, followed by a
series of type descriptions, each consisting of a type header
followed by a series of MAXCPUs uma_percpu_stat structures holding
per-CPU allocation information. Typical values of MAXCPU will be
1 (UP compiled kernel) and 16 (SMP compiled kernel).
This query mechanism allows user space monitoring tools to extract
memory allocation statistics in a machine-readable form, and to do so
at a per-CPU granularity, allowing monitoring of allocation patterns
across CPUs in order to better understand the distribution of work and
memory flow over multiple CPUs.
While here, also export the number of UMA zones as a sysctl
vm.uma_count, in order to assist in sizing user swpace buffers to
receive the stream.
A follow-up commit of libmemstat(3), a library to monitor kernel memory
allocation, will occur in the next few days. This change directly
supports converting netstat(1)'s "-mb" mode to using UMA-sourced stats
rather than separately maintained mbuf allocator statistics.
Approved by: re (kensmith)
In an earlier world order, UMA would flush per-CPU statistics to the
zone whenever it was moving buckets between the zone and the cache,
or when coalescing statistics across the CPU. Remove flushing of
statistics to the zone when coalescing statistics as part of sysctl,
as we won't be running on the right CPU to write to the cache
statistics.
Add a missed gathering of statistics: when uma_zalloc_internal()
does a special case allocation of a single item, make sure to update
the zone statistics to represent this. Previously this case wasn't
accounted for in user-visible statistics.
Approved by: re (kensmith)
many regions checked again and again despite knowing the pages
contained were not usable and only satisfied the alignment constraints
This case was compounded, especially for large allocations, by the
practice of looping from the top of memory so as to keep out of the
important low-memory regions. While the old contigmalloc(9) has the
same problem, it is not as noticeable due to looping from the low
memory to high.
This degenerate case is fixed, as well as reversing the sense of the
rest of the loops within it, to provide a tremendous speed increase.
This makes the best case O(n * VM overhead) much more likely than the
worst case O(4 * VM overhead). For comparison, the worst case for old
contigmalloc would be O(5 * VM overhead) in addition to its strategy
of turning used memory into free being highly pessimal.
Also, fix a bug that in practice most likely couldn't have been triggered,
int the new contigmalloc(9): it walked backwards from the end of memory
without accounting for how many pages it needed. Potentially, nonexistant
pages could have been mapped. This hasn't occurred because the kernel
generally requests as its first contigmalloc(9) a single page.
Reported by: Nicolas Dehaine <nicko@stbernard.com>, wes
MFC After: 1 month
More testing by: Nicolas Dehaine <nicko@stbernard.com>, wes
vm_page's machine-dependent fields. Use this function in
vm_pageq_add_new_page() so that the vm_page's machine-dependent and
machine-independent fields are initialized at the same time.
Remove code from pmap_init() for initializing the vm_page's
machine-dependent fields.
Remove stale comments from pmap_init().
Eliminate the Boolean variable pmap_initialized from the alpha, amd64,
i386, and ia64 pmap implementations. Its use is no longer required
because of the above changes and earlier changes that result in physical
memory that is being mapped at initialization time being mapped without
pv entries.
Tested by: cognet, kensmith, marcel