Commit Graph

2152 Commits

Author SHA1 Message Date
rwatson
37cd630e38 Use mp_maxid in preference to MAXCPU when creating exports of UMA
per-CPU cache statistics.  UMA sizes the cache array based on the
number of CPUs at boot (mp_maxid + 1), and iterating based on MAXCPU
could read off the end of the array (into the next zone).

Reported by:	yongari
MFC after:	1 week
2005-07-16 11:03:06 +00:00
rwatson
4ead3022dd Improve canonicalization of copyrights. Order copyrights by order of
assertion (jeff, bmilekic, rwatson).

Suggested ages ago by:	bde
MFC after:		1 week
2005-07-16 09:51:52 +00:00
rwatson
18ba9c6401 Move the unlocking of the zone mutex in sysctl_vm_zone_stats() so that
it covers the following of the uc_alloc/freebucket cache pointers.
Originally, I felt that the race wasn't helped by holding the mutex,
hence a comment in the code and not holding it across the cache access.
However, it does improve consistency, as while it doesn't prevent
bucket exchange, it does prevent bucket pointer invalidation.  So a
race in gathering cache free space statistics still can occur, but not
one that follows an invalid bucket pointer, if the mutex is held.

Submitted by:	yongari
MFC after:	1 week
2005-07-16 09:40:34 +00:00
silby
24f7a1c3d6 Increase the flags field for kegs from a 16 to a 32 bit value;
we have exhausted all 16 flags.
2005-07-16 02:23:41 +00:00
rwatson
87bbce9e08 Track UMA(9) allocation failures by zone, and export via sysctl.
Requested by:	victor cruceru <victor dot cruceru at gmail dot com>
MFC after:	1 week
2005-07-15 23:34:39 +00:00
jhb
841c5ac424 Convert a remaining !fs.map->system_map to
fs.first_object->flags & OBJ_NEEDGIANT test that was missed in an earlier
revision.  This fixes mutex assertion failures in the debug.mpsafevm=0
case.

Reported by:	ps
MFC after:	3 days
2005-07-14 21:18:07 +00:00
rwatson
83343a94ec Introduce a new sysctl, vm.zone_stats, which exports UMA(9) allocator
statistics via a binary structure stream:

- Add structure 'uma_stream_header', which defines a stream version,
  definition of MAXCPUs used in the stream, and the number of zone
  records in the stream.

- Add structure 'uma_type_header', which defines the name, alignment,
  size, resource allocation limits, current pages allocated, preferred
  bucket size, and central zone + keg statistics.

- Add structure 'uma_percpu_stat', which, for each per-CPU cache,
  includes the number of allocations and frees, as well as the number
  of free items in the cache.

- When the sysctl is queried, return a stream header, followed by a
  series of type descriptions, each consisting of a type header
  followed by a series of MAXCPUs uma_percpu_stat structures holding
  per-CPU allocation information.  Typical values of MAXCPU will be
  1 (UP compiled kernel) and 16 (SMP compiled kernel).

This query mechanism allows user space monitoring tools to extract
memory allocation statistics in a machine-readable form, and to do so
at a per-CPU granularity, allowing monitoring of allocation patterns
across CPUs in order to better understand the distribution of work and
memory flow over multiple CPUs.

While here, also export the number of UMA zones as a sysctl
vm.uma_count, in order to assist in sizing user swpace buffers to
receive the stream.

A follow-up commit of libmemstat(3), a library to monitor kernel memory
allocation, will occur in the next few days.  This change directly
supports converting netstat(1)'s "-mb" mode to using UMA-sourced stats
rather than separately maintained mbuf allocator statistics.

MFC after:	1 week
2005-07-14 16:35:13 +00:00
rwatson
c24543fa50 In addition to tracking allocs in the zone, also track frees. Add
a zone free counter, as well as a cache free counter.

MFC after:	1 week
2005-07-14 16:17:21 +00:00
rwatson
3f3682a4b8 In an earlier world order, UMA would flush per-CPU statistics to the
zone whenever it was moving buckets between the zone and the cache,
or when coalescing statistics across the CPU.  Remove flushing of
statistics to the zone when coalescing statistics as part of sysctl,
as we won't be running on the right CPU to write to the cache
statistics.

Add a missed gathering of statistics: when uma_zalloc_internal()
does a special case allocation of a single item, make sure to update
the zone statistics to represent this.  Previously this case wasn't
accounted for in user-visible statistics.

MFC after:	1 week
2005-07-14 16:13:46 +00:00
silby
64582f3995 Change the panic in trash_ctor into just a printf for now. Once the reports
of panics in trash_ctor relating to mbufs have been examined and a fix
found, this will be turned back into a panic.

Approved by: re (rwatson)
2005-06-26 23:44:07 +00:00
alc
67602b23a9 Increase UMA_BOOT_PAGES to prevent a crash during initialization. See
http://docs.FreeBSD.org/cgi/mid.cgi?42AD8270.8060906 for a detailed
description of the crash.

Reported by: Eric Anderson
Approved by: re (scottl)
MFC after: 3 days
2005-06-16 17:06:34 +00:00
green
3bb055500e The new contigmalloc(9) has a bad degenerate case where there were
many regions checked again and again despite knowing the pages
contained were not usable and only satisfied the alignment constraints
This case was compounded, especially for large allocations, by the
practice of looping from the top of memory so as to keep out of the
important low-memory regions.  While the old contigmalloc(9) has the
same problem, it is not as noticeable due to looping from the low
memory to high.

This degenerate case is fixed, as well as reversing the sense of the
rest of the loops within it, to provide a tremendous speed increase.
This makes the best case O(n * VM overhead) much more likely than the
worst case O(4 * VM overhead).  For comparison, the worst case for old
contigmalloc would be O(5 * VM overhead) in addition to its strategy
of turning used memory into free being highly pessimal.

Also, fix a bug that in practice most likely couldn't have been triggered,
int the new contigmalloc(9): it walked backwards from the end of memory
without accounting for how many pages it needed.  Potentially, nonexistant
pages could have been mapped.  This hasn't occurred because the kernel
generally requests as its first contigmalloc(9) a single page.

Reported by: Nicolas Dehaine <nicko@stbernard.com>, wes
MFC After: 1 month
More testing by: Nicolas Dehaine <nicko@stbernard.com>, wes
2005-06-11 00:05:16 +00:00
alc
53e95f1eb2 Add a comment to the effect that fictitious pages do not require the
initialization of their machine-dependent fields.
2005-06-10 17:27:54 +00:00
alc
2d109601cb Introduce a procedure, pmap_page_init(), that initializes the
vm_page's machine-dependent fields.  Use this function in
vm_pageq_add_new_page() so that the vm_page's machine-dependent and
machine-independent fields are initialized at the same time.

Remove code from pmap_init() for initializing the vm_page's
machine-dependent fields.

Remove stale comments from pmap_init().

Eliminate the Boolean variable pmap_initialized from the alpha, amd64,
i386, and ia64 pmap implementations.  Its use is no longer required
because of the above changes and earlier changes that result in physical
memory that is being mapped at initialization time being mapped without
pv entries.

Tested by: cognet, kensmith, marcel
2005-06-10 03:33:36 +00:00
alc
6224234587 Update some comments to reflect the change from spl-based to lock-based
synchronization.
2005-05-28 17:56:18 +00:00
ups
acfce18a2a Use low level constructs borrowed from interrupt threads to wait for
work in proc0.
Remove the TDP_WAKEPROC0 workaround.
2005-05-23 23:01:53 +00:00
alc
9c80b49669 Swap in can occur safely without Giant. Release Giant on entry to
scheduler().
2005-05-22 21:06:07 +00:00
alc
3b49968234 Remove GIANT_REQUIRED from swapout_procs(). 2005-05-22 00:30:50 +00:00
alc
b351552abd Reduce the number of times that we acquire and release locks in
swap_pager_getpages().

MFC after: 1 week
2005-05-20 21:26:05 +00:00
alc
45eab788de Remove calls to spl*(). 2005-05-19 06:11:13 +00:00
alc
eee15b6b76 Remove a stale comment concerning spl* usage. 2005-05-19 03:53:07 +00:00
alc
c7cb7f7317 Update some comments to reflect the change from spl-based to lock-based
synchronization.
2005-05-18 22:08:52 +00:00
alc
1b7ce7e514 Remove calls to spl*(). 2005-05-18 20:45:33 +00:00
alc
0e9f42833e Revert revision 1.270: swp_pager_async_iodone() need not perform
VM_LOCK_GIANT().

Discussed with: jeff
2005-05-18 17:48:04 +00:00
bz
b543d49d86 Correct 32 vs 64 bit signedness issues.
Approved by:	pjd (mentor)
MFC after:	2 weeks
2005-05-18 08:57:31 +00:00
grehan
a442ec4d3f The final test in unlock_and_deallocate() to determine if GIANT needs to be
unlocked wasn't updated to check for OBJ_NEEDGIANT. This caused a WITNESS
panic when debug_mpsafevm was set to 0.

Approved by:	jeffr
2005-05-12 04:09:41 +00:00
marcel
9be5f7d46a Enable debug_mpsafevm on ia64 due to the severe functional regression
caused by recent locking changes when it's off. Revert the logic to
trim down the conditional.

Clued-in by: alc@
2005-05-08 23:56:16 +00:00
jeff
95489bf6b4 - We need to inhert the OBJ_NEEDGIANT flag from the original object in
vm_object_split().

Spotted by:	alc
2005-05-04 20:54:16 +00:00
jeff
d62d255d2e - Add a new object flag "OBJ_NEEDSGIANT". We set this flag if the
underlying vnode requires Giant.
 - In vm_fault only acquire Giant if the underlying object has NEEDSGIANT
   set.
 - In vm_object_shadow inherit the NEEDSGIANT flag from the backing object.
2005-05-03 11:11:26 +00:00
alc
f1dec39efb Remove GIANT_REQUIRED from vmspace_exec().
Prodded by: jeff
2005-05-02 07:05:20 +00:00
jeff
5adae6c622 - VM_LOCK_GIANT in the swap pager's iodone routine as VFS will soon call it
without Giant.

Sponsored by:	Isilon Systems, Inc.
2005-04-30 11:25:49 +00:00
rwatson
bb1e0b257a Modify UMA to use critical sections to protect per-CPU caches, rather than
mutexes, which offers lower overhead on both UP and SMP.  When allocating
from or freeing to the per-cpu cache, without INVARIANTS enabled, we now
no longer perform any mutex operations, which offers a 1%-3% performance
improvement in a variety of micro-benchmarks.  We rely on critical
sections to prevent (a) preemption resulting in reentrant access to UMA on
a single CPU, and (b) migration of the thread during access.  In the event
we need to go back to the zone for a new bucket, we release the critical
section to acquire the global zone mutex, and must re-acquire the critical
section and re-evaluate which cache we are accessing in case migration has
occured, or circumstances have changed in the current cache.

Per-CPU cache statistics are now gathered lock-free by the sysctl, which
can result in small races in statistics reporting for caches.

Reviewed by:	bmilekic, jeff (somewhat)
Tested by:	rwatson, kris, gnn, scottl, mike at sentex dot net, others
2005-04-29 18:56:36 +00:00
jeff
f869be5c72 - Pass the ISOPEN flag to namei so filesystems will know we're about to
open them or otherwise access the data.
2005-04-27 09:05:19 +00:00
kris
69fffa3c93 Add the vm.exec_map_entries tunable and read-only sysctl, which controls
the number of entries in exec_map (maximum number of simultaneous execs
that can be handled by the kernel).  The default value of 16 is
insufficient on heavily loaded machines (particularly SMP machines), and
if it is exceeded then executing further processes will generate a SIGABRT.

This is a workaround until a better solution can be implemented.

Reviewed by:	alc
MFC after:	3 days
2005-04-25 19:22:05 +00:00
des
9900fad82d Unbreak the build on 64-bit architectures. 2005-04-16 12:37:16 +00:00
jhb
de9a1fa207 Add a vm.blacklist tunable which can hold a space or comma seperated list
of physical addresses.  The pages containing these physical addresses will
not be added to the free list and thus will effectively be ignored by the
VM system.  This is mostly useful for the case when one knows of specific
physical addresses that have bit errors (such as from a memtest run) so
that one can blacklist the bad pages while waiting for the new sticks of
RAM to arrive.  The physical addresses of any ignored pages are listed in
the message buffer as well.
2005-04-15 21:45:02 +00:00
csjp
e89e83d7fe Move MAC check_vnode_mmap entry point out from being exclusive to
MAP_SHARED so that the entry point gets executed un-conditionally.
This may be useful for security policies which want to perform access
control checks around run-time linking.

-add the mmap(2) flags argument to the check_vnode_mmap entry point
 so that we can make access control decisions based on the type of
 mapped object.
-update any dependent API around this parameter addition such as
 function prototype modifications, entry point parameter additions
 and the inclusion of sys/mman.h header file.
-Change the MLS, BIBA and LOMAC security policies so that subject
 domination routines are not executed unless the type of mapping is
 shared. This is done to maintain compatibility between the old
 vm_mmap_vnode(9) and these policies.

Reviewed by:	rwatson
MFC after:	1 month
2005-04-14 16:03:30 +00:00
jhb
e62b86cdc5 Tidy vcnt() by moving a duplicated line above #ifdef and removing a useless
variable.
2005-04-12 23:15:28 +00:00
jhb
13114caff8 Flip the switch and turn mpsafevm on by default for sparc64.
Approved by:	alc
2005-04-04 20:59:02 +00:00
jeff
0eef91eae9 - Don't NULL the vnode's v_object pointer until after the object is torn
down.  If we have dirty pages, the putpages routine will need to know
   what the vnode's object is so that it may write out dirty pages.

Pointy hat:	phk
Found by:	obrien
2005-04-03 22:56:58 +00:00
jhb
a3c6b782c3 - Change the vm_mmap() function to accept an objtype_t parameter specifying
the type of object represented by the handle argument.
- Allow vm_mmap() to map device memory via cdev objects in addition to
  vnodes and anonymous memory.  Note that mmaping a cdev directly does not
  currently perform any MAC checks like mapping a vnode does.
- Unbreak the DRM getbufs ioctl by having it call vm_mmap() directly on the
  cdev the ioctl is acting on rather than trying to find a suitable vnode
  to map from.

Reviewed by:	alc, arch@
2005-04-01 20:00:11 +00:00
jeff
97c40ebd49 - LK_NOPAUSE is a nop now.
Sponsored by:   Isilon Systems, Inc.
2005-03-31 04:37:09 +00:00
alc
75d6caaaf0 Eliminate (now) unnecessary acquisition and release of the global page
queues lock in vm_object_backing_scan().  Updates to the page's PG_BUSY
flag and busy field are synchronized by the containing object's lock.

Testing the page's hold_count and wire_count in vm_object_backing_scan()'s
OBSC_COLLAPSE_NOWAIT case is unnecessary.  There is no reason why the held
or wired pages cannot be migrated to the shadow object.

Reviewed by: tegge
2005-03-30 05:40:02 +00:00
das
fb7ab34022 Move the swap_zone == NULL check earlier (i.e. before we dereference
the pointer.)

Found by:	Coverity Prevent analysis tool
2005-03-18 21:22:48 +00:00
jeff
92d24b8044 - Don't lock the vnode interlock in vm_object_set_writeable_dirty() if
we've already set the object flags.

Reviewed by:	alc
2005-03-17 12:03:42 +00:00
jeff
d1b34cc38c - In vm_page_insert() hold the backing vnode when the first page
is inserted.
 - In vm_page_remove() drop the backing vnode when the last page
   is removed.
 - Don't check the vnode to see if it must be reclaimed on every
   call to vm_page_free_toq() as we only check it now when it is
   actually required.  This saves us two lock operations per call.

Sponsored by:	Isilon Systems, Inc.
2005-03-15 14:14:09 +00:00
jeff
41fb0028e9 - Don't directly adjust v_usecount, use vref() instead.
Sponsored by:	Isilon Systems, Inc.
2005-03-14 09:03:19 +00:00
jeff
eaf9deaeb7 - Retire OLOCK and OWANT. All callers hold the vnode lock when creating
a vnode object.  There has been an assert to prove this for some time.

Sponsored by:	Isilon Systems, Inc.
2005-03-14 07:29:40 +00:00
jeff
0c26372161 - Don't acquire the vnode lock in destroy_vobject, assert that it has
already been acquired by the caller.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 12:05:05 +00:00
alc
2b424cf256 Revert the first part of revision 1.114 and modify the second part. On
architectures implementing uma_small_alloc() pages do not necessarily
belong to the kmem object.
2005-02-24 06:13:01 +00:00