freebsd-nq/sys/kern
Robert Watson 63a7e0a3f9 Kernel malloc layers malloc_type allocation over one of two underlying
allocators: a set of power-of-two UMA zones for small allocations, and the
VM page allocator for large allocations.  In order to maintain unified
statistics for specific malloc types, kernel malloc maintains a separate
per-type statistics pool, which can be monitored using vmstat -m.  Prior
to this commit, each pool of per-type statistics was protected using a
per-type mutex associated with the malloc type.

This change modifies kernel malloc to maintain per-CPU statistics pools
for each malloc type, and protects writing those statistics using critical
sections.  It also moves to unsynchronized reads of per-CPU statistics
when generating coalesced statistics.  To do this, several changes are
implemented:

- In the previous world order, the statistics memory was allocated by
  the owner of the malloc type structure, allocated statically using
  MALLOC_DEFINE().  This embedded the definition of the malloc_type
  structure into all kernel modules.  Move to a model in which a pointer
  within struct malloc_type points at a UMA-allocated
  malloc_type_internal data structure owned and maintained by
  kern_malloc.c, and not part of the exported ABI/API to the rest of
  the kernel.  For the purposes of easing a possible MFC, re-use an
  existing pointer in 'struct malloc_type', and maintain the current
  malloc_type structure size, as well as layout with respect to the
  fields reused outside of the malloc subsystem (such as ks_shortdesc).
  There are several unused fields as a result of no longer requiring
  the mutex in malloc_type.

- Struct malloc_type_internal contains an array of malloc_type_stats,
  of size MAXCPU.  The structure defined above avoids hard-coding a
  kernel compile-time value of MAXCPU into kernel modules that interact
  with malloc.

- When accessing per-cpu statistics for a malloc type, surround read -
  modify - update requests with critical_enter()/critical_exit() in
  order to avoid races during write.  The per-CPU fields are written
  only from the CPU that owns them.

- Per-CPU stats now maintained "allocated" and "freed" counters for
  number of allocations/frees and bytes allocated/freed, since there is
  no longer a coherent global notion of the totals.  When coalescing
  malloc stats, accept a slight race between reading stats across CPUs,
  and avoid showing the user a negative allocation count for the type
  in the event of a race.  The global high watermark is no longer
  maintained for a malloc type, as there is no global notion of the
  number of allocations.

- While tearing up the sysctl() path, also switch to using sbufs.  The
  current "export as text" sysctl format is retained with the same
  syntax.  We may want to change this in the future to export more
  per-CPU information, such as how allocations and frees are balanced
  across CPUs.

This change results in a substantial speedup of kernel malloc and free
paths on SMP, as critical sections (where usable) out-perform mutexes
due to avoiding atomic/bus-locked operations.  There is also a minor
improvement on UP due to the slightly lower cost of critical sections
there.  The cost of the change to this approach is the loss of a
continuous notion of total allocations that can be exploited to track
per-type high watermarks, as well as increased complexity when
monitoring statistics.

Due to carefully avoiding changing the ABI, as well as hardening the ABI
against future changes, it is not necessary to recompile kernel modules
for this change.  However, MFC'ing this change to RELENG_5 will require
also MFC'ing optimizations for soft critical sections, which may modify
exposed kernel ABIs.  The internal malloc API is changed, and
modifications to vmstat in order to restore "vmstat -m" on core dumps will
follow shortly.

Several improvements from:		bde
Statistics approach discussed with:	ups
Tested by:				scottl, others
2005-05-29 13:38:07 +00:00
..
bus_if.m /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
clock_if.m /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
cpufreq_if.m Introduce a new method, cpufreq_drv_type(), that returns the type of the 2005-02-18 00:23:36 +00:00
device_if.m /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
genassym.sh
imgact_aout.c - Neither of our image formats require Giant now that the vm and vfs have 2005-05-03 10:51:38 +00:00
imgact_elf32.c
imgact_elf64.c
imgact_elf.c Don't set the default of kern.fallback_elf_brand to FreeBSD for arm, as 2005-05-24 22:21:44 +00:00
imgact_gzip.c - Change the vm_mmap() function to accept an objtype_t parameter specifying 2005-04-01 20:00:11 +00:00
imgact_shell.c Change the way options are parsed on the `#!'-line of a shell-script. Instead 2005-05-28 22:42:41 +00:00
inflate.c
init_main.c Add /rescue/init to the default init_path, before /stand/sysinstall. 2005-02-17 10:00:10 +00:00
init_sysent.c Regenerate from syscalls.master. 2005-05-28 14:35:43 +00:00
kern_acct.c When mac_check_system_acct() fails, make sure to unlock as well as close 2005-03-01 08:56:13 +00:00
kern_acl.c Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is 2004-07-26 07:24:04 +00:00
kern_alq.c Modify the alq(9) alq_open() API to accept a file creation mode, rather 2005-04-16 12:12:27 +00:00
kern_clock.c - Define KTR points for KTR_SCHED. 2004-12-26 00:14:21 +00:00
kern_condvar.c Refine the turnstile and sleep queue interfaces just a bit: 2004-10-12 18:36:20 +00:00
kern_conf.c cdev (still) needs per instance uid/gid/mode 2005-03-31 10:29:57 +00:00
kern_context.c
kern_cpu.c Add debugging prints to all the methods in case there are problems with 2005-04-10 19:11:23 +00:00
kern_descrip.c - Use NAMEI to pickup Giant if we need it in fpcheckstd(). 2005-05-03 10:52:22 +00:00
kern_environment.c My addled brains didn't realize that since vtp points into value, we 2005-03-09 12:16:45 +00:00
kern_event.c make stat return an zero'd struct, and be a FIFO again... This is only 2005-05-24 23:42:50 +00:00
kern_exec.c - Initialize vfslocked correctly early enough for MAC to compile. 2005-05-03 16:24:59 +00:00
kern_exit.c Use low level constructs borrowed from interrupt threads to wait for 2005-05-23 23:01:53 +00:00
kern_fork.c Inherit signal mask for child process in fork1(), RELENG_4 and other 2005-04-20 13:14:52 +00:00
kern_idle.c Divorce critical sections from spinlocks. Critical sections as denoted by 2005-04-04 21:53:56 +00:00
kern_intr.c Use PCPU_LAZY_INC() for cnt.v_{intr,trap,syscalls} rather than atomic 2005-04-12 23:18:54 +00:00
kern_jail.c - Use taskqueue_thread rather than taskqueue_swi since our task is going 2005-04-05 08:51:45 +00:00
kern_kse.c Remove thread_upcall_check, it was used to avoid race bug in earlier 2005-05-27 15:57:27 +00:00
kern_kthread.c /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
kern_ktr.c Modify the alq(9) alq_open() API to accept a file creation mode, rather 2005-04-16 12:12:27 +00:00
kern_ktrace.c Make a SYSCTL_NODE static 2005-02-10 12:23:29 +00:00
kern_linker.c Fix panic when module is compiled in and it is loaded from loader.conf. 2005-05-28 23:20:05 +00:00
kern_lock.c - Differentiate two UPGRADE panics so I have a better idea of what's going 2005-04-12 05:43:03 +00:00
kern_lockf.c Print name of device instead of useless major/minor numbers. 2005-03-29 08:13:01 +00:00
kern_mac.c Get the directory structure correct in a comment. 2005-04-22 19:09:12 +00:00
kern_malloc.c Kernel malloc layers malloc_type allocation over one of two underlying 2005-05-29 13:38:07 +00:00
kern_mbuf.c Well, it seems that I pre-maturely removed the "All rights reserved" 2005-02-16 21:45:59 +00:00
kern_mib.c Add a sysctl that records the amount of physical memory in the machine. 2005-02-28 21:42:56 +00:00
kern_module.c Swap the arguments for CP so we copy the correct source and 2005-02-18 22:14:40 +00:00
kern_mtxpool.c Make a bunch of malloc types static. 2005-02-10 12:02:37 +00:00
kern_mutex.c Add additional newline to debug.mutex.prof.stats header, so that 2005-04-08 14:14:09 +00:00
kern_ntptime.c Explicitly acquire Giant around the ntp_gettime() and assert it in the 2005-05-28 14:34:41 +00:00
kern_physio.c /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
kern_pmc.c Do not conditionally compile the contents of this file upon whether 2005-04-20 20:30:59 +00:00
kern_poll.c Remove recently added note about DEVICE_POLLING not working with SMP. 2005-02-25 22:07:51 +00:00
kern_proc.c Add a sysctl that returns the full path of a process' text file. 2005-04-18 02:10:37 +00:00
kern_prot.c Introduce p_canwait() and MAC Framework and MAC Policy entry points 2005-04-18 13:36:57 +00:00
kern_resource.c Stop explicitly touching td_base_pri outside of the scheduler and simply 2004-12-30 20:29:58 +00:00
kern_sema.c /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
kern_shutdown.c - Remove unused include. 2005-04-12 05:45:58 +00:00
kern_sig.c Oops, forgot to update this file. 2005-04-19 08:11:28 +00:00
kern_subr.c /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
kern_switch.c Use low level constructs borrowed from interrupt threads to wait for 2005-05-23 23:01:53 +00:00
kern_sx.c /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
kern_synch.c Use low level constructs borrowed from interrupt threads to wait for 2005-05-23 23:01:53 +00:00
kern_syscalls.c Do a pass over all modules in the kernel and make them return EOPNOTSUPP 2004-07-15 08:26:07 +00:00
kern_sysctl.c Make another bunch of SYSCTL_NODEs static 2005-02-10 12:16:08 +00:00
kern_tc.c s/ENOTTY/ENOIOCTL/ 2005-03-26 20:04:28 +00:00
kern_thr.c Add new syscall thr_new to create thread in atomic, it will 2005-04-23 02:36:07 +00:00
kern_thread.c Remove sleep queue hack, it is no longer needed with current sleep queue. 2005-05-27 04:27:22 +00:00
kern_time.c Implement kern_adjtime(), kern_readv(), kern_sched_rr_get_interval(), 2005-03-31 22:51:18 +00:00
kern_timeout.c When processing a timeout() callout and returning it to the free 2005-02-11 00:14:00 +00:00
kern_umtx.c Allocate umtx_q from heap instead of stack, this avoids 2005-03-05 09:15:03 +00:00
kern_uuid.c /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
kern_xxx.c /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
ksched.c /* -> /*- for license, minor formatting changes 2005-01-07 02:29:27 +00:00
link_elf_obj.c Add support for completing the installation of ELF relocatable 2004-08-29 01:21:51 +00:00
link_elf.c Normalize the VM wiring done with SPARSE_MAPPING: check for errors, and 2004-08-09 18:46:13 +00:00
linker_if.m /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
Make.tags.inc
Makefile
makesyscalls.sh
md4c.c /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
md5c.c MD5Pad() should never have been exposed. 2005-02-10 12:20:42 +00:00
p1003_1b.c Actually commit the code for kern_sched_get_rr_interval(). 2005-03-31 22:54:48 +00:00
posix4_mib.c Back when VOP_* was introduced, we did not have new-style struct 2004-12-01 23:16:38 +00:00
sched_4bsd.c Bring a working snapshot of hwpmc(4), its associated libraries, userland utilities 2005-04-19 04:01:25 +00:00
sched_ule.c Bring a working snapshot of hwpmc(4), its associated libraries, userland utilities 2005-04-19 04:01:25 +00:00
subr_acl_posix1e.c Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is 2004-07-26 07:24:04 +00:00
subr_autoconf.c /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
subr_blist.c Move the definitions of SWAPBLK_NONE and SWAPBLK_MASK from vm_page.h to 2004-06-04 04:03:26 +00:00
subr_bus.c Document that the returned pointer should be freed even if the number 2005-05-20 05:04:22 +00:00
subr_clist.c /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
subr_clock.c /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
subr_devstat.c - Remove two mtx_asserts that can incorrectly trigger if 2005-05-03 10:58:05 +00:00
subr_disk.c /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
subr_eventhandler.c eliminate potential null deref 2005-02-23 19:32:29 +00:00
subr_hints.c Don't set ret_namelen and ret_resnamelen in res_find() unless both the 2005-03-24 21:20:25 +00:00
subr_kdb.c Implement an alternate method to stop CPUs when entering DDB. Normally we use 2005-04-30 20:01:00 +00:00
subr_kobj.c
subr_log.c Use dynamic major number allocation. 2005-02-27 22:02:03 +00:00
subr_mbpool.c /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
subr_mchain.c /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
subr_module.c
subr_msgbuf.c /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
subr_param.c Increase default HZ for sparc64 to 1000. 2005-04-16 15:07:41 +00:00
subr_pcpu.c /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
subr_power.c
subr_prf.c Constify hexdump() harder. 2005-04-06 10:14:13 +00:00
subr_prof.c netchild's mega-patch to isolate compiler dependencies into a central 2005-03-02 21:33:29 +00:00
subr_rman.c If we are going to 2005-05-06 02:50:00 +00:00
subr_rtc.c /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
subr_sbuf.c Make a bunch of malloc types static. 2005-02-10 12:02:37 +00:00
subr_scanf.c Remove advertising clause from University of California Regent's license, 2004-04-05 21:03:37 +00:00
subr_sleepqueue.c Remove thread_upcall_check, it was used to avoid race bug in earlier 2005-05-27 15:57:27 +00:00
subr_smp.c Implement an alternate method to stop CPUs when entering DDB. Normally we use 2005-04-30 20:01:00 +00:00
subr_taskqueue.c o enable shutdown of taskqueue threads; the thread servicing the queue checks 2005-05-01 00:38:11 +00:00
subr_trap.c - Rev 1.83 of kern_lock.c fixes the td_locks assert, reenable it here. 2005-03-28 12:52:46 +00:00
subr_turnstile.c Make a bunch of malloc types static. 2005-02-10 12:02:37 +00:00
subr_unit.c Remove debugging printfs. 2005-03-14 06:51:29 +00:00
subr_witness.c - Define the real lock order with cdev and a few vm/vfs related locks. This 2005-04-22 22:43:31 +00:00
sys_generic.c Implement kern_adjtime(), kern_readv(), kern_sched_rr_get_interval(), 2005-03-31 22:51:18 +00:00
sys_pipe.c Rearrange the kninit calls for both directions of a pipe so that 2005-01-17 07:56:28 +00:00
sys_process.c Add missing cases for PT_SYSCALL. 2005-03-18 21:22:28 +00:00
sys_socket.c Introduce three additional MAC Framework and MAC Policy entry points to 2005-04-16 18:46:29 +00:00
syscalls.c Regenerate from syscalls.master. 2005-05-28 14:35:43 +00:00
syscalls.master Mark ntp_gettime() as MSTD, since its system call path will acquire 2005-05-28 14:35:05 +00:00
sysv_ipc.c /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
sysv_msg.c Add much needed descriptions for a number of the IPC related sysctl OIDs. 2005-02-12 01:22:39 +00:00
sysv_sem.c Remove end-of-line tabs. 2005-04-18 11:51:10 +00:00
sysv_shm.c Actually use the iterating variable in the for loop when trying to avoid 2005-05-12 20:04:48 +00:00
tty_compat.c Put the pre FreeBSD-2.x tty compat code under BURN_BRIDGES. 2004-06-21 22:57:16 +00:00
tty_conf.c Preparation commit for the tty cleanups that will follow in the near 2004-07-15 20:47:41 +00:00
tty_cons.c Use dynamic major number allocation for /dev/console, there is no 2005-02-27 21:52:42 +00:00
tty_pty.c Explicitly hold a reference to the cdev we have just cloned. This 2005-03-31 12:19:44 +00:00
tty_subr.c /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
tty_tty.c Explicitly hold a reference to the cdev we have just cloned. This 2005-03-31 12:19:44 +00:00
tty.c According to the comment in struct tty, t_modem is optional; hence we should 2005-04-13 13:56:17 +00:00
uipc_accf.c Move the logic implementing retrieval of the SO_ACCEPTFILTER socket option 2005-03-12 12:57:18 +00:00
uipc_cow.c /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
uipc_domain.c /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
uipc_mbuf2.c Make a bunch of malloc types static. 2005-02-10 12:02:37 +00:00
uipc_mbuf.c Change m_uiotombuf so it will accept offset at which data should be copied 2005-05-04 18:55:03 +00:00
uipc_proto.c Remove advertising clause from University of California Regent's license, 2004-04-05 21:03:37 +00:00
uipc_sem.c Introduce MAC Framework and MAC Policy entry points to label and control 2005-05-04 10:39:15 +00:00
uipc_sockbuf.c In the current world order, each socket has two mutexes: a mutex 2005-05-27 17:16:43 +00:00
uipc_socket2.c In the current world order, each socket has two mutexes: a mutex 2005-05-27 17:16:43 +00:00
uipc_socket.c Move the logic implementing retrieval of the SO_ACCEPTFILTER socket option 2005-03-12 12:57:18 +00:00
uipc_syscalls.c Change m_uiotombuf so it will accept offset at which data should be copied 2005-05-04 18:55:03 +00:00
uipc_usrreq.c Fix two issues which were missed in FreeBSD-SA-05:08.kmem. 2005-05-07 00:41:36 +00:00
vfs_acl.c Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is 2004-07-26 07:24:04 +00:00
vfs_aio.c - Acquire Giant in AIO's iodone routine. VFS will no longer do it for us 2005-04-30 11:27:31 +00:00
vfs_bio.c - Remove long dead splbio() calls and comments relating to the old 2005-04-30 12:18:50 +00:00
vfs_cache.c - Change all filesystems and vfs_cache to relock the dvp once the child is 2005-04-13 10:59:09 +00:00
vfs_cluster.c Revert revision 1.164: pmap_qremove() does not require protection by 2005-05-14 05:09:11 +00:00
vfs_default.c - Remove unnecessary spls. 2005-05-01 00:59:34 +00:00
vfs_export.c Handle theoretical case of vfs_export being called with both MNT_DELEXPORT and 2005-05-11 18:25:42 +00:00
vfs_extattr.c Acquire Giant explicitly in quotactl() so that the syscalls.master 2005-05-28 13:11:35 +00:00
vfs_hash.c Fix bug in vfs_hash_rehash(): use correct bucket. This only affected 2005-04-07 07:54:08 +00:00
vfs_init.c Remove VFS_START(). Its original purpose involved the mfs filesystem, 2005-02-20 23:02:20 +00:00
vfs_lookup.c - Remove a debugging printf that slipped in. 2005-04-13 23:36:28 +00:00
vfs_mount.c devfs_first() return value isn't used, remove it. 2005-05-18 22:05:12 +00:00
vfs_subr.c If we are going to 2005-05-06 02:50:00 +00:00
vfs_syscalls.c Acquire Giant explicitly in quotactl() so that the syscalls.master 2005-05-28 13:11:35 +00:00
vfs_vnops.c - Stop checking vxthread, we've asserted that it was useless for several 2005-04-27 09:17:11 +00:00
vnode_if.src - Mark the VOPs that require exclusive locks. Those that aren't marked 2005-04-11 15:19:29 +00:00