Commit Graph

1243 Commits

Author SHA1 Message Date
Julian Elischer
e602ba25fd Part 1 of KSE-III
The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by:	Almost everyone who counts
	(at various times, peter, jhb, matt, alfred, mini, bernd,
	and a cast of thousands)

	NOTE: this is still Beta code, and contains lots of debugging stuff.
	expect slight instability in signals..
2002-06-29 17:26:22 +00:00
Ian Dowse
23f09d50bb Avoid using the 64-bit vm_pindex_t in a few places where 64-bit
types are not required, as the overhead is unnecessary:

 o In the i386 pmap_protect(), `sindex' and `eindex' represent page
   indices within the 32-bit virtual address space.
 o In swp_pager_meta_build() and swp_pager_meta_ctl(), use a temporary
   variable to store the low few bits of a vm_pindex_t that gets used
   as an array index.
 o vm_uiomove() uses `osize' and `idx' for page offsets within a
   map entry.
 o In vm_object_split(), `idx' is a page offset within a map entry.
2002-06-26 20:32:51 +00:00
Ian Dowse
5125fe4f45 Use an explicit cast to avoid relying on sign extension to do the
right thing in code such as `vm_pindex_t x = ~SWAP_META_MASK'.

Reviewed by:	dillon
2002-06-26 19:18:14 +00:00
Kenneth D. Merry
98cb733c67 At long last, commit the zero copy sockets code.
MAKEDEV:	Add MAKEDEV glue for the ti(4) device nodes.

ti.4:		Update the ti(4) man page to include information on the
		TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options,
		and also include information about the new character
		device interface and the associated ioctls.

man9/Makefile:	Add jumbo.9 and zero_copy.9 man pages and associated
		links.

jumbo.9:	New man page describing the jumbo buffer allocator
		interface and operation.

zero_copy.9:	New man page describing the general characteristics of
		the zero copy send and receive code, and what an
		application author should do to take advantage of the
		zero copy functionality.

NOTES:		Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS,
		TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT.

conf/files:	Add uipc_jumbo.c and uipc_cow.c.

conf/options:	Add the 5 options mentioned above.

kern_subr.c:	Receive side zero copy implementation.  This takes
		"disposable" pages attached to an mbuf, gives them to
		a user process, and then recycles the user's page.
		This is only active when ZERO_COPY_SOCKETS is turned on
		and the kern.ipc.zero_copy.receive sysctl variable is
		set to 1.

uipc_cow.c:	Send side zero copy functions.  Takes a page written
		by the user and maps it copy on write and assigns it
		kernel virtual address space.  Removes copy on write
		mapping once the buffer has been freed by the network
		stack.

uipc_jumbo.c:	Jumbo disposable page allocator code.  This allocates
		(optionally) disposable pages for network drivers that
		want to give the user the option of doing zero copy
		receive.

uipc_socket.c:	Add kern.ipc.zero_copy.{send,receive} sysctls that are
		enabled if ZERO_COPY_SOCKETS is turned on.

		Add zero copy send support to sosend() -- pages get
		mapped into the kernel instead of getting copied if
		they meet size and alignment restrictions.

uipc_syscalls.c:Un-staticize some of the sf* functions so that they
		can be used elsewhere.  (uipc_cow.c)

if_media.c:	In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid
		calling malloc() with M_WAITOK.  Return an error if
		the M_NOWAIT malloc fails.

		The ti(4) driver and the wi(4) driver, at least, call
		this with a mutex held.  This causes witness warnings
		for 'ifconfig -a' with a wi(4) or ti(4) board in the
		system.  (I've only verified for ti(4)).

ip_output.c:	Fragment large datagrams so that each segment contains
		a multiple of PAGE_SIZE amount of data plus headers.
		This allows the receiver to potentially do page
		flipping on receives.

if_ti.c:	Add zero copy receive support to the ti(4) driver.  If
		TI_PRIVATE_JUMBOS is not defined, it now uses the
		jumbo(9) buffer allocator for jumbo receive buffers.

		Add a new character device interface for the ti(4)
		driver for the new debugging interface.  This allows
		(a patched version of) gdb to talk to the Tigon board
		and debug the firmware.  There are also a few additional
		debugging ioctls available through this interface.

		Add header splitting support to the ti(4) driver.

		Tweak some of the default interrupt coalescing
		parameters to more useful defaults.

		Add hooks for supporting transmit flow control, but
		leave it turned off with a comment describing why it
		is turned off.

if_tireg.h:	Change the firmware rev to 12.4.11, since we're really
		at 12.4.11 plus fixes from 12.4.13.

		Add defines needed for debugging.

		Remove the ti_stats structure, it is now defined in
		sys/tiio.h.

ti_fw.h:	12.4.11 firmware.

ti_fw2.h:	12.4.11 firmware, plus selected fixes from 12.4.13,
		and my header splitting patches.  Revision 12.4.13
		doesn't handle 10/100 negotiation properly.  (This
		firmware is the same as what was in the tree previously,
		with the addition of header splitting support.)

sys/jumbo.h:	Jumbo buffer allocator interface.

sys/mbuf.h:	Add a new external mbuf type, EXT_DISPOSABLE, to
		indicate that the payload buffer can be thrown away /
		flipped to a userland process.

socketvar.h:	Add prototype for socow_setup.

tiio.h:		ioctl interface to the character portion of the ti(4)
		driver, plus associated structure/type definitions.

uio.h:		Change prototype for uiomoveco() so that we'll know
		whether the source page is disposable.

ufs_readwrite.c:Update for new prototype of uiomoveco().

vm_fault.c:	In vm_fault(), check to see whether we need to do a page
		based copy on write fault.

vm_object.c:	Add a new function, vm_object_allocate_wait().  This
		does the same thing that vm_object allocate does, except
		that it gives the caller the opportunity to specify whether
		it should wait on the uma_zalloc() of the object structre.

		This allows vm objects to be allocated while holding a
		mutex.  (Without generating WITNESS warnings.)

		vm_object_allocate() is implemented as a call to
		vm_object_allocate_wait() with the malloc flag set to
		M_WAITOK.

vm_object.h:	Add prototype for vm_object_allocate_wait().

vm_page.c:	Add page-based copy on write setup, clear and fault
		routines.

vm_page.h:	Add page based COW function prototypes and variable in
		the vm_page structure.

Many thanks to Drew Gallatin, who wrote the zero copy send and receive
code, and to all the other folks who have tested and reviewed this code
over the years.
2002-06-26 03:37:47 +00:00
Matthew Dillon
a69ac1740f Enforce RLIMIT_VMEM on growable mappings (aka the primary stack or any
MAP_STACK mapping).

Suggested by:	alc
2002-06-26 03:13:46 +00:00
Matthew Dillon
070f64fe6f Part I of RLIMIT_VMEM implementation. Implement core functionality for
a new resource limit that covers a process's entire VM space, including
mmap()'d space.

(Part II will be additional code to check RLIMIT_VMEM during exec() but it
needs more fleshing out).

PR:		kern/18209
Submitted by:	Andrey Alekseyev <uitm@zenon.net>, Dmitry Kim <jason@nichego.net>
MFC after:	7 days
2002-06-26 00:29:28 +00:00
Ian Dowse
6395da5437 Complete the initial set of VM changes required to support full
64-bit file sizes. This step simply addresses the remaining overflows,
and does attempt to optimise performance. The details are:

 o Use a 64-bit type for the vm_object `size' and the size argument
   to vm_object_allocate().
 o Use the correct type for index variables in dev_pager_getpages(),
   vm_object_page_clean() and vm_object_page_remove().
 o Avoid an overflow in the i386 pmap_object_init_pt().
2002-06-25 22:14:06 +00:00
Jeff Roberson
e78f35b33f Turn VM_ALLOC_ZERO into a flag.
Submitted by:	tegge
Reviewed by:	dillon
2002-06-25 22:01:12 +00:00
Jeff Roberson
5c0e403ba2 Reduce the amount of code that runs with the zone lock held in slab_zalloc().
This allows us to run the zone initialization functions without any locks held.
2002-06-25 21:04:50 +00:00
Alan Cox
366838ddfe o Eliminate vmspace::vm_minsaddr. It's initialized but never used.
o Replace stale comments in vmspace by "const until freed" annotations
   on some fields.
2002-06-25 18:14:38 +00:00
Alan Cox
848d14193d o Remove GIANT_REQUIRED from kmem_alloc_pageable(), kmem_alloc_nofault(),
and kmem_free().  (Annotate as MPSAFE.)
 o Remove incorrect casts from kmem_alloc_pageable() and kmem_alloc_nofault().
2002-06-23 18:07:40 +00:00
Alan Cox
2cd301d1e1 o Remove the unnecessary acquisition and release of Giant around fdrop()
in mmap(2).
2002-06-23 01:48:22 +00:00
Alan Cox
c04c996b25 o Reduce the scope of Giant in vm_mmap() to just the code that manipulates
a vnode.  (Thus, MAP_ANON and MAP_STACK never acquire Giant.)
2002-06-22 19:13:56 +00:00
Alan Cox
c8664f82a5 o Replace mtx_assert(&Giant, MA_OWNED) in dev_pager_alloc()
with the acquisition and release of Giant.  (Annotate as MPSAFE.)
 o Reorder the sanity checks in dev_pager_alloc() to reduce
   the time that Giant is held.
2002-06-22 18:36:51 +00:00
Alan Cox
409748276e o In vm_map_insert(), replace GIANT_REQUIRED by the acquisition and
release of Giant around the direct manipulation of the vm_object and
   the optional call to pmap_object_init_pt().
 o In vm_map_findspace(), remove GIANT_REQUIRED.  Instead, acquire and
   release Giant around the occasional call to pmap_growkernel().
 o In vm_map_find(), remove GIANT_REQUIRED.
2002-06-22 17:47:12 +00:00
Alan Cox
24c46d036d o Replace GIANT_REQUIRED in swap_pager_alloc() by the acquisition and
release of Giant.  (Annotate as MPSAFE.)
2002-06-22 08:03:21 +00:00
Alan Cox
2a1618cd59 o Remove GIANT_REQUIRED from phys_pager_alloc(). If handle isn't NULL,
acquire and release Giant.  If handle is NULL, Giant isn't needed.
 o Annotate phys_pager_alloc() and phys_pager_dealloc() as MPSAFE.
2002-06-22 07:54:42 +00:00
Alan Cox
990ab7add4 o Replace GIANT_REQUIRED in vnode_pager_alloc() by the acquisition and
release of Giant.  (Annotate as MPSAFE.)
 o Also, in vnode_pager_alloc(), remove an unnecessary re-initialization
   of struct vm_object::flags and move a statement that is duplicated
   in both branches of an if-else.
2002-06-22 07:28:06 +00:00
Alan Cox
43a90f3a1b o Remove GIANT_REQUIRED from vslock().
o Annotate kernacc(), useracc(), and vslock() as MPSAFE.

Motivated by:	alfred
2002-06-22 01:26:02 +00:00
Alan Cox
27168693db o Remove GIANT_REQUIRED from vm_map_stack(). 2002-06-21 06:03:47 +00:00
Alan Cox
7942194583 o Remove GIANT_REQUIRED from vm_pager_allocate() and vm_pager_deallocate(). 2002-06-21 05:04:56 +00:00
Alan Cox
3d66f1384e o Remove an incorrect cast from obreak(). This cast would,
for example, break an sbrk(>=4GB) on 64-bit architectures
   even if the resource limit allowed it.
 o Correct an off-by-one error.
 o Correct a spelling error in a comment.
 o Reorder an && expression so that the commonly FALSE expression
   comes first.

Submitted by:	bde (bullets 1 and 2)
2002-06-20 18:38:28 +00:00
Alan Cox
5375be1861 o Acquire and release the vm_map lock instead of Giant in obreak().
Consequently, use vm_map_insert() and vm_map_delete(), which expect
   the vm_map to be locked, instead of vm_map_find() and vm_map_remove(),
   which do not.
2002-06-20 02:04:55 +00:00
Jeff Roberson
1e081f889b - Move the computation of pflags out of the page allocation loop in
kmem_malloc()
- zero fill pages if PG_ZERO bit is not set after allocation in kmem_malloc()

Suggested by: alc, jake
2002-06-19 23:49:57 +00:00
Jeff Roberson
3370c5bfd7 - Remove bogus use of kmem_alloc that was inherited from the old zone
allocator.
- Properly set M_ZERO when talking to the back end page allocators for
  non malloc zones.  This forces us to zero fill pages when they are first
  brought into a cache.
- Properly handle M_ZERO in uma_zalloc_internal.  This fixes a problem where
  per cpu buckets weren't always getting zeroed.
2002-06-19 20:49:44 +00:00
Jeff Roberson
95f24639b7 Teach kmem_malloc about M_ZERO. 2002-06-19 20:47:18 +00:00
Alan Cox
00e1854a1f o Replace GIANT_REQUIRED in vm_object_coalesce() by the acquisition and
release of Giant.
 o Reduce the scope of GIANT_REQUIRED in vm_map_insert().

These changes will enable us to remove the acquisition and release
of Giant from obreak().
2002-06-19 06:02:03 +00:00
Alan Cox
515630b12f o Remove LK_CANRECURSE from the vm_map lock. 2002-06-18 18:31:35 +00:00
Jeff Roberson
4741dcbff5 Honor the BUCKETCACHE flag on free as well. 2002-06-17 23:53:58 +00:00
Jeff Roberson
18aa2de5a7 - Introduce the new M_NOVM option which tells uma to only check the currently
allocated slabs and bucket caches for free items.  It will not go ask the vm
  for pages.  This differs from M_NOWAIT in that it not only doesn't block, it
  doesn't even ask.

- Add a new zcreate option ZONE_VM, that sets the BUCKETCACHE zflag.  This
  tells uma that it should only allocate buckets out of the bucket cache, and
  not from the VM.  It does this by using the M_NOVM option to zalloc when
  getting a new bucket.  This is so that the VM doesn't recursively enter
  itself while trying to allocate buckets for vm_map_entry zones.  If there
  are already allocated buckets when we get here we'll still use them but
  otherwise we'll skip it.

- Use the ZONE_VM flag on vm map entries and pv entries on x86.
2002-06-17 22:02:41 +00:00
Alan Cox
b49ecb86d0 o Acquire and release Giant in vm_map_wakeup() to prevent
a lost wakeup().

Reviewed by:	tegge
2002-06-17 13:27:40 +00:00
Alan Cox
042bb29940 o Remove GIANT_REQUIRED from vm_fault_user_wire().
o Move pmap_pageable() outside of Giant in vm_fault_unwire().
   (pmap_pageable() is a no-op on all supported architectures.)
 o Remove the acquisition and release of Giant from mlock().
2002-06-16 20:42:29 +00:00
Alan Cox
319490fb7b o Remove GIANT_REQUIRED from useracc() and vsunlock(). Neither
vm_map_check_protection() nor vm_map_unwire() expect Giant
   to be held.
2002-06-15 19:10:19 +00:00
Alan Cox
e30616dbfe o Remove the acquisition and release of Giant from munlock().
Reviewed by:	tegge
2002-06-15 05:05:04 +00:00
Alan Cox
1d7cf06c8c o Use vm_map_wire() and vm_map_unwire() in place of vm_map_pageable() and
vm_map_user_pageable().
 o Remove vm_map_pageable() and vm_map_user_pageable().
 o Remove vm_map_clear_recursive() and vm_map_set_recursive().  (They were
   only used by vm_map_pageable() and vm_map_user_pageable().)

Reviewed by:	tegge
2002-06-14 18:21:01 +00:00
Alan Cox
d46e7d6bee o Acquire and release Giant in vm_map_unlock_and_wait().
Submitted by:	tegge
2002-06-12 08:15:52 +00:00
Alan Cox
28c58286ef o Properly handle a failure by vm_fault_wire() or vm_fault_user_wire()
in vm_map_wire().
 o Make two white-space changes in vm_map_wire().

Reviewed by:	tegge
2002-06-11 19:13:59 +00:00
Alan Cox
73b2bace26 o Teach vm_map_delete() to respect the "in-transition" flag
on a vm_map_entry by sleeping until the flag is cleared.

Submitted by:	tegge
2002-06-11 05:24:22 +00:00
Alan Cox
2b4a2c272d o In vm_map_entry_create(), call uma_zalloc() with M_NOWAIT on system maps.
Submitted by: tegge
 o Eliminate the "!mapentzone" check from vm_map_entry_create() and
   vm_map_entry_dispose().  Reviewed by: tegge
 o Fix white-space usage in vm_map_entry_create().
2002-06-10 06:11:45 +00:00
Ian Dowse
f97d6ce396 Correct the logic for determining whether the per-CPU locks need
to be destroyed. This fixes a problem where destroying a UMA zone
would fail to destroy all zone mutexes.

Reviewed by:	jeff
2002-06-10 03:25:23 +00:00
Alan Cox
12d7cc840f o Add vm_map_wire() for wiring contiguous regions of either kernel
or user vm_maps.  This implementation has two key benefits when compared
   to vm_map_{user_,}pageable(): (1) it avoids a race condition through
   the use of "in-transition" vm_map entries and (2) it eliminates lock
   recursion on the vm_map.

Note: there is still an error case that requires clean up.

Reviewed by:	tegge
2002-06-09 20:25:18 +00:00
Alan Cox
b2f3846aef o Simplify vm_map_unwire() by merging the second and third passes
over the caller-specified region.
2002-06-08 19:00:40 +00:00
Alan Cox
e27e17b711 o Remove an unnecessary call to vm_map_wakeup() from vm_map_unwire().
o Add a stub for vm_map_wire().

Note: the description of the previous commit had an error.  The in-
transition flag actually blocks the deallocation of a vm_map_entry by
vm_map_delete() and vm_map_simplify_entry().
2002-06-08 07:32:38 +00:00
Alan Cox
acd9a301ec o Add vm_map_unwire() for unwiring contiguous regions of either kernel
or user vm_maps.  In accordance with the standards for munlock(2),
   and in contrast to vm_map_user_pageable(), this implementation does not
   allow holes in the specified region.  This implementation uses the
   "in transition" flag described below.
 o Introduce a new flag, "in transition," to the vm_map_entry.
   Eventually, vm_map_delete() and vm_map_simplify_entry() will respect
   this flag by deallocating in-transition vm_map_entrys, allowing
   the vm_map lock to be safely released in vm_map_unwire() and (the
   forthcoming) vm_map_wire().
 o Modify vm_map_simplify_entry() to respect the in-transition flag.

In collaboration with:	tegge
2002-06-07 18:34:23 +00:00
Alfred Perlstein
fa7212543f fix typo in _SYS_SYSPROTO_H_ case: s/mlockall_args/munlockall_args
Submitted by: Mark Santcroos <marks@ripe.net>
2002-06-06 18:51:14 +00:00
Jeff Roberson
494273bead Add a comment describing a resource leak that occurs during a failure case
in obj_alloc.
2002-06-03 22:59:19 +00:00
Alan Cox
c5aaa06ded o Migrate vm_map_split() from vm_map.c to vm_object.c, renaming it
to vm_object_split().  Its interface should still be changed
   to resemble vm_object_shadow().
2002-06-02 23:54:09 +00:00
Alan Cox
0d78c0dce2 o Style fixes to vm_map_split(), including the elimination of one variable
declaration that shadows another.

Note: This function should really be vm_object_split(), not vm_map_split().

Reviewed by:	md5
2002-06-02 19:32:05 +00:00
Alan Cox
72353893d4 o Condition vm_object_pmap_copy_1()'s compilation on the kernel
option ENABLE_VFS_IOOPT.  Unless this option is in effect,
   vm_object_pmap_copy_1() is not used.
2002-06-02 06:31:41 +00:00
Alan Cox
61c075b67f o Remove GIANT_REQUIRED from vm_map_zfini(), vm_map_zinit(),
vm_map_create(), and vm_map_submap().
 o Make further use of a local variable in vm_map_entry_splay()
   that caches a reference to one of a vm_map_entry's children.
   (This reduces code size somewhat.)
 o Revert a part of revision 1.66, deinlining vmspace_pmap().
   (This function is MPSAFE.)
2002-06-01 22:41:43 +00:00