Commit Graph

5278 Commits

Author SHA1 Message Date
Bruce Evans
f20c8dde15 Fixed some printf format errors (one new one reported by gcc and 3 nearby
old ones not reported by gcc).  This helps unbreak LINT.
2002-07-08 12:21:11 +00:00
Peter Wemm
a136efe9b6 Collect all the (now equivalent) pmap_new_proc/pmap_dispose_proc/
pmap_swapin_proc/pmap_swapout_proc functions from the MD pmap code
and use a single equivalent MI version.  There are other cleanups
needed still.

While here, use the UMA zone hooks to keep a cache of preinitialized
proc structures handy, just like the thread system does.  This eliminates
one dependency on 'struct proc' being persistent even after being freed.
There are some comments about things that can be factored out into
ctor/dtor functions if it is worth it.  For now they are mostly just
doing statistics to get a feel of how it is working.
2002-07-07 23:05:27 +00:00
Jeff Roberson
41a5470d03 - Require locks for getattr. At some point this could only require shared
locks.
2002-07-07 22:37:45 +00:00
Jeff Roberson
31965a72c9 - Delay unlocking a vnode in linker_hints_lookup until we're actually done
with it.
 - Remove a now stale comment about improper vnode locking.
2002-07-07 22:35:47 +00:00
Peter Wemm
b799f5a475 Make this compile on 64 bit platforms 2002-07-07 22:27:40 +00:00
Jeff Roberson
18c48f437f - Don't hold the vn lock while calling VOP_CLOSE in vclean(). 2002-07-07 06:38:22 +00:00
Jeff Roberson
bed75d4627 - BUF_REFCNT() seems to be the preferred method for verifying a locked buf.
Tell vop_strategy_pre() to use this instead.
 - Ignore B_CLUSTER bufs.  Their components are locked but they don't really
   exist so they don't have to be.  This isn't ideal but it is safe.
2002-07-07 05:29:45 +00:00
Jeff Roberson
49244e35ff Add two asserts that prove & document getblk and geteblk's behavior of
returning locked bufs.
2002-07-07 05:27:08 +00:00
Jeff Roberson
c031d11bb4 Fix a mistake in my last commit. Don't grab an extra reference to the object
in bp->b_object.
2002-07-06 21:27:20 +00:00
Jeff Roberson
9a236af3ad Fixup uses of GETVOBJECT.
- Cache a pointer to the vnode's object in the buf.
 - Hold a reference to that object in addition to the vnode's reference just
   to be consistent.
 - Cleanup code that got the object indirectly through the vp and VOP calls.

This fixes at least one case where we were calling GETVOBJECT without a lock.
It also avoids an expensive layered call at the cost of another pointer in
struct buf.
2002-07-06 08:59:52 +00:00
Julian Elischer
fe0f1bf4b1 make this repect ps_sigintr if there is a pre-existing signal
or suspension request.

Submitted by:	David Xu
2002-07-06 08:47:24 +00:00
Jeff Roberson
0b2ed1aef7 Clean up execve locking:
- Grab the vnode object early in exec when we still have the vnode lock.
 - Cache the object in the image_params.
 - Make use of the cached object in imgact_*.c
2002-07-06 07:00:01 +00:00
Jeff Roberson
e818064e98 - Disable original vop_strategy lock specification.
- Switch to the new vop_strategy_pre for lock validation.

VOP_STRATEGY requires only that the buf is locked UNLESS the block numbers need
to be translated.  There may be other reasons, but as long as the underlying
layer uses a VOP to perform the operations they will be caught later.
2002-07-06 05:23:17 +00:00
Jeff Roberson
302c7aaab9 - Add vop_strategy_pre to validate VOP_STRATEGY locking.
- Disable original vop_strategy lock specification.
 - Switch to the new vop_strategy_pre for lock validation.

VOP_STRATEGY requires only that the buf is locked UNLESS the block numbers need
to be translated.  There may be other reasons, but as long as the underlying
layer uses a VOP to perform the operations they will be caught later.
2002-07-06 05:21:12 +00:00
Jeff Roberson
13e407efee Use the new #! directive for vop_rename. Leave the old lock specification
intact but disabled.
2002-07-06 04:41:27 +00:00
Jeff Roberson
cc8662b0f9 Add "vop_rename_pre" to do pre rename lock verification. This is enabled only
with DEBUG_VFS_LOCKS.
2002-07-06 04:39:48 +00:00
Julian Elischer
55fb7ca894 Fix at least one of the things wrong with signals
^Z should work a lot better now.

Submitted by:	peter@freebsd.org
2002-07-06 02:45:11 +00:00
Andrew Gallatin
a83560d677 Remove the advertising clause from the Duke BSD copyright on the
zero-copy files

Requested by: rwatson
Approved by: Jeff Chase (my old boss at Duke)
2002-07-06 02:44:15 +00:00
Warner Losh
e0b7446484 dd %i as an alias for %d for greater compatibility with our *BSD bretheren
Obtained from: NetBSD
Reviewed by: jake, rwatson, bosko
2002-07-05 18:36:49 +00:00
Jeff Roberson
2efc89d4dc Include systm.h before vnode.h so Debugger() and printf() are available when
full vnode lock debugging is enabled.
2002-07-05 05:15:30 +00:00
Alan Cox
70c1763634 o Resurrect vm_page_lock_queues(), vm_page_unlock_queues(), and the free
queue lock (revision 1.33 of vm/vm_page.c removed them).
 o Make the free queue lock a spin lock because it's sometimes acquired
   inside of a critical section.
2002-07-04 22:07:37 +00:00
Maxime Henrion
d7f9ecc86b Move vfs_rootmountalloc() in vfs_mount.c and remove lite2_vfs_mountroot()
which was #if 0'd and is not likely to be used now.
2002-07-03 09:27:24 +00:00
Julian Elischer
aa0fa33464 Try clean up some of the mess that resulted from layers and layers
of p4 merges from -current as things started getting different.

Corroborated by: Similar patches just mailed by BDE.
2002-07-03 09:15:20 +00:00
Maxime Henrion
563af2ec15 Remove an unused argument in vfs_mountroot(). 2002-07-03 08:52:37 +00:00
Julian Elischer
ee9919b024 White space commit.
I'm working on this file but I wanted to make the whitespece commit
separatly.
2002-07-03 06:15:26 +00:00
Andrew Gallatin
0ac3b6364f Hold the sched lock across call to forward_signal() in tdsignal() to
keep SMP systems from panic'ing when ^C'ing an app

suggested by julian
2002-07-03 02:55:48 +00:00
Dag-Erling Smørgrav
b61860ad2d Add mtx_ prefixes to the fields used for mutex profiling, and fix a bug
where the profiling code would report the release point instead of the
acquisition point.

Requested by:	bde
2002-07-03 01:50:27 +00:00
Maxime Henrion
534ab2e108 I didn't pay enough attention when copy/pasting disclaimers.
The disclaimer in vfs_conf.c was slightly different.  Fix this.
2002-07-02 18:33:32 +00:00
Maxime Henrion
2b4edb69f1 Move every code related to mount(2) in a new file, vfs_mount.c.
The file vfs_conf.c which was dealing with root mounting has
been repo-copied into vfs_mount.c to preserve history.
This makes nmount related development easier, and help reducing
the size of vfs_syscalls.c, which is still an enormous file.

Reviewed by:	rwatson
Repo-copy by:	peter
2002-07-02 17:09:22 +00:00
Julian Elischer
8b768fc82b When going back to SLEEP state, make sure our
State is correctly marked so.
2002-07-02 05:40:51 +00:00
Julian Elischer
d5cb7e14f6 Fix failure to correctly transition back to sleep mode. 2002-07-02 05:33:46 +00:00
Peter Wemm
c781aea8ba #include <sys/ktrace.h> would be useful too. (for ktrace_mtx) 2002-07-01 23:18:08 +00:00
Ian Dowse
f2f2285a6a The jail syscall calls chroot, which is not mpsafe, so put back a
mtx_lock(&Giant) around that call.

Reviewed by:	arr
2002-07-01 20:46:01 +00:00
Peter Wemm
1e9b3d9142 Add #include "opt_ktrace.h" 2002-07-01 19:49:04 +00:00
Ian Dowse
6bd521df93 Use indirect function pointer hooks instead of #ifdef SOFTUPDATES
direct calls for the two places where the kernel calls into soft
updates code. Set up the hooks in softdep_initialize() and NULL
them out in softdep_uninitialize(). This change allows soft updates
to function correctly when ufs is loaded as a module.

Reviewed by:	mckusick
2002-07-01 17:59:40 +00:00
Andrew R. Reiter
c0854cd341 - In thread_userret(), remove the Giant locking and unlocking around the
call to thread_alloc().

Approved by:	julian
Reviewed by:	jake, jeff
2002-07-01 03:15:16 +00:00
Julian Elischer
7c7a6f22ca If the process is a zombie, then you must not try dereference the thread
because there isn't one. Of course this code only possibly works
for single threaded processes anyhow..
2002-06-30 07:50:22 +00:00
Alfred Perlstein
37a6b453c4 Partial backout of 1.318, remove error handling added because it may be
incorrect.

Requested by: bde
2002-06-30 05:23:58 +00:00
Ian Dowse
37777f4d1f Add a hashdestroy() function to undo the actions of hashinit(). 2002-06-30 02:07:26 +00:00
Alfred Perlstein
97bb78ace2 Fix several style bugs:
close up the continued line after removing the cast made the line.
space before parentheses in indirect function call.

Add an addtional error handler case for the results of callback.

Submitted by: bde
2002-06-29 17:58:44 +00:00
Alfred Perlstein
c5e3ef7e1f Unbreak computation of 'smask' that I broke when removing caddr_t.
Submitted by: bde
2002-06-29 17:56:34 +00:00
Julian Elischer
e602ba25fd Part 1 of KSE-III
The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by:	Almost everyone who counts
	(at various times, peter, jhb, matt, alfred, mini, bernd,
	and a cast of thousands)

	NOTE: this is still Beta code, and contains lots of debugging stuff.
	expect slight instability in signals..
2002-06-29 17:26:22 +00:00
Julian Elischer
44990b8cb8 Add files that are new for KSE. 2002-06-29 07:04:59 +00:00
David E. O'Brien
87e1503e2c Rename the db command lockedvnodes to lockedvnods so that it fits on the
help screen and one doens't think we have a lockedvnodesmap command.
2002-06-29 04:45:09 +00:00
Alfred Perlstein
016091145e more caddr_t removal. 2002-06-29 02:00:02 +00:00
Alfred Perlstein
7f05b0353a More caddr_t removal, make fo_ioctl take a void * instead of a caddr_t. 2002-06-29 01:50:25 +00:00
Alfred Perlstein
69a3693f3e catch up with mextadd callback taking a void argument instead of a caddr_t. 2002-06-29 01:49:22 +00:00
Alfred Perlstein
802082390b More caddr_t removal.
Change struct knote's kn_hook from caddr_t to void *.
2002-06-29 00:29:12 +00:00
Alfred Perlstein
a551e20e27 nuke more instances of caddr_t 2002-06-29 00:02:01 +00:00
Alfred Perlstein
337f75e11c m_extadd takes a void (*freef)(void *, void *) now, not a
void (*freef)(caddr_t, void *).
2002-06-29 00:01:46 +00:00
Alfred Perlstein
64f0b9d749 remove or replace caddr_t with void.
make the mbuf external free function take a void * rather than caddr_t.
2002-06-28 23:48:23 +00:00
Alfred Perlstein
210a5a7169 nuke caddr_t. 2002-06-28 23:17:36 +00:00
Alfred Perlstein
a788442584 Remove unneeded casts to caddr_t. 2002-06-28 23:02:38 +00:00
Alfred Perlstein
52545a237b document that the pipe fo_stat routine doesn't need locks because it's
a read operation.

Requested by: rwatson
2002-06-28 22:35:12 +00:00
Jeff Roberson
90769c9ed0 Improve the VOP locking asserts
- Add vfs_badlock_print to control whether or not we print lock violations
 - Add vfs_badlock_panic to control whether we panic on lock violations

Both default to on to mimic the original behavior if DEBUG_VFS_LOCKS is on.
2002-06-28 20:58:14 +00:00
Ian Dowse
84b2995b2f In vn_mkdir(), use vrele() instead of vput() on the parent directory
vnode in the case that the target exists and is the same vnode as
the parent (i.e. "mkdir ."). The namei() call does not leave the
vnode locked in this case even though you might expect it to.

This bug was mostly harmless in practice because unlocking an already
unlocked vnode currently does not trigger any panics or warnings.

Reviewed by:	jeff
2002-06-28 20:06:47 +00:00
Jeff Roberson
5c71bc6cf2 Clean up vn_rdwr locking.
- Do shared locks on read.
 - Only do vn_{start,finished}_write when writing.
2002-06-28 17:51:11 +00:00
Brian Feldman
aac12bcfbc Fix a case where a vnode got explicitly unlocked after the pointer to it
got set to NULL.

Revision 1.355: in the box
2002-06-28 16:17:47 +00:00
Luigi Rizzo
d26d355f0e Remove a printf and add a comment on an assumption that could be
occasionally violated by device drivers.
2002-06-27 23:23:04 +00:00
Robert Watson
600c1a5a8e Fix a bug that prevented the deletion of non-default ACLs from being
passed down the VFS stack.  While I'm here, replace a '0' with a 'NULL'
to make the code more readable.

Sponsored by:	DARPA, NAI Labs
Obtained from:	TrustedBSD Project
2002-06-27 19:31:15 +00:00
Robert Watson
cbeb840245 A bit of whitespace magic. 2002-06-27 19:30:11 +00:00
Kenneth D. Merry
98cb733c67 At long last, commit the zero copy sockets code.
MAKEDEV:	Add MAKEDEV glue for the ti(4) device nodes.

ti.4:		Update the ti(4) man page to include information on the
		TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options,
		and also include information about the new character
		device interface and the associated ioctls.

man9/Makefile:	Add jumbo.9 and zero_copy.9 man pages and associated
		links.

jumbo.9:	New man page describing the jumbo buffer allocator
		interface and operation.

zero_copy.9:	New man page describing the general characteristics of
		the zero copy send and receive code, and what an
		application author should do to take advantage of the
		zero copy functionality.

NOTES:		Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS,
		TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT.

conf/files:	Add uipc_jumbo.c and uipc_cow.c.

conf/options:	Add the 5 options mentioned above.

kern_subr.c:	Receive side zero copy implementation.  This takes
		"disposable" pages attached to an mbuf, gives them to
		a user process, and then recycles the user's page.
		This is only active when ZERO_COPY_SOCKETS is turned on
		and the kern.ipc.zero_copy.receive sysctl variable is
		set to 1.

uipc_cow.c:	Send side zero copy functions.  Takes a page written
		by the user and maps it copy on write and assigns it
		kernel virtual address space.  Removes copy on write
		mapping once the buffer has been freed by the network
		stack.

uipc_jumbo.c:	Jumbo disposable page allocator code.  This allocates
		(optionally) disposable pages for network drivers that
		want to give the user the option of doing zero copy
		receive.

uipc_socket.c:	Add kern.ipc.zero_copy.{send,receive} sysctls that are
		enabled if ZERO_COPY_SOCKETS is turned on.

		Add zero copy send support to sosend() -- pages get
		mapped into the kernel instead of getting copied if
		they meet size and alignment restrictions.

uipc_syscalls.c:Un-staticize some of the sf* functions so that they
		can be used elsewhere.  (uipc_cow.c)

if_media.c:	In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid
		calling malloc() with M_WAITOK.  Return an error if
		the M_NOWAIT malloc fails.

		The ti(4) driver and the wi(4) driver, at least, call
		this with a mutex held.  This causes witness warnings
		for 'ifconfig -a' with a wi(4) or ti(4) board in the
		system.  (I've only verified for ti(4)).

ip_output.c:	Fragment large datagrams so that each segment contains
		a multiple of PAGE_SIZE amount of data plus headers.
		This allows the receiver to potentially do page
		flipping on receives.

if_ti.c:	Add zero copy receive support to the ti(4) driver.  If
		TI_PRIVATE_JUMBOS is not defined, it now uses the
		jumbo(9) buffer allocator for jumbo receive buffers.

		Add a new character device interface for the ti(4)
		driver for the new debugging interface.  This allows
		(a patched version of) gdb to talk to the Tigon board
		and debug the firmware.  There are also a few additional
		debugging ioctls available through this interface.

		Add header splitting support to the ti(4) driver.

		Tweak some of the default interrupt coalescing
		parameters to more useful defaults.

		Add hooks for supporting transmit flow control, but
		leave it turned off with a comment describing why it
		is turned off.

if_tireg.h:	Change the firmware rev to 12.4.11, since we're really
		at 12.4.11 plus fixes from 12.4.13.

		Add defines needed for debugging.

		Remove the ti_stats structure, it is now defined in
		sys/tiio.h.

ti_fw.h:	12.4.11 firmware.

ti_fw2.h:	12.4.11 firmware, plus selected fixes from 12.4.13,
		and my header splitting patches.  Revision 12.4.13
		doesn't handle 10/100 negotiation properly.  (This
		firmware is the same as what was in the tree previously,
		with the addition of header splitting support.)

sys/jumbo.h:	Jumbo buffer allocator interface.

sys/mbuf.h:	Add a new external mbuf type, EXT_DISPOSABLE, to
		indicate that the payload buffer can be thrown away /
		flipped to a userland process.

socketvar.h:	Add prototype for socow_setup.

tiio.h:		ioctl interface to the character portion of the ti(4)
		driver, plus associated structure/type definitions.

uio.h:		Change prototype for uiomoveco() so that we'll know
		whether the source page is disposable.

ufs_readwrite.c:Update for new prototype of uiomoveco().

vm_fault.c:	In vm_fault(), check to see whether we need to do a page
		based copy on write fault.

vm_object.c:	Add a new function, vm_object_allocate_wait().  This
		does the same thing that vm_object allocate does, except
		that it gives the caller the opportunity to specify whether
		it should wait on the uma_zalloc() of the object structre.

		This allows vm objects to be allocated while holding a
		mutex.  (Without generating WITNESS warnings.)

		vm_object_allocate() is implemented as a call to
		vm_object_allocate_wait() with the malloc flag set to
		M_WAITOK.

vm_object.h:	Add prototype for vm_object_allocate_wait().

vm_page.c:	Add page-based copy on write setup, clear and fault
		routines.

vm_page.h:	Add page based COW function prototypes and variable in
		the vm_page structure.

Many thanks to Drew Gallatin, who wrote the zero copy send and receive
code, and to all the other folks who have tested and reviewed this code
over the years.
2002-06-26 03:37:47 +00:00
Andrew R. Reiter
e024f583de - Remove Giant acquisition from modevent(), modfnext(), modstat() and
modfind().  Giant is no longer needed by these functions for safe
  execution.

Reviewed by:	jhb
2002-06-26 00:31:44 +00:00
Andrew R. Reiter
4e77f68011 - Alleviate jail() from having the burden of acquiring Giant by simply
removing.  We can do this since we no longer need Giant to safely
  execute jail().

Reviewed by:	rwatson, jhb
2002-06-26 00:29:01 +00:00
Alan Cox
366838ddfe o Eliminate vmspace::vm_minsaddr. It's initialized but never used.
o Replace stale comments in vmspace by "const until freed" annotations
   on some fields.
2002-06-25 18:14:38 +00:00
Jake Burkholder
8ba3d077ff Add an MD callout like cpu_exit, but which is called after sched_lock is
obtained, when all other scheduling activity is suspended.  This is needed
on sparc64 to deactivate the vmspace of the exiting process on all cpus.
Otherwise if another unrelated process gets the exact same vmspace structure
allocated to it (same address), its address space will not be activated
properly.  This seems to fix some spontaneous signal 11 problems with smp
on sparc64.
2002-06-24 15:48:02 +00:00
Maxime Henrion
0ae037eb45 Bring sys/kern/md5c.c in sync with the userland version.
Add a comment so that people don't forget to keep the
version in src/lib/libmd/md5c.c in sync with this one.
This fixes a warning on sparc64.

Reviewed by:	phk
2002-06-24 14:15:25 +00:00
Kirk McKusick
d374764fd3 Use proper size in bzero of stat structure.
Submitted by:	Jake Burkholder <jake@locore.ca>
Sponsored by:	DARPA & NAI Labs.
2002-06-24 07:14:44 +00:00
Jonathan Mini
01ad8a53db Remove unused diagnostic function cread_free_thread().
Approved by:	alfred
2002-06-24 06:22:00 +00:00
Matthew Dillon
727300861d I Noticed a defect in the way wakeup() scans the tailq. Tor noticed an
even worse defect in wakeup_one().  This patch cleans up both.

Submitted by:	tegge
MFC after:	3 days
2002-06-24 00:14:36 +00:00
Maxime Henrion
0a84e7c8f7 More 64 bits platforms warning fixes.
Reviewed by:	rwatson
2002-06-23 18:32:39 +00:00
Kirk McKusick
6524dddcd5 This patch fixes a size problem with the stat structure for
64-bit architectures that was introduced in the UFS2 code
merge two days ago. The stat structure change that caused
the problem was the addition of the file create time.

Submitted by:	Bruce Evans <bde@zeta.org.au>
Sponsored by:	DARPA & NAI Labs.
2002-06-22 22:01:13 +00:00
Maxime Henrion
2853bfa0df We don't need to check the return value of malloc() against
NULL when the M_WAITOK flag is specified.
2002-06-22 21:44:11 +00:00
Matthew Dillon
ed22d6e948 Fix a bug in vfs_bio_clrbuf(). The single-page-clrbuf optimization was
improperly clearing more then just the invalid portions of the page.  (This
bug is not known to have been triggered by anything).

Submitted by:	tegge
MFC after:	7 days
2002-06-22 19:09:35 +00:00
Maxime Henrion
cacd1c9b49 o Remove the initialization of unused fields in the struct
uio now that we don't use uiomove() anymore.
o Enforce stricter checks on the length of the iov's in
  nmount(2) since we now malloc() them individually and
  corrupted iov's could make the kernel crash in malloc()
  with "kmem_map too small".

Reviewed by:	phk
2002-06-22 18:07:05 +00:00
Jonathan Mini
9718382d85 Always drop the p_args reference we held for copyout, even if we're about
to change it. This fixes a leak triggered by setproctitle(3).

Approved by:	alfred
Noticed by:	Peter Jeremy <peter.jeremy@alcatel.com.au>
2002-06-22 10:05:50 +00:00
Kirk McKusick
1c85e6a35d This commit adds basic support for the UFS2 filesystem. The UFS2
filesystem expands the inode to 256 bytes to make space for 64-bit
block pointers. It also adds a file-creation time field, an ability
to use jumbo blocks per inode to allow extent like pointer density,
and space for extended attributes (up to twice the filesystem block
size worth of attributes, e.g., on a 16K filesystem, there is space
for 32K of attributes). UFS2 fully supports and runs existing UFS1
filesystems. New filesystems built using newfs can be built in either
UFS1 or UFS2 format using the -O option. In this commit UFS1 is
the default format, so if you want to build UFS2 format filesystems,
you must specify -O 2. This default will be changed to UFS2 when
UFS2 proves itself to be stable. In this commit the boot code for
reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c)
as there is insufficient space in the boot block. Once the size of the
boot block is increased, this code can be defined.

Things to note: the definition of SBSIZE has changed to SBLOCKSIZE.
The header file <ufs/ufs/dinode.h> must be included before
<ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and
ufs_lbn_t.

Still TODO:
Verify that the first level bootstraps work for all the architectures.
Convert the utility ffsinfo to understand UFS2 and test growfs.
Add support for the extended attribute storage. Update soft updates
to ensure integrity of extended attribute storage. Switch the
current extended attribute interfaces to use the extended attribute
storage. Add the extent like functionality (framework is there,
but is currently never used).

Sponsored by: DARPA & NAI Labs.
Reviewed by:	Poul-Henning Kamp <phk@freebsd.org>
2002-06-21 06:18:05 +00:00
Maxime Henrion
7d2d440991 Change the way we internally store the mount options to
a linked list.  This is to allow the merging of the mount
options in the MNT_UPDATE case, as the current data structure
is unsuitable for this.

There are no functional differences in this commit.

Reviewed by:	phk
2002-06-20 20:03:42 +00:00
Alfred Perlstein
c33c825169 Implement SO_NOSIGPIPE option for sockets. This allows one to request that
an EPIPE error return not generate SIGPIPE on sockets.

Submitted by: lioux
Inspired by: Darwin
2002-06-20 18:52:54 +00:00
Alfred Perlstein
69be5db96f Don't leak resources if fdcheckstd() fails during exec.
Submitted by: Mike Makonnen <makonnen@pacbell.net>
2002-06-20 17:27:28 +00:00
Ian Dowse
99568bcaf7 Display the mutex name in the ^T status line if the selected thread
is blocked on a mutex. Prepend a '*' to distinguish this case as
is done in top(1).
2002-06-20 14:03:36 +00:00
Peter Wemm
9d04103b7c Remove UIO_USERISPACE - we do not support any split instruction/data
address space machines (eg: pdp-11) and are not likely to ever do so.
Nothing in our kernel sets this.
2002-06-20 07:08:43 +00:00
Peter Wemm
2f9267ec23 Move the "- 1" into the RQB_FFS(mask) macro itself so that
implementations can provide a base zero ffs function if they wish.
This changes
  #define RQB_FFS(mask) (ffs64(mask))
  foo = RQB_FFS(mask) - 1;
to
  #define RQB_FFS(mask) (ffs64(mask) - 1)
  foo = RQB_FFS(mask);
On some platforms we can get the "- 1" for free, eg: those that use the
C code for ffs64().

Reviewed by:	jake (in principle)
2002-06-20 06:21:20 +00:00
Andrew R. Reiter
2eb7b21b00 - Remove the lock(9) protecting the kernel linker system.
- Added a mutex, kld_mtx, to protect the kernel_linker system.  Note that
  while ``classes'' is global (to that file), it is only read only after
  SI_SUB_KLD, SI_ORDER_ANY.
- Add a SYSINIT to flip a flag that disallows class registration after
  SI_SUB_KLD, SI_ORDER_ANY.

Idea for ``classes'' read only by:	jake
Reviewed by:	jake
2002-06-19 21:25:59 +00:00
Poul-Henning Kamp
c4bacc1871 Remove the compat bits for the mis-aligned struct disklabel on alpha,
people got three times longer than I promised.

Sponsored by: DARPA & NAI Labs.
2002-06-19 08:37:02 +00:00
Alfred Perlstein
1419eacb86 Squish the "could sleep with process lock" messages caused by calling
uifind() with a proc lock held.

change_ruid() and change_euid() have been modified to take a uidinfo
structure which will be pre-allocated by callers, they will then
call uihold() on the uidinfo structure so that the caller's logic
is simplified.

This allows one to call uifind() before locking the proc struct and
thereby avoid a potential blocking allocation with the proc lock
held.

This may need revisiting, perhaps keeping a spare uidinfo allocated
per process to handle this situation or re-examining if the proc
lock needs to be held over the entire operation of changing real
or effective user id.

Submitted by: Don Lewis <dl-freebsd@catspoiler.org>
2002-06-19 06:39:25 +00:00
Alfred Perlstein
f2102dadf9 setsugid() touches p->p_flag so assert that the proc is locked. 2002-06-18 22:41:35 +00:00
Seigo Tanimura
03e4918190 Remove so*_locked(), which were backed out by mistake. 2002-06-18 07:42:02 +00:00
Maxime Henrion
fe93750656 Change vfs_copyopt() so that the length argument passed to it
must be the exact same size as the mount option.  This makes
vfs_copyopt() much more useful.
2002-06-14 20:04:21 +00:00
Bosko Milekic
ad1026f912 Set system_map for both mbuf_map and clust_map to 1, in mbuf_init().
Submitted by: Tor Egge (tegge)
Pointed out to me by: hsu
2002-06-13 23:53:42 +00:00
Robert Watson
6480dc743e Regen. 2002-06-13 23:44:50 +00:00
Robert Watson
65772a1a0a Keep POSIX.1e capabilities system call placeholders, but remove definitions. 2002-06-13 23:43:53 +00:00
Robert Watson
a3cce19f7d kern_cap.c no longer needed. 2002-06-13 23:19:34 +00:00
Robert Watson
4aaae52d99 opt_cap.c no longer needed 2002-06-13 23:17:39 +00:00
Kelly Yancey
9ae6d334da Make nselcol, the number of select collisions since boot, unsigned as
negative collisions simply doesn't make sense.

PR:		(one small part of) 19720
Approved by:	alfred
2002-06-12 02:08:18 +00:00
Kelly Yancey
e3f0c5755c Time counter stats are unsigned, advertise them to sysctl(8) that way.
PR:		(one small part of) 19720
Approved by:	phk
2002-06-11 19:47:44 +00:00
Kelly Yancey
3316a80bd1 Convert hit and miss counters to unsigned values. Surely negative values
for either does not make sense.

PR:		(one small part of) 19720
2002-06-10 22:40:26 +00:00
John Baldwin
d0c149fce8 We no longer need to acqure Giant in ast() for ktrpsig() in postsig() now
that ktrace no longer needs Giant.
2002-06-07 05:43:40 +00:00
John Baldwin
374a15aa55 - trapsignal() no longer needs to acquire Giant for ktrpsig().
- Catch up to new ktrace API.
2002-06-07 05:43:02 +00:00
John Baldwin
af300f2367 - Proper locking for p_tracep and p_traceflag.
- Catch up to new ktrace API.
2002-06-07 05:42:25 +00:00
John Baldwin
6c84de02e0 Properly lock accesses to p_tracep and p_traceflag. Also make a few
ktrace-only things #ifdef KTRACE that were not before.
2002-06-07 05:41:27 +00:00
John Baldwin
9ba7fe1b76 - Catch up to new ktrace API.
- ktrace trace points in msleep() and cv_wait() no longer need Giant.
2002-06-07 05:39:16 +00:00
John Baldwin
60a9bb197d Catch up to changes in ktrace API. 2002-06-07 05:37:18 +00:00
John Baldwin
ea3fc8e4cd Overhaul the ktrace subsystem a bit. For the most part, the actual vnode
operations to dump a ktrace event out to an output file are now handled
asychronously by a ktrace worker thread.  This enables most ktrace events
to not need Giant once p_tracep and p_traceflag are suitably protected by
the new ktrace_lock.

There is a single todo list of pending ktrace requests.  The various
ktrace tracepoints allocate a ktrace request object and tack it onto the
end of the queue.  The ktrace kernel thread grabs requests off the head of
the queue and processes them using the trace vnode and credentials of the
thread triggering the event.

Since we cannot assume that the user memory referenced when doing a
ktrgenio() will be valid and since we can't access it from the ktrace
worker thread without a bit of hassle anyways, ktrgenio() requests are
still handled synchronously.  However, in order to ensure that the requests
from a given thread still maintain relative order to one another, when a
synchronous ktrace event (such as a genio event) is triggered, we still put
the request object on the todo list to synchronize with the worker thread.
The original thread blocks atomically with putting the item on the queue.
When the worker thread comes across an asynchronous request, it wakes up
the original thread and then blocks to ensure it doesn't manage to write a
later event before the original thread has a chance to write out the
synchronous event.  When the original thread wakes up, it writes out the
synchronous using its own context and then finally wakes the worker thread
back up.  Yuck.  The sychronous events aren't pretty but they do work.

Since ktrace events can be triggered in fairly low-level areas (msleep()
and cv_wait() for example) the ktrace code is designed to use very few
locks when posting an event (currently just the ktrace_mtx lock and the
vnode interlock to bump the refcoun on the trace vnode).  This also means
that we can't allocate a ktrace request object when an event is triggered.
Instead, ktrace request objects are allocated from a pre-allocated pool
and returned to the pool after a request is serviced.

The size of this pool defaults to 100 objects, which is about 13k on an
i386 kernel.  The size of the pool can be adjusted at compile time via the
KTRACE_REQUEST_POOL kernel option, at boot time via the
kern.ktrace_request_pool loader tunable, or at runtime via the
kern.ktrace_request_pool sysctl.

If the pool of request objects is exhausted, then a warning message is
printed to the console.  The message is rate-limited in that it is only
printed once until the size of the pool is adjusted via the sysctl.

I have tested all kernel traces but have not tested user traces submitted
by utrace(2), though they should work fine in theory.

Since a ktrace request has several properties (content of event, trace
vnode, details of originating process, credentials for I/O, etc.), I chose
to drop the first argument to the various ktrfoo() functions.  Currently
the functions just assume the event is posted from curthread.  If there is
a great desire to do so, I suppose I could instead put back the first
argument but this time make it a thread pointer instead of a vnode pointer.

Also, KTRPOINT() now takes a thread as its first argument instead of a
process.  This is because the check for a recursive ktrace event is now
per-thread instead of process-wide.

Tested on:	i386
Compiles on:	sparc64, alpha
2002-06-07 05:32:59 +00:00
John Baldwin
48849938e8 Change the all locks list from a STAILQ to a TAILQ. This bloats struct
lock_object by another pointer (though all of lock_object should be
conditional on LOCK_DEBUG anyways) in exchange for an O(1) TAILQ_REMOVE()
in witness_destroy() (called for every mtx_destroy() and sx_destroy())
instead of an O(n) STAILQ_REMOVE.  Since WITNESS is so dog slow as it is,
the speed-up is worth the space cost.

Suggested by:	iedowse
2002-06-06 20:51:04 +00:00
Chad David
ca18d53eae s/!SIGNOTEMPY/SIGISEMPTY/
Reviewed by: marcel, jhb, alfred
2002-06-06 19:12:41 +00:00
John Baldwin
8dcb900b62 Handle "dead" witnesses better in the situation of several short term locks
being created and destroyed without a single long-term one around to ensure
the witness associated with that group of locks stays alive.  The pipe
mutexes are an example of this group.  For a dead witness we no longer
clear the witness name.  Instead, when looking up the witness for a lock,
if a dead witness' (a witness with a refcount of 0) w_name pointer is
identical to the witness name of the lock then we revive that witness
instead of using a new witness for the lock.  This results in far fewer
dead witness objects and also better preserves locking orders over the long
term resulting in more correct lock order checking.  Note that we can't
ever derefence w_name of a dead witness since we don't know if the string
it is pointing to has been free()'d or kldunload()'d out from under us.
2002-06-06 19:04:38 +00:00
Dag-Erling Smørgrav
edad3af28d Move some sysctls from the debug tree to the vfs tree. 2002-06-06 15:50:22 +00:00
Dag-Erling Smørgrav
4a357a32e0 Gratuitous whitespace cleanup. 2002-06-06 15:46:38 +00:00
Poul-Henning Kamp
53e645d90e Use "bwrbg" as description when we sleep for background writing,
"biord" was misleading in every possible way.
2002-06-06 08:56:10 +00:00
Bruce Evans
6438c894da Fixed overflow in the bounds checking in dscheck(). It assumed that
daadr_t is no larger than a long, and some other relatively harmless
things (*blush*).  Overflow for subtracting a daddr_t from a u_long
caused "truncation" of the i/o for attempts to access blocks beyond
the end of the actually cause expansion of the i/o to a preposterous
size.
2002-06-06 00:35:07 +00:00
John Baldwin
6a95e08f2f Replace thread_runnable() with thread_running() as the latter is more
accurate.

Suggested by:	julian
2002-06-04 22:36:24 +00:00
John Baldwin
7fcca6096f Optimize the adaptive mutex spin a bit. Use a simple while loop with
simple reads (and on IA32, a "pause" instruction for each interation of the
loop) to spin until either the mutex owner field changes, or the lock owner
stops executing.

Suggested by:	tanimura
Tested on:	i386
2002-06-04 21:53:48 +00:00
John Baldwin
5853d37d3b Add a private thread_runnable() macro to make the code more readable and
make the KSE diff easier to maintain.
2002-06-04 21:50:02 +00:00
Dag-Erling Smørgrav
f16a5176ad ANSIfy the one remaining K&R function. 2002-06-02 21:57:28 +00:00
Dag-Erling Smørgrav
e89efc02e0 Whitespace nits. 2002-06-02 21:55:58 +00:00
Dag-Erling Smørgrav
3b1f7e7de0 Add support for 'j' flag. Simplify the size modifier code and reduce code
duplication.  Also add support for 'n' specifier.

Reviewed by:	bde
2002-06-02 21:54:55 +00:00
Jens Schweikhardt
21dc7d4f57 Fix typo in the BSD copyright: s/withough/without/
Spotted and suggested by:	des
MFC after:	3 weeks
2002-06-02 20:05:59 +00:00
Mike Barcroft
6ee093fb8f Add POSIX.1-2001 WCONTINUED option for waitpid(2). A proc flag
(P_CONTINUED) is set when a stopped process receives a SIGCONT and
cleared after it has notified a parent process that has requested
notification via waitpid(2) with WCONTINUED specified in its options
operand.  The status value can be checked with the new WIFCONTINUED()
macro.

Reviewed by:	jake
2002-06-01 18:37:46 +00:00
Archie Cobbs
48d183faca Fix a bug in m_split(): the "m->m_ext.ext_size" field of an mbuf was being
set to zero. This field indicates the total space in the external buffer
and therefore should not be modified after the external buffer is added.

Add a comment warning that the mbufs returned by m_split() might be read-only.

Fix M_TRAILINGSPACE() to return zero if !M_WRITABLE(m).

Reviewed by:	freebsd-net
Obtained from:	Vernier Networks, Inc.
MFC after:	1 week
2002-05-31 22:09:57 +00:00
Dag-Erling Smørgrav
7aa57dca57 Nit: kern.ttys is of type S,xtty, not S,tty. 2002-05-31 16:11:49 +00:00
Seigo Tanimura
4cc20ab1f0 Back out my lats commit of locking down a socket, it conflicts with hsu's work.
Requested by:	hsu
2002-05-31 11:52:35 +00:00
Robert Drehmel
280759e75e - Replace the bandaid introduced in revision 1.110 with
a better solution.
 - Add braces for a ``for'' statement containing a single
   multi-line statement.
2002-05-31 09:41:09 +00:00
Poul-Henning Kamp
eef633a71f Mistyped and lost a '&' in previous commit. 2002-05-30 16:26:39 +00:00
Poul-Henning Kamp
fe71224650 Don't forget to factor in the boottime when we calculate PPS timestamps.
Submitted by:	Akira Watanabe <akira@myaw.ei.meisei-u.ac.jp>
2002-05-30 10:34:01 +00:00
Jeff Roberson
7181624aaa Record the file, line, and pid of the last successful shared lock holder. This
is useful as a last effort in debugging file system deadlocks.  This is enabled
via 'options DEBUG_LOCKS'
2002-05-30 05:55:22 +00:00
Julian Elischer
628855e758 CURSIG() is not a macro so rename it cursig().
Obtained from:	KSE tree
2002-05-29 23:44:32 +00:00
Julian Elischer
2d0231f5da diff reduction from KSE to keep WW-III from happenning on -current 2002-05-29 20:40:50 +00:00
Dag-Erling Smørgrav
6b658142fd Add some checks to prevent NULL dereferences.
Submitted by:	jhay
2002-05-28 14:29:56 +00:00
Maxime Henrion
8eb0098f4c Remove a duplicated vfs_freeopts() that I introduced in last
revision.
2002-05-28 13:27:55 +00:00
Dag-Erling Smørgrav
6c533ac713 Add NAI copyright. 2002-05-28 06:53:41 +00:00
Marcel Moolenaar
52183d0145 Add uuidgen(2) and uuidgen(1).
The uuidgen command, by means of the uuidgen syscall, generates one
or more Universally Unique Identifiers compatible with OSF/DCE 1.1
version 1 UUIDs.

From the Perforce logs (change 11995):

Round of cleanups:
o  Give uuidgen() the correct prototype in syscalls.master
o  Define struct uuid according to DCE 1.1 in sys/uuid.h
o  Use struct uuid instead of uuid_t. The latter is defined
   in sys/uuid.h but should not be used in kernel land.
o  Add snprintf_uuid(), printf_uuid() and sbuf_printf_uuid()
   to kern_uuid.c for use in the kernel (currently geom_gpt.c).
o  Rename the non-standard struct uuid in kern/kern_uuid.c
   to struct uuid_private and give it a slightly better definition
   for better byte-order handling. See below.
o  In sys/gpt.h, fix the broken uuid definitions to match the now
   compliant struct uuid definition. See below.
o  In usr.bin/uuidgen/uuidgen.c catch up with struct uuid change.

A note about byte-order:
        The standard failed to provide a non-conflicting and
unambiguous definition for the binary representation. My initial
implementation always wrote the timestamp as a 64-bit little-endian
(2s-complement) integral. The clock sequence was always written
as a 16-bit big-endian (2s-complement) integral. After a good
nights sleep and couple of Pan Galactic Gargle Blasters (not
necessarily in that order :-) I reread the spec and came to the
conclusion that the time fields are always written in the native
by order, provided the the low, mid and hi chopping still occurs.
The spec mentions that you "might need to swap bytes if you talk
to a machine that has a different byte-order". The clock sequence
is always written in big-endian order (as is the IEEE 802 address)
because its division is resulting in bytes, making the ordering
unambiguous.
2002-05-28 06:16:08 +00:00
Marcel Moolenaar
494eefd86b Add syscall uuidgen() for generating Univerally Unique Identifiers
(UUIDs). On ia64 UUIDs, aka GUIDs, are used by EFI and the firmware
among others. To create GUID Partition Tables (GPTs), we need to
be able to generate UUIDs.
2002-05-28 05:58:06 +00:00
Dag-Erling Smørgrav
1a149fcd67 Introduce struct xtty, used when exporting tty information to userland.
Make kern.ttys export a struct xtty rather than struct tty.  Since struct
tty is no longer exposed to userland, remove the dev_t / udev_t hack.

Sponsored by:	DARPA, NAI Labs
2002-05-28 05:40:53 +00:00
Alan Cox
a739e09c6e o Remove some unnecessary casting from and add some necessary casting to
aio_suspend() and lio_listio().

Submitted by:	bde
2002-05-25 18:39:42 +00:00
Dag-Erling Smørgrav
4b4c18f861 ANSIfy (significant portions were already partly ANSIfied) 2002-05-25 15:52:53 +00:00
Dag-Erling Smørgrav
b7457aabf6 Remove register. 2002-05-25 15:44:38 +00:00
Dag-Erling Smørgrav
dedf14f521 Automated whitespace cleanup. 2002-05-25 15:43:06 +00:00
Jake Burkholder
d2ac231616 Make the run queue parameters machine dependent. Optimize 64 bit
architectures by using a 64 bit word for the bit array which keeps
track of non-empty queues.

Reviewed by:	peter
2002-05-25 01:12:23 +00:00
Peter Wemm
34e3110c70 Fix warnings. Also, removed an unused variable that I found that was just
initialized and never used afterwards.
2002-05-24 06:06:18 +00:00
Maxime Henrion
2274ec995c Style nit, no functional changes. 2002-05-23 23:22:22 +00:00
Maxime Henrion
9ee6bf717f Slightly change the way we pass mount options to the filesystem
VFS_NMOUNT operations.

Reviewed by:	phk
2002-05-23 23:02:19 +00:00
Hajimu UMEMOTO
4b562eede1 In m_aux_delete, no need to chase beyond victim.
Submitted by:	archie
Obtained from:	KAME
2002-05-23 15:59:48 +00:00
John Baldwin
cc5d39f81e Minor nit: get p pointer in msleep() from td->td_proc (where
td == curthread) rather than from curproc.
2002-05-23 04:14:18 +00:00
John Baldwin
a79c98fa98 Whitespace: trim a trailing tab. 2002-05-23 04:12:28 +00:00
Dag-Erling Smørgrav
db586c8b7c Make the counters uintmax_ts, and use %ju rather than %llu. 2002-05-23 03:08:42 +00:00
John Baldwin
6b8c698908 Rename pause() to ia32_pause() so it doesn't conflict with the pause()
function defined in <unistd.h>.  I didn't #ifdef _KERNEL it because the
mutex implementation in libpthread will probably need this.
2002-05-22 20:32:39 +00:00
John Baldwin
0228ea4e0b Rename cpu_pause() to pause(). Originally I was going to make this an
MI API with empty cpu_pause() functions on other arch's, but this
functionality is definitely unique to IA-32, so I decided to leave it
as i386-only and wrap it in #ifdef's.  I should have dropped the cpu_
prefix when I made that decision.

Requested by:	bde
2002-05-22 13:19:22 +00:00
John Baldwin
703fc290fb Add appropriate IA32 "pause" instructions to improve performanec on
Pentium 4's and newer IA32 processors.  The "pause" instruction has been
verified by Intel to be a NOP on all currently existing IA32 processors
prior to the Pentium 4.
2002-05-21 22:26:35 +00:00
Andrew R. Reiter
ec41816009 - td will never be NULL, so the call to soalloc() in socreate() will always
be passed a 1; we can, however, use M_NOWAIT to indicate this.
- Check so against NULL since it's a pointer to a structure.
2002-05-21 21:30:44 +00:00
John Baldwin
0e54ddadd9 Fix an old cut 'n' paste bug inherited from BSD/OS: don't increment 'i'
twice once we are in the long wait stage of spinning on a spin mutex.
2002-05-21 21:27:05 +00:00
Andrew R. Reiter
1515cd22e1 - OR the flag variable with M_ZERO so that the uma_zalloc() handles the
zero'ing out of the allocated memory.  Also removed the logical bzero
  that followed.
2002-05-21 21:18:41 +00:00
John Baldwin
e6302957fe Whitespace fixup, properly indent the body of an else clause. 2002-05-21 21:13:27 +00:00
John Baldwin
2498cf8c42 Add code to make default mutexes adaptive if the ADAPTIVE_MUTEXES kernel
option is used (not on by default).

- In the case of trying to lock a mutex, if the MTX_CONTESTED flag is set,
  then we can safely read the thread pointer from the mtx_lock member while
  holding sched_lock.  We then examine the thread to see if it is currently
  executing on another CPU.  If it is, then we keep looping instead of
  blocking.
- In the case of trying to unlock a mutex, it is now possible for a mutex
  to have MTX_CONTESTED set in mtx_lock but to not have any threads
  actually blocked on it, so we need to handle that case.  In that case,
  we just release the lock as if MTX_CONTESTED was not set and return.
- We do not adaptively spin on Giant as Giant is held for long times and
  it slows SMP systems down to a crawl (it was taking several minutes,
  like 5-10 or so for my test alpha and sparc64 SMP boxes to boot up when
  they adaptively spinned on Giant).
- We only compile in the code to do this for SMP kernels, it doesn't make
  sense for UP kernels.

Tested on:	i386, alpha, sparc64
2002-05-21 20:47:11 +00:00
John Baldwin
e8fdcfb57a Optimize spin mutexes for UP kernels without debugging to just enter and
exit critical sections.  We only contest on a spin mutex on an SMP kernel
running on an SMP machine.
2002-05-21 20:34:28 +00:00
John Baldwin
525c135972 In witness_unlock(), when updating a lock list entry bucket, decrement the
count of lock list entries after we fixup the bucket of lock list entries.
In theory we can remove the intr_disable/intr_restore() calls now.
2002-05-20 19:16:22 +00:00
Jake Burkholder
45eefe7176 Add a bandaid so that sysctl kern.malloc works on sparc64. 2002-05-20 18:29:37 +00:00
John Baldwin
bbd296aba6 - Allow witness_sleep() to be called when witness hasn't been initialized
yet.  We just return without performing any checks.
- Don't explicitly enter and exit critical sections when walking lock
  lists.  We don't need a critical section to walk the list of sleep
  locks for a thread.  We check to see if a spin lock list is empty
  before we walk it.  If the list is empty we don't need to walk it.  If
  it isn't then we already hold at least one spin lock and are already in
  a critical section and thus don't need our own explicit critical
  section.
2002-05-20 17:49:46 +00:00
John Baldwin
42e498655d Fix the td_intr_nesting_level check to work ok if a flag like M_ZERO is
passed in with M_WAITOK to malloc().
2002-05-20 17:46:57 +00:00
Mike Silbersack
184fec1a09 Subtle fix to the accept filter LRU code. In some cases, a newly
initialized socket with no qlimit was being passed in.  In order
to handle this case properly, we must not use >= when comparing
queue sizes to qlimit.  As a result of this improper handling,
a panic could result in certain cases.

PR:		38325
MFC after:	3 days
2002-05-20 17:34:31 +00:00
Maxime Henrion
e9e705b0df Change two vput() that should have been vrele().
Submitted by:	iedowse
2002-05-20 14:59:43 +00:00
Seigo Tanimura
243917fe3b Lock down a socket, milestone 1.
o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
  socket buffer. The mutex in the receive buffer also protects the data
  in struct socket.

o Determine the lock strategy for each members in struct socket.

o Lock down the following members:

  - so_count
  - so_options
  - so_linger
  - so_state

o Remove *_locked() socket APIs.  Make the following socket APIs
  touching the members above now require a locked socket:

 - sodisconnect()
 - soisconnected()
 - soisconnecting()
 - soisdisconnected()
 - soisdisconnecting()
 - sofree()
 - soref()
 - sorele()
 - sorwakeup()
 - sotryfree()
 - sowakeup()
 - sowwakeup()

Reviewed by:	alfred
2002-05-20 05:41:09 +00:00
Marcel Moolenaar
a9b4acea06 All signals can be sent to the inferior process when it's restarted,
not just the legacy ones.

PR: 33299
Submitted by: Alexander N. Kabaev <ak03@gte.com>
2002-05-19 01:37:43 +00:00
John Baldwin
f44d9e24fb Change p_can{debug,see,sched,signal}()'s first argument to be a thread
pointer instead of a proc pointer and require the process pointed to
by the second argument to be locked.  We now use the thread ucred reference
for the credential checks in p_can*() as a result.  p_canfoo() should now
no longer need Giant.
2002-05-19 00:14:50 +00:00
John Baldwin
bdc9a8d01b Now that daddr_t has grown up, use %lld to printf it and cast it to long
long.
2002-05-18 23:46:04 +00:00
Poul-Henning Kamp
e96d018d92 Use btodb() macro.
Sponsored by: DARPA & NAI Labs.
2002-05-18 09:34:09 +00:00
Eric Melville
096a727e41 Separate "seperate" from kernel source. 2002-05-16 22:43:20 +00:00
Tom Rhodes
d394511de3 More s/file system/filesystem/g 2002-05-16 21:28:32 +00:00
Maxime Henrion
34e53231d0 o Fix vfs_copyopt(), the first argument to bcopy() is the source,
not the destination.
o Remove some code from vfs_getopt() which was making the interface
  more complicated to use for a very slight gain.
2002-05-16 17:09:41 +00:00
Robert Watson
661016419c p_cansignal() returns an errno value; at some point, the check for
inter-process signalling ceased to preserve and return that value,
instead always returning EPERM.  This meant that it was possible
to "probe" the pid space for processes that were not otherwise
visible.  This change reverts that reversion.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-05-14 23:07:15 +00:00
Jeff Roberson
0e2d6cc899 Disable the shared locking namei() code for now. It breaks several stacking
filesystems.  This is on hold until the rest of VFS Locking is reviewed and
deemed safe.  It can be enabled with 'options LOOKUP_SHARED'.
2002-05-14 21:59:49 +00:00
Dag-Erling Smørgrav
733c328439 Remove a printf(3) argument with no corresponding format specifier. 2002-05-14 18:28:06 +00:00
Poul-Henning Kamp
98b0c78978 Make daddr_t and u_daddr_t 64bits wide.
Retire daddr64_t and use daddr_t instead.

Sponsored by:	DARPA & NAI Labs.
2002-05-14 11:09:43 +00:00
Poul-Henning Kamp
77068a7fe2 Retire the bogus uses of the disklabel field d_sbsize and begin to
initialize it to zero so we don't have to have everbody and their
aunt including FFS specific header files.

Sponsored by: DARPA & NAI Labs.
2002-05-12 20:49:41 +00:00
Marcel Moolenaar
882c6b1e5a Fix alpha build. The alpha has dumpsys implemented.
While here, revert the condition to list the machines
for which dumpsys has not been implemented.

Reported by: wilko
2002-05-12 18:27:28 +00:00
Mike Silbersack
a9caffba47 Change the mbuf exhaustion warning message to match the message
in -stable.
2002-05-09 20:21:07 +00:00
Jonathan Mini
d8f4f6a404 Remove trace_req().
Reviewed by:	alfred, jhb, peter
2002-05-09 04:13:41 +00:00
Alan Cox
82641acd17 o Correct an error made in revision 1.65: In readv(), if uap->iovcnt is
out-of-range, drop the file reference before returning.  (This error
   also exists in the RELENG_4 branch.)
 o Eliminate the acquisition and release of Giant in readv()
   now that malloc() and free() are callable without Giant.
2002-05-09 02:30:41 +00:00
Alfred Perlstein
8b43b53530 expand_name fixes:
.) don't use MAXPATHLEN + 1, fix logic to compensate.
.) style(9) function parameters.
.) fix line wrapping.
.) remove duplicated error and string handling code.
.) don't NUL terminate already NUL terminated string.
.) all string length variables changed from int to size_t.
.) constify variables.
.) catch when corename would be truncated.
.) cast pid_t and uid_t args for format string.
.) add parens around return arguments.

Help and suggestions from: bde
2002-05-08 09:06:47 +00:00
Jake Burkholder
0cce52f8eb Remove runq_findproc. This never worked right in the first place and can
be prohibitively expensive.
2002-05-08 04:39:49 +00:00
Alfred Perlstein
b2bc3101a8 M_ZERO the temp buffer in expand_name() otherwise if an error occurs
while logging we may pass a non NUL terminated string to log(9) for a
%s format arg.
2002-05-07 23:37:07 +00:00
Peter Wemm
0d93809e04 Re-remove kern_random.c and svr4_signal.c. Somehow dillon managed to keep
on committing to these while they were in the Attic after they had been
removed.  I think this was because he had the file checked out and already
'modified' while markm cvs rm'ed them, and cvs screws up when trying to
"merge" the modifications with the "rm".  And after that the client
state was sufficiently hosed to keep it messed up.  Yay CVS!  (CVS is
very fragile for adding and removing files remotely)

The existence of these files was pointed out by: ru
2002-05-07 21:54:47 +00:00
Seigo Tanimura
9d0fc9636e Do not forget to increase the number of completely connected sockets in
soisconnected_locked().

Forgotten by:	tanimura
2002-05-07 16:17:44 +00:00
Jeff Roberson
f0d73b3e5f Switch from just holding the interlock to holding the standard lock throughout
getnewvnode().  This is safer.  In the future, we should investigate requiring
only the interlock to get the vnode object.
2002-05-07 02:44:06 +00:00
Alfred Perlstein
e649887b1e Make funsetown() take a 'struct sigio **' so that the locking can
be done internally.

Ensure that no one can fsetown() to a dying process/pgrp.  We need
to check the process for P_WEXIT to see if it's exiting.  Process
groups are already safe because there is no such thing as a pgrp
zombie, therefore the proctree lock completely protects the pgrp
from having sigio structures associated with it after it runs
funsetownlst.

Add sigio lock to witness list under proctree and allproc, but over
proc and pgrp.

Seigo Tanimura helped with this.
2002-05-06 19:31:28 +00:00
John Baldwin
e746d950ab When checking to see if the init process calls exit1(), compare p to the
initproc proc pointer instead of checking to see if the pid is 1.

Submitted by:	bde
2002-05-06 17:07:10 +00:00
John Baldwin
276c516984 Style fixes in local variable declarations.
Submitted by:	bde
2002-05-06 17:04:29 +00:00
John Baldwin
7a6b989bfa - Style fixes in some comments.
- Whitespace nit.
- Sort some includes.

Submitted by:	bde (mostly)
2002-05-06 15:46:29 +00:00
Jeff Roberson
6953f5da1a Hold the currently selected vnode's lock across the call to VOP_GETVOBJECT.
Don't try to create a vm object before the file system has a chance to finish
initializing it.  This is incorrect for a number of reasons.  Firstly, that
VOP requires a lock which the file system may not have initialized yet. Also,
open and others will create a vm object if it is necessary later.
2002-05-06 04:47:43 +00:00
Maxime Henrion
9d997d8be8 Add the lchflags(2) syscall.
Reviewed by:	rwatson
2002-05-05 23:47:41 +00:00
Maxime Henrion
8d9b781fb5 Add an entry for the lchflags(2) syscall. It's useful to prevent
a symlink deletion.

Reviewed by:	rwatson
2002-05-05 23:37:44 +00:00
Jeff Roberson
576365ba36 Move a KASSERT() in open() prior to unlocking the vnode. It's not safe to
call VOP_GETVOBJECT without a lock.
2002-05-05 23:17:13 +00:00
Alan Cox
c50fe92b8d o Condition the compilation of uiomoveco() and vm_uiomove()
on ENABLE_VFS_IOOPT.
 o Add a comment to the effect that this code is experimental
   support for zero-copy I/O.
2002-05-05 22:42:40 +00:00
Poul-Henning Kamp
81e017430a Expand the one-line function pbreassignbuf() the only place it is or could
be used.
2002-05-05 20:37:08 +00:00
Bruce Evans
f5216b9a19 Return the correct error code (ENOSYS, not EINVAL) from nosys(). Getting
killed by SIGSYS for unimlemented syscalls is bad enough.

Obtained from:	Lite2 branch

The Lite2 branch has some other interesting unmerged (?) bits in this
file.  They are well hidden among cosmetic regressions.
2002-05-05 04:50:47 +00:00
Bruce Evans
a9a0f15a69 Fixed breakage of binary compatibility of the kern.clockrate sysctl in
sys/time.h rev.1.53, etc.  Zero out the entire struct clkinfo and not
just the new spare part of it so that there is no possibility of leaking
kernel stack context to userland.
2002-05-05 04:33:09 +00:00
Maxime Henrion
afd458b0fa Fix a typo.
Submitted by:	dwmalone
2002-05-04 19:50:09 +00:00
Poul-Henning Kamp
e31c615c60 Remove a six year old undocumented #ifdef : NO_B_MALLOC. 2002-05-04 19:24:55 +00:00
Matthew Dillon
9f9435545b Remove obsolete code (that was already #if 0'd out).
Requested by: Hiten Pandya <hitmaster2k@yahoo.com>
2002-05-04 17:10:15 +00:00
Alfred Perlstein
698f85d3e3 style(9): 'if' and 'while' need a space after them. 2002-05-04 07:40:49 +00:00
Poul-Henning Kamp
48e5da550a Initialize time_second to 1 instead of zero to pacify slightly bogus arp code.
Various minor style fixes from BDE.
2002-05-03 08:46:03 +00:00
Seigo Tanimura
6041fa0a60 As malloc(9) and free(9) are now Giant-free, remove the Giant lock
across malloc(9) and free(9) of a pgrp or a session.
2002-05-03 07:46:59 +00:00
Seigo Tanimura
c8d8a686e4 Fix the lock order reversal between the sigio lock and a process/pgrp lock in
funsetownlst() by locking the sigio lock across funsetownlst().
2002-05-03 05:32:25 +00:00
Peter Wemm
85f79d52e9 Retire makeobjops.pl - replaced by ../tools/makeobjops.awk. 2002-05-02 22:21:59 +00:00
Poul-Henning Kamp
0b5d880d39 As promised make the hack for sizeof(struct disklabel) on alpha annoying.
Run make world (or recompile whatever program whines) to get rid of warning.

Compat bits will be removed entirely in about two weeks.
2002-05-02 21:53:39 +00:00
Maxime Henrion
6dbde1fe23 Convert devfs to nmount.
Reviewed by:	phk
2002-05-02 20:27:42 +00:00
John Baldwin
3fc755c118 - Protect randompid and nprocs with the allproc_lock.
- Reorder fork1() to do malloc() and other blocking operations prior to
  acquiring the needed process locks.
- The new process inherit's the credentials of curthread, not the
  credentials of the old process.
- Document a really weird race that will come up with KSE allows multiple
  kernel threads per process.
2002-05-02 15:13:45 +00:00
John Baldwin
d7aadbf9ce - Reorder a few things so that when we lock the process at the end of
exit1() we don't have to release it until we acquire schd_lock to
  call cpu_throw().
- Since we can switch at any time due to preemption or a lock release
  prior to acquiring sched_lock, don't update switchtime and switchticks
  until the very end of exit1() after we have acquired sched_lock.
- Interlock the proctree_lock and proc lock in wait1() and exit1() to
  avoid lost wakeups when a parent blocks waiting for a child to exit at
  the bottom of wait1().  In exit1() the proc lock interlocked with
  proctree_lock (and released after acquiring sched_lock) is that of
  the parent process.
- In wait1() use an exclusive lock of proctree lock while we are
  looking for a process to harvest.  This allows us to completely
  remove all references to the process once we've found one (i.e.,
  disconnect it from pgrp's, session's, zombproc list, and it's parent's
  children list) "atomically" without needing to worry about a lock
  upgrade.
- We don't need sched_lock to test if p_stat is SZOMB or SSTOP when holding
  the proc lock since the proc lock is always held with p_stat is set to
  SZOMB or SSTOP.
- Protect nprocs with an xlock of the allproc_lock.
2002-05-02 15:09:58 +00:00
John Baldwin
9b3b1c5fdf - Reorder execve() so that it performs blocking operations before it
locks the process.
- Defer other blocking operations such as vrele()'s until after we
  release locks.
- execsigs() now requires the proc lock to be held when it is called
  rather than locking the process internally.
2002-05-02 15:00:14 +00:00
Jeff Roberson
8f70816cf2 Hide a pointer to the malloc_type bucket at the end of the freed memory. If
this memory is modified after it has been freed we can now report it's
previous owner.
2002-05-02 09:07:04 +00:00
Jeff Roberson
5a34a9f089 malloc/free(9) no longer require Giant. Use the malloc_mtx to protect the
mallochash.  Mallochash is going to go away as soon as I introduce the
kfree/kmalloc api and partially overhaul the malloc wrapper.  This can't happen
until all users of the malloc api that expect memory to be aligned on the size
of the allocation are fixed.
2002-05-02 07:22:19 +00:00
Jeff Roberson
639c9550fb Remove the temporary alignment check in free().
Implement the following checks on freed memory in the bucket path:
	- Slab membership
	- Alignment
	- Duplicate free

This previously was only done if we skipped the buckets.  This code will slow
down INVARIANTS a bit, but it is smp safe.  The checks were moved out of the
normal path and into hooks supplied in uma_dbg.
2002-05-02 02:08:48 +00:00
Alfred Perlstein
f132072368 Redo the sigio locking.
Turn the sigio sx into a mutex.

Sigio lock is really only needed to protect interrupts from dereferencing
the sigio pointer in an object when the sigio itself is being destroyed.

In order to do this in the most unintrusive manner change pgsigio's
sigio * argument into a **, that way we can lock internally to the
function.
2002-05-01 20:44:46 +00:00
Peter Wemm
6692ac6644 Cosmetic tweaks. Try and keep the style more consistent, catch some stray
whitespace and update a comment.
2002-05-01 02:51:50 +00:00
Peter Wemm
aed0556447 kern_tc.c doesn't use <machine/psl.h>, and having this #include breaks
other platforms.
2002-05-01 01:31:26 +00:00
David E. O'Brien
2244cf6bba Remove this Perl script. There have been zero bug reports against
vnode_if.awk.
2002-05-01 00:40:44 +00:00
Jeff Roberson
289f207c81 Convert longs to u_longs in stats. This will hold off wrap arounds for a
while longer.
2002-04-30 22:39:32 +00:00
Alan Cox
ea0f50bcf0 o Convert the vm_page buckets mutex to a spin lock. (This resolves
an issue on the Alpha platform found by jeff@.)
 o Simplify vm_page_lookup().

Reviewed by:	jhb
2002-04-30 21:24:47 +00:00
Poul-Henning Kamp
39acc78a1e Brucifixion ? Yes, out that door, row on the left, one patch each.
Many thanks to:	bde
2002-04-30 20:42:06 +00:00
Matthew Dillon
e6728403d4 These are Alexander Kabaev's VFSops fixes (see the thread 'Found: module
loading breakage').  The patch fixes serious issues with the VFS
operations vector array which results in a crash when a filesystem module
adding a new VOP is loaded into the kernel.  Basically what was happening
before was that the old operations vector was being freed and a new one
allocated.  The original MALLOC code tended to reuse the same address
for the case and so the bug did not rear its ugly head until the new memory
subsystem was emplaced.

This patch replaces the temporary workaround Dave O'Brien comitted in 1.58.

The patch is clean enough that I intend to MFC it to stable at some point.

Submitted by:	Alexander Kabaev <ak03@gte.com>
MFC after:	1 week
2002-04-30 18:44:32 +00:00
Jeff Roberson
8efc4eff00 Add a new UMA debugging facility. This will overwrite freed memory with
0xdeadc0de and then check for it just before memory is handed off as part
of a new request.  This will catch any post free/pre alloc modification of
memory, as well as introduce errors for anything that tries to dereference
it as a pointer.

This code takes the form of special init, fini, ctor and dtor routines that
are specificly used by malloc.  It is in a seperate file because additional
debugging aids will want to live here as well.
2002-04-30 07:54:25 +00:00
Jeff Roberson
2cc35ff9c6 Move the implementation of M_ZERO into UMA so that it can be passed to
uma_zalloc and friends.  Remove this functionality from the malloc wrapper.

Document this change in uma.h and adjust variable names in uma_core.
2002-04-30 04:26:34 +00:00
Seigo Tanimura
960ed29c4b Revert the change of #includes in sys/filedesc.h and sys/socketvar.h.
Requested by:	bde

Since locking sigio_lock is usually followed by calling pgsigio(),
move the declaration of sigio_lock and the definitions of SIGIO_*() to
sys/signalvar.h.

While I am here, sort include files alphabetically, where possible.
2002-04-30 01:54:54 +00:00
Robert Watson
43a7c4e919 Re-add the 16384 bucket also.
Submitted by:	green
2002-04-29 17:53:23 +00:00
Robert Watson
bd796eb25f Revert a portion of kern_malloc.c:1.99, which (in addition to adding
malloc profiling) also modified the set of pre-defined buckets for the
memory allocator.  For reasons unknown to me, this resulted in extensive
memory corruption in the kernel, in particular on SMP boxes, so I'm
committing this work-around until Jeff gets a chance to debug it
properly.  David Wolfskill pointed me at this commit as the one that
might be a problem; I've been running this code on two dual-processor
burn-in boxes for about 12 hours now, and the rate of panics due to
memory corruption has dropped to zero (from one every five minutes).

Hopefully not treading on the toes of:	jeff
2002-04-29 17:12:02 +00:00
David Malone
dbe620d321 Add a sysctl which disables the logging of console output.
Approved by:	phk
MFC after:	2 weeks
2002-04-29 09:15:38 +00:00
Jeroen Ruigrok van der Werven
1cf1a725ff Fix indention which I did wrong in a previous commit.
Submitted by:	bde
2002-04-29 08:18:06 +00:00
Poul-Henning Kamp
6b00cf46ec Stylistic sweep through the timecounter code.
Renovate comments.
2002-04-28 18:24:21 +00:00
Poul-Henning Kamp
d25917e856 Don't screw up our uptime with historical dates. 2002-04-28 16:51:36 +00:00
Ian Dowse
ba1551ca81 Avoid the user-visible effect of setting SA_NOCLDWAIT when the
SIGCHLD handler is SIG_IGN. This is a reimplementation of the
problematic revision 1.131 of kern_exit.c. To avoid accessing process
UPAGES, we set a new procsig flag when the SIGCHLD handler is SIG_IGN
and use that instead.
2002-04-27 22:41:41 +00:00
Peter Wemm
4f033348f4 Finish fixing hints. Remember the use_kenv state for the next run.
Otherwise we fall back to using the static hints the next time around.
We still have the leftover fallback code there which meant that we skipped
the use_hints checking on the second and subsequent calls.  Also, be a bit
more careful about walking off the end of the envp array.

I've extracted this from a larger diff.  I hope I didn't miss anything...
2002-04-27 22:32:57 +00:00
Peter Wemm
fc1218bb71 Partial fix for hints
Obtained from:  mux
2002-04-27 22:25:13 +00:00
Ian Dowse
3eee035c5b Remove a stale comment saying that the vnode lock must be the first
element in the structure pointed to by vp->v_data; the vnode lock
is now within the vnode structure itself.
2002-04-27 22:20:33 +00:00
Seigo Tanimura
acbbcc5f1d Fix the code fragment clobbered in my last commit. 2002-04-27 09:33:49 +00:00
Seigo Tanimura
d48d4b2501 Add a global sx sigio_lock to protect the pointer to the sigio object
of a socket.  This avoids lock order reversal caused by locking a
process in pgsigio().

sowakeup() and the callers of it (sowwakeup, soisconnected, etc.) now
require sigio_lock to be locked.  Provide sowwakeup_locked(),
soisconnected_locked(), and so on in case where we have to modify a
socket and wake up a process atomically.
2002-04-27 08:24:29 +00:00
Poul-Henning Kamp
f5d157fb51 Explain magic number.
Add magic date no explanation.

Add a delta which was lost in transit yesterday which prevented
other timecounters from actually being used.
2002-04-27 07:28:54 +00:00
Poul-Henning Kamp
f175569ac2 Make the dummy timecounter actually tick or we will never get anyhere. 2002-04-27 07:06:52 +00:00
John Baldwin
e64b74e35b Whitespace bogon. 2002-04-27 04:48:36 +00:00
Marcel Moolenaar
9ae9d0ff86 Insert a semi-colon between label 'skip:' and the closing brace
of the FOREACH loop to silence GCC 3.
2002-04-27 02:58:18 +00:00
Mike Barcroft
a30d4b3270 Move the new byte order function prototypes from <sys/param.h> to
<sys/endian.h>.  This puts us in line with NetBSD and OpenBSD.
2002-04-26 22:48:23 +00:00
Poul-Henning Kamp
62efba6a0c Now that the private parts of timecounters are no longer being fingered
by other bits of code, split struct timecounter into two.

struct timecounter contains just the bits which pertains to the hardware
counter and the reading of it.

struct timehands (as in "the hands on a clock") contains all the ugly bit
fidling stuff.  Statically compile ten timehands.

This commit is the functional part.  A later cosmetic patch will rename
various variables and fieldnames.
2002-04-26 21:51:08 +00:00
Poul-Henning Kamp
b4a1d0deb1 Hide the private parts of timecounter from a couple of places that don't
really need to know the gory details.
2002-04-26 21:31:44 +00:00
Poul-Henning Kamp
7bf758bff0 Simplify the RFC2783 and PPS_SYNC timestamp collection API. 2002-04-26 20:24:28 +00:00
Poul-Henning Kamp
9e1b5510c3 Move the winding of timecounters out of hardclock and into a normal
timeout loop.

Limit the rate at which we wind the timecounters to approx 1000 Hz.

This limits the precision of the get{bin,nano,micro}[up]time(9)
functions to roughly a millisecond.
2002-04-26 12:37:36 +00:00
Poul-Henning Kamp
056abcabb7 Various cleanup and sorting of clock reading functions. Add the two
functions missing in the complete 12 function complement.
2002-04-26 10:19:29 +00:00
Poul-Henning Kamp
656d3e04d1 Rename tco_setscales() and tco_delta() to use the same tc_ prefix as
the rest of this file.
2002-04-26 10:11:02 +00:00
Poul-Henning Kamp
7e2d76ff05 Remove the tc_update() function. Any frequency change to the
timecounter will be used starting at the next second, which is
good enough for sysctl purposes.  If better adjustment is needed
the NTP PLL should be used.
2002-04-26 10:06:26 +00:00
Brian Somers
b94c4e9a93 Test if rootvnode is NULL rather than if rootdev is NODEV when determining
if there's a filesystem present.

rootdev can be NODEV in the NFS-mounted root scenario.

Discussed with: Harti Brandt <brandt@fokus.gmd.de>, iedowse
2002-04-26 09:52:54 +00:00
Mike Silbersack
e1f1827f98 Make sure that sockets undergoing accept filtering are aborted in a
LRU fashion when the listen queue fills up.  Previously, there was
no mechanism to kick out old sockets, leading to an easy DoS of
daemons using accept filtering.

Reviewed by:	alfred
MFC after:	3 days
2002-04-26 02:07:46 +00:00
Dag-Erling Smørgrav
521eb014c8 Add the mutex profiling lock to the witness list. This hopefully unbreaks
the MUTEX_PROFILING + WITNESS + !WITNESS_SKIPSPIN case.

Submitted by:	Hiten Pandya <hiten@uk.FreeBSD.org>
2002-04-25 22:48:40 +00:00
Bruce Evans
2c900f6451 Fixed some longstanding bugs in _getenv_static():
- malformed environment strings (ones without an '=') were not rejected.
  There shouldn't be any of these, but when the static environment is
  empty it always begins with one of these; this one should be considered
  as the terminator after the end of the environment, but it isn't.
- the comparison of the name being looked up with the name in the
  environment was fuzzy -- only the characters up to the length of the
  latter were compared, so _getenv_static("foobar") matched "foo=..."
  in the environment and everything matched "" in the empty environment.

MFC after:	3 days
2002-04-25 20:25:15 +00:00
Bruce Evans
ff557fa1a9 Break the following implementation of panic(3):
#!bin/sh

	# Original version of this by Michael Reifenberger
	# <root@nihil.plaut.de>.

	mdconfig -d -u 11 >/dev/null 2>&1
	dd if=/dev/zero of=zz bs=1m count=1

	while :
	do
		mdconfig -a -t vnode -f zz -u 11
		fdisk -f - -iv /dev/md11 <<EOF1
		g c1 h64 s32
		p 1 165 0 2048
		a 1
	EOF1
		mdconfig -d -u 11
	done

Garbage pointers in __si_u were not cleared by destroy_dev().  Not
clearing si_disk made the above fatal because the disk layer uses
si_disk as a flag to indicate that the dev_t has been completely
initialized.  disk_destroy() clears si_disk for the parent dev_t
but doesn't get called for children.

Not fixed:
- setting the undocumented sysctl debug.free_devt should cause more
  complete destruction of the dev_t including clearing of __si_u, but
  actually causes the above to panic a little earlier.
- the loop leaks 10 memory allocations per iteration (4 DEVFS, 2 devbuf
  and 4 dev_t).

Reviewed by:	timeout by MAINTAINER after 3 months
2002-04-25 13:17:33 +00:00
Marcel Moolenaar
d297ad160e Don't use the symbol name to lookup the symbol value when we can use
the symbol index defined by the relocation. The elf_lookup() support
function is to be used by elf_reloc() when symbol lookups need to be
done. The elf_lookup() function operates on the symbol index and
will do a symbol name based lookup when such is required, otherwise
it uses the symbol index directly. This solves the problem seen on
ia64 where the symbol hash table does not contain local symbols and
a symbol name based lookup would fail for those symbols.

Don't pass the symbol name to elf_reloc(), as it isn't used any more.
2002-04-25 01:22:16 +00:00
Seigo Tanimura
ce00aebe22 Free(9) should be Giant-free.
Suggested by:	jhb
2002-04-24 09:59:18 +00:00
Mike Silbersack
c473d3e406 Remove sodropablereq - this function hasn't been used since the
syncache went in.

MFC after:	3 days
2002-04-24 04:11:08 +00:00
Jeffrey Hsu
4bc37205bc The cold and panicstr variables do not need to be protected by sched_lock.
Submitted by:	Jennifer Yang (yangjihui@yahoo.com)
Reviewed by:	jake & jhb in principle
2002-04-23 19:50:22 +00:00
Poul-Henning Kamp
708da94ef2 Add a basic sanity check on pointers passed to free(9).
Should be improved by:	jeff
2002-04-23 18:50:25 +00:00
Poul-Henning Kamp
00d70dec4e Don't call malloc(9) to allocate zero bytes softc data for devices. 2002-04-23 15:48:23 +00:00
Robert Watson
7a0776e477 Slightly restructure extattr_get_vp() so that there's only one entry point
to VOP_GETEXTATTR().  This simplifies code flow when inserting MAC hooks.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-04-23 01:27:38 +00:00
Alfred Perlstein
ea5b39d029 Don't FILEDESC_LOCK around calls to falloc(). 2002-04-22 20:09:11 +00:00
Dag-Erling Smørgrav
d397408818 Usage style sweep: spell "usage" with a small 'u'.
Also change one case of blatant __progname abuse (several more remain)
This commit does not touch anything in src/{contrib,crypto,gnu}/.
2002-04-22 13:44:47 +00:00
Poul-Henning Kamp
29f88f470e Comment out Kirks io-request priority hack until we can do this in a
civilized way which doesn't cause grief.

The problem is that it is not generally safe to cast a "struct bio
*" to a "struct buf *".  Things like ccd, vinum, ata-raid and GEOM
constructs bio's which are not entrails of a struct buf.

Also, curthread may or may not have anything to do with the I/O request
at hand.

The correct solution can either be to tag struct bio's with a
priority derived from the requesting threads nice and have disksort
act on this field, this wouldn't address the "silly-seek syndrome"
where two equal processes bang the diskheads from one edge to the
other of the disk repeatedly.

Alternatively, and probably better: a sleep should be introduced
either at the time the I/O is requested or at the time it is completed
where we can be sure to sleep in the right thread.

The sleep also needs to be in constant timeunits, 1/hz can be practicaly
any sub-second size, at high HZ the current code practically doesn't
do anything.
2002-04-22 06:53:20 +00:00
Marcel Moolenaar
8420105927 Add function link_elf_get_gp(), specific to ia64 for now, to get
the DT_PLTGOT value. On ia64 this is the value of GP. We need this
to construct function descriptors, but the elf file structure is
not exported to MD code.

Note that the name of the function is based on the meaning that
DT_PLTGOT has on ia64. This may differ on other architectures. As
such, link_elf_get_gp() has a high level of MD to it. Renaming the
function to describe what DT_* value is returned makes it generic,
but also makes the MD code less clear and if we only need this on
ia64, then a general name for a specific function doesn't help.

In short: I don't know what is "right" at this time, so I'll go
with what I have.
2002-04-21 21:08:30 +00:00
Mark Murray
bd41864183 Use protected names (_foo) to cutdown on boatloads of lint warnings. 2002-04-21 11:16:10 +00:00
Marcel Moolenaar
9daa5b147a GCC 3.x WARNS: Add a break to the default case. 2002-04-20 21:56:42 +00:00
Seigo Tanimura
1c2451c24d Push down Giant for setpgid(), setsid() and aio_daemon(). Giant protects only
malloc(9) and free(9).
2002-04-20 12:02:52 +00:00
Robert Watson
0510317039 Improve style consistency of vfs_syscalls.c by converting the style used
in various extattr_*() calls to match the rest of the file.  Originally,
these bits at the end looked more like style(9).  This patch was submitted
by green by way of the TrustedBSD MAC tree, and I fixed a few problems
with it on the way through.  Someone with more time on their hands should
convert the entire file to style(9); this commit is for diff reduction
purposes.

Submitted by:	green
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-04-20 01:37:08 +00:00
Robert Watson
89e9e6e7c5 In sendfile(), use the vn_rdwr() helper function, rather than manually
constructing a struct aio and invoking VOP_READ() directly.  This cleans
up the code a little, but also has the advantage of making sure almost
all vnode read/write access in the kernel goes through the helper
function, meaning that instrumentation of that helper function can impact
almost all relevant read/write operations.  In this case, it permits us
to put MAC hooks into vn_rdwr() and not modify uipc_syscalls.c (yet).

In general, if helper vn_*() functions exist, they should be used in
preference to direct VOP's in system call service code.

Submitted by:	green
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-04-19 13:46:24 +00:00
Robert Watson
5a06cb0ca6 Divorce proc0 and proc1 credentials earlier; while this isn't technically
needed in the current code, in the MAC tree, create_init() relies on the
ability to modify the credentials present for initproc, and should not
perform that modification on a shared credential.  Pro-active diff
reduction against MAC changes that are in the queue; also facilitates
other work, including the capabilities implementation.

Submitted by:	green
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-04-19 13:35:53 +00:00
Poul-Henning Kamp
3bdd2d061a suser is Giant safe, so optimize a pointless case. 2002-04-19 09:20:13 +00:00
SUZUKI Shinsuke
88ff5695c1 just merged cosmetic changes from KAME to ease sync between KAME and FreeBSD.
(based on freebsd4-snap-20020128)

Reviewed by:	ume
MFC after:	1 week
2002-04-19 04:46:24 +00:00
Jacques Vidrine
e983a3762b When exec'ing a set[ug]id program, make sure that the stdio file descriptors
(0, 1, 2) are allocated by opening /dev/null for any which are not already
open.

Reviewed by:	alfred, phk
MFC after:	2 days
2002-04-19 00:45:29 +00:00
Maxime Henrion
b48a4280fc Avoid calling malloc() or free() while holding the
kenv lock.

Reviewed by:	jake
2002-04-17 17:51:10 +00:00
Maxime Henrion
d786139c76 Rework the kernel environment subsystem. We now convert the static
environment needed at boot time to a dynamic subsystem when VM is
up.  The dynamic kernel environment is protected by an sx lock.

This adds some new functions to manipulate the kernel environment :
freeenv(), setenv(), unsetenv() and testenv().  freeenv() has to be
called after every getenv() when you have finished using the string.
testenv() only tests if an environment variable is present, and
doesn't require a freeenv() call. setenv() and unsetenv() are self
explanatory.

The kenv(2) syscall exports these new functionalities to userland,
mainly for kenv(1).

Reviewed by:	peter
2002-04-17 13:06:36 +00:00
Maxime Henrion
fd448168b7 Add an entry for the kenv(2) syscall (code to follow).
Reviewed by: peter
2002-04-17 13:05:13 +00:00
Ian Dowse
df99ca52f1 The recent NFS forced unmount improvements introduced a side-effect
where some client operations might be unexpectedly cancelled during
an unsuccessful non-forced unmount attempt. This causes problems
for amd(8), because it periodically attempts a non-forced unmount
to check if the filesystem is still in use.

Fix this by adding a new mountpoint flag MNTK_UNMOUNTF that is set
only during the operation of a forced unmount. Use this instead of
MNTK_UNMOUNT to trigger the cancellation of hung NFS operations.

Also correct a problem where dounmount() might inadvertently clear
the MNTK_UNMOUNT flag.

Reported by:	simokawa
MFC after:	1 week
2002-04-17 01:07:29 +00:00
John Baldwin
ba626c1db2 Lock proctree_lock instead of pgrpsess_lock. 2002-04-16 17:11:34 +00:00
John Baldwin
596325f154 - Lock proctree_lock instead of pgrpsess_lock.
- Use temporary variables to hold a pointer to a pgrp while we dink with it
  while not holding either the associated proc lock or proctree_lock.  It
  is in theory possible that p->p_pgrp could change out from under us.
2002-04-16 17:09:22 +00:00
John Baldwin
c8b1829d8e - Lock proctree_lock instead of pgrpsess_lock.
- Simplify return logic of setsid() and setpgid().
2002-04-16 17:06:11 +00:00
John Baldwin
ea97757a54 - Lock proctree_lock instead of pgrpsess_lock.
- Exclusively lock proctree_lock while calling leavepgrp().
2002-04-16 17:04:21 +00:00
John Baldwin
f089b57070 - Merge the pgrpsess_lock and proctree_lock sx locks into one proctree_lock
sx lock.  Trying to get the lock order between these locks was getting
  too complicated as the locking in wait1() was being fixed.
- leavepgrp() now requires an exclusive lock of proctree_lock to be held
  when it is called.
- fixjobc() no longer gets a shared lock of proctree_lock now that it
  requires an xlock be held by the caller.
- Locking notes in sys/proc.h are adjusted to note that everything that
  used to be protected by the pgrpsess_lock is now protected by the
  proctree_lock.
2002-04-16 17:03:05 +00:00
Poul-Henning Kamp
fe4dc7a6ee Remove two debug printfs which should never have been committed. 2002-04-15 21:08:51 +00:00
John Baldwin
38e0823392 You have to cast int64_t's to long long if you printf them with %lld.
This now compiles on alpha without a warning.

Pointy-hat to:	phk
2002-04-15 21:04:32 +00:00
Poul-Henning Kamp
e1d970f181 Improve the implementation of adjtime(2).
Apply the change as a continuous slew rather than as a series of
discrete steps and make it possible to adjust arbitraryly huge
amounts of time in either direction.

In practice this is done by hooking into the same once-per-second
loop as the NTP PLL and setting a suitable frequency offset deducting
the amount slewed from the remainder.  If the remaining delta is
larger than 1 second we slew at 5000PPM (5msec/sec), for a delta
less than a second we slew at 500PPM (500usec/sec) and for the last
one second period we will slew at whatever rate (less than 500PPM)
it takes to eliminate the delta entirely.

The old implementation stepped the clock a number of microseconds
every HZ to acheive the same effect, using the same rates of change.

Eliminate the global variables tickadj, tickdelta and timedelta and
their various use and initializations.

This removes the most significant obstacle to running timecounter and
NTP housekeeping from a timeout rather than hardclock.
2002-04-15 12:23:11 +00:00
Poul-Henning Kamp
b35c8f287d Take the "tickadj" element out of struct clockinfo. Our adjtime(2)
implementation is being changed and the very concept of tickadj will
no longer be meaningful.
2002-04-15 12:11:06 +00:00
Poul-Henning Kamp
b9c6e8bdbd In the ntp_adjtime(2) syscall, return our actual estimate of unapplied
offset correction instead of the most recent offset applied.
2002-04-15 08:58:24 +00:00
Jeff Roberson
5e914b96b9 Finish adding support code for sysctl kern.mprof. This dumps some malloc
information related to bucket size effeciency.  Three things are printed on
each row:

Size is the size the user actually asked for rounded to 16 bytes.
Requests is the number of times this size was asked for.
Real Size is the size we actually handed out.

At the end the total memory used and total waste is displayed.  Currently my
system displays about 33% wasted memory.

The intent of this code is to gather statistics for tuning the malloc bucket
sizes.  It is not intended to be run with INVARIANTS and it is not entirely
mp safe.  It can be enabled via 'options MALLOC_PROFILE' which was commited
earlier.
2002-04-15 05:24:01 +00:00
Jeff Roberson
6f2671750e Remove malloc_type's ks_limit.
Updated the kmemzones logic such that the ks_size bitmap can be used as an
index into it to report the size of the zone used.

Create the kern.malloc sysctl which replaces the kvm mechanism to report
similar data.  This will provide an easy place for statistics aggregation if
malloc_type statistics become per cpu data.

Add some code ifdef'd under MALLOC_PROFILING to facilitate a tool for sizing
the malloc buckets.
2002-04-15 04:05:53 +00:00
Alfred Perlstein
46e12b42fe Don't allow one to trace an ancestor when already traced.
PR: kern/29741
Submitted by: Dave Zarzycki <zarzycki@FreeBSD.org>
Fix from: Tim J. Robbins <tim@robbins.dropbear.id.au>
MFC After: 2 weeks
2002-04-14 17:12:55 +00:00
Jeff Roberson
79a3e97054 Use VOP_GETVOBJECT instead of accessing the member directly. This fixed
an issue with nullfs and NAMEI shared.

Submitted by:	Alexander Kabaev
2002-04-14 10:18:48 +00:00
Alan Cox
24ab015f79 Regen 2002-04-14 05:33:58 +00:00
Alan Cox
b0d97980f6 Remove the requirement that Giant be held around sigreturn(). 2002-04-14 05:31:47 +00:00
Alan Cox
00e731601d o Use aiocblist::fd_file in the AIO threads rather than recomputing
the file * from the calling process's descriptor table.
 o Eliminate sharing of the calling process's descriptor table
   with the AIO threads.
2002-04-14 03:04:19 +00:00
John Baldwin
9c1ab3e04a - Change killpg1()'s first argument to be a thread instead of a process so
we can use td_ucred.
- In killpg1(), the proc lock is sufficient to check if p_stat is SZOMB
  or not.  We don't need sched_lock.
- Close some races in psignal().  In psignal() there is a big switch
  statement based on p_stat.  All the different cases are assuming that
  the process (or thread) isn't going to change state out from under it.
  To ensure this is true, just lock sched_lock for the entire switch.  We
  practically held it the entire time already anyways.  This also
  simplifies the locking somewhat and actually results in fewer lock
  operations.
- Allow signotify() to be called with the sched_lock held since psignal()
  now does that.
- Use td_ucred in a couple of places.
2002-04-13 23:33:36 +00:00
John Baldwin
bad56603ba - Change donice() to take a thread as the first argument instead of a
process so it can use td_ucred.
- Require the target process of donice() to be locked when donice() is
  called.
- Use td_ucred.
- Lock the target process of p_cansee() and while reading the credentials
  of a process.
- Change the logic of rtprio() slightly so it does it's copyin() if needed
  prior to locking the target process.
- rtprio() no longer needs Giant.  In theory with full KSE it would still
  need Giant to protect p_ucred of curproc for the p_canfoo() functions
  but p_canfoo() will be changing to using td_ucred of curthread before
  full KSE hits the tree.
2002-04-13 23:28:23 +00:00
John Baldwin
07f3485d5e - Change the algorithms of the syscalls to modify process credentials to
allocate a blank cred first, lock the process, perform checks on the
  old process credential, copy the old process credential into the new
  blank credential, modify the new credential, update the process
  credential pointer, unlock the process, and cleanup rather than trying
  to allocate a new credential after performing the checks on the old
  credential.
- Cleanup _setugid() a little bit.
- setlogin() doesn't need Giant thanks to pgrp/session locking and
  td_ucred.
2002-04-13 23:07:05 +00:00
John Baldwin
a7ff744350 - Change the first argument of ktrcanset(), ktrsetchildren(), and ktrops()
to a thread pointer so that ktrcanset() can use td_ucred.
- Add some proc locking to partially protect p_tracep and p_traceflag.
2002-04-13 22:54:18 +00:00
Thomas Moestl
8db523989f Use pmap_extract() instead of pmap_kextract() to retrieve the physical
address associated with a user virtual address in
pipe_build_write_buffer().

Reviewed by:	alc
2002-04-13 20:09:06 +00:00
Jeroen Ruigrok van der Werven
bcbf4411d6 Use the correct macros for F_SETFD/F_GETFD instead of magic numbers.
Reflect that fact in the manual page.

PR:		12723
Submitted by:	Peter Jeremy <peter.jeremy@alcatel.com.au>
Approved by:	bde
MFC after:	2 weeks
2002-04-13 10:16:53 +00:00
Thomas Moestl
de67a4bd91 Back out the last revision - it does not work correctly when one of
the pages in question is not in the top-level vm object, but in
one of the shadow ones.

Pointed out by: alc
Pointy hat to:	tmm
2002-04-13 00:03:07 +00:00