Commit Graph

3195 Commits

Author SHA1 Message Date
Alfred Perlstein
7f05b0353a More caddr_t removal, make fo_ioctl take a void * instead of a caddr_t. 2002-06-29 01:50:25 +00:00
Alfred Perlstein
69a3693f3e catch up with mextadd callback taking a void argument instead of a caddr_t. 2002-06-29 01:49:22 +00:00
Alfred Perlstein
802082390b More caddr_t removal.
Change struct knote's kn_hook from caddr_t to void *.
2002-06-29 00:29:12 +00:00
Alfred Perlstein
64f0b9d749 remove or replace caddr_t with void.
make the mbuf external free function take a void * rather than caddr_t.
2002-06-28 23:48:23 +00:00
Alfred Perlstein
02a32cd207 change struct socket -> so_pcb from caddr_t to void *. 2002-06-28 23:17:08 +00:00
Alfred Perlstein
b555662c63 change f_data field in struct file from caddr_t to void *. 2002-06-28 23:00:32 +00:00
Jeff Roberson
90769c9ed0 Improve the VOP locking asserts
- Add vfs_badlock_print to control whether or not we print lock violations
 - Add vfs_badlock_panic to control whether we panic on lock violations

Both default to on to mimic the original behavior if DEBUG_VFS_LOCKS is on.
2002-06-28 20:58:14 +00:00
Kenneth D. Merry
98cb733c67 At long last, commit the zero copy sockets code.
MAKEDEV:	Add MAKEDEV glue for the ti(4) device nodes.

ti.4:		Update the ti(4) man page to include information on the
		TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options,
		and also include information about the new character
		device interface and the associated ioctls.

man9/Makefile:	Add jumbo.9 and zero_copy.9 man pages and associated
		links.

jumbo.9:	New man page describing the jumbo buffer allocator
		interface and operation.

zero_copy.9:	New man page describing the general characteristics of
		the zero copy send and receive code, and what an
		application author should do to take advantage of the
		zero copy functionality.

NOTES:		Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS,
		TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT.

conf/files:	Add uipc_jumbo.c and uipc_cow.c.

conf/options:	Add the 5 options mentioned above.

kern_subr.c:	Receive side zero copy implementation.  This takes
		"disposable" pages attached to an mbuf, gives them to
		a user process, and then recycles the user's page.
		This is only active when ZERO_COPY_SOCKETS is turned on
		and the kern.ipc.zero_copy.receive sysctl variable is
		set to 1.

uipc_cow.c:	Send side zero copy functions.  Takes a page written
		by the user and maps it copy on write and assigns it
		kernel virtual address space.  Removes copy on write
		mapping once the buffer has been freed by the network
		stack.

uipc_jumbo.c:	Jumbo disposable page allocator code.  This allocates
		(optionally) disposable pages for network drivers that
		want to give the user the option of doing zero copy
		receive.

uipc_socket.c:	Add kern.ipc.zero_copy.{send,receive} sysctls that are
		enabled if ZERO_COPY_SOCKETS is turned on.

		Add zero copy send support to sosend() -- pages get
		mapped into the kernel instead of getting copied if
		they meet size and alignment restrictions.

uipc_syscalls.c:Un-staticize some of the sf* functions so that they
		can be used elsewhere.  (uipc_cow.c)

if_media.c:	In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid
		calling malloc() with M_WAITOK.  Return an error if
		the M_NOWAIT malloc fails.

		The ti(4) driver and the wi(4) driver, at least, call
		this with a mutex held.  This causes witness warnings
		for 'ifconfig -a' with a wi(4) or ti(4) board in the
		system.  (I've only verified for ti(4)).

ip_output.c:	Fragment large datagrams so that each segment contains
		a multiple of PAGE_SIZE amount of data plus headers.
		This allows the receiver to potentially do page
		flipping on receives.

if_ti.c:	Add zero copy receive support to the ti(4) driver.  If
		TI_PRIVATE_JUMBOS is not defined, it now uses the
		jumbo(9) buffer allocator for jumbo receive buffers.

		Add a new character device interface for the ti(4)
		driver for the new debugging interface.  This allows
		(a patched version of) gdb to talk to the Tigon board
		and debug the firmware.  There are also a few additional
		debugging ioctls available through this interface.

		Add header splitting support to the ti(4) driver.

		Tweak some of the default interrupt coalescing
		parameters to more useful defaults.

		Add hooks for supporting transmit flow control, but
		leave it turned off with a comment describing why it
		is turned off.

if_tireg.h:	Change the firmware rev to 12.4.11, since we're really
		at 12.4.11 plus fixes from 12.4.13.

		Add defines needed for debugging.

		Remove the ti_stats structure, it is now defined in
		sys/tiio.h.

ti_fw.h:	12.4.11 firmware.

ti_fw2.h:	12.4.11 firmware, plus selected fixes from 12.4.13,
		and my header splitting patches.  Revision 12.4.13
		doesn't handle 10/100 negotiation properly.  (This
		firmware is the same as what was in the tree previously,
		with the addition of header splitting support.)

sys/jumbo.h:	Jumbo buffer allocator interface.

sys/mbuf.h:	Add a new external mbuf type, EXT_DISPOSABLE, to
		indicate that the payload buffer can be thrown away /
		flipped to a userland process.

socketvar.h:	Add prototype for socow_setup.

tiio.h:		ioctl interface to the character portion of the ti(4)
		driver, plus associated structure/type definitions.

uio.h:		Change prototype for uiomoveco() so that we'll know
		whether the source page is disposable.

ufs_readwrite.c:Update for new prototype of uiomoveco().

vm_fault.c:	In vm_fault(), check to see whether we need to do a page
		based copy on write fault.

vm_object.c:	Add a new function, vm_object_allocate_wait().  This
		does the same thing that vm_object allocate does, except
		that it gives the caller the opportunity to specify whether
		it should wait on the uma_zalloc() of the object structre.

		This allows vm objects to be allocated while holding a
		mutex.  (Without generating WITNESS warnings.)

		vm_object_allocate() is implemented as a call to
		vm_object_allocate_wait() with the malloc flag set to
		M_WAITOK.

vm_object.h:	Add prototype for vm_object_allocate_wait().

vm_page.c:	Add page-based copy on write setup, clear and fault
		routines.

vm_page.h:	Add page based COW function prototypes and variable in
		the vm_page structure.

Many thanks to Drew Gallatin, who wrote the zero copy send and receive
code, and to all the other folks who have tested and reviewed this code
over the years.
2002-06-26 03:37:47 +00:00
Matthew Dillon
070f64fe6f Part I of RLIMIT_VMEM implementation. Implement core functionality for
a new resource limit that covers a process's entire VM space, including
mmap()'d space.

(Part II will be additional code to check RLIMIT_VMEM during exec() but it
needs more fleshing out).

PR:		kern/18209
Submitted by:	Andrey Alekseyev <uitm@zenon.net>, Dmitry Kim <jason@nichego.net>
MFC after:	7 days
2002-06-26 00:29:28 +00:00
Mark Murray
f607758849 Fix a GCCism.
int foo[0];  // dodgy
int foo[];   // means the same, works the same and survives lint.

Tested by:	3 months of use on my laptop
2002-06-24 16:44:38 +00:00
Jake Burkholder
8ba3d077ff Add an MD callout like cpu_exit, but which is called after sched_lock is
obtained, when all other scheduling activity is suspended.  This is needed
on sparc64 to deactivate the vmspace of the exiting process on all cpus.
Otherwise if another unrelated process gets the exact same vmspace structure
allocated to it (same address), its address space will not be activated
properly.  This seems to fix some spontaneous signal 11 problems with smp
on sparc64.
2002-06-24 15:48:02 +00:00
Maxime Henrion
097b91c0ad Oops, this should have been part of my previous commit.
Somehow, it hasn't.

Reviewed by:	phk
2002-06-24 14:18:39 +00:00
Bruce Evans
c894109f44 Include <sys/types.h> in the !_KERNEL case so that this file is
self-sufficient in that case (it needs dev_t).  This is normal pollution
for most headers that define ioctl numbers.
2002-06-24 11:45:45 +00:00
Bruce Evans
22f6ca8d0d Fixed some style bugs (mainly excessive indentation).
Not completely unapproved by:	julian
2002-06-24 11:37:56 +00:00
Jonathan Mini
01ad8a53db Remove unused diagnostic function cread_free_thread().
Approved by:	alfred
2002-06-24 06:22:00 +00:00
Dag-Erling Smørgrav
520c140b46 This commit was generated by cvs2svn to compensate for changes in r98679,
which included commits to RCS files with non-trunk default branches.
2002-06-23 14:38:51 +00:00
Dag-Erling Smørgrav
9296418d51 Import OpenBSD's <sys/tree.h>, needed by OpenSSH.
Obtained from:	OpenBSD
2002-06-23 14:38:51 +00:00
Luigi Rizzo
2cc213c443 Remove some extra spaces hidden between tabs
Spotted-by: diff against the version in RELENG_4
2002-06-23 12:06:40 +00:00
Jake Burkholder
e020a3a8be KTR_CT* had one too many trailing zeroes, making KTR_CT5-8 too large for
ktr_mask.
2002-06-23 00:38:04 +00:00
Kirk McKusick
6524dddcd5 This patch fixes a size problem with the stat structure for
64-bit architectures that was introduced in the UFS2 code
merge two days ago. The stat structure change that caused
the problem was the addition of the file create time.

Submitted by:	Bruce Evans <bde@zeta.org.au>
Sponsored by:	DARPA & NAI Labs.
2002-06-22 22:01:13 +00:00
Maxime Henrion
cacd1c9b49 o Remove the initialization of unused fields in the struct
uio now that we don't use uiomove() anymore.
o Enforce stricter checks on the length of the iov's in
  nmount(2) since we now malloc() them individually and
  corrupted iov's could make the kernel crash in malloc()
  with "kmem_map too small".

Reviewed by:	phk
2002-06-22 18:07:05 +00:00
Luigi Rizzo
dcb9465082 Define an mbuf type, MT_TAG, used for volatile annotations
prepended to mbuf chains in the network stack.
Reuse a previoulsy unused value to avoid changes in other
data structures.
2002-06-22 11:29:08 +00:00
Kirk McKusick
1c85e6a35d This commit adds basic support for the UFS2 filesystem. The UFS2
filesystem expands the inode to 256 bytes to make space for 64-bit
block pointers. It also adds a file-creation time field, an ability
to use jumbo blocks per inode to allow extent like pointer density,
and space for extended attributes (up to twice the filesystem block
size worth of attributes, e.g., on a 16K filesystem, there is space
for 32K of attributes). UFS2 fully supports and runs existing UFS1
filesystems. New filesystems built using newfs can be built in either
UFS1 or UFS2 format using the -O option. In this commit UFS1 is
the default format, so if you want to build UFS2 format filesystems,
you must specify -O 2. This default will be changed to UFS2 when
UFS2 proves itself to be stable. In this commit the boot code for
reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c)
as there is insufficient space in the boot block. Once the size of the
boot block is increased, this code can be defined.

Things to note: the definition of SBSIZE has changed to SBLOCKSIZE.
The header file <ufs/ufs/dinode.h> must be included before
<ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and
ufs_lbn_t.

Still TODO:
Verify that the first level bootstraps work for all the architectures.
Convert the utility ffsinfo to understand UFS2 and test growfs.
Add support for the extended attribute storage. Update soft updates
to ensure integrity of extended attribute storage. Switch the
current extended attribute interfaces to use the extended attribute
storage. Add the extent like functionality (framework is there,
but is currently never used).

Sponsored by: DARPA & NAI Labs.
Reviewed by:	Poul-Henning Kamp <phk@freebsd.org>
2002-06-21 06:18:05 +00:00
Luigi Rizzo
4ad01e18e3 Add some #define's for mbuf annotations.
As the comment in the code says, eventually there will be a proper
data structure (e.g NetBSD's struct m_tag) to store chains of
annotations, and mbuf-handling procedures will handle these chains
in the correct way.

Right now, these chains do not exist, and we just use the constants
defined here to implement simple ad-hoc solutions to remove some global
variables used so far to pass around informations about packets
being processed.

Global variables are not only ugly and make the code unreadable, they
also prevent from using parallelism in network stack processing.

(the 3-days MFC only refers to this commit, i.e. the PACKET_TAG_*
constants; the full mechanism will be committed and MFC'ed on a
longer timescale).

MFC after: 3 days
2002-06-20 21:29:55 +00:00
Maxime Henrion
7d2d440991 Change the way we internally store the mount options to
a linked list.  This is to allow the merging of the mount
options in the MNT_UPDATE case, as the current data structure
is unsuitable for this.

There are no functional differences in this commit.

Reviewed by:	phk
2002-06-20 20:03:42 +00:00
Alfred Perlstein
c33c825169 Implement SO_NOSIGPIPE option for sockets. This allows one to request that
an EPIPE error return not generate SIGPIPE on sockets.

Submitted by: lioux
Inspired by: Darwin
2002-06-20 18:52:54 +00:00
Bruce Evans
c7143344b2 Quick fix for the type of the bitmap in sigset_t. It was an array of
4 u_ints but needs to be an array of 4 uint32_t's to work, at least
if unsigned ints have less than 32 bits.  It should be a non-array of
1 uint128_t on 128-bit machines, especially if u_int has 128 bits.
The headers that declare uint32_t (actually __uint32_t) are intentionally
not included here since this header should only be included by other
headers.

Fixed some style bugs (space instead of tab after #ifndef and #endif).
2002-06-20 09:04:33 +00:00
Peter Wemm
e8aef1d3b5 Use suword16/fuword16 instead of susword/fusword - this has two different
definitions so far.. 16 bit on x86 and appears to be 32 bit on sparc64.
Be explicit to avoid suprises.
2002-06-20 07:23:08 +00:00
Peter Wemm
b23619e02a Deorbit suibyte(). It was only used for split address space systems
for supporting UIO_USERISPACE (ie: it wasn't used).
2002-06-20 07:13:35 +00:00
Peter Wemm
9d04103b7c Remove UIO_USERISPACE - we do not support any split instruction/data
address space machines (eg: pdp-11) and are not likely to ever do so.
Nothing in our kernel sets this.
2002-06-20 07:08:43 +00:00
Mike Barcroft
0c49c1f970 Change spelling of u_char' to unsigned char' to avoid requiring
<sys/types.h> as a prerequisite.
2002-06-19 19:05:41 +00:00
Poul-Henning Kamp
c4bacc1871 Remove the compat bits for the mis-aligned struct disklabel on alpha,
people got three times longer than I promised.

Sponsored by: DARPA & NAI Labs.
2002-06-19 08:37:02 +00:00
Alfred Perlstein
1419eacb86 Squish the "could sleep with process lock" messages caused by calling
uifind() with a proc lock held.

change_ruid() and change_euid() have been modified to take a uidinfo
structure which will be pre-allocated by callers, they will then
call uihold() on the uidinfo structure so that the caller's logic
is simplified.

This allows one to call uifind() before locking the proc struct and
thereby avoid a potential blocking allocation with the proc lock
held.

This may need revisiting, perhaps keeping a spare uidinfo allocated
per process to handle this situation or re-examining if the proc
lock needs to be held over the entire operation of changing real
or effective user id.

Submitted by: Don Lewis <dl-freebsd@catspoiler.org>
2002-06-19 06:39:25 +00:00
Bill Fumerola
4471d80f69 fix whitespace botch in previous commit. 2002-06-19 01:23:54 +00:00
Seigo Tanimura
03e4918190 Remove so*_locked(), which were backed out by mistake. 2002-06-18 07:42:02 +00:00
Jeff Roberson
18aa2de5a7 - Introduce the new M_NOVM option which tells uma to only check the currently
allocated slabs and bucket caches for free items.  It will not go ask the vm
  for pages.  This differs from M_NOWAIT in that it not only doesn't block, it
  doesn't even ask.

- Add a new zcreate option ZONE_VM, that sets the BUCKETCACHE zflag.  This
  tells uma that it should only allocate buckets out of the bucket cache, and
  not from the VM.  It does this by using the M_NOVM option to zalloc when
  getting a new bucket.  This is so that the VM doesn't recursively enter
  itself while trying to allocate buckets for vm_map_entry zones.  If there
  are already allocated buckets when we get here we'll still use them but
  otherwise we'll skip it.

- Use the ZONE_VM flag on vm map entries and pv entries on x86.
2002-06-17 22:02:41 +00:00
Alfred Perlstein
4ff964978c remove bogus comment, select/poll do NOT need to fhold as they hold the
filedesc lock.
style(9) fixes, add blank line at start of functions with no local variables.
2002-06-17 09:39:30 +00:00
Garrett Wollman
6149fe8392 Fix two syntax errors and add declarations of statvfs() and fstatvfs(). 2002-06-16 21:02:08 +00:00
Garrett Wollman
bf121deddc Now that we have a separate header file for sigset_t, use it and avoid
the full pollution of <signal.h>.
2002-06-16 18:40:16 +00:00
Garrett Wollman
e055075e35 Use <sys/_sigset.h> to get declaration of sigset_t, which has been moved
to a separate header to facilitate its declaration in more than one
place.  Namespace issues not fixed.
2002-06-16 18:35:24 +00:00
Garrett Wollman
ff15712090 Delete everything except the sigset_t definitions (subsequent to
repo-copy).
2002-06-16 18:33:59 +00:00
Garrett Wollman
15d5bbf2b6 Add some (but not all) of the things POSIX expects to be declared in
<sys/select.h>.
2002-06-15 23:39:10 +00:00
Garrett Wollman
0a3d161049 Fix visibility macros. Declare fsblkcnt_t and fsfilcnt_t (for statvfs())
per POSIX.
2002-06-15 23:38:43 +00:00
Garrett Wollman
58db51b2c2 Fix visibility issues; use <sys/timespec.h>. 2002-06-15 23:37:33 +00:00
Maxime Henrion
fe93750656 Change vfs_copyopt() so that the length argument passed to it
must be the exact same size as the mount option.  This makes
vfs_copyopt() much more useful.
2002-06-14 20:04:21 +00:00
Garrett Wollman
250917547b Implement the <sys/statvfs.h> header. Related changes to <sys/types.h>
are still awaiting a worldstone.  Functions and their declarations to
come later.
2002-06-14 19:37:06 +00:00
Robert Watson
8973ba1b6e Reserve two constants for managing socket MAC labels via socket options. 2002-06-14 08:49:04 +00:00
Robert Watson
9d697bebc8 Whitespaec consistency. 2002-06-14 08:46:07 +00:00
Robert Watson
6480dc743e Regen. 2002-06-13 23:44:50 +00:00
Robert Watson
820a52632e No POSIX.1e capabilities in the main tree yet. 2002-06-13 23:40:13 +00:00
Kelly Yancey
9ae6d334da Make nselcol, the number of select collisions since boot, unsigned as
negative collisions simply doesn't make sense.

PR:		(one small part of) 19720
Approved by:	alfred
2002-06-12 02:08:18 +00:00
Garrett Wollman
b837f53a62 SO_PRIVSTATE has been commented out for long enough now.... 2002-06-11 18:23:11 +00:00
Kelly Yancey
3316a80bd1 Convert hit and miss counters to unsigned values. Surely negative values
for either does not make sense.

PR:		(one small part of) 19720
2002-06-10 22:40:26 +00:00
Bruce Evans
cc966ea4bc Renamed the idempotency identifier to match the file name. Cleaned up
indentation and comments.
2002-06-09 02:52:40 +00:00
Bruce Evans
1fe7722cb5 Renamed the idempotency identifier to match the file name. 2002-06-07 14:37:09 +00:00
John Baldwin
ea3fc8e4cd Overhaul the ktrace subsystem a bit. For the most part, the actual vnode
operations to dump a ktrace event out to an output file are now handled
asychronously by a ktrace worker thread.  This enables most ktrace events
to not need Giant once p_tracep and p_traceflag are suitably protected by
the new ktrace_lock.

There is a single todo list of pending ktrace requests.  The various
ktrace tracepoints allocate a ktrace request object and tack it onto the
end of the queue.  The ktrace kernel thread grabs requests off the head of
the queue and processes them using the trace vnode and credentials of the
thread triggering the event.

Since we cannot assume that the user memory referenced when doing a
ktrgenio() will be valid and since we can't access it from the ktrace
worker thread without a bit of hassle anyways, ktrgenio() requests are
still handled synchronously.  However, in order to ensure that the requests
from a given thread still maintain relative order to one another, when a
synchronous ktrace event (such as a genio event) is triggered, we still put
the request object on the todo list to synchronize with the worker thread.
The original thread blocks atomically with putting the item on the queue.
When the worker thread comes across an asynchronous request, it wakes up
the original thread and then blocks to ensure it doesn't manage to write a
later event before the original thread has a chance to write out the
synchronous event.  When the original thread wakes up, it writes out the
synchronous using its own context and then finally wakes the worker thread
back up.  Yuck.  The sychronous events aren't pretty but they do work.

Since ktrace events can be triggered in fairly low-level areas (msleep()
and cv_wait() for example) the ktrace code is designed to use very few
locks when posting an event (currently just the ktrace_mtx lock and the
vnode interlock to bump the refcoun on the trace vnode).  This also means
that we can't allocate a ktrace request object when an event is triggered.
Instead, ktrace request objects are allocated from a pre-allocated pool
and returned to the pool after a request is serviced.

The size of this pool defaults to 100 objects, which is about 13k on an
i386 kernel.  The size of the pool can be adjusted at compile time via the
KTRACE_REQUEST_POOL kernel option, at boot time via the
kern.ktrace_request_pool loader tunable, or at runtime via the
kern.ktrace_request_pool sysctl.

If the pool of request objects is exhausted, then a warning message is
printed to the console.  The message is rate-limited in that it is only
printed once until the size of the pool is adjusted via the sysctl.

I have tested all kernel traces but have not tested user traces submitted
by utrace(2), though they should work fine in theory.

Since a ktrace request has several properties (content of event, trace
vnode, details of originating process, credentials for I/O, etc.), I chose
to drop the first argument to the various ktrfoo() functions.  Currently
the functions just assume the event is posted from curthread.  If there is
a great desire to do so, I suppose I could instead put back the first
argument but this time make it a thread pointer instead of a vnode pointer.

Also, KTRPOINT() now takes a thread as its first argument instead of a
process.  This is because the check for a recursive ktrace event is now
per-thread instead of process-wide.

Tested on:	i386
Compiles on:	sparc64, alpha
2002-06-07 05:32:59 +00:00
John Baldwin
609d46568c Add a new SYSINIT subsystem for KTRACE. 2002-06-07 05:11:39 +00:00
John Baldwin
c5dce53f5d - Add a per-thread member 'td_inktrace' to be used by ktrace to detect
when a thread is in the ktrace subsystem to avoid ktrace'ing internal
  ktrace events.
- Update the locking notes for p_traceflag and p_tracep taking into account
  the new ktrace_lock mutex.
2002-06-07 05:11:08 +00:00
John Baldwin
48849938e8 Change the all locks list from a STAILQ to a TAILQ. This bloats struct
lock_object by another pointer (though all of lock_object should be
conditional on LOCK_DEBUG anyways) in exchange for an O(1) TAILQ_REMOVE()
in witness_destroy() (called for every mtx_destroy() and sx_destroy())
instead of an O(n) STAILQ_REMOVE.  Since WITNESS is so dog slow as it is,
the speed-up is worth the space cost.

Suggested by:	iedowse
2002-06-06 20:51:04 +00:00
Mike Barcroft
8375a2c466 Remove the deprecated 4.2/4.3BSD wait union. 2002-06-05 02:21:01 +00:00
Juli Mallett
22ed0c9ade NODEV is defined the same in _KERNEL and !_KERNEL case, so move it out from
the preprocessor conditional, and remove the now-empty #else.

Reviewed by:	asmodai
2002-06-04 05:48:38 +00:00
Jens Schweikhardt
21dc7d4f57 Fix typo in the BSD copyright: s/withough/without/
Spotted and suggested by:	des
MFC after:	3 weeks
2002-06-02 20:05:59 +00:00
Alfred Perlstein
6e330f3e36 bde noticed that SOMAXCONN breaks pretty badly as an option for LINT.
so back it out.
2002-06-02 04:32:52 +00:00
Mike Barcroft
8ad4a5a605 Be more strict about namespaces.
Submitted by:	wollman (mostly)
2002-06-01 21:07:10 +00:00
Mike Barcroft
0658e085a7 Fix some, but not all style bugs. 2002-06-01 18:58:02 +00:00
Mike Barcroft
6ee093fb8f Add POSIX.1-2001 WCONTINUED option for waitpid(2). A proc flag
(P_CONTINUED) is set when a stopped process receives a SIGCONT and
cleared after it has notified a parent process that has requested
notification via waitpid(2) with WCONTINUED specified in its options
operand.  The status value can be checked with the new WIFCONTINUED()
macro.

Reviewed by:	jake
2002-06-01 18:37:46 +00:00
Brian S. Dean
181b15f9f8 Make a structure definition slightly more style(9) compliant (makes
the structure definition easier to find using grep).
2002-06-01 03:55:16 +00:00
Archie Cobbs
48d183faca Fix a bug in m_split(): the "m->m_ext.ext_size" field of an mbuf was being
set to zero. This field indicates the total space in the external buffer
and therefore should not be modified after the external buffer is added.

Add a comment warning that the mbufs returned by m_split() might be read-only.

Fix M_TRAILINGSPACE() to return zero if !M_WRITABLE(m).

Reviewed by:	freebsd-net
Obtained from:	Vernier Networks, Inc.
MFC after:	1 week
2002-05-31 22:09:57 +00:00
Seigo Tanimura
4cc20ab1f0 Back out my lats commit of locking down a socket, it conflicts with hsu's work.
Requested by:	hsu
2002-05-31 11:52:35 +00:00
Marcel Moolenaar
78dff92b6b Don't use an incomplete array type to mark the start of the padding
because the padding should be inserted before the array and not after
it, as is done by GCC 3.1. Instead use an explicit uint32_t field
to get what was intended and on top of that make the size of the
padding explicit. This also doesn't depend on a C99 feature.
While here, expand the comment. Just to make a point.

Pointed out by: fanf
2002-05-31 01:07:13 +00:00
Doug Rabson
99bd783419 Move the definition of ElfN_Hashelt to common headers. The only platform
which has a different definition for this is alpha.
2002-05-30 08:32:18 +00:00
Jeff Roberson
7181624aaa Record the file, line, and pid of the last successful shared lock holder. This
is useful as a last effort in debugging file system deadlocks.  This is enabled
via 'options DEBUG_LOCKS'
2002-05-30 05:55:22 +00:00
Marcel Moolenaar
2e1cdcf311 o Remove GCC specific attribute packed.
o  Add incomplete array padding.
2002-05-30 05:44:23 +00:00
Julian Elischer
628855e758 CURSIG() is not a macro so rename it cursig().
Obtained from:	KSE tree
2002-05-29 23:44:32 +00:00
Garrett Wollman
b3ec920c0e Version bump for addition of dlfunc(3). 2002-05-29 21:04:25 +00:00
Poul-Henning Kamp
f4258597dc Add one copy of crc32() and crc32_tab[] in libkern, and remove it two other
places.

Comment out crc32 related definitions in zlib.h, we don't seem to have the
corresponding code in our kernel.
2002-05-29 20:24:09 +00:00
Marcel Moolenaar
cd84983468 Add attribute packed to struct gpt_hdr to avoid unwanted padding at
the end of the struct to make it an integral number of "longs" on
64-bit architectures. The size of the struct must be 92, not 96.
2002-05-29 02:58:41 +00:00
Bruce Evans
97be9f99d2 Fixed some style bugs in recent commits. 2002-05-28 15:24:13 +00:00
Marcel Moolenaar
bcd46c600a Add support to GEOM for GUID Partition Tables (GPTs). The support
is currently conditional on both the GEOM and GEOM_GPT options to
avoid getting GPT by default and having the MBR and GPT classes
clash.
The correct behaviour of the MBR class would be to back-off (reject)
a MBR if it's a Protective MBR (a MBR with a single partition of type
0xEE that spans the whole disk (as far as the MBR is concerned).
The correct behaviour if the GPT class would be to back-off (reject)
a GPT if there's a MBR that's not a Protective MBR.

At this stage it's inconvenient to destroy a good MBR when working
with GPTs that it's more convenient to have the MBR class back-off
when it detects the GPT signature on disk and have the GPT class
ignore the MBR.

In sys/gpt.h UUIDs (GUIDs) for the following FreeBSD partitions
have been defined:

GPT_ENT_TYPE_FREEBSD
	FreeBSD slice with disklabel. This is the equivalent of
	the well-known FreeBSD MBR partition type.
GPT_ENT_TYPE_FREEBSD_{SWAP|UFS|UFS2|VINUM}
	FreeBSD partitions in the context of disklabel. This is
	speculating on the idea to use the GPT to hold partitions
	instead if slices and removing the fixed (and low) limits
	we have on the number of partitions.

This commit lacks a GPT image for the regression suite.
2002-05-28 09:04:48 +00:00
Dag-Erling Smørgrav
6c533ac713 Add NAI copyright. 2002-05-28 06:53:41 +00:00
Dag-Erling Smørgrav
b0405a2ad3 Back out part of previous commit; the dev_t union trick is still useful in
the kvm case.
2002-05-28 06:34:28 +00:00
Marcel Moolenaar
52183d0145 Add uuidgen(2) and uuidgen(1).
The uuidgen command, by means of the uuidgen syscall, generates one
or more Universally Unique Identifiers compatible with OSF/DCE 1.1
version 1 UUIDs.

From the Perforce logs (change 11995):

Round of cleanups:
o  Give uuidgen() the correct prototype in syscalls.master
o  Define struct uuid according to DCE 1.1 in sys/uuid.h
o  Use struct uuid instead of uuid_t. The latter is defined
   in sys/uuid.h but should not be used in kernel land.
o  Add snprintf_uuid(), printf_uuid() and sbuf_printf_uuid()
   to kern_uuid.c for use in the kernel (currently geom_gpt.c).
o  Rename the non-standard struct uuid in kern/kern_uuid.c
   to struct uuid_private and give it a slightly better definition
   for better byte-order handling. See below.
o  In sys/gpt.h, fix the broken uuid definitions to match the now
   compliant struct uuid definition. See below.
o  In usr.bin/uuidgen/uuidgen.c catch up with struct uuid change.

A note about byte-order:
        The standard failed to provide a non-conflicting and
unambiguous definition for the binary representation. My initial
implementation always wrote the timestamp as a 64-bit little-endian
(2s-complement) integral. The clock sequence was always written
as a 16-bit big-endian (2s-complement) integral. After a good
nights sleep and couple of Pan Galactic Gargle Blasters (not
necessarily in that order :-) I reread the spec and came to the
conclusion that the time fields are always written in the native
by order, provided the the low, mid and hi chopping still occurs.
The spec mentions that you "might need to swap bytes if you talk
to a machine that has a different byte-order". The clock sequence
is always written in big-endian order (as is the IEEE 802 address)
because its division is resulting in bytes, making the ordering
unambiguous.
2002-05-28 06:16:08 +00:00
Dag-Erling Smørgrav
1a149fcd67 Introduce struct xtty, used when exporting tty information to userland.
Make kern.ttys export a struct xtty rather than struct tty.  Since struct
tty is no longer exposed to userland, remove the dev_t / udev_t hack.

Sponsored by:	DARPA, NAI Labs
2002-05-28 05:40:53 +00:00
Mike Barcroft
aa37be50ad Use underscored variant of BYTE_ORDER and friends to allow this to
work in a !__BSD_VISIBLE environment.
2002-05-27 00:55:17 +00:00
Doug Rabson
396a429cfd Add declarations of suword32 and suword64. Add implementations of one or
the other (or both) to all the platforms. Similar for fuword32 and
fuword64.
2002-05-26 16:03:13 +00:00
Jake Burkholder
d2ac231616 Make the run queue parameters machine dependent. Optimize 64 bit
architectures by using a 64 bit word for the bit array which keeps
track of non-empty queues.

Reviewed by:	peter
2002-05-25 01:12:23 +00:00
Alfred Perlstein
fa09b4015d Backout 1.54 (restore definition for printf0 to actually do something). 2002-05-24 19:16:08 +00:00
Mark Murray
1422d23663 The previous ANSIfication did not take into account earlier,
non-compliant compilers. Revert to the compatible form to allow
upgrade-builds.
2002-05-24 09:40:51 +00:00
Mark Murray
f1f239b30d The previous ANSIfication did not take into account upgrade-builds
uing an earlier, non-compliant compiler. Revert to the compatible
form.
2002-05-24 09:37:10 +00:00
Maxime Henrion
cdb5638a27 Update comments to better match reality. 2002-05-23 23:18:25 +00:00
Mark Murray
b1fc278484 ANSIfy variable-argument macros. 2002-05-23 18:26:23 +00:00
Mark Murray
5e05b84b02 Whitespace only; fix indentation. 2002-05-23 12:09:14 +00:00
John Baldwin
e8fdcfb57a Optimize spin mutexes for UP kernels without debugging to just enter and
exit critical sections.  We only contest on a spin mutex on an SMP kernel
running on an SMP machine.
2002-05-21 20:34:28 +00:00
Jake Burkholder
0f33dc7b6f Forward declare struct thread. 2002-05-20 16:11:38 +00:00
Seigo Tanimura
243917fe3b Lock down a socket, milestone 1.
o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
  socket buffer. The mutex in the receive buffer also protects the data
  in struct socket.

o Determine the lock strategy for each members in struct socket.

o Lock down the following members:

  - so_count
  - so_options
  - so_linger
  - so_state

o Remove *_locked() socket APIs.  Make the following socket APIs
  touching the members above now require a locked socket:

 - sodisconnect()
 - soisconnected()
 - soisconnecting()
 - soisdisconnected()
 - soisdisconnecting()
 - sofree()
 - soref()
 - sorele()
 - sorwakeup()
 - sotryfree()
 - sowakeup()
 - sowwakeup()

Reviewed by:	alfred
2002-05-20 05:41:09 +00:00
John Baldwin
f44d9e24fb Change p_can{debug,see,sched,signal}()'s first argument to be a thread
pointer instead of a proc pointer and require the process pointed to
by the second argument to be locked.  We now use the thread ucred reference
for the credential checks in p_can*() as a result.  p_canfoo() should now
no longer need Giant.
2002-05-19 00:14:50 +00:00
Poul-Henning Kamp
4ecbca5e4f Try again: Make daddr_t 64 bits.
Sponsored by:	DARPA & NAI Labs.
2002-05-18 09:48:28 +00:00
Poul-Henning Kamp
f07d4b256b Move the hideously misnamed type "u_daddr_t" to <sys/blist.h> where it
belongs.

Sponsored by: DARPA & NAI Labs.
2002-05-18 09:38:20 +00:00
David E. O'Brien
96c8341645 Bump __FreeBSD_version to note that Perl is not in /usr/src any more. 2002-05-17 03:13:08 +00:00
Tom Rhodes
d394511de3 More s/file system/filesystem/g 2002-05-16 21:28:32 +00:00
Maxim Sobolev
5b24bd6150 Rename struct scr_size into struct _scr_size and struct scrmap into
struct _scrmap, so that it doesn't break C++ programs (name of element of
the structure is the same as the name of the scructure itself).

MFC after:	5 days
2002-05-16 10:57:10 +00:00
Poul-Henning Kamp
4aafadc8b9 Revert daddr_t to 32 bits while we research the reported problems. 2002-05-15 17:52:03 +00:00
Poul-Henning Kamp
6380601f64 Move MI stuff out of MD param.h files.
It can all still be overridden in the MD files should need suddenly arise.
2002-05-14 20:35:29 +00:00
Robert Watson
4a7edf6974 Strategic diff reduction against TrustedBSD MAC branch: introduce an
additional system boot ordering entry, SI_SUB_MAC_LATE, which occurs
after all MAC policies have been initialized, permitting the MAC
subsystem to take action once all "early loaded" modules are in place.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-05-14 14:36:09 +00:00
Poul-Henning Kamp
98b0c78978 Make daddr_t and u_daddr_t 64bits wide.
Retire daddr64_t and use daddr_t instead.

Sponsored by:	DARPA & NAI Labs.
2002-05-14 11:09:43 +00:00
Poul-Henning Kamp
085559c4fc Roll the LOG2 macro up again, I don't belive unrolling this for 64bits
make sense.

Sponsored by:	DARPA & NAI Labs.
2002-05-14 08:01:34 +00:00
Poul-Henning Kamp
e74e01d5e3 Make the mtio data structures explicitly sized.
A couple of the fields should probably be 64bits in the future.

Sponsored by:	DARPA & NAI Labs.
2002-05-14 07:30:13 +00:00
Poul-Henning Kamp
22bd43ccda Move a few ancient minor-number definitions for tapedrives to the
only driver which uses them.  Remove the rest.
2002-05-14 06:57:02 +00:00
David E. O'Brien
ef372b6540 Bump for GCC 3.1. 2002-05-13 07:14:17 +00:00
Poul-Henning Kamp
7110af7577 ARGH! SBLOCK is not unused. Try to get this right.
BBSIZE belongs in <sys/disklabel.h> (but shouldn't be a constant).

Define SBLOCK again, using the right math.

Sponsored by: DARPA & NAI Labs.
2002-05-12 20:21:40 +00:00
Andrew Gallatin
338a21a47a Restore the ability to take crashdumps on alpha. This was cut and pasted
nearly in its entirety from i386, so it retains the phk/nati copyright.

Savecore likes the results, but I have no way to test it as gdb is
still broken.
2002-05-11 21:53:46 +00:00
Alfred Perlstein
b71b449d27 As a temporary bandaid disable '__printf0like' unconditionally, it
doesn't seem to work under gcc 3.1 yet.

We are now 'WERROR' safe again.
2002-05-11 03:58:24 +00:00
John Baldwin
2c6c9ea2bd p_leader is only set at fork1() time, so update its locking note
appropriately.
2002-05-10 14:28:05 +00:00
David E. O'Brien
39ebfeb05b Add a hack (ported from NetBSD) to support Sun disk labels.
This code works by converting the Sun label to a struct disklabel, which
is probably even the right thing for reading a label. The original
checksum is taken over, so that the label source can be distinguished.

The NetBSD code to wrap a BSD-style disklabel into the Sun disklabel has been
deleted for now - don't know whether that is really desirable, after all Sun
disklabels could just be used always (BSD disklabels are going to have
problems with PROM compatability).  The dsinit() call in diskopen() has been
#ifdef'ed out for now, this will be changed to use the minimal slice struct
in case of dsinit() failure.

Submitted by:	tmm
Obtained from:	NetBSD
2002-05-09 20:22:59 +00:00
Jonathan Mini
d8f4f6a404 Remove trace_req().
Reviewed by:	alfred, jhb, peter
2002-05-09 04:13:41 +00:00
Alfred Perlstein
e649887b1e Make funsetown() take a 'struct sigio **' so that the locking can
be done internally.

Ensure that no one can fsetown() to a dying process/pgrp.  We need
to check the process for P_WEXIT to see if it's exiting.  Process
groups are already safe because there is no such thing as a pgrp
zombie, therefore the proctree lock completely protects the pgrp
from having sigio structures associated with it after it runs
funsetownlst.

Add sigio lock to witness list under proctree and allproc, but over
proc and pgrp.

Seigo Tanimura helped with this.
2002-05-06 19:31:28 +00:00
Alan Cox
b3a882e936 o Header files shouldn't depend on options: Provide prototypes
for uiomoveco(), uioread(), and vm_uiomove() regardless
   of whether ENABLE_VFS_IOOPT is defined or not.

Submitted by:	bde
2002-05-06 06:20:04 +00:00
Bruce Evans
607aa34e88 Include <sys/queue.h> so that this file provides its own namespace
pollution which is required for its includes of <sys/_lock.h> and
<sys/_mutex.h> to work.
2002-05-06 03:13:08 +00:00
Maxime Henrion
9d997d8be8 Add the lchflags(2) syscall.
Reviewed by:	rwatson
2002-05-05 23:47:41 +00:00
Alan Cox
c50fe92b8d o Condition the compilation of uiomoveco() and vm_uiomove()
on ENABLE_VFS_IOOPT.
 o Add a comment to the effect that this code is experimental
   support for zero-copy I/O.
2002-05-05 22:42:40 +00:00
Poul-Henning Kamp
81e017430a Expand the one-line function pbreassignbuf() the only place it is or could
be used.
2002-05-05 20:37:08 +00:00
Poul-Henning Kamp
d08961bec3 Move some UFS related stuff home where it belongs. 2002-05-05 20:04:33 +00:00
Maxime Henrion
4e039937d9 Add a KERNELDUMPMAGIC_CLEARED macro to unbreak savecore. Since
it is a "magic" value, what it expands to is not really important.
I set it to "Cleared Kernel Dump", but that can be changed later
if someone thinks it's not good enough.

Pointy hat to:	fenner
2002-05-05 13:47:21 +00:00
Bruce Evans
a9a0f15a69 Fixed breakage of binary compatibility of the kern.clockrate sysctl in
sys/time.h rev.1.53, etc.  Zero out the entire struct clkinfo and not
just the new spare part of it so that there is no possibility of leaking
kernel stack context to userland.
2002-05-05 04:33:09 +00:00
Poul-Henning Kamp
60a084052b Shake unused stuff out of the flags in struct buf->b_flags. 2002-05-04 19:40:34 +00:00
Poul-Henning Kamp
2a5bcfdef6 The struct buf->b_act was not used anywere. 2002-05-04 19:06:32 +00:00
Poul-Henning Kamp
48e5da550a Initialize time_second to 1 instead of zero to pacify slightly bogus arp code.
Various minor style fixes from BDE.
2002-05-03 08:46:03 +00:00
Marcel Moolenaar
cb5e1f4f73 Adjust KINFO_PROC_SIZE due to segsz_t being changed from a 32-bit to
a 64-bit integral.
2002-05-03 01:41:37 +00:00
Alfred Perlstein
90535973d5 Cleanup, quote:
This leaves some vestiges of the old locking, including style
  bugs in it.  I've only noticed anachronisms in socketvar.h so far
  (I've merged net* but not kern or all of sys).  The patch also
  has old fixes for style bugs in accf stuff and namespace pollution
  in uma...  The largest style bugs are line continued backslashes
  in column 80 and (these are fixed), and starting the do-while
  code for the new macros in column 40, which is quite unlike the
  usual indentation (see sys/queue.h) and not even like the indentation
  for the old macros (column 32) (this is not fixed).

Submitted by: bde
2002-05-02 22:03:19 +00:00
Jeff Roberson
5a34a9f089 malloc/free(9) no longer require Giant. Use the malloc_mtx to protect the
mallochash.  Mallochash is going to go away as soon as I introduce the
kfree/kmalloc api and partially overhaul the malloc wrapper.  This can't happen
until all users of the malloc api that expect memory to be aligned on the size
of the allocation are fixed.
2002-05-02 07:22:19 +00:00
Alfred Perlstein
f132072368 Redo the sigio locking.
Turn the sigio sx into a mutex.

Sigio lock is really only needed to protect interrupts from dereferencing
the sigio pointer in an object when the sigio itself is being destroyed.

In order to do this in the most unintrusive manner change pgsigio's
sigio * argument into a **, that way we can lock internally to the
function.
2002-05-01 20:44:46 +00:00
John Baldwin
6af7484d0f Axe unused SESS_UNLOCK_NOSWITCH() and PGRP_UNLOCK_NOSWITCH() macros. The
MTX_NOSWITCH flag was deprecated a while ago.
2002-05-01 18:11:16 +00:00
Matthew N. Dodd
47bbd753d9 Document the location (in the source tree) of the "Porter's Handbook". 2002-04-30 23:55:16 +00:00
Matthew N. Dodd
a8f6daaeca Bump __FreeBSD_version for mtx_init() change.
Document same.

Forgotten by:	 jhb
2002-04-30 23:54:03 +00:00
Jeff Roberson
289f207c81 Convert longs to u_longs in stats. This will hold off wrap arounds for a
while longer.
2002-04-30 22:39:32 +00:00
Poul-Henning Kamp
9c30ce571e Brucifixion ? Yes, out that door, row on the left, one patch each. 2002-04-30 19:48:45 +00:00
Seigo Tanimura
960ed29c4b Revert the change of #includes in sys/filedesc.h and sys/socketvar.h.
Requested by:	bde

Since locking sigio_lock is usually followed by calling pgsigio(),
move the declaration of sigio_lock and the definitions of SIGIO_*() to
sys/signalvar.h.

While I am here, sort include files alphabetically, where possible.
2002-04-30 01:54:54 +00:00
Robert Watson
84f9ed84d7 Since devfs now uses vnode locks, add devfs back to IS_LOCKING_VFS. 2002-04-29 20:29:08 +00:00
Mike Barcroft
d75cd4deb9 Make this header self-reliant with regard to the types it uses. 2002-04-29 16:58:54 +00:00
Poul-Henning Kamp
6b00cf46ec Stylistic sweep through the timecounter code.
Renovate comments.
2002-04-28 18:24:21 +00:00
Bruce Evans
4cfce7a7a6 Removed unused forward struct declaration. 2002-04-28 09:51:45 +00:00
Ian Dowse
ba1551ca81 Avoid the user-visible effect of setting SA_NOCLDWAIT when the
SIGCHLD handler is SIG_IGN. This is a reimplementation of the
problematic revision 1.131 of kern_exit.c. To avoid accessing process
UPAGES, we set a new procsig flag when the SIGCHLD handler is SIG_IGN
and use that instead.
2002-04-27 22:41:41 +00:00
Seigo Tanimura
d48d4b2501 Add a global sx sigio_lock to protect the pointer to the sigio object
of a socket.  This avoids lock order reversal caused by locking a
process in pgsigio().

sowakeup() and the callers of it (sowwakeup, soisconnected, etc.) now
require sigio_lock to be locked.  Provide sowwakeup_locked(),
soisconnected_locked(), and so on in case where we have to modify a
socket and wake up a process atomically.
2002-04-27 08:24:29 +00:00
Mike Barcroft
a30d4b3270 Move the new byte order function prototypes from <sys/param.h> to
<sys/endian.h>.  This puts us in line with NetBSD and OpenBSD.
2002-04-26 22:48:23 +00:00
Poul-Henning Kamp
62efba6a0c Now that the private parts of timecounters are no longer being fingered
by other bits of code, split struct timecounter into two.

struct timecounter contains just the bits which pertains to the hardware
counter and the reading of it.

struct timehands (as in "the hands on a clock") contains all the ugly bit
fidling stuff.  Statically compile ten timehands.

This commit is the functional part.  A later cosmetic patch will rename
various variables and fieldnames.
2002-04-26 21:51:08 +00:00
Poul-Henning Kamp
b4a1d0deb1 Hide the private parts of timecounter from a couple of places that don't
really need to know the gory details.
2002-04-26 21:31:44 +00:00
Poul-Henning Kamp
7bf758bff0 Simplify the RFC2783 and PPS_SYNC timestamp collection API. 2002-04-26 20:24:28 +00:00
Poul-Henning Kamp
9e1b5510c3 Move the winding of timecounters out of hardclock and into a normal
timeout loop.

Limit the rate at which we wind the timecounters to approx 1000 Hz.

This limits the precision of the get{bin,nano,micro}[up]time(9)
functions to roughly a millisecond.
2002-04-26 12:37:36 +00:00
Poul-Henning Kamp
056abcabb7 Various cleanup and sorting of clock reading functions. Add the two
functions missing in the complete 12 function complement.
2002-04-26 10:19:29 +00:00
Poul-Henning Kamp
7e2d76ff05 Remove the tc_update() function. Any frequency change to the
timecounter will be used starting at the next second, which is
good enough for sysctl purposes.  If better adjustment is needed
the NTP PLL should be used.
2002-04-26 10:06:26 +00:00