Commit Graph

3053 Commits

Author SHA1 Message Date
ken
0d3a835f3f At long last, commit the zero copy sockets code.
MAKEDEV:	Add MAKEDEV glue for the ti(4) device nodes.

ti.4:		Update the ti(4) man page to include information on the
		TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options,
		and also include information about the new character
		device interface and the associated ioctls.

man9/Makefile:	Add jumbo.9 and zero_copy.9 man pages and associated
		links.

jumbo.9:	New man page describing the jumbo buffer allocator
		interface and operation.

zero_copy.9:	New man page describing the general characteristics of
		the zero copy send and receive code, and what an
		application author should do to take advantage of the
		zero copy functionality.

NOTES:		Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS,
		TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT.

conf/files:	Add uipc_jumbo.c and uipc_cow.c.

conf/options:	Add the 5 options mentioned above.

kern_subr.c:	Receive side zero copy implementation.  This takes
		"disposable" pages attached to an mbuf, gives them to
		a user process, and then recycles the user's page.
		This is only active when ZERO_COPY_SOCKETS is turned on
		and the kern.ipc.zero_copy.receive sysctl variable is
		set to 1.

uipc_cow.c:	Send side zero copy functions.  Takes a page written
		by the user and maps it copy on write and assigns it
		kernel virtual address space.  Removes copy on write
		mapping once the buffer has been freed by the network
		stack.

uipc_jumbo.c:	Jumbo disposable page allocator code.  This allocates
		(optionally) disposable pages for network drivers that
		want to give the user the option of doing zero copy
		receive.

uipc_socket.c:	Add kern.ipc.zero_copy.{send,receive} sysctls that are
		enabled if ZERO_COPY_SOCKETS is turned on.

		Add zero copy send support to sosend() -- pages get
		mapped into the kernel instead of getting copied if
		they meet size and alignment restrictions.

uipc_syscalls.c:Un-staticize some of the sf* functions so that they
		can be used elsewhere.  (uipc_cow.c)

if_media.c:	In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid
		calling malloc() with M_WAITOK.  Return an error if
		the M_NOWAIT malloc fails.

		The ti(4) driver and the wi(4) driver, at least, call
		this with a mutex held.  This causes witness warnings
		for 'ifconfig -a' with a wi(4) or ti(4) board in the
		system.  (I've only verified for ti(4)).

ip_output.c:	Fragment large datagrams so that each segment contains
		a multiple of PAGE_SIZE amount of data plus headers.
		This allows the receiver to potentially do page
		flipping on receives.

if_ti.c:	Add zero copy receive support to the ti(4) driver.  If
		TI_PRIVATE_JUMBOS is not defined, it now uses the
		jumbo(9) buffer allocator for jumbo receive buffers.

		Add a new character device interface for the ti(4)
		driver for the new debugging interface.  This allows
		(a patched version of) gdb to talk to the Tigon board
		and debug the firmware.  There are also a few additional
		debugging ioctls available through this interface.

		Add header splitting support to the ti(4) driver.

		Tweak some of the default interrupt coalescing
		parameters to more useful defaults.

		Add hooks for supporting transmit flow control, but
		leave it turned off with a comment describing why it
		is turned off.

if_tireg.h:	Change the firmware rev to 12.4.11, since we're really
		at 12.4.11 plus fixes from 12.4.13.

		Add defines needed for debugging.

		Remove the ti_stats structure, it is now defined in
		sys/tiio.h.

ti_fw.h:	12.4.11 firmware.

ti_fw2.h:	12.4.11 firmware, plus selected fixes from 12.4.13,
		and my header splitting patches.  Revision 12.4.13
		doesn't handle 10/100 negotiation properly.  (This
		firmware is the same as what was in the tree previously,
		with the addition of header splitting support.)

sys/jumbo.h:	Jumbo buffer allocator interface.

sys/mbuf.h:	Add a new external mbuf type, EXT_DISPOSABLE, to
		indicate that the payload buffer can be thrown away /
		flipped to a userland process.

socketvar.h:	Add prototype for socow_setup.

tiio.h:		ioctl interface to the character portion of the ti(4)
		driver, plus associated structure/type definitions.

uio.h:		Change prototype for uiomoveco() so that we'll know
		whether the source page is disposable.

ufs_readwrite.c:Update for new prototype of uiomoveco().

vm_fault.c:	In vm_fault(), check to see whether we need to do a page
		based copy on write fault.

vm_object.c:	Add a new function, vm_object_allocate_wait().  This
		does the same thing that vm_object allocate does, except
		that it gives the caller the opportunity to specify whether
		it should wait on the uma_zalloc() of the object structre.

		This allows vm objects to be allocated while holding a
		mutex.  (Without generating WITNESS warnings.)

		vm_object_allocate() is implemented as a call to
		vm_object_allocate_wait() with the malloc flag set to
		M_WAITOK.

vm_object.h:	Add prototype for vm_object_allocate_wait().

vm_page.c:	Add page-based copy on write setup, clear and fault
		routines.

vm_page.h:	Add page based COW function prototypes and variable in
		the vm_page structure.

Many thanks to Drew Gallatin, who wrote the zero copy send and receive
code, and to all the other folks who have tested and reviewed this code
over the years.
2002-06-26 03:37:47 +00:00
imp
41dbee355d Partially back out the "make all interfaces standard" commit. There's
a small chance that it might have broken loading the miibus, so err on
the side of caution until I can figure out what is going on.  This
backs out all but the PCI, PCIB and ISA bus interfaces being
"standard," which have been well tested...
2002-06-24 01:53:26 +00:00
imp
013a98dc5f plxcard for OLDCARD almost certainly isn't going to happen. 2002-06-23 07:31:29 +00:00
imp
d1b74a473f As disclosed to arch@, make more interfaces standard. This allows for
easier loading of modules that might refer to these interfaces.  None
of the code that implements them is standard, just the glue.  This
bloats the kernel a whopping 8k.

Silence on: arch@
2002-06-23 07:27:24 +00:00
rwatson
1bbba0d1b8 Remove CAPABILITIES from NOTES 2002-06-21 19:53:04 +00:00
julian
2ddb82d191 A node that creates a device entry in /dev (yay devfs)
so that /dev/mumble can be the entrypoint to some networking graph,
e.g. a tunnel or a remote tape drive or whatever...

Not fully tested (by me) yet.

Submitted by:	Mark Santcroos <marks@ripe.net>
MFC after:	3 weeks
2002-06-18 21:32:33 +00:00
n_hibma
90312e83bd Make the speed used by gdb over serial settable in the kernel configuration.
This facilitates the use in circumstances where you are using a serial
console as well. GDB doesn't support anything higher than 9600 baud (19k2
if you are lucky), but the console does.
2002-06-18 21:30:37 +00:00
obrien
b9e927f911 Allow one to configure `sio'. 2002-06-18 01:14:54 +00:00
n_hibma
f739d87ace Use OBJDIR instead of CURDIR. This unbreaks loading modules through
'make load' if an object dir was, like it is used in /sys/modules. I.e.

	cd /sys/modules/umass
	make obj
	make
	make load

works again without having to install the module.

If no objdir was used the module in the current directory is used.
2002-06-17 20:01:06 +00:00
jhay
8301b3a03b sppp needs slcompress.c nowadays.
PR:		39369
2002-06-17 05:40:49 +00:00
mux
fd67d75e4d Removed a duplicate -ffreestanding. It's already set in bsd.kern.mk.
Approved by:	bde
2002-06-16 10:42:05 +00:00
rwatson
b5791a138c kern_cap.c no longer needed. 2002-06-13 23:19:34 +00:00
rwatson
f96d2575ed POSIX.1e capabilities aren't here yet, don't put an option for it
in the options file.
2002-06-13 22:41:23 +00:00
brooks
d16dfb140a Remote pci.h/NPCI usage from i4b code.
Approved by:	hm
2002-06-13 06:04:28 +00:00
phk
c112e2f321 Put geom_gpt.c under the GEOM option instead of having a special GEOM_GPT
option for it.
2002-06-10 18:49:41 +00:00
jake
56400fc901 Remove code from trap which is handled in userland now. 2002-06-08 07:17:19 +00:00
jhb
8ad95afcab According to Bruce, this file shouldn't have comments to describe what
options do.  Comments should be in NOTES and having the comments in two
places usually means that one place will just bitrot.  Thus, remove the
comment for KTRACE_REQUEST_POOL from the previous revision.

Requested by:	bde
2002-06-07 14:33:23 +00:00
jhb
ab80d12ef1 Overhaul the ktrace subsystem a bit. For the most part, the actual vnode
operations to dump a ktrace event out to an output file are now handled
asychronously by a ktrace worker thread.  This enables most ktrace events
to not need Giant once p_tracep and p_traceflag are suitably protected by
the new ktrace_lock.

There is a single todo list of pending ktrace requests.  The various
ktrace tracepoints allocate a ktrace request object and tack it onto the
end of the queue.  The ktrace kernel thread grabs requests off the head of
the queue and processes them using the trace vnode and credentials of the
thread triggering the event.

Since we cannot assume that the user memory referenced when doing a
ktrgenio() will be valid and since we can't access it from the ktrace
worker thread without a bit of hassle anyways, ktrgenio() requests are
still handled synchronously.  However, in order to ensure that the requests
from a given thread still maintain relative order to one another, when a
synchronous ktrace event (such as a genio event) is triggered, we still put
the request object on the todo list to synchronize with the worker thread.
The original thread blocks atomically with putting the item on the queue.
When the worker thread comes across an asynchronous request, it wakes up
the original thread and then blocks to ensure it doesn't manage to write a
later event before the original thread has a chance to write out the
synchronous event.  When the original thread wakes up, it writes out the
synchronous using its own context and then finally wakes the worker thread
back up.  Yuck.  The sychronous events aren't pretty but they do work.

Since ktrace events can be triggered in fairly low-level areas (msleep()
and cv_wait() for example) the ktrace code is designed to use very few
locks when posting an event (currently just the ktrace_mtx lock and the
vnode interlock to bump the refcoun on the trace vnode).  This also means
that we can't allocate a ktrace request object when an event is triggered.
Instead, ktrace request objects are allocated from a pre-allocated pool
and returned to the pool after a request is serviced.

The size of this pool defaults to 100 objects, which is about 13k on an
i386 kernel.  The size of the pool can be adjusted at compile time via the
KTRACE_REQUEST_POOL kernel option, at boot time via the
kern.ktrace_request_pool loader tunable, or at runtime via the
kern.ktrace_request_pool sysctl.

If the pool of request objects is exhausted, then a warning message is
printed to the console.  The message is rate-limited in that it is only
printed once until the size of the pool is adjusted via the sysctl.

I have tested all kernel traces but have not tested user traces submitted
by utrace(2), though they should work fine in theory.

Since a ktrace request has several properties (content of event, trace
vnode, details of originating process, credentials for I/O, etc.), I chose
to drop the first argument to the various ktrfoo() functions.  Currently
the functions just assume the event is posted from curthread.  If there is
a great desire to do so, I suppose I could instead put back the first
argument but this time make it a thread pointer instead of a vnode pointer.

Also, KTRPOINT() now takes a thread as its first argument instead of a
process.  This is because the check for a recursive ktrace event is now
per-thread instead of process-wide.

Tested on:	i386
Compiles on:	sparc64, alpha
2002-06-07 05:32:59 +00:00
mdodd
be7ae7357c 'device hea' is no longer broken.
Add 'nowerror' to a few 'hea' files to ignore warnings on volatiles.
2002-06-07 02:04:09 +00:00
gibbs
282bbce194 Hook up the ahd driver. 2002-06-06 16:35:58 +00:00
pdeuskar
6d55ec63aa Added support for 82545EM and 82546EB based adapters.
Added Vlan support.

MFC after:	1 week
2002-06-03 22:30:51 +00:00
mdodd
50437cfa68 Add new 'hea' driver files. 2002-06-03 09:14:12 +00:00
alfred
3ff28a5819 bde noticed that SOMAXCONN breaks pretty badly as an option for LINT.
so back it out.
2002-06-02 04:32:52 +00:00
brooks
bef0aeddb0 The loop back device hasn't been a count device for a while so remove
the number of interfaces.
2002-05-31 06:28:13 +00:00
takawata
d19ac116d6 Make oldcard and newcard kernel module work. 2002-05-30 17:38:00 +00:00
obrien
4c50817e02 PHK claims there is a crc32.c now. 2002-05-29 21:58:56 +00:00
obrien
4654593fd8 Back out revision 1.639. PHK filed to commit the libkern file. 2002-05-29 21:57:27 +00:00
phk
4383144a9a Add one copy of crc32() and crc32_tab[] in libkern, and remove it two other
places.

Comment out crc32 related definitions in zlib.h, we don't seem to have the
corresponding code in our kernel.
2002-05-29 20:24:09 +00:00
jake
580d1a81b5 Merge the code in pv.c into pmap.c directly. Place all page mappings onto
the pv lists in the vm_page, even unmanaged kernel mappings.  This is so
that the virtual cachability of these mappings can be tracked when a page
is mapped to more than one virtual address.  All virtually cachable
mappings of a physical page must have the same virtual colour, or illegal
alises can be created in the data cache.  This is a bit tricky because we
still have to recognize managed and unmanaged mappings, even though they
are all on the pv lists.
2002-05-29 06:08:45 +00:00
marcel
5fe0fdb432 Add support to GEOM for GUID Partition Tables (GPTs). The support
is currently conditional on both the GEOM and GEOM_GPT options to
avoid getting GPT by default and having the MBR and GPT classes
clash.
The correct behaviour of the MBR class would be to back-off (reject)
a MBR if it's a Protective MBR (a MBR with a single partition of type
0xEE that spans the whole disk (as far as the MBR is concerned).
The correct behaviour if the GPT class would be to back-off (reject)
a GPT if there's a MBR that's not a Protective MBR.

At this stage it's inconvenient to destroy a good MBR when working
with GPTs that it's more convenient to have the MBR class back-off
when it detects the GPT signature on disk and have the GPT class
ignore the MBR.

In sys/gpt.h UUIDs (GUIDs) for the following FreeBSD partitions
have been defined:

GPT_ENT_TYPE_FREEBSD
	FreeBSD slice with disklabel. This is the equivalent of
	the well-known FreeBSD MBR partition type.
GPT_ENT_TYPE_FREEBSD_{SWAP|UFS|UFS2|VINUM}
	FreeBSD partitions in the context of disklabel. This is
	speculating on the idea to use the GPT to hold partitions
	instead if slices and removing the fixed (and low) limits
	we have on the number of partitions.

This commit lacks a GPT image for the regression suite.
2002-05-28 09:04:48 +00:00
marcel
58435e6cb7 Add uuidgen(2) and uuidgen(1).
The uuidgen command, by means of the uuidgen syscall, generates one
or more Universally Unique Identifiers compatible with OSF/DCE 1.1
version 1 UUIDs.

From the Perforce logs (change 11995):

Round of cleanups:
o  Give uuidgen() the correct prototype in syscalls.master
o  Define struct uuid according to DCE 1.1 in sys/uuid.h
o  Use struct uuid instead of uuid_t. The latter is defined
   in sys/uuid.h but should not be used in kernel land.
o  Add snprintf_uuid(), printf_uuid() and sbuf_printf_uuid()
   to kern_uuid.c for use in the kernel (currently geom_gpt.c).
o  Rename the non-standard struct uuid in kern/kern_uuid.c
   to struct uuid_private and give it a slightly better definition
   for better byte-order handling. See below.
o  In sys/gpt.h, fix the broken uuid definitions to match the now
   compliant struct uuid definition. See below.
o  In usr.bin/uuidgen/uuidgen.c catch up with struct uuid change.

A note about byte-order:
        The standard failed to provide a non-conflicting and
unambiguous definition for the binary representation. My initial
implementation always wrote the timestamp as a 64-bit little-endian
(2s-complement) integral. The clock sequence was always written
as a 16-bit big-endian (2s-complement) integral. After a good
nights sleep and couple of Pan Galactic Gargle Blasters (not
necessarily in that order :-) I reread the spec and came to the
conclusion that the time fields are always written in the native
by order, provided the the low, mid and hi chopping still occurs.
The spec mentions that you "might need to swap bytes if you talk
to a machine that has a different byte-order". The clock sequence
is always written in big-endian order (as is the IEEE 802 address)
because its division is resulting in bytes, making the ordering
unambiguous.
2002-05-28 06:16:08 +00:00
phk
d41562720e Add a proof-of-concept encryption class.
"The only hard problem in cryptography is key-management."

All sectors are encrypted with AES in CBC mode using a constant key,
currently compiled in and all zero.

To activate this module, write the magic header on the partition:

	echo "<<FreeBSD-GEOM-AES>>" | dd conv=sync of=/dev/md98

The encrypted device will be one sector shorter and have ".aes"
appended to its name.

Sponsored by: DARPA & NAI Labs.
2002-05-26 18:14:38 +00:00
jake
82d31a36d4 Remove a hack for using an external compiler if cross compiling. 2002-05-26 15:55:28 +00:00
peter
a984a1d718 For now, make the .ifdef GCC3 case default. We should change -Wno-format
back to -fformat-extensions (or whatever) when we have the functionality.
We are gaining warnings again that should be fixed but the are being hidden
by NO_WERROR and all the -Wformat noise.
2002-05-24 01:02:45 +00:00
ru
c2119d6433 Fixed broken ``make -jX install''.
Spotted by:	make release TARGET_ARCH=ia64
2002-05-23 07:25:01 +00:00
jhb
d3398f2f58 Add code to make default mutexes adaptive if the ADAPTIVE_MUTEXES kernel
option is used (not on by default).

- In the case of trying to lock a mutex, if the MTX_CONTESTED flag is set,
  then we can safely read the thread pointer from the mtx_lock member while
  holding sched_lock.  We then examine the thread to see if it is currently
  executing on another CPU.  If it is, then we keep looping instead of
  blocking.
- In the case of trying to unlock a mutex, it is now possible for a mutex
  to have MTX_CONTESTED set in mtx_lock but to not have any threads
  actually blocked on it, so we need to handle that case.  In that case,
  we just release the lock as if MTX_CONTESTED was not set and return.
- We do not adaptively spin on Giant as Giant is held for long times and
  it slows SMP systems down to a crawl (it was taking several minutes,
  like 5-10 or so for my test alpha and sparc64 SMP boxes to boot up when
  they adaptively spinned on Giant).
- We only compile in the code to do this for SMP kernels, it doesn't make
  sense for UP kernels.

Tested on:	i386, alpha, sparc64
2002-05-21 20:47:11 +00:00
non
741fbfb787 MFi386: 1.398-1.399 (${MACHINE_ARCH}_dump.c -> dump_machdep.c) 2002-05-21 04:13:08 +00:00
jake
1166262e26 De-inline the tlb demap functions. These were so big that gcc3.1 refused
to inline them anyway.  ;)
2002-05-20 16:10:17 +00:00
nyan
284f241aff MFi386: revision 1.400. 2002-05-19 13:20:05 +00:00
nyan
a4a176f024 Remove unneeded entries. 2002-05-19 13:18:10 +00:00
marcel
0042ddf6d7 Remove CWARNFLAGS and add GCC3. We handle GCC3.x specific flags
centrally now that we have GCC3 in the tree. The GCC3 variable
is a helper during the switch.
2002-05-19 03:41:48 +00:00
marcel
42058b50b4 Hook up the new linux_ptrace implementation.
PR: 33299
Submitted by: Alexander N. Kabaev <ak03@gte.com>
2002-05-19 01:27:14 +00:00
rwatson
930f7599ed Remove IFS from 5.0-CURRENT. This facilitates introducing UFS2 as
IFS had its fingers deep in the belly of the UFS/FFS split.  IFS
will be reimplemented by the maintainer at a later date.

Requested by:	adrian (maintainer)
2002-05-19 00:11:08 +00:00
trhodes
28d42899b7 More s/file system/filesystem/g 2002-05-16 21:28:32 +00:00
iedowse
b459f6c726 The ufs/ffs files are no longer required by ext2fs. 2002-05-16 20:54:44 +00:00
iedowse
4009aa43de Complete the separation of ext2fs from ufs by copying the remaining
shared code and converting all ufs references. Originally it may
have made sense to share common features between the two filesystems,
but recently it has only caused problems, the UFS2 work being the
final straw.

All UFS_* indirect calls are now direct calls to ext2_* functions,
and ext2fs-specific mount and inode structures have been introduced.
2002-05-16 19:08:03 +00:00
jeff
ba85b0e087 Disable the shared locking namei() code for now. It breaks several stacking
filesystems.  This is on hold until the rest of VFS Locking is reviewed and
deemed safe.  It can be enabled with 'options LOOKUP_SHARED'.
2002-05-14 21:59:49 +00:00
ru
9cf99ff7cf Check that kldxref(8) exists before running it. 2002-05-14 07:49:12 +00:00
benno
796d01e2a8 Build the fpu support routines. 2002-05-13 07:53:22 +00:00
jake
856c4cf891 ${MACHINE_ARCH}dump.c -> dump_machdep.c. 2002-05-13 02:40:21 +00:00