At long last, commit the zero copy sockets code.

MAKEDEV:	Add MAKEDEV glue for the ti(4) device nodes.

ti.4:		Update the ti(4) man page to include information on the
		TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options,
		and also include information about the new character
		device interface and the associated ioctls.

man9/Makefile:	Add jumbo.9 and zero_copy.9 man pages and associated
		links.

jumbo.9:	New man page describing the jumbo buffer allocator
		interface and operation.

zero_copy.9:	New man page describing the general characteristics of
		the zero copy send and receive code, and what an
		application author should do to take advantage of the
		zero copy functionality.

NOTES:		Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS,
		TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT.

conf/files:	Add uipc_jumbo.c and uipc_cow.c.

conf/options:	Add the 5 options mentioned above.

kern_subr.c:	Receive side zero copy implementation.  This takes
		"disposable" pages attached to an mbuf, gives them to
		a user process, and then recycles the user's page.
		This is only active when ZERO_COPY_SOCKETS is turned on
		and the kern.ipc.zero_copy.receive sysctl variable is
		set to 1.

uipc_cow.c:	Send side zero copy functions.  Takes a page written
		by the user and maps it copy on write and assigns it
		kernel virtual address space.  Removes copy on write
		mapping once the buffer has been freed by the network
		stack.

uipc_jumbo.c:	Jumbo disposable page allocator code.  This allocates
		(optionally) disposable pages for network drivers that
		want to give the user the option of doing zero copy
		receive.

uipc_socket.c:	Add kern.ipc.zero_copy.{send,receive} sysctls that are
		enabled if ZERO_COPY_SOCKETS is turned on.

		Add zero copy send support to sosend() -- pages get
		mapped into the kernel instead of getting copied if
		they meet size and alignment restrictions.

uipc_syscalls.c:Un-staticize some of the sf* functions so that they
		can be used elsewhere.  (uipc_cow.c)

if_media.c:	In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid
		calling malloc() with M_WAITOK.  Return an error if
		the M_NOWAIT malloc fails.

		The ti(4) driver and the wi(4) driver, at least, call
		this with a mutex held.  This causes witness warnings
		for 'ifconfig -a' with a wi(4) or ti(4) board in the
		system.  (I've only verified for ti(4)).

ip_output.c:	Fragment large datagrams so that each segment contains
		a multiple of PAGE_SIZE amount of data plus headers.
		This allows the receiver to potentially do page
		flipping on receives.

if_ti.c:	Add zero copy receive support to the ti(4) driver.  If
		TI_PRIVATE_JUMBOS is not defined, it now uses the
		jumbo(9) buffer allocator for jumbo receive buffers.

		Add a new character device interface for the ti(4)
		driver for the new debugging interface.  This allows
		(a patched version of) gdb to talk to the Tigon board
		and debug the firmware.  There are also a few additional
		debugging ioctls available through this interface.

		Add header splitting support to the ti(4) driver.

		Tweak some of the default interrupt coalescing
		parameters to more useful defaults.

		Add hooks for supporting transmit flow control, but
		leave it turned off with a comment describing why it
		is turned off.

if_tireg.h:	Change the firmware rev to 12.4.11, since we're really
		at 12.4.11 plus fixes from 12.4.13.

		Add defines needed for debugging.

		Remove the ti_stats structure, it is now defined in
		sys/tiio.h.

ti_fw.h:	12.4.11 firmware.

ti_fw2.h:	12.4.11 firmware, plus selected fixes from 12.4.13,
		and my header splitting patches.  Revision 12.4.13
		doesn't handle 10/100 negotiation properly.  (This
		firmware is the same as what was in the tree previously,
		with the addition of header splitting support.)

sys/jumbo.h:	Jumbo buffer allocator interface.

sys/mbuf.h:	Add a new external mbuf type, EXT_DISPOSABLE, to
		indicate that the payload buffer can be thrown away /
		flipped to a userland process.

socketvar.h:	Add prototype for socow_setup.

tiio.h:		ioctl interface to the character portion of the ti(4)
		driver, plus associated structure/type definitions.

uio.h:		Change prototype for uiomoveco() so that we'll know
		whether the source page is disposable.

ufs_readwrite.c:Update for new prototype of uiomoveco().

vm_fault.c:	In vm_fault(), check to see whether we need to do a page
		based copy on write fault.

vm_object.c:	Add a new function, vm_object_allocate_wait().  This
		does the same thing that vm_object allocate does, except
		that it gives the caller the opportunity to specify whether
		it should wait on the uma_zalloc() of the object structre.

		This allows vm objects to be allocated while holding a
		mutex.  (Without generating WITNESS warnings.)

		vm_object_allocate() is implemented as a call to
		vm_object_allocate_wait() with the malloc flag set to
		M_WAITOK.

vm_object.h:	Add prototype for vm_object_allocate_wait().

vm_page.c:	Add page-based copy on write setup, clear and fault
		routines.

vm_page.h:	Add page based COW function prototypes and variable in
		the vm_page structure.

Many thanks to Drew Gallatin, who wrote the zero copy send and receive
code, and to all the other folks who have tested and reviewed this code
over the years.
This commit is contained in:
Kenneth D. Merry 2002-06-26 03:37:47 +00:00
parent a69ac1740f
commit 98cb733c67
Notes: svn2git 2020-12-20 02:59:44 +00:00
svn path=/head/; revision=98849
35 changed files with 14898 additions and 10527 deletions

View File

@ -339,6 +339,7 @@ all)
sh $0 i4brbch0 i4brbch1 # ISDN
sh $0 agpgart # AGP
sh $0 nsmb0 # SMB/CIFS
sh $0 ti0 ti1 # ti(4)
;;
# a much restricted set of the above, to save precious i-nodes on the
@ -1526,6 +1527,12 @@ card*)
chmod 644 card$unit
;;
ti*) unit=`expr $i : 'ti\(.*\)'`
chr=153
mknod ti$unit c $chr `unit2minor $unit`
chmod 600 ti$unit
;;
ttyx?|ttyy?|ttyz?)
case $i in
*0) unit=0;; *1) unit=1;; *2) unit=2;; *3) unit=3;;

View File

@ -38,6 +38,8 @@
.Nd "Alteon Networks Tigon I and Tigon II gigabit ethernet driver"
.Sh SYNOPSIS
.Cd "device ti"
.Cd "options TI_PRIVATE_JUMBOS"
.Cd "options TI_JUMBO_HDRSPLIT"
.Sh DESCRIPTION
The
.Nm
@ -109,6 +111,24 @@ utility configures the adapter to receive and transmit jumbo frames.
Using jumbo frames can greatly improve performance for certain tasks,
such as file transfers and data streaming.
.Pp
Header splitting support for Tigon 2 boards (this option has no effect for
the Tigon 1) can be turned on with the
.Dv TI_JUMBO_HDRSPLIT
option. See
.Xr zero_copy 9
for more discussion on zero copy receive and header splitting.
.Pp
The
.Nm
driver normally uses jumbo receive buffers allocated by the
.Xr jumbo 9
buffer allocator, but can be configured to use its own private pool of
jumbo buffers that are contiguous instead of buffers from the jumbo
allocator, which are made up of multiple page sized chunks. To turn on
private jumbos, use the
.Dv TI_PRIVATE_JUMBOS
option.
.Pp
Support for vlans is also available using the
.Xr vlan 4
mechanism.
@ -165,6 +185,64 @@ Force half duplex operation.
.Pp
For more information on configuring this device, see
.Xr ifconfig 8 .
.Sh IOCTLS
In addition to the standard
.Xr socket 2
.Xr ioctl 2
calls implemented by most network drivers, the
.Nm
driver also includes a character device interface that can be used for
additional diagnostics, configuration and debugging. With this character
device interface, and a specially patched version of gdb, the user can
debug firmware running on the Tigon board.
.Pp
These ioctls and their arguments are defined in the
.Aq Pa sys/tiio.h
header file.
.Bl -tag -width ALT_WRITE_TG_MEM
.It Dv TIIOCGETSTATS
Return card statistics DMAed from the card into kernel memory approximately
every 2 seconds. (That time interval can be changed via the
.Dv TIIOCSETPARAMS
ioctl.) The argument is
.Va struct ti_stats .
.It Dv TIIOCGETPARAMS
Get various performance-related firmware parameters that largely affect how
interrupts are coalesced. The argument is
.Va struct ti_params .
.It Dv TIIOCSETPARAMS
Set various performance-related firmware parameters that largely affect how
interrupts are coalesced. The argument is
.Va struct ti_params .
.It Dv TIIOCSETTRACE
Tell the NIC to trace the requested types of information.
The argument is
.Va ti_trace_type .
.It Dv TIIOCGETTRACE
Dump the trace buffer from the card. The argument is
.Va struct ti_trace_buf .
.It Dv ALT_ATTACH
This ioctl is used for compatibility with Alteon's Solaris driver. They
apparantly only have one character interface for debugging, so they have
to tell it which Tigon instance they want to debug. This ioctl is a noop
for FreeBSD.
.It Dv ALT_READ_TG_MEM
Read the requested memory region from the Tigon board. The argument is
.Va struct tg_mem .
.It Dv ALT_WRITE_TG_MEM
Write to the requested memory region on the Tigon board. The argument is
.Va struct tg_mem .
.It Dv ALT_READ_TG_REG
Read the requested register on the Tigon board. The argument is
.Va struct tg_reg .
.It Dv ALT_WRITE_TG_REG
Write to the requested register on the Tigon board. The argument is
.Va struct tg_reg .
.El
.Sh FILES
.Bl -tag -width /dev/ti[0-255] -compact
.It Pa /dev/ti[0-255]
Tigon driver character interface.
.Sh DIAGNOSTICS
.Bl -diag
.It "ti%d: couldn't map memory"
@ -206,7 +284,9 @@ the network connection (cable).
.Xr netintro 4 ,
.Xr ng_ether 4 ,
.Xr vlan 4 ,
.Xr ifconfig 8
.Xr ifconfig 8 ,
.Xr zero_copy 9 ,
.Xr jumbo 9
.Rs
.%T Alteon Gigabit Ethernet/PCI NIC manuals
.%O http://sanjose.alteon.com/open.shtml
@ -221,3 +301,8 @@ The
.Nm
driver was written by
.An Bill Paul Aq wpaul@bsdi.com .
The header splitting firmware modifications, character ioctl interface and
debugging support were written by
.An Kenneth Merry Aq ken@FreeBSD.org .
Initial zero copy support was written by
.An Andrew Gallatin Aq gallatin@FreeBSD.org .

View File

@ -45,6 +45,7 @@ MAN= BUF_LOCK.9 BUF_LOCKFREE.9 BUF_LOCKINIT.9 BUF_REFCNT.9 \
getnewvnode.9 \
groupmember.9 \
ifnet.9 inittodr.9 intro.9 ithread.9 \
jumbo.9 \
kernacc.9 kobj.9 kthread.9 ktr.9 \
lock.9 \
make_dev.9 malloc.9 mbchain.9 mbuf.9 mdchain.9 \
@ -77,7 +78,7 @@ MAN= BUF_LOCK.9 BUF_LOCKFREE.9 BUF_LOCKINIT.9 BUF_REFCNT.9 \
vm_page_wire.9 vm_page_zero_fill.9 vm_set_page_size.9 \
vn_isdisk.9 vnode.9 vput.9 vref.9 vrele.9 \
vslock.9 \
zone.9
zero_copy.9 zone.9
MLINKS+=DRIVER_MODULE.9 MULTI_DRIVER_MODULE.9
MLINKS+=MD5.9 MD5Init.9 MD5.9 MD5Transform.9
@ -148,6 +149,9 @@ MLINKS+=ifnet.9 if_data.9 ifnet.9 ifaddr.9 ifnet.9 ifqueue.9
MLINKS+=ithread.9 ithread_add_handler.9 ithread.9 ithread_create.9
MLINKS+=ithread.9 ithread_destroy.9 ithread.9 ithread_priority.9
MLINKS+=ithread.9 ithread_remove_handler.9 ithread.9 ithread_schedule.9
MLINKS+=jumbo.9 jumbo_vm_init.9 jumbo.9 jumbo_pg_alloc.9
MLINKS+=jumbo.9 jumbo_pg_free.9 jumbo.9 jumbo_freem.9
MLINKS+=jumbo.9 jumbo_pg_steal.9 jumbo.9 jumbo_phys_to_kva.9
MLINKS+=kernacc.9 useracc.9
MLINKS+=kthread.9 kproc_start.9 kthread.9 kproc_shutdown.9
MLINKS+=kthread.9 kthread_create.9 kthread.9 kthread_exit.9
@ -250,6 +254,7 @@ MLINKS+=vm_page_wakeup.9 vm_page_flash.9
MLINKS+=vm_page_wire.9 vm_page_unwire.9
MLINKS+=vref.9 VREF.9
MLINKS+=vslock.9 vsunlock.9
MLINKS+=zero_copy.9 zero_copy_sockets.9
MLINKS+=device_add_child.9 device_add_child_ordered.9
MLINKS+=device_enable.9 device_disable.9

153
share/man/man9/jumbo.9 Normal file
View File

@ -0,0 +1,153 @@
.\"
.\" Copyright (c) 2002 Kenneth D. Merry.
.\" All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions, and the following disclaimer,
.\" without modification, immediately at the beginning of the file.
.\" 2. The name of the author may not be used to endorse or promote products
.\" derived from this software without specific prior written permission.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
.\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" $FreeBSD$
.\"
.Dd June 23, 2002
.Dt JUMBO 9
.Os
.Sh NAME
.Nm jumbo ,
.Nm jumbo_vm_init ,
.Nm jumbo_pg_alloc ,
.Nm jumbo_pg_free ,
.Nm jumbo_freem ,
.Nm jumbo_pg_steal ,
.Nm jumbo_phys_to_kva
.Nd kernel interface for allocating and freeing page-sized disposable buffers
.Sh SYNOPSIS
.In sys/jumbo.h
.Ft int
.Fo jumbo_vm_init
.Fa "void"
.Fc
.Ft vm_page_t
.Fo jumbo_pg_alloc
.Fa "void"
.Fc
.Ft void
.Fo jumbo_pg_free
.Fa "vm_offset_t addr"
.Fc
.Ft void
.Fo jumbo_freem
.Fa "caddr_t addr"
.Fa "void *args"
.Fc
.Ft void
.Fo jumbo_pg_steal
.Fa "vm_page_t pg"
.Fc
.Ft caddr_t
.Fo jumbo_phys_to_kva
.Fa "vm_offset_t pa"
.Fc
.Sh DESCRIPTION
The
.Nm
buffer facility is designed for allocating disposable page-sized
buffers. Buffers allocated via this facility can either be returned or
not. This facility is primarily intended for use with network adapters
that have MTUs of a page or greater size. The buffers will normally be
disposed of by the
.Xr zero_copy 9
receive code.
.Pp
.Fn jumbo_vm_init
initializes the pool of KVA the
.Nm
code needs to operate and does some
other initialization to prepare the subsystem for operation. This routine
only needs to be called once. Calling it multiple times will have no
effect. It is recommended that this initialization routine be called in a
device driver attach routine, so that resources are not allocated if the
.Nm
subsystem won't end up being used.
.Fn jumbo_vm_init
returns 1 upon successful completion, and 0 upon failure.
.Pp
.Fn jumbo_pg_alloc
allocates a physical page and assigns a piece of KVA from the
.Nm
KVA pool. It returns the allocated page if successful, and NULL in the case of
failure.
.Pp
.Fn jumbo_pg_free
frees a page allocated by
.Fn jumbo_pg_alloc .
It takes the address of the memory in question as an argument. This
routine will normally be used in cases where the allocated
.Nm jumbo
page cannot be used for some reason. The normal free path is via
.Fn jumbo_freem .
.Pp
.Fn jumbo_freem
is the routine that should be given as the external free routine when an
external mbuf is allocated using pages from the
.Nm
allocator. It takes the virtual address of the page in question, and
ignores the second argument.
.Pp
.Fn jumbo_pg_steal
"steals" a page and recycles its KVA space.
.Pp
.Fn jumbo_phys_to_kva
translates the physical address of a
.Nm
allocated page to the proper kernel virtual address.
.Sh SEE ALSO
.Xr ti 4 ,
.Xr zero_copy 9
.Sh HISTORY
The
.Nm
allocator is primarily based on a page allocator system originally written
by
.An Andrew Gallatin Aq gallatin@FreeBSD.org
as part of a set of zero copy patches for the
.Xr ti 4
driver. The allocator was taken out of the
.Xr ti 4
driver, cleaned up and ported to the new mutex interface by
.An Kenneth Merry Aq ken@FreeBSD.org .
.Pp
The
.Nm
allocator first appeared in
.Fx 5.0 ,
and has existed in patch form since at least 1999.
.Sh AUTHORS
.An Andrew Gallatin Aq gallatin@FreeBSD.org
and
.An Kenneth Merry Aq ken@FreeBSD.org .
.Sh BUGS
There is currently a static number of KVA pages allocated by the
.Nm
allocator, with no real provision for increasing the number of pages
allocated should demand exceed supply.
.Pp
The
.Fn jumbo_pg_steal
function isn't currently used anywhere.

142
share/man/man9/zero_copy.9 Normal file
View File

@ -0,0 +1,142 @@
.\"
.\" Copyright (c) 2002 Kenneth D. Merry.
.\" All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions, and the following disclaimer,
.\" without modification, immediately at the beginning of the file.
.\" 2. The name of the author may not be used to endorse or promote products
.\" derived from this software without specific prior written permission.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
.\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" $FreeBSD$
.\"
.Dd June 23, 2002
.Dt ZERO_COPY 9
.Os
.Sh NAME
.Nm zero_copy ,
.Nm zero_copy_sockets
.Sh SYNOPSIS
.Cd options ZERO_COPY_SOCKETS
.Sh DESCRIPTION
The FreeBSD kernel includes a facility for eliminating data copies on
socket reads and writes.
.Pp
This code is collectively known as the zero copy sockets code, because during
normal network I/O, data will not be copied by the CPU at all. Rather it
will be DMAed from the user's buffer to the NIC (for sends), or DMAed from
the NIC to a buffer that will then be given to the user (receives).
.Pp
The zero copy sockets code uses the standard socket read and write
semantics, and therefore has some limitations and restrictions that
programmers should be aware of when trying to take advantage of this
functionality.
.Pp
For sending data, there are no special requirements or capabilities that
the sending NIC must have. The data written to the socket, though, must be
at least a page in size and page aligned in order to be mapped into the
kernel. If it doesn't meet the page size and alignment constraints, it
will be copied into the kernel, as is normally the case with socket I/O.
.Pp
The user should be careful not to overwrite buffers that have been written
to the socket before the data has been freed by the kernel, and the
copy-on-write mapping cleared. If a buffer is overwritten before it has
been given up by the kernel, the data will be copied, and no savings in CPU
utilization and memory bandwidth utilization will be realized.
.Pp
The
.Xr socket 2
API doesn't really give the user any indication of when his data has
actually been sent over the wire, or when the data has been freed from
kernel buffers. For protocols like TCP, the data will be kept around in
the kernel until it has been acknowledged by the other side; it must be
kept until the acknowledgement is received in case retransmission is required.
.Pp
From an application standpoint, the best way to guarantee that the data has
been sent out over the wire and freed by the kernel (for TCP-based sockets)
is to set a socket buffer size (see the SO_SNDBUF socket option in the
.Xr setsockopt 2
man page) appropriate for the application and network environment and then
make sure you have sent out twice as much data as the socket buffer size
before reusing a buffer. For TCP, the send and receive socket buffer sizes
generally directly correspond to the TCP window size.
.Pp
For receiving data, in order to take advantage of the zero copy receive
code, the user must have a NIC that is configured for an MTU greater than
the architecture page size. (e.g., for alpha this would be 8KB, for i386,
it would be 4KB) Additionally, in order for zero copy receive to work,
packet payloads must be at least a page in size and page aligned.
.Pp
Achieving page aligned payloads requires a NIC that can split an incoming
packet into multiple buffers. It also generally requires some sort of
intelligence on the NIC to make sure that the payload starts in its own
buffer. This is called "header splitting". Currently the only NICs with
support for header splitting are Alteon Tigon 2 based boards running
slightly modified firmware. The FreeBSD
.Xr ti 4
driver includes modified firmware for Tigon 2 boards only. Header
splitting code can be written, however, for any NIC that allows putting
received packets into multiple buffers and that has enough programability
to determine that the header should go into one buffer and the payload into
another.
.Pp
You can also do a form of header splitting that doesn't require any NIC
modifications if your NIC is at least capable of splitting packets into
multiple buffers. This requires that you optimize the NIC driver for your
most common packet header size. If that size (ethernet + IP + TCP headers)
is generally 66 bytes, for instance, you would set the first buffer in a
set for a particular packet to be 66 bytes long, and then subsequent
buffers would be a page in size. For packets that have headers that are
exactly 66 bytes long, your payload will be page aligned.
.Pp
The other requirement for zero copy receive to work is that the buffer that
is the destination for the data read from a socket must be at least a page
in size and page aligned.
.Pp
Obviously the requirements for receive side zero copy are impossible to
meet without NIC hardware that is programmable enough to do header
splitting of some sort. Since most NICs aren't that programmable, or their
manufacturers won't share the source code to their firmware, this approach
to zero copy receive isn't widely useful.
.Pp
There are other approaches, such as RDMA and TCP Offload, that may
potentially help alleviate the CPU overhead associated with copying data
out of the kernel. Most known techniques require some sort of support at
the NIC level to work, and describing such techniques is beyond the scope
of this manual page.
.Pp
The zero copy send and zero copy receive code can be individually turned
off via the
.Va kern.ipc.zero_copy.send
and
.Va kern.ipc.zero_copy.receive
.Nm sysctl
variables respectively.
.Sh SEE ALSO
.Xr socket 2 ,
.Xr sendfile 2 ,
.Xr ti 4,
.Xr jumbo 9
.Sh HISTORY
The zero copy sockets code first appeared in FreeBSD 5.0, although it has
been in existence in patch form since at least mid-1999.
.Sh AUTHORS
The zero copy sockets code was originally written by
.An Andrew Gallatin Aq gallatin@FreeBSD.org
and substantially modified and updated by
.An Kenneth Merry Aq ken@FreeBSD.org .

View File

@ -533,6 +533,13 @@ options TCP_DROP_SYNFIN #drop TCP packets with SYN+FIN
options DUMMYNET
options BRIDGE
# Zero copy sockets support. This enables "zero copy" for sending and
# receving data via a socket. The send side works for any type of NIC,
# the receive side only works for NICs that support MTUs greater than the
# page size of your architecture and that support header splitting. See
# zero_copy(9) for more details.
options ZERO_COPY_SOCKETS
#
# ATM (HARP version) options
#
@ -1670,6 +1677,13 @@ device sk
device ti
device fpa 1
# Use "private" jumbo buffers allocated exclusively for the ti(4) driver.
# This option is incompatible with the TI_JUMBO_HDRSPLIT option below.
#options TI_PRIVATE_JUMBOS
# Turn on the header splitting option for the ti(4) driver firmware. This
# only works for Tigon II chips, and has no effect for Tigon I chips.
options TI_JUMBO_HDRSPLIT
#
# ATM related options (Cranor version)
# (note: this driver cannot be used with the HARP ATM stack)
@ -2255,6 +2269,8 @@ options MSGTQL=41 # Max number of messages in system
options NBUF=512 # Number of buffer headers
options MSIZE=256 # mbuf size in bytes
options MCLSHIFT=12 # mbuf cluster shift in bits, 12 == 4KB
options NMBCLUSTERS=1024 # Number of mbuf clusters
options SCSI_NCR_DEBUG

View File

@ -921,7 +921,9 @@ kern/tty_pty.c optional pty
kern/tty_subr.c standard
kern/tty_tty.c standard
kern/uipc_accf.c optional inet
kern/uipc_cow.c optional zero_copy_sockets
kern/uipc_domain.c standard
kern/uipc_jumbo.c standard
kern/uipc_mbuf.c standard
kern/uipc_mbuf2.c standard
kern/uipc_proto.c standard

View File

@ -345,6 +345,11 @@ NETGRAPH_VJC opt_netgraph.h
DRM_LINUX opt_drm.h
DRM_DEBUG opt_drm.h
ZERO_COPY_SOCKETS opt_zero.h
TI_PRIVATE_JUMBOS opt_ti.h
TI_JUMBO_HDRSPLIT opt_ti.h
# ATM (HARP version)
ATM_CORE opt_atm.h
ATM_IP opt_atm.h
@ -405,6 +410,8 @@ INVARIANTS opt_global.h
REGRESSION opt_global.h
RESTARTABLE_PANICS opt_global.h
VFS_BIO_DEBUG opt_global.h
MSIZE opt_global.h
MCLSHIFT opt_global.h
# These are VM related options
VM_KMEM_SIZE opt_vm.h

File diff suppressed because it is too large Load Diff

View File

@ -137,7 +137,7 @@
*/
#define TI_FIRMWARE_MAJOR 0xc
#define TI_FIRMWARE_MINOR 0x4
#define TI_FIRMWARE_FIX 0xd
#define TI_FIRMWARE_FIX 0xb
/*
* Miscelaneous Local Control register.
@ -348,6 +348,7 @@
#define TI_OPMODE_NO_TX_INTRS 0x00002000
#define TI_OPMODE_NO_RX_INTRS 0x00004000
#define TI_OPMODE_FATAL_ENB 0x40000000 /* not yet implimented */
#define TI_OPMODE_JUMBO_HDRSPLIT 0x00008000
/*
* DMA configuration thresholds.
@ -422,6 +423,53 @@
*/
#define TI_MEM_MAX 0x7FFFFF
/*
* Maximum register address on the Tigon.
*/
#define TI_REG_MAX 0x3fff
/*
* These values were taken from Alteon's tg.h.
*/
#define TI_BEG_SRAM 0x0 /* host thinks it's here */
#define TI_BEG_SCRATCH 0xc00000 /* beg of scratch pad area */
#define TI_END_SRAM_II 0x800000 /* end of SRAM, for 2 MB stuffed */
#define TI_END_SCRATCH_II 0xc04000 /* end of scratch pad CPU A (16KB) */
#define TI_END_SCRATCH_B 0xc02000 /* end of scratch pad CPU B (8KB) */
#define TI_BEG_SCRATCH_B_DEBUG 0xd00000 /* beg of scratch pad for ioctl */
#define TI_END_SCRATCH_B_DEBUG 0xd02000 /* end of scratch pad for ioctl */
#define TI_SCRATCH_DEBUG_OFF 0x100000 /* offset for ioctl usage */
#define TI_END_SRAM_I 0x200000 /* end of SRAM, for 2 MB stuffed */
#define TI_END_SCRATCH_I 0xc00800 /* end of scratch pad area (2KB) */
#define TI_BEG_PROM 0x40000000 /* beg of PROM, special access */
#define TI_BEG_FLASH 0x80000000 /* beg of EEPROM, special access */
#define TI_END_FLASH 0x80100000 /* end of EEPROM for 1 MB stuff */
#define TI_BEG_SER_EEPROM 0xa0000000 /* beg of Serial EEPROM (fake out) */
#define TI_END_SER_EEPROM 0xa0002000 /* end of Serial EEPROM (fake out) */
#define TI_BEG_REGS 0xc0000000 /* beg of register area */
#define TI_END_REGS 0xc0000400 /* end of register area */
#define TI_END_WRITE_REGS 0xc0000180 /* can't write GPRs currently */
#define TI_BEG_REGS2 0xc0000200 /* beg of second writeable reg area */
/* the EEPROM is byte addressable in a pretty odd way */
#define EEPROM_BYTE_LOC 0xff000000
/*
* From Alteon's tg.h.
*/
#define TI_PROCESSOR_A 0
#define TI_PROCESSOR_B 1
#define TI_CPU_A TG_PROCESSOR_A
#define TI_CPU_B TG_PROCESSOR_B
/*
* Following macro can be used to access to any of the CPU registers
* It will adjust the address appropriately.
* Parameters:
* reg - The register to access, e.g TI_CPU_CONTROL
* cpu - cpu, i.e PROCESSOR_A or PROCESSOR_B (or TI_CPU_A or TI_CPU_B)
*/
#define CPU_REG(reg, cpu) ((reg) + (cpu) * 0x100)
/*
* Even on the alpha, pci addresses are 32-bit quantities
*/
@ -486,192 +534,6 @@ struct ti_producer {
u_int32_t ti_unused;
};
/*
* Tigon statistics counters.
*/
struct ti_stats {
/*
* MAC stats, taken from RFC 1643, ethernet-like MIB
*/
volatile u_int32_t dot3StatsAlignmentErrors; /* 0 */
volatile u_int32_t dot3StatsFCSErrors; /* 1 */
volatile u_int32_t dot3StatsSingleCollisionFrames; /* 2 */
volatile u_int32_t dot3StatsMultipleCollisionFrames; /* 3 */
volatile u_int32_t dot3StatsSQETestErrors; /* 4 */
volatile u_int32_t dot3StatsDeferredTransmissions; /* 5 */
volatile u_int32_t dot3StatsLateCollisions; /* 6 */
volatile u_int32_t dot3StatsExcessiveCollisions; /* 7 */
volatile u_int32_t dot3StatsInternalMacTransmitErrors; /* 8 */
volatile u_int32_t dot3StatsCarrierSenseErrors; /* 9 */
volatile u_int32_t dot3StatsFrameTooLongs; /* 10 */
volatile u_int32_t dot3StatsInternalMacReceiveErrors; /* 11 */
/*
* interface stats, taken from RFC 1213, MIB-II, interfaces group
*/
volatile u_int32_t ifIndex; /* 12 */
volatile u_int32_t ifType; /* 13 */
volatile u_int32_t ifMtu; /* 14 */
volatile u_int32_t ifSpeed; /* 15 */
volatile u_int32_t ifAdminStatus; /* 16 */
#define IF_ADMIN_STATUS_UP 1
#define IF_ADMIN_STATUS_DOWN 2
#define IF_ADMIN_STATUS_TESTING 3
volatile u_int32_t ifOperStatus; /* 17 */
#define IF_OPER_STATUS_UP 1
#define IF_OPER_STATUS_DOWN 2
#define IF_OPER_STATUS_TESTING 3
#define IF_OPER_STATUS_UNKNOWN 4
#define IF_OPER_STATUS_DORMANT 5
volatile u_int32_t ifLastChange; /* 18 */
volatile u_int32_t ifInDiscards; /* 19 */
volatile u_int32_t ifInErrors; /* 20 */
volatile u_int32_t ifInUnknownProtos; /* 21 */
volatile u_int32_t ifOutDiscards; /* 22 */
volatile u_int32_t ifOutErrors; /* 23 */
volatile u_int32_t ifOutQLen; /* deprecated */ /* 24 */
volatile u_int8_t ifPhysAddress[8]; /* 8 bytes */ /* 25 - 26 */
volatile u_int8_t ifDescr[32]; /* 27 - 34 */
u_int32_t alignIt; /* align to 64 bit for u_int64_ts following */
/*
* more interface stats, taken from RFC 1573, MIB-IIupdate,
* interfaces group
*/
volatile u_int64_t ifHCInOctets; /* 36 - 37 */
volatile u_int64_t ifHCInUcastPkts; /* 38 - 39 */
volatile u_int64_t ifHCInMulticastPkts; /* 40 - 41 */
volatile u_int64_t ifHCInBroadcastPkts; /* 42 - 43 */
volatile u_int64_t ifHCOutOctets; /* 44 - 45 */
volatile u_int64_t ifHCOutUcastPkts; /* 46 - 47 */
volatile u_int64_t ifHCOutMulticastPkts; /* 48 - 49 */
volatile u_int64_t ifHCOutBroadcastPkts; /* 50 - 51 */
volatile u_int32_t ifLinkUpDownTrapEnable; /* 52 */
volatile u_int32_t ifHighSpeed; /* 53 */
volatile u_int32_t ifPromiscuousMode; /* 54 */
volatile u_int32_t ifConnectorPresent; /* follow link state 55 */
/*
* Host Commands
*/
volatile u_int32_t nicCmdsHostState; /* 56 */
volatile u_int32_t nicCmdsFDRFiltering; /* 57 */
volatile u_int32_t nicCmdsSetRecvProdIndex; /* 58 */
volatile u_int32_t nicCmdsUpdateGencommStats; /* 59 */
volatile u_int32_t nicCmdsResetJumboRing; /* 60 */
volatile u_int32_t nicCmdsAddMCastAddr; /* 61 */
volatile u_int32_t nicCmdsDelMCastAddr; /* 62 */
volatile u_int32_t nicCmdsSetPromiscMode; /* 63 */
volatile u_int32_t nicCmdsLinkNegotiate; /* 64 */
volatile u_int32_t nicCmdsSetMACAddr; /* 65 */
volatile u_int32_t nicCmdsClearProfile; /* 66 */
volatile u_int32_t nicCmdsSetMulticastMode; /* 67 */
volatile u_int32_t nicCmdsClearStats; /* 68 */
volatile u_int32_t nicCmdsSetRecvJumboProdIndex; /* 69 */
volatile u_int32_t nicCmdsSetRecvMiniProdIndex; /* 70 */
volatile u_int32_t nicCmdsRefreshStats; /* 71 */
volatile u_int32_t nicCmdsUnknown; /* 72 */
/*
* NIC Events
*/
volatile u_int32_t nicEventsNICFirmwareOperational; /* 73 */
volatile u_int32_t nicEventsStatsUpdated; /* 74 */
volatile u_int32_t nicEventsLinkStateChanged; /* 75 */
volatile u_int32_t nicEventsError; /* 76 */
volatile u_int32_t nicEventsMCastListUpdated; /* 77 */
volatile u_int32_t nicEventsResetJumboRing; /* 78 */
/*
* Ring manipulation
*/
volatile u_int32_t nicRingSetSendProdIndex; /* 79 */
volatile u_int32_t nicRingSetSendConsIndex; /* 80 */
volatile u_int32_t nicRingSetRecvReturnProdIndex; /* 81 */
/*
* Interrupts
*/
volatile u_int32_t nicInterrupts; /* 82 */
volatile u_int32_t nicAvoidedInterrupts; /* 83 */
/*
* BD Coalessing Thresholds
*/
volatile u_int32_t nicEventThresholdHit; /* 84 */
volatile u_int32_t nicSendThresholdHit; /* 85 */
volatile u_int32_t nicRecvThresholdHit; /* 86 */
/*
* DMA Attentions
*/
volatile u_int32_t nicDmaRdOverrun; /* 87 */
volatile u_int32_t nicDmaRdUnderrun; /* 88 */
volatile u_int32_t nicDmaWrOverrun; /* 89 */
volatile u_int32_t nicDmaWrUnderrun; /* 90 */
volatile u_int32_t nicDmaWrMasterAborts; /* 91 */
volatile u_int32_t nicDmaRdMasterAborts; /* 92 */
/*
* NIC Resources
*/
volatile u_int32_t nicDmaWriteRingFull; /* 93 */
volatile u_int32_t nicDmaReadRingFull; /* 94 */
volatile u_int32_t nicEventRingFull; /* 95 */
volatile u_int32_t nicEventProducerRingFull; /* 96 */
volatile u_int32_t nicTxMacDescrRingFull; /* 97 */
volatile u_int32_t nicOutOfTxBufSpaceFrameRetry; /* 98 */
volatile u_int32_t nicNoMoreWrDMADescriptors; /* 99 */
volatile u_int32_t nicNoMoreRxBDs; /* 100 */
volatile u_int32_t nicNoSpaceInReturnRing; /* 101 */
volatile u_int32_t nicSendBDs; /* current count 102 */
volatile u_int32_t nicRecvBDs; /* current count 103 */
volatile u_int32_t nicJumboRecvBDs; /* current count 104 */
volatile u_int32_t nicMiniRecvBDs; /* current count 105 */
volatile u_int32_t nicTotalRecvBDs; /* current count 106 */
volatile u_int32_t nicTotalSendBDs; /* current count 107 */
volatile u_int32_t nicJumboSpillOver; /* 108 */
volatile u_int32_t nicSbusHangCleared; /* 109 */
volatile u_int32_t nicEnqEventDelayed; /* 110 */
/*
* Stats from MAC rx completion
*/
volatile u_int32_t nicMacRxLateColls; /* 111 */
volatile u_int32_t nicMacRxLinkLostDuringPkt; /* 112 */
volatile u_int32_t nicMacRxPhyDecodeErr; /* 113 */
volatile u_int32_t nicMacRxMacAbort; /* 114 */
volatile u_int32_t nicMacRxTruncNoResources; /* 115 */
/*
* Stats from the mac_stats area
*/
volatile u_int32_t nicMacRxDropUla; /* 116 */
volatile u_int32_t nicMacRxDropMcast; /* 117 */
volatile u_int32_t nicMacRxFlowControl; /* 118 */
volatile u_int32_t nicMacRxDropSpace; /* 119 */
volatile u_int32_t nicMacRxColls; /* 120 */
/*
* MAC RX Attentions
*/
volatile u_int32_t nicMacRxTotalAttns; /* 121 */
volatile u_int32_t nicMacRxLinkAttns; /* 122 */
volatile u_int32_t nicMacRxSyncAttns; /* 123 */
volatile u_int32_t nicMacRxConfigAttns; /* 124 */
volatile u_int32_t nicMacReset; /* 125 */
volatile u_int32_t nicMacRxBufDescrAttns; /* 126 */
volatile u_int32_t nicMacRxBufAttns; /* 127 */
volatile u_int32_t nicMacRxZeroFrameCleanup; /* 128 */
volatile u_int32_t nicMacRxOneFrameCleanup; /* 129 */
volatile u_int32_t nicMacRxMultipleFrameCleanup; /* 130 */
volatile u_int32_t nicMacRxTimerCleanup; /* 131 */
volatile u_int32_t nicMacRxDmaCleanup; /* 132 */
/*
* Stats from the mac_stats area
*/
volatile u_int32_t nicMacTxCollisionHistogram[15]; /* 133 */
/*
* MAC TX Attentions
*/
volatile u_int32_t nicMacTxTotalAttns; /* 134 */
/*
* NIC Profile
*/
volatile u_int32_t nicProfile[32]; /* 135 */
/*
* Pat to 1024 bytes.
*/
u_int32_t pad[75];
};
/*
* Tigon general information block. This resides in host memory
* and contains the status counters, ring control blocks and
@ -1057,7 +919,11 @@ struct ti_event_desc {
*/
struct ti_ring_data {
struct ti_rx_desc ti_rx_std_ring[TI_STD_RX_RING_CNT];
#ifdef PRIVATE_JUMBOS
struct ti_rx_desc ti_rx_jumbo_ring[TI_JUMBO_RX_RING_CNT];
#else
struct ti_rx_desc_ext ti_rx_jumbo_ring[TI_JUMBO_RX_RING_CNT];
#endif
struct ti_rx_desc ti_rx_mini_ring[TI_MINI_RX_RING_CNT];
struct ti_rx_desc ti_rx_return_ring[TI_RETURN_RING_CNT];
struct ti_event_desc ti_event_ring[TI_EVENT_RING_CNT];
@ -1113,7 +979,14 @@ struct ti_jpool_entry {
SLIST_ENTRY(ti_jpool_entry) jpool_entries;
};
typedef enum {
TI_FLAG_NONE = 0x00,
TI_FLAG_DEBUGING = 0x01,
TI_FLAG_WAIT_FOR_LINK = 0x02
} ti_flag_vals;
struct ti_softc {
STAILQ_ENTRY(ti_softc) ti_links;
struct arpcom arpcom; /* interface info */
bus_space_handle_t ti_bhandle;
vm_offset_t ti_vhandle;
@ -1126,6 +999,7 @@ struct ti_softc {
u_int8_t ti_hwrev; /* Tigon rev (1 or 2) */
u_int8_t ti_copper; /* 1000baseTX card */
u_int8_t ti_linkstat; /* Link state */
int ti_hdrsplit; /* enable header splitting */
struct ti_ring_data *ti_rdata; /* rings */
struct ti_chain_data ti_cdata; /* mbufs */
#define ti_ev_prodidx ti_rdata->ti_ev_prodidx_r
@ -1150,6 +1024,8 @@ struct ti_softc {
int ti_if_flags;
int ti_txcnt;
struct mtx ti_mtx;
ti_flag_vals ti_flags;
dev_t dev;
};
#define TI_LOCK(_sc) mtx_lock(&(_sc)->ti_mtx)

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -39,6 +39,8 @@
* $FreeBSD$
*/
#include "opt_zero.h"
#include <sys/param.h>
#include <sys/systm.h>
#include <sys/kernel.h>
@ -58,6 +60,82 @@
SYSCTL_INT(_kern, KERN_IOV_MAX, iov_max, CTLFLAG_RD, NULL, UIO_MAXIOV,
"Maximum number of elements in an I/O vector; sysconf(_SC_IOV_MAX)");
#ifdef ZERO_COPY_SOCKETS
#include <vm/vm.h>
#include <vm/vm_param.h>
#include <sys/lock.h>
#include <vm/pmap.h>
#include <vm/vm_map.h>
#include <vm/vm_page.h>
#include <vm/vm_object.h>
#include <vm/vm_pager.h>
#include <vm/vm_kern.h>
#include <vm/vm_extern.h>
#include <vm/swap_pager.h>
#include <sys/mbuf.h>
#include <machine/cpu.h>
/* Declared in uipc_socket.c */
extern int so_zero_copy_receive;
static int vm_pgmoveco(vm_map_t mapa, vm_object_t srcobj, vm_offset_t kaddr,
vm_offset_t uaddr);
static int userspaceco(caddr_t cp, u_int cnt, struct uio *uio,
struct vm_object *obj, int disposable);
static int
vm_pgmoveco(mapa, srcobj, kaddr, uaddr)
vm_map_t mapa;
vm_object_t srcobj;
vm_offset_t kaddr, uaddr;
{
vm_map_t map = mapa;
vm_page_t kern_pg, user_pg;
vm_object_t uobject;
vm_map_entry_t entry;
vm_pindex_t upindex, kpindex;
vm_prot_t prot;
boolean_t wired;
/*
* First lookup the kernel page.
*/
kern_pg = PHYS_TO_VM_PAGE(vtophys(kaddr));
if ((vm_map_lookup(&map, uaddr,
VM_PROT_READ, &entry, &uobject,
&upindex, &prot, &wired)) != KERN_SUCCESS) {
return(EFAULT);
}
if ((user_pg = vm_page_lookup(uobject, upindex)) != NULL) {
vm_page_sleep_busy(user_pg, 1, "vm_pgmoveco");
pmap_remove(map->pmap, uaddr, uaddr+PAGE_SIZE);
vm_page_busy(user_pg);
vm_page_free(user_pg);
}
if (kern_pg->busy || ((kern_pg->queue - kern_pg->pc) == PQ_FREE) ||
(kern_pg->hold_count != 0)|| (kern_pg->flags & PG_BUSY)) {
printf("vm_pgmoveco: pindex(%lu), busy(%d), PG_BUSY(%d), "
"hold(%d) paddr(0x%lx)\n", (u_long)kern_pg->pindex,
kern_pg->busy, (kern_pg->flags & PG_BUSY) ? 1 : 0,
kern_pg->hold_count, (u_long)kern_pg->phys_addr);
if ((kern_pg->queue - kern_pg->pc) == PQ_FREE)
panic("vm_pgmoveco: renaming free page");
else
panic("vm_pgmoveco: renaming busy page");
}
kpindex = kern_pg->pindex;
vm_page_busy(kern_pg);
vm_page_rename(kern_pg, uobject, upindex);
vm_page_flag_clear(kern_pg, PG_BUSY);
kern_pg->valid = VM_PAGE_BITS_ALL;
vm_map_lookup_done(map, entry);
return(KERN_SUCCESS);
}
#endif /* ZERO_COPY_SOCKETS */
int
uiomove(cp, n, uio)
register caddr_t cp;
@ -133,16 +211,100 @@ uiomove(cp, n, uio)
return (error);
}
#ifdef ENABLE_VFS_IOOPT
#if defined(ENABLE_VFS_IOOPT) || defined(ZERO_COPY_SOCKETS)
/*
* Experimental support for zero-copy I/O
*/
static int
userspaceco(cp, cnt, uio, obj, disposable)
caddr_t cp;
u_int cnt;
struct uio *uio;
struct vm_object *obj;
int disposable;
{
struct iovec *iov;
int error;
iov = uio->uio_iov;
#ifdef ZERO_COPY_SOCKETS
if (uio->uio_rw == UIO_READ) {
if ((so_zero_copy_receive != 0)
&& (obj != NULL)
&& ((cnt & PAGE_MASK) == 0)
&& ((((intptr_t) iov->iov_base) & PAGE_MASK) == 0)
&& ((uio->uio_offset & PAGE_MASK) == 0)
&& ((((intptr_t) cp) & PAGE_MASK) == 0)
&& (obj->type == OBJT_DEFAULT)
&& (disposable != 0)) {
/* SOCKET: use page-trading */
/*
* We only want to call vm_pgmoveco() on
* disposeable pages, since it gives the
* kernel page to the userland process.
*/
error = vm_pgmoveco(&curproc->p_vmspace->vm_map,
obj, (vm_offset_t)cp,
(vm_offset_t)iov->iov_base);
/*
* If we get an error back, attempt
* to use copyout() instead. The
* disposable page should be freed
* automatically if we weren't able to move
* it into userland.
*/
if (error != 0)
error = copyout(cp, iov->iov_base, cnt);
#ifdef ENABLE_VFS_IOOPT
} else if ((vfs_ioopt != 0)
&& ((cnt & PAGE_MASK) == 0)
&& ((((intptr_t) iov->iov_base) & PAGE_MASK) == 0)
&& ((uio->uio_offset & PAGE_MASK) == 0)
&& ((((intptr_t) cp) & PAGE_MASK) == 0)) {
error = vm_uiomove(&curproc->p_vmspace->vm_map, obj,
uio->uio_offset, cnt,
(vm_offset_t) iov->iov_base, NULL);
#endif /* ENABLE_VFS_IOOPT */
} else {
error = copyout(cp, iov->iov_base, cnt);
}
} else {
error = copyin(iov->iov_base, cp, cnt);
}
#else /* ZERO_COPY_SOCKETS */
if (uio->uio_rw == UIO_READ) {
#ifdef ENABLE_VFS_IOOPT
if ((vfs_ioopt != 0)
&& ((cnt & PAGE_MASK) == 0)
&& ((((intptr_t) iov->iov_base) & PAGE_MASK) == 0)
&& ((uio->uio_offset & PAGE_MASK) == 0)
&& ((((intptr_t) cp) & PAGE_MASK) == 0)) {
error = vm_uiomove(&curproc->p_vmspace->vm_map, obj,
uio->uio_offset, cnt,
(vm_offset_t) iov->iov_base, NULL);
} else
#endif /* ENABLE_VFS_IOOPT */
{
error = copyout(cp, iov->iov_base, cnt);
}
} else {
error = copyin(iov->iov_base, cp, cnt);
}
#endif /* ZERO_COPY_SOCKETS */
return (error);
}
int
uiomoveco(cp, n, uio, obj)
uiomoveco(cp, n, uio, obj, disposable)
caddr_t cp;
int n;
struct uio *uio;
struct vm_object *obj;
int disposable;
{
struct iovec *iov;
u_int cnt;
@ -169,23 +331,9 @@ uiomoveco(cp, n, uio, obj)
case UIO_USERSPACE:
if (ticks - PCPU_GET(switchticks) >= hogticks)
uio_yield();
if (uio->uio_rw == UIO_READ) {
#ifdef ENABLE_VFS_IOOPT
if (vfs_ioopt && ((cnt & PAGE_MASK) == 0) &&
((((intptr_t) iov->iov_base) & PAGE_MASK) == 0) &&
((uio->uio_offset & PAGE_MASK) == 0) &&
((((intptr_t) cp) & PAGE_MASK) == 0)) {
error = vm_uiomove(&curproc->p_vmspace->vm_map, obj,
uio->uio_offset, cnt,
(vm_offset_t) iov->iov_base, NULL);
} else
#endif
{
error = copyout(cp, iov->iov_base, cnt);
}
} else {
error = copyin(iov->iov_base, cp, cnt);
}
error = userspaceco(cp, cnt, uio, obj, disposable);
if (error)
return (error);
break;
@ -208,6 +356,9 @@ uiomoveco(cp, n, uio, obj)
}
return (0);
}
#endif /* ENABLE_VFS_IOOPT || ZERO_COPY_SOCKETS */
#ifdef ENABLE_VFS_IOOPT
/*
* Experimental support for zero-copy I/O
@ -277,7 +428,7 @@ uioread(n, uio, obj, nread)
}
return error;
}
#endif
#endif /* ENABLE_VFS_IOOPT */
/*
* Give next character to user as result of read.

181
sys/kern/uipc_cow.c Normal file
View File

@ -0,0 +1,181 @@
/*-
* Copyright (c) 1997, Duke University
* All rights reserved.
*
* Author:
* Andrew Gallatin <gallatin@cs.duke.edu>
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. All advertising materials mentioning features or use of this software
* must display the following acknowledgements:
* This product includes software developed by Duke University
* 4. The name of Duke University may not be used to endorse or promote
* products derived from this software without specific prior written
* permission.
*
* THIS SOFTWARE IS PROVIDED BY DUKE UNIVERSITY ``AS IS'' AND ANY
* EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL DUKE UNIVERSITY BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITSOR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
* IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
* OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
* ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
* $FreeBSD$
*/
/*
* This is a set of routines for enabling and disabling copy on write
* protection for data written into sockets.
*/
#include <sys/param.h>
#include <sys/systm.h>
#include <sys/kernel.h>
#include <sys/proc.h>
#include <sys/lock.h>
#include <sys/mutex.h>
#include <sys/mbuf.h>
#include <sys/socketvar.h>
#include <sys/uio.h>
#include <vm/vm.h>
#include <vm/vm_param.h>
#include <vm/pmap.h>
#include <vm/vm_map.h>
#include <vm/vm_page.h>
#include <vm/vm_object.h>
#if 0
#include <vm/vm_pager.h>
#include <vm/vm_kern.h>
#include <vm/vm_extern.h>
#include <vm/vm_zone.h>
#include <vm/swap_pager.h>
#endif
struct netsend_cow_stats {
int attempted;
int fail_not_mapped;
int fail_wired;
int fail_not_anon;
int fail_pmap_cow;
int fail_pg_error;
int fail_kva;
int free_post_exit;
int success;
int iodone;
int freed;
};
static struct netsend_cow_stats socow_stats = {0,0,0,0,0,0,0,0,0,0,0};
extern struct sf_buf *sf_bufs;
extern vm_offset_t sf_base;
#define dtosf(x) (&sf_bufs[((uintptr_t)(x) - (uintptr_t)sf_base) >> PAGE_SHIFT])
void sf_buf_free(caddr_t addr, void *args);
struct sf_buf *sf_buf_alloc(void);
static void socow_iodone(caddr_t addr, void *args);
static void
socow_iodone(caddr_t addr, void *args)
{
int s;
struct sf_buf *sf;
vm_offset_t paddr;
vm_page_t pp;
sf = dtosf(addr);
paddr = vtophys((vm_offset_t)addr);
pp = PHYS_TO_VM_PAGE(paddr);
s = splvm();
/* remove COW mapping */
vm_page_cowclear(pp);
vm_object_deallocate(pp->object);
splx(s);
/* note that sf_buf_free() unwires the page for us*/
sf_buf_free(addr, NULL);
socow_stats.iodone++;
}
int
socow_setup(struct mbuf *m0, struct uio *uio)
{
struct sf_buf *sf;
vm_page_t pp;
vm_offset_t pa;
struct iovec *iov;
struct vmspace *vmspace;
struct vm_map *map;
vm_offset_t uva;
int s;
vmspace = curproc->p_vmspace;;
map = &vmspace->vm_map;
uva = (vm_offset_t) uio->uio_iov->iov_base;
s = splvm();
/*
* verify page is mapped & not already wired for i/o
*/
socow_stats.attempted++;
pa=pmap_extract(map->pmap, uva);
if(!pa) {
socow_stats.fail_not_mapped++;
splx(s);
return(0);
}
pp = PHYS_TO_VM_PAGE(pa);
sf = sf_buf_alloc();
sf->m = pp;
pmap_qenter(sf->kva, &pp, 1);
/*
* set up COW
*/
vm_page_cowsetup(pp);
/*
* wire the page for I/O
*/
vm_page_wire(pp);
/*
* prevent the process from exiting on us.
*/
vm_object_reference(pp->object);
/*
* attach to mbuf
*/
m0->m_data = (caddr_t)sf->kva;
m0->m_len = PAGE_SIZE;
MEXTADD(m0, sf->kva, PAGE_SIZE, socow_iodone, NULL, 0, EXT_SFBUF);
socow_stats.success++;
iov = uio->uio_iov;
iov->iov_base += PAGE_SIZE;
iov->iov_len -= PAGE_SIZE;
uio->uio_resid -= PAGE_SIZE;
uio->uio_offset += PAGE_SIZE;
if (iov->iov_len == 0) {
uio->uio_iov++;
uio->uio_iovcnt--;
}
splx(s);
return(1);
}

252
sys/kern/uipc_jumbo.c Normal file
View File

@ -0,0 +1,252 @@
/*-
* Copyright (c) 1997, Duke University
* All rights reserved.
*
* Author:
* Andrew Gallatin <gallatin@cs.duke.edu>
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. All advertising materials mentioning features or use of this software
* must display the following acknowledgements:
* This product includes software developed by Duke University
* 4. The name of Duke University may not be used to endorse or promote
* products derived from this software without specific prior written
* permission.
*
* THIS SOFTWARE IS PROVIDED BY DUKE UNIVERSITY ``AS IS'' AND ANY
* EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL DUKE UNIVERSITY BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITSOR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
* IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
* OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
* ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
* $FreeBSD$
*/
/*
* This is a set of routines for allocating large-sized mbuf payload
* areas, and is primarily intended for use in receive side mbuf
* allocation.
*/
#include <sys/param.h>
#include <sys/systm.h>
#include <sys/types.h>
#include <sys/sockio.h>
#include <sys/uio.h>
#include <sys/lock.h>
#include <sys/kernel.h>
#include <sys/mutex.h>
#include <sys/malloc.h>
#include <vm/vm.h>
#include <vm/pmap.h>
#include <vm/vm_extern.h>
#include <vm/pmap.h>
#include <vm/vm_map.h>
#include <vm/vm_map.h>
#include <vm/vm_param.h>
#include <vm/vm_pageout.h>
#include <sys/vmmeter.h>
#include <vm/vm_page.h>
#include <vm/vm_object.h>
#include <vm/vm_kern.h>
#include <sys/proc.h>
#include <sys/jumbo.h>
/*
* XXX this may be too high or too low.
*/
#define JUMBO_MAX_PAGES 3072
struct jumbo_kmap {
vm_offset_t kva;
SLIST_ENTRY(jumbo_kmap) entries; /* Singly-linked List. */
};
static SLIST_HEAD(jumbo_kmap_head, jumbo_kmap) jumbo_kmap_free,
jumbo_kmap_inuse;
static struct mtx jumbo_mutex;
MTX_SYSINIT(jumbo_lock, &jumbo_mutex, "jumbo mutex", MTX_DEF);
static struct vm_object *jumbo_vm_object;
static unsigned long jumbo_vmuiomove_pgs_freed = 0;
#if 0
static int jumbo_vm_wakeup_wanted = 0;
#endif
vm_offset_t jumbo_basekva;
int
jumbo_vm_init(void)
{
int i;
struct jumbo_kmap *entry;
mtx_lock(&jumbo_mutex);
if (jumbo_vm_object != NULL) {
mtx_unlock(&jumbo_mutex);
return (1);
}
/* allocate our object */
jumbo_vm_object = vm_object_allocate_wait(OBJT_DEFAULT, JUMBO_MAX_PAGES,
M_NOWAIT);
if (jumbo_vm_object == NULL) {
mtx_unlock(&jumbo_mutex);
return (0);
}
SLIST_INIT(&jumbo_kmap_free);
SLIST_INIT(&jumbo_kmap_inuse);
/* grab some kernel virtual address space */
jumbo_basekva = kmem_alloc_pageable(kernel_map,
PAGE_SIZE * JUMBO_MAX_PAGES);
if (jumbo_basekva == 0) {
vm_object_deallocate(jumbo_vm_object);
jumbo_vm_object = NULL;
mtx_unlock(&jumbo_mutex);
return 0;
}
for (i = 0; i < JUMBO_MAX_PAGES; i++) {
entry = malloc(sizeof(struct jumbo_kmap), M_TEMP, M_NOWAIT);
if (!entry && !i) {
mtx_unlock(&jumbo_mutex);
panic("jumbo_vm_init: unable to allocated kvas");
} else if (!entry) {
printf("warning: jumbo_vm_init allocated only %d kva\n",
i);
mtx_unlock(&jumbo_mutex);
return 1;
}
entry->kva = jumbo_basekva + (vm_offset_t)i * PAGE_SIZE;
SLIST_INSERT_HEAD(&jumbo_kmap_free, entry, entries);
}
mtx_unlock(&jumbo_mutex);
return 1;
}
void
jumbo_freem(caddr_t addr, void *args)
{
vm_page_t frame;
frame = PHYS_TO_VM_PAGE(pmap_kextract((vm_offset_t)addr));
/*
* Need giant for looking at the hold count below. Convert this
* to the vm mutex once the VM code has been moved out from under
* giant.
*/
GIANT_REQUIRED;
if (frame->hold_count == 0)
jumbo_pg_free((vm_offset_t)addr);
else
printf("jumbo_freem: hold count for %p is %d!!??\n",
frame, frame->hold_count);
}
void
jumbo_pg_steal(vm_page_t pg)
{
vm_offset_t addr;
struct jumbo_kmap *entry;
addr = ptoa(pg->pindex) + jumbo_basekva;
if (pg->object != jumbo_vm_object)
panic("stealing a non jumbo_vm_object page");
vm_page_remove(pg);
mtx_lock(&jumbo_mutex);
pmap_qremove(addr,1);
entry = SLIST_FIRST(&jumbo_kmap_inuse);
entry->kva = addr;
SLIST_REMOVE_HEAD(&jumbo_kmap_inuse, entries);
SLIST_INSERT_HEAD(&jumbo_kmap_free, entry, entries);
mtx_unlock(&jumbo_mutex);
#if 0
if (jumbo_vm_wakeup_wanted)
wakeup(jumbo_vm_object);
#endif
}
vm_page_t
jumbo_pg_alloc(void)
{
vm_page_t pg;
vm_pindex_t pindex;
struct jumbo_kmap *entry;
pg = NULL;
mtx_lock(&jumbo_mutex);
entry = SLIST_FIRST(&jumbo_kmap_free);
if (entry != NULL){
pindex = atop(entry->kva - jumbo_basekva);
pg = vm_page_alloc(jumbo_vm_object, pindex, VM_ALLOC_INTERRUPT);
if (pg != NULL) {
SLIST_REMOVE_HEAD(&jumbo_kmap_free, entries);
SLIST_INSERT_HEAD(&jumbo_kmap_inuse, entry, entries);
pmap_qenter(entry->kva, &pg, 1);
}
}
mtx_unlock(&jumbo_mutex);
return(pg);
}
void
jumbo_pg_free(vm_offset_t addr)
{
struct jumbo_kmap *entry;
vm_offset_t paddr;
vm_page_t pg;
paddr = pmap_kextract((vm_offset_t)addr);
pg = PHYS_TO_VM_PAGE(paddr);
if (pg->object != jumbo_vm_object) {
jumbo_vmuiomove_pgs_freed++;
/* if(vm_page_lookup(jumbo_vm_object, atop(addr - jumbo_basekva)))
panic("vm_page_rename didn't");
printf("freeing uiomoved pg:\t pindex = %d, padd = 0x%lx\n",
atop(addr - jumbo_basekva), paddr);
*/
} else {
vm_page_busy(pg); /* vm_page_free wants pages to be busy*/
vm_page_free(pg);
}
mtx_lock(&jumbo_mutex);
pmap_qremove(addr,1);
entry = SLIST_FIRST(&jumbo_kmap_inuse);
entry->kva = addr;
SLIST_REMOVE_HEAD(&jumbo_kmap_inuse, entries);
SLIST_INSERT_HEAD(&jumbo_kmap_free, entry, entries);
mtx_unlock(&jumbo_mutex);
#if 0
if (jumbo_vm_wakeup_wanted)
wakeup(jumbo_vm_object);
#endif
}

View File

@ -35,6 +35,7 @@
*/
#include "opt_inet.h"
#include "opt_zero.h"
#include <sys/param.h>
#include <sys/systm.h>
@ -94,6 +95,17 @@ SYSCTL_INT(_kern_ipc, KIPC_SOMAXCONN, somaxconn, CTLFLAG_RW,
static int numopensockets;
SYSCTL_INT(_kern_ipc, OID_AUTO, numopensockets, CTLFLAG_RD,
&numopensockets, 0, "Number of open sockets");
#ifdef ZERO_COPY_SOCKETS
/* These aren't static because they're used in other files. */
int so_zero_copy_send = 1;
int so_zero_copy_receive = 1;
SYSCTL_NODE(_kern_ipc, OID_AUTO, zero_copy, CTLFLAG_RD, 0,
"Zero copy controls");
SYSCTL_INT(_kern_ipc_zero_copy, OID_AUTO, receive, CTLFLAG_RW,
&so_zero_copy_receive, 0, "Enable zero copy receive");
SYSCTL_INT(_kern_ipc_zero_copy, OID_AUTO, send, CTLFLAG_RW,
&so_zero_copy_send, 0, "Enable zero copy send");
#endif /* ZERO_COPY_SOCKETS */
/*
@ -471,6 +483,22 @@ sodisconnect(so)
* must check for short counts if EINTR/ERESTART are returned.
* Data and control buffers are freed on return.
*/
#ifdef ZERO_COPY_SOCKETS
struct so_zerocopy_stats{
int size_ok;
int align_ok;
int found_ifp;
};
struct so_zerocopy_stats so_zerocp_stats = {0,0,0};
#include <netinet/in.h>
#include <net/route.h>
#include <netinet/in_pcb.h>
#include <vm/vm.h>
#include <vm/vm_page.h>
#include <vm/vm_object.h>
#endif /*ZERO_COPY_SOCKETS*/
int
sosend(so, addr, uio, top, control, flags, td)
register struct socket *so;
@ -486,6 +514,9 @@ sosend(so, addr, uio, top, control, flags, td)
register long space, len, resid;
int clen = 0, error, s, dontroute, mlen;
int atomic = sosendallatonce(so) || top;
#ifdef ZERO_COPY_SOCKETS
int cow_send;
#endif /* ZERO_COPY_SOCKETS */
if (uio)
resid = uio->uio_resid;
@ -574,6 +605,9 @@ sosend(so, addr, uio, top, control, flags, td)
if (flags & MSG_EOR)
top->m_flags |= M_EOR;
} else do {
#ifdef ZERO_COPY_SOCKETS
cow_send = 0;
#endif /* ZERO_COPY_SOCKETS */
if (top == 0) {
MGETHDR(m, M_TRYWAIT, MT_DATA);
if (m == NULL) {
@ -592,12 +626,32 @@ sosend(so, addr, uio, top, control, flags, td)
mlen = MLEN;
}
if (resid >= MINCLSIZE) {
#ifdef ZERO_COPY_SOCKETS
if (so_zero_copy_send &&
resid>=PAGE_SIZE &&
space>=PAGE_SIZE &&
uio->uio_iov->iov_len>=PAGE_SIZE) {
so_zerocp_stats.size_ok++;
if (!((vm_offset_t)
uio->uio_iov->iov_base & PAGE_MASK)){
so_zerocp_stats.align_ok++;
cow_send = socow_setup(m, uio);
}
}
if (!cow_send){
#endif /* ZERO_COPY_SOCKETS */
MCLGET(m, M_TRYWAIT);
if ((m->m_flags & M_EXT) == 0)
goto nopages;
mlen = MCLBYTES;
len = min(min(mlen, resid), space);
} else {
#ifdef ZERO_COPY_SOCKETS
len = PAGE_SIZE;
}
} else {
#endif /* ZERO_COPY_SOCKETS */
nopages:
len = min(min(mlen, resid), space);
/*
@ -608,6 +662,11 @@ sosend(so, addr, uio, top, control, flags, td)
MH_ALIGN(m, len);
}
space -= len;
#ifdef ZERO_COPY_SOCKETS
if (cow_send)
error = 0;
else
#endif /* ZERO_COPY_SOCKETS */
error = uiomove(mtod(m, caddr_t), (int)len, uio);
resid = uio->uio_resid;
m->m_len = len;
@ -719,6 +778,27 @@ soreceive(so, psa, uio, mp0, controlp, flagsp)
if (error)
goto bad;
do {
#ifdef ZERO_COPY_SOCKETS
if (so_zero_copy_receive) {
vm_page_t pg;
int disposable;
if ((m->m_flags & M_EXT)
&& (m->m_ext.ext_type == EXT_DISPOSABLE))
disposable = 1;
else
disposable = 0;
pg = PHYS_TO_VM_PAGE(vtophys(mtod(m, caddr_t)));
if (uio->uio_offset == -1)
uio->uio_offset =IDX_TO_OFF(pg->pindex);
error = uiomoveco(mtod(m, caddr_t),
min(uio->uio_resid, m->m_len),
uio, pg->object,
disposable);
} else
#endif /* ZERO_COPY_SOCKETS */
error = uiomove(mtod(m, caddr_t),
(int) min(uio->uio_resid, m->m_len), uio);
m = m_free(m);
@ -874,6 +954,28 @@ soreceive(so, psa, uio, mp0, controlp, flagsp)
*/
if (mp == 0) {
splx(s);
#ifdef ZERO_COPY_SOCKETS
if (so_zero_copy_receive) {
vm_page_t pg;
int disposable;
if ((m->m_flags & M_EXT)
&& (m->m_ext.ext_type == EXT_DISPOSABLE))
disposable = 1;
else
disposable = 0;
pg = PHYS_TO_VM_PAGE(vtophys(mtod(m, caddr_t) +
moff));
if (uio->uio_offset == -1)
uio->uio_offset =IDX_TO_OFF(pg->pindex);
error = uiomoveco(mtod(m, caddr_t) + moff,
(int)len, uio,pg->object,
disposable);
} else
#endif /* ZERO_COPY_SOCKETS */
error = uiomove(mtod(m, caddr_t) + moff, (int)len, uio);
s = splnet();
if (error)

View File

@ -74,8 +74,8 @@
static void sf_buf_init(void *arg);
SYSINIT(sock_sf, SI_SUB_MBUF, SI_ORDER_ANY, sf_buf_init, NULL)
static struct sf_buf *sf_buf_alloc(void);
static void sf_buf_free(caddr_t addr, void *args);
struct sf_buf *sf_buf_alloc(void);
void sf_buf_free(caddr_t addr, void *args);
static int sendit(struct thread *td, int s, struct msghdr *mp, int flags);
static int recvit(struct thread *td, int s, struct msghdr *mp,
@ -96,9 +96,9 @@ static struct {
struct mtx sf_lock;
} sf_freelist;
static vm_offset_t sf_base;
static struct sf_buf *sf_bufs;
static u_int sf_buf_alloc_want;
vm_offset_t sf_base;
struct sf_buf *sf_bufs;
u_int sf_buf_alloc_want;
/*
* System call interface to the socket abstraction.
@ -1570,7 +1570,7 @@ sf_buf_init(void *arg)
/*
* Get an sf_buf from the freelist. Will block if none are available.
*/
static struct sf_buf *
struct sf_buf *
sf_buf_alloc()
{
struct sf_buf *sf;
@ -1600,7 +1600,7 @@ sf_buf_alloc()
/*
* Detatch mapped page and release resources back to the system.
*/
static void
void
sf_buf_free(caddr_t addr, void *args)
{
struct sf_buf *sf;

View File

@ -3,6 +3,7 @@
.PATH: ${.CURDIR}/../../pci
KMOD= if_ti
SRCS= if_ti.c opt_bdg.h device_if.h bus_if.h pci_if.h
SRCS= if_ti.c opt_bdg.h device_if.h bus_if.h pci_if.h opt_ti.h opt_zero.h \
vnode_if.h
.include <bsd.kmod.mk>

View File

@ -303,8 +303,10 @@ ifmedia_ioctl(ifp, ifr, ifm, cmd)
if (ifmr->ifm_count != 0) {
kptr = (int *)malloc(ifmr->ifm_count * sizeof(int),
M_TEMP, M_WAITOK);
M_TEMP, M_NOWAIT);
if (kptr == NULL)
return (ENOMEM);
/*
* Get the media words from the interface's list.
*/

View File

@ -917,8 +917,50 @@ ip_output(m0, opt, ro, flags, imo)
m->m_pkthdr.csum_flags &= ~CSUM_DELAY_DATA;
}
if (len > PAGE_SIZE) {
/*
* Fragement large datagrams such that each segment
* contains a multiple of PAGE_SIZE amount of data,
* plus headers. This enables a receiver to perform
* page-flipping zero-copy optimizations.
*/
int newlen;
struct mbuf *mtmp;
for (mtmp = m, off = 0;
mtmp && ((off + mtmp->m_len) <= ifp->if_mtu);
mtmp = mtmp->m_next) {
off += mtmp->m_len;
}
/*
* firstlen (off - hlen) must be aligned on an
* 8-byte boundary
*/
if (off < hlen)
goto smart_frag_failure;
off = ((off - hlen) & ~7) + hlen;
newlen = (~PAGE_MASK) & ifp->if_mtu;
if ((newlen + sizeof (struct ip)) > ifp->if_mtu) {
/* we failed, go back the default */
smart_frag_failure:
newlen = len;
off = hlen + len;
}
/* printf("ipfrag: len = %d, hlen = %d, mhlen = %d, newlen = %d, off = %d\n",
len, hlen, sizeof (struct ip), newlen, off);*/
len = newlen;
} else {
off = hlen + len;
}
{
int mhlen, firstlen = len;
int mhlen, firstlen = off - hlen;
struct mbuf **mnext = &m->m_nextpkt;
int nfrags = 1;
@ -928,7 +970,7 @@ ip_output(m0, opt, ro, flags, imo)
*/
m0 = m;
mhlen = sizeof (struct ip);
for (off = hlen + len; off < (u_short)ip->ip_len; off += len) {
for (; off < (u_short)ip->ip_len; off += len) {
MGETHDR(m, M_DONTWAIT, MT_HEADER);
if (m == 0) {
error = ENOBUFS;

File diff suppressed because it is too large Load Diff

View File

@ -137,7 +137,7 @@
*/
#define TI_FIRMWARE_MAJOR 0xc
#define TI_FIRMWARE_MINOR 0x4
#define TI_FIRMWARE_FIX 0xd
#define TI_FIRMWARE_FIX 0xb
/*
* Miscelaneous Local Control register.
@ -348,6 +348,7 @@
#define TI_OPMODE_NO_TX_INTRS 0x00002000
#define TI_OPMODE_NO_RX_INTRS 0x00004000
#define TI_OPMODE_FATAL_ENB 0x40000000 /* not yet implimented */
#define TI_OPMODE_JUMBO_HDRSPLIT 0x00008000
/*
* DMA configuration thresholds.
@ -422,6 +423,53 @@
*/
#define TI_MEM_MAX 0x7FFFFF
/*
* Maximum register address on the Tigon.
*/
#define TI_REG_MAX 0x3fff
/*
* These values were taken from Alteon's tg.h.
*/
#define TI_BEG_SRAM 0x0 /* host thinks it's here */
#define TI_BEG_SCRATCH 0xc00000 /* beg of scratch pad area */
#define TI_END_SRAM_II 0x800000 /* end of SRAM, for 2 MB stuffed */
#define TI_END_SCRATCH_II 0xc04000 /* end of scratch pad CPU A (16KB) */
#define TI_END_SCRATCH_B 0xc02000 /* end of scratch pad CPU B (8KB) */
#define TI_BEG_SCRATCH_B_DEBUG 0xd00000 /* beg of scratch pad for ioctl */
#define TI_END_SCRATCH_B_DEBUG 0xd02000 /* end of scratch pad for ioctl */
#define TI_SCRATCH_DEBUG_OFF 0x100000 /* offset for ioctl usage */
#define TI_END_SRAM_I 0x200000 /* end of SRAM, for 2 MB stuffed */
#define TI_END_SCRATCH_I 0xc00800 /* end of scratch pad area (2KB) */
#define TI_BEG_PROM 0x40000000 /* beg of PROM, special access */
#define TI_BEG_FLASH 0x80000000 /* beg of EEPROM, special access */
#define TI_END_FLASH 0x80100000 /* end of EEPROM for 1 MB stuff */
#define TI_BEG_SER_EEPROM 0xa0000000 /* beg of Serial EEPROM (fake out) */
#define TI_END_SER_EEPROM 0xa0002000 /* end of Serial EEPROM (fake out) */
#define TI_BEG_REGS 0xc0000000 /* beg of register area */
#define TI_END_REGS 0xc0000400 /* end of register area */
#define TI_END_WRITE_REGS 0xc0000180 /* can't write GPRs currently */
#define TI_BEG_REGS2 0xc0000200 /* beg of second writeable reg area */
/* the EEPROM is byte addressable in a pretty odd way */
#define EEPROM_BYTE_LOC 0xff000000
/*
* From Alteon's tg.h.
*/
#define TI_PROCESSOR_A 0
#define TI_PROCESSOR_B 1
#define TI_CPU_A TG_PROCESSOR_A
#define TI_CPU_B TG_PROCESSOR_B
/*
* Following macro can be used to access to any of the CPU registers
* It will adjust the address appropriately.
* Parameters:
* reg - The register to access, e.g TI_CPU_CONTROL
* cpu - cpu, i.e PROCESSOR_A or PROCESSOR_B (or TI_CPU_A or TI_CPU_B)
*/
#define CPU_REG(reg, cpu) ((reg) + (cpu) * 0x100)
/*
* Even on the alpha, pci addresses are 32-bit quantities
*/
@ -486,192 +534,6 @@ struct ti_producer {
u_int32_t ti_unused;
};
/*
* Tigon statistics counters.
*/
struct ti_stats {
/*
* MAC stats, taken from RFC 1643, ethernet-like MIB
*/
volatile u_int32_t dot3StatsAlignmentErrors; /* 0 */
volatile u_int32_t dot3StatsFCSErrors; /* 1 */
volatile u_int32_t dot3StatsSingleCollisionFrames; /* 2 */
volatile u_int32_t dot3StatsMultipleCollisionFrames; /* 3 */
volatile u_int32_t dot3StatsSQETestErrors; /* 4 */
volatile u_int32_t dot3StatsDeferredTransmissions; /* 5 */
volatile u_int32_t dot3StatsLateCollisions; /* 6 */
volatile u_int32_t dot3StatsExcessiveCollisions; /* 7 */
volatile u_int32_t dot3StatsInternalMacTransmitErrors; /* 8 */
volatile u_int32_t dot3StatsCarrierSenseErrors; /* 9 */
volatile u_int32_t dot3StatsFrameTooLongs; /* 10 */
volatile u_int32_t dot3StatsInternalMacReceiveErrors; /* 11 */
/*
* interface stats, taken from RFC 1213, MIB-II, interfaces group
*/
volatile u_int32_t ifIndex; /* 12 */
volatile u_int32_t ifType; /* 13 */
volatile u_int32_t ifMtu; /* 14 */
volatile u_int32_t ifSpeed; /* 15 */
volatile u_int32_t ifAdminStatus; /* 16 */
#define IF_ADMIN_STATUS_UP 1
#define IF_ADMIN_STATUS_DOWN 2
#define IF_ADMIN_STATUS_TESTING 3
volatile u_int32_t ifOperStatus; /* 17 */
#define IF_OPER_STATUS_UP 1
#define IF_OPER_STATUS_DOWN 2
#define IF_OPER_STATUS_TESTING 3
#define IF_OPER_STATUS_UNKNOWN 4
#define IF_OPER_STATUS_DORMANT 5
volatile u_int32_t ifLastChange; /* 18 */
volatile u_int32_t ifInDiscards; /* 19 */
volatile u_int32_t ifInErrors; /* 20 */
volatile u_int32_t ifInUnknownProtos; /* 21 */
volatile u_int32_t ifOutDiscards; /* 22 */
volatile u_int32_t ifOutErrors; /* 23 */
volatile u_int32_t ifOutQLen; /* deprecated */ /* 24 */
volatile u_int8_t ifPhysAddress[8]; /* 8 bytes */ /* 25 - 26 */
volatile u_int8_t ifDescr[32]; /* 27 - 34 */
u_int32_t alignIt; /* align to 64 bit for u_int64_ts following */
/*
* more interface stats, taken from RFC 1573, MIB-IIupdate,
* interfaces group
*/
volatile u_int64_t ifHCInOctets; /* 36 - 37 */
volatile u_int64_t ifHCInUcastPkts; /* 38 - 39 */
volatile u_int64_t ifHCInMulticastPkts; /* 40 - 41 */
volatile u_int64_t ifHCInBroadcastPkts; /* 42 - 43 */
volatile u_int64_t ifHCOutOctets; /* 44 - 45 */
volatile u_int64_t ifHCOutUcastPkts; /* 46 - 47 */
volatile u_int64_t ifHCOutMulticastPkts; /* 48 - 49 */
volatile u_int64_t ifHCOutBroadcastPkts; /* 50 - 51 */
volatile u_int32_t ifLinkUpDownTrapEnable; /* 52 */
volatile u_int32_t ifHighSpeed; /* 53 */
volatile u_int32_t ifPromiscuousMode; /* 54 */
volatile u_int32_t ifConnectorPresent; /* follow link state 55 */
/*
* Host Commands
*/
volatile u_int32_t nicCmdsHostState; /* 56 */
volatile u_int32_t nicCmdsFDRFiltering; /* 57 */
volatile u_int32_t nicCmdsSetRecvProdIndex; /* 58 */
volatile u_int32_t nicCmdsUpdateGencommStats; /* 59 */
volatile u_int32_t nicCmdsResetJumboRing; /* 60 */
volatile u_int32_t nicCmdsAddMCastAddr; /* 61 */
volatile u_int32_t nicCmdsDelMCastAddr; /* 62 */
volatile u_int32_t nicCmdsSetPromiscMode; /* 63 */
volatile u_int32_t nicCmdsLinkNegotiate; /* 64 */
volatile u_int32_t nicCmdsSetMACAddr; /* 65 */
volatile u_int32_t nicCmdsClearProfile; /* 66 */
volatile u_int32_t nicCmdsSetMulticastMode; /* 67 */
volatile u_int32_t nicCmdsClearStats; /* 68 */
volatile u_int32_t nicCmdsSetRecvJumboProdIndex; /* 69 */
volatile u_int32_t nicCmdsSetRecvMiniProdIndex; /* 70 */
volatile u_int32_t nicCmdsRefreshStats; /* 71 */
volatile u_int32_t nicCmdsUnknown; /* 72 */
/*
* NIC Events
*/
volatile u_int32_t nicEventsNICFirmwareOperational; /* 73 */
volatile u_int32_t nicEventsStatsUpdated; /* 74 */
volatile u_int32_t nicEventsLinkStateChanged; /* 75 */
volatile u_int32_t nicEventsError; /* 76 */
volatile u_int32_t nicEventsMCastListUpdated; /* 77 */
volatile u_int32_t nicEventsResetJumboRing; /* 78 */
/*
* Ring manipulation
*/
volatile u_int32_t nicRingSetSendProdIndex; /* 79 */
volatile u_int32_t nicRingSetSendConsIndex; /* 80 */
volatile u_int32_t nicRingSetRecvReturnProdIndex; /* 81 */
/*
* Interrupts
*/
volatile u_int32_t nicInterrupts; /* 82 */
volatile u_int32_t nicAvoidedInterrupts; /* 83 */
/*
* BD Coalessing Thresholds
*/
volatile u_int32_t nicEventThresholdHit; /* 84 */
volatile u_int32_t nicSendThresholdHit; /* 85 */
volatile u_int32_t nicRecvThresholdHit; /* 86 */
/*
* DMA Attentions
*/
volatile u_int32_t nicDmaRdOverrun; /* 87 */
volatile u_int32_t nicDmaRdUnderrun; /* 88 */
volatile u_int32_t nicDmaWrOverrun; /* 89 */
volatile u_int32_t nicDmaWrUnderrun; /* 90 */
volatile u_int32_t nicDmaWrMasterAborts; /* 91 */
volatile u_int32_t nicDmaRdMasterAborts; /* 92 */
/*
* NIC Resources
*/
volatile u_int32_t nicDmaWriteRingFull; /* 93 */
volatile u_int32_t nicDmaReadRingFull; /* 94 */
volatile u_int32_t nicEventRingFull; /* 95 */
volatile u_int32_t nicEventProducerRingFull; /* 96 */
volatile u_int32_t nicTxMacDescrRingFull; /* 97 */
volatile u_int32_t nicOutOfTxBufSpaceFrameRetry; /* 98 */
volatile u_int32_t nicNoMoreWrDMADescriptors; /* 99 */
volatile u_int32_t nicNoMoreRxBDs; /* 100 */
volatile u_int32_t nicNoSpaceInReturnRing; /* 101 */
volatile u_int32_t nicSendBDs; /* current count 102 */
volatile u_int32_t nicRecvBDs; /* current count 103 */
volatile u_int32_t nicJumboRecvBDs; /* current count 104 */
volatile u_int32_t nicMiniRecvBDs; /* current count 105 */
volatile u_int32_t nicTotalRecvBDs; /* current count 106 */
volatile u_int32_t nicTotalSendBDs; /* current count 107 */
volatile u_int32_t nicJumboSpillOver; /* 108 */
volatile u_int32_t nicSbusHangCleared; /* 109 */
volatile u_int32_t nicEnqEventDelayed; /* 110 */
/*
* Stats from MAC rx completion
*/
volatile u_int32_t nicMacRxLateColls; /* 111 */
volatile u_int32_t nicMacRxLinkLostDuringPkt; /* 112 */
volatile u_int32_t nicMacRxPhyDecodeErr; /* 113 */
volatile u_int32_t nicMacRxMacAbort; /* 114 */
volatile u_int32_t nicMacRxTruncNoResources; /* 115 */
/*
* Stats from the mac_stats area
*/
volatile u_int32_t nicMacRxDropUla; /* 116 */
volatile u_int32_t nicMacRxDropMcast; /* 117 */
volatile u_int32_t nicMacRxFlowControl; /* 118 */
volatile u_int32_t nicMacRxDropSpace; /* 119 */
volatile u_int32_t nicMacRxColls; /* 120 */
/*
* MAC RX Attentions
*/
volatile u_int32_t nicMacRxTotalAttns; /* 121 */
volatile u_int32_t nicMacRxLinkAttns; /* 122 */
volatile u_int32_t nicMacRxSyncAttns; /* 123 */
volatile u_int32_t nicMacRxConfigAttns; /* 124 */
volatile u_int32_t nicMacReset; /* 125 */
volatile u_int32_t nicMacRxBufDescrAttns; /* 126 */
volatile u_int32_t nicMacRxBufAttns; /* 127 */
volatile u_int32_t nicMacRxZeroFrameCleanup; /* 128 */
volatile u_int32_t nicMacRxOneFrameCleanup; /* 129 */
volatile u_int32_t nicMacRxMultipleFrameCleanup; /* 130 */
volatile u_int32_t nicMacRxTimerCleanup; /* 131 */
volatile u_int32_t nicMacRxDmaCleanup; /* 132 */
/*
* Stats from the mac_stats area
*/
volatile u_int32_t nicMacTxCollisionHistogram[15]; /* 133 */
/*
* MAC TX Attentions
*/
volatile u_int32_t nicMacTxTotalAttns; /* 134 */
/*
* NIC Profile
*/
volatile u_int32_t nicProfile[32]; /* 135 */
/*
* Pat to 1024 bytes.
*/
u_int32_t pad[75];
};
/*
* Tigon general information block. This resides in host memory
* and contains the status counters, ring control blocks and
@ -1057,7 +919,11 @@ struct ti_event_desc {
*/
struct ti_ring_data {
struct ti_rx_desc ti_rx_std_ring[TI_STD_RX_RING_CNT];
#ifdef PRIVATE_JUMBOS
struct ti_rx_desc ti_rx_jumbo_ring[TI_JUMBO_RX_RING_CNT];
#else
struct ti_rx_desc_ext ti_rx_jumbo_ring[TI_JUMBO_RX_RING_CNT];
#endif
struct ti_rx_desc ti_rx_mini_ring[TI_MINI_RX_RING_CNT];
struct ti_rx_desc ti_rx_return_ring[TI_RETURN_RING_CNT];
struct ti_event_desc ti_event_ring[TI_EVENT_RING_CNT];
@ -1113,7 +979,14 @@ struct ti_jpool_entry {
SLIST_ENTRY(ti_jpool_entry) jpool_entries;
};
typedef enum {
TI_FLAG_NONE = 0x00,
TI_FLAG_DEBUGING = 0x01,
TI_FLAG_WAIT_FOR_LINK = 0x02
} ti_flag_vals;
struct ti_softc {
STAILQ_ENTRY(ti_softc) ti_links;
struct arpcom arpcom; /* interface info */
bus_space_handle_t ti_bhandle;
vm_offset_t ti_vhandle;
@ -1126,6 +999,7 @@ struct ti_softc {
u_int8_t ti_hwrev; /* Tigon rev (1 or 2) */
u_int8_t ti_copper; /* 1000baseTX card */
u_int8_t ti_linkstat; /* Link state */
int ti_hdrsplit; /* enable header splitting */
struct ti_ring_data *ti_rdata; /* rings */
struct ti_chain_data ti_cdata; /* mbufs */
#define ti_ev_prodidx ti_rdata->ti_ev_prodidx_r
@ -1150,6 +1024,8 @@ struct ti_softc {
int ti_if_flags;
int ti_txcnt;
struct mtx ti_mtx;
ti_flag_vals ti_flags;
dev_t dev;
};
#define TI_LOCK(_sc) mtx_lock(&(_sc)->ti_mtx)

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

62
sys/sys/jumbo.h Normal file
View File

@ -0,0 +1,62 @@
/*-
* Copyright (c) 1997, Duke University
* All rights reserved.
*
* Author:
* Andrew Gallatin <gallatin@cs.duke.edu>
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. All advertising materials mentioning features or use of this software
* must display the following acknowledgements:
* This product includes software developed by Duke University
* 4. The name of Duke University may not be used to endorse or promote
* products derived from this software without specific prior written
* permission.
*
* THIS SOFTWARE IS PROVIDED BY DUKE UNIVERSITY ``AS IS'' AND ANY
* EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL DUKE UNIVERSITY BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITSOR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
* IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
* OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
* ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
* $FreeBSD$
*/
#ifndef _SYS_JUMBO_H_
#define _SYS_JUMBO_H_
#ifdef _KERNEL
extern vm_offset_t jumbo_basekva;
static __inline caddr_t jumbo_phys_to_kva(vm_offset_t pa);
static __inline caddr_t
jumbo_phys_to_kva(vm_offset_t pa)
{
vm_page_t pg;
pg = PHYS_TO_VM_PAGE(pa);
pg->flags &= ~PG_BUSY;
return (caddr_t)(ptoa(pg->pindex) + jumbo_basekva);
}
int jumbo_vm_init(void);
void jumbo_freem(caddr_t addr, void *args);
vm_page_t jumbo_pg_alloc(void);
void jumbo_pg_free(vm_offset_t addr);
void jumbo_pg_steal(vm_page_t pg);
#endif /* _KERNEL */
#endif /* !_SYS_JUMBO_H_ */

View File

@ -158,6 +158,7 @@ struct mbuf {
#define EXT_SFBUF 2 /* sendfile(2)'s sf_bufs */
#define EXT_NET_DRV 100 /* custom ext_buf provided by net driver(s) */
#define EXT_MOD_TYPE 200 /* custom module's ext_buf type */
#define EXT_DISPOSABLE 300 /* can throw this buffer away w/page flipping */
/*
* Flags copied when copying m_pkthdr.

View File

@ -391,6 +391,7 @@ void socantsendmore(struct socket *so);
int soclose(struct socket *so);
int soconnect(struct socket *so, struct sockaddr *nam, struct thread *td);
int soconnect2(struct socket *so1, struct socket *so2);
int socow_setup(struct mbuf *m0, struct uio *uio);
int socreate(int dom, struct socket **aso, int type, int proto,
struct ucred *cred, struct thread *td);
int sodisconnect(struct socket *so);

333
sys/sys/tiio.h Normal file
View File

@ -0,0 +1,333 @@
/*-
* Copyright (c) 1999, 2000 Kenneth D. Merry.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions, and the following disclaimer,
* without modification, immediately at the beginning of the file.
* 2. The name of the author may not be used to endorse or promote products
* derived from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
* $FreeBSD$
*/
/*
* The ti_stats structure below is from code with the following copyright,
* and originally comes from the Alteon firmware documentation.
*/
/*
* Copyright (c) 1997, 1998, 1999
* Bill Paul <wpaul@ctr.columbia.edu>. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. All advertising materials mentioning features or use of this software
* must display the following acknowledgement:
* This product includes software developed by Bill Paul.
* 4. Neither the name of the author nor the names of any co-contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY Bill Paul AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL Bill Paul OR THE VOICES IN HIS HEAD
* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
* THE POSSIBILITY OF SUCH DAMAGE.
*
* from: if_tireg.h,v 1.8 1999/07/23 18:46:24 wpaul Exp $
*/
#ifndef _SYS_TIIO_H_
#define _SYS_TIIO_H_
#include <sys/ioccom.h>
/*
* Tigon statistics counters.
*/
struct ti_stats {
/*
* MAC stats, taken from RFC 1643, ethernet-like MIB
*/
volatile u_int32_t dot3StatsAlignmentErrors; /* 0 */
volatile u_int32_t dot3StatsFCSErrors; /* 1 */
volatile u_int32_t dot3StatsSingleCollisionFrames; /* 2 */
volatile u_int32_t dot3StatsMultipleCollisionFrames; /* 3 */
volatile u_int32_t dot3StatsSQETestErrors; /* 4 */
volatile u_int32_t dot3StatsDeferredTransmissions; /* 5 */
volatile u_int32_t dot3StatsLateCollisions; /* 6 */
volatile u_int32_t dot3StatsExcessiveCollisions; /* 7 */
volatile u_int32_t dot3StatsInternalMacTransmitErrors; /* 8 */
volatile u_int32_t dot3StatsCarrierSenseErrors; /* 9 */
volatile u_int32_t dot3StatsFrameTooLongs; /* 10 */
volatile u_int32_t dot3StatsInternalMacReceiveErrors; /* 11 */
/*
* interface stats, taken from RFC 1213, MIB-II, interfaces group
*/
volatile u_int32_t ifIndex; /* 12 */
volatile u_int32_t ifType; /* 13 */
volatile u_int32_t ifMtu; /* 14 */
volatile u_int32_t ifSpeed; /* 15 */
volatile u_int32_t ifAdminStatus; /* 16 */
#define IF_ADMIN_STATUS_UP 1
#define IF_ADMIN_STATUS_DOWN 2
#define IF_ADMIN_STATUS_TESTING 3
volatile u_int32_t ifOperStatus; /* 17 */
#define IF_OPER_STATUS_UP 1
#define IF_OPER_STATUS_DOWN 2
#define IF_OPER_STATUS_TESTING 3
#define IF_OPER_STATUS_UNKNOWN 4
#define IF_OPER_STATUS_DORMANT 5
volatile u_int32_t ifLastChange; /* 18 */
volatile u_int32_t ifInDiscards; /* 19 */
volatile u_int32_t ifInErrors; /* 20 */
volatile u_int32_t ifInUnknownProtos; /* 21 */
volatile u_int32_t ifOutDiscards; /* 22 */
volatile u_int32_t ifOutErrors; /* 23 */
volatile u_int32_t ifOutQLen; /* deprecated */ /* 24 */
volatile u_int8_t ifPhysAddress[8]; /* 8 bytes */ /* 25 - 26 */
volatile u_int8_t ifDescr[32]; /* 27 - 34 */
u_int32_t alignIt; /* align to 64 bit for u_int64_ts following */
/*
* more interface stats, taken from RFC 1573, MIB-IIupdate,
* interfaces group
*/
volatile u_int64_t ifHCInOctets; /* 36 - 37 */
volatile u_int64_t ifHCInUcastPkts; /* 38 - 39 */
volatile u_int64_t ifHCInMulticastPkts; /* 40 - 41 */
volatile u_int64_t ifHCInBroadcastPkts; /* 42 - 43 */
volatile u_int64_t ifHCOutOctets; /* 44 - 45 */
volatile u_int64_t ifHCOutUcastPkts; /* 46 - 47 */
volatile u_int64_t ifHCOutMulticastPkts; /* 48 - 49 */
volatile u_int64_t ifHCOutBroadcastPkts; /* 50 - 51 */
volatile u_int32_t ifLinkUpDownTrapEnable; /* 52 */
volatile u_int32_t ifHighSpeed; /* 53 */
volatile u_int32_t ifPromiscuousMode; /* 54 */
volatile u_int32_t ifConnectorPresent; /* follow link state 55 */
/*
* Host Commands
*/
volatile u_int32_t nicCmdsHostState; /* 56 */
volatile u_int32_t nicCmdsFDRFiltering; /* 57 */
volatile u_int32_t nicCmdsSetRecvProdIndex; /* 58 */
volatile u_int32_t nicCmdsUpdateGencommStats; /* 59 */
volatile u_int32_t nicCmdsResetJumboRing; /* 60 */
volatile u_int32_t nicCmdsAddMCastAddr; /* 61 */
volatile u_int32_t nicCmdsDelMCastAddr; /* 62 */
volatile u_int32_t nicCmdsSetPromiscMode; /* 63 */
volatile u_int32_t nicCmdsLinkNegotiate; /* 64 */
volatile u_int32_t nicCmdsSetMACAddr; /* 65 */
volatile u_int32_t nicCmdsClearProfile; /* 66 */
volatile u_int32_t nicCmdsSetMulticastMode; /* 67 */
volatile u_int32_t nicCmdsClearStats; /* 68 */
volatile u_int32_t nicCmdsSetRecvJumboProdIndex; /* 69 */
volatile u_int32_t nicCmdsSetRecvMiniProdIndex; /* 70 */
volatile u_int32_t nicCmdsRefreshStats; /* 71 */
volatile u_int32_t nicCmdsUnknown; /* 72 */
/*
* NIC Events
*/
volatile u_int32_t nicEventsNICFirmwareOperational; /* 73 */
volatile u_int32_t nicEventsStatsUpdated; /* 74 */
volatile u_int32_t nicEventsLinkStateChanged; /* 75 */
volatile u_int32_t nicEventsError; /* 76 */
volatile u_int32_t nicEventsMCastListUpdated; /* 77 */
volatile u_int32_t nicEventsResetJumboRing; /* 78 */
/*
* Ring manipulation
*/
volatile u_int32_t nicRingSetSendProdIndex; /* 79 */
volatile u_int32_t nicRingSetSendConsIndex; /* 80 */
volatile u_int32_t nicRingSetRecvReturnProdIndex; /* 81 */
/*
* Interrupts
*/
volatile u_int32_t nicInterrupts; /* 82 */
volatile u_int32_t nicAvoidedInterrupts; /* 83 */
/*
* BD Coalessing Thresholds
*/
volatile u_int32_t nicEventThresholdHit; /* 84 */
volatile u_int32_t nicSendThresholdHit; /* 85 */
volatile u_int32_t nicRecvThresholdHit; /* 86 */
/*
* DMA Attentions
*/
volatile u_int32_t nicDmaRdOverrun; /* 87 */
volatile u_int32_t nicDmaRdUnderrun; /* 88 */
volatile u_int32_t nicDmaWrOverrun; /* 89 */
volatile u_int32_t nicDmaWrUnderrun; /* 90 */
volatile u_int32_t nicDmaWrMasterAborts; /* 91 */
volatile u_int32_t nicDmaRdMasterAborts; /* 92 */
/*
* NIC Resources
*/
volatile u_int32_t nicDmaWriteRingFull; /* 93 */
volatile u_int32_t nicDmaReadRingFull; /* 94 */
volatile u_int32_t nicEventRingFull; /* 95 */
volatile u_int32_t nicEventProducerRingFull; /* 96 */
volatile u_int32_t nicTxMacDescrRingFull; /* 97 */
volatile u_int32_t nicOutOfTxBufSpaceFrameRetry; /* 98 */
volatile u_int32_t nicNoMoreWrDMADescriptors; /* 99 */
volatile u_int32_t nicNoMoreRxBDs; /* 100 */
volatile u_int32_t nicNoSpaceInReturnRing; /* 101 */
volatile u_int32_t nicSendBDs; /* current count 102 */
volatile u_int32_t nicRecvBDs; /* current count 103 */
volatile u_int32_t nicJumboRecvBDs; /* current count 104 */
volatile u_int32_t nicMiniRecvBDs; /* current count 105 */
volatile u_int32_t nicTotalRecvBDs; /* current count 106 */
volatile u_int32_t nicTotalSendBDs; /* current count 107 */
volatile u_int32_t nicJumboSpillOver; /* 108 */
volatile u_int32_t nicSbusHangCleared; /* 109 */
volatile u_int32_t nicEnqEventDelayed; /* 110 */
/*
* Stats from MAC rx completion
*/
volatile u_int32_t nicMacRxLateColls; /* 111 */
volatile u_int32_t nicMacRxLinkLostDuringPkt; /* 112 */
volatile u_int32_t nicMacRxPhyDecodeErr; /* 113 */
volatile u_int32_t nicMacRxMacAbort; /* 114 */
volatile u_int32_t nicMacRxTruncNoResources; /* 115 */
/*
* Stats from the mac_stats area
*/
volatile u_int32_t nicMacRxDropUla; /* 116 */
volatile u_int32_t nicMacRxDropMcast; /* 117 */
volatile u_int32_t nicMacRxFlowControl; /* 118 */
volatile u_int32_t nicMacRxDropSpace; /* 119 */
volatile u_int32_t nicMacRxColls; /* 120 */
/*
* MAC RX Attentions
*/
volatile u_int32_t nicMacRxTotalAttns; /* 121 */
volatile u_int32_t nicMacRxLinkAttns; /* 122 */
volatile u_int32_t nicMacRxSyncAttns; /* 123 */
volatile u_int32_t nicMacRxConfigAttns; /* 124 */
volatile u_int32_t nicMacReset; /* 125 */
volatile u_int32_t nicMacRxBufDescrAttns; /* 126 */
volatile u_int32_t nicMacRxBufAttns; /* 127 */
volatile u_int32_t nicMacRxZeroFrameCleanup; /* 128 */
volatile u_int32_t nicMacRxOneFrameCleanup; /* 129 */
volatile u_int32_t nicMacRxMultipleFrameCleanup; /* 130 */
volatile u_int32_t nicMacRxTimerCleanup; /* 131 */
volatile u_int32_t nicMacRxDmaCleanup; /* 132 */
/*
* Stats from the mac_stats area
*/
volatile u_int32_t nicMacTxCollisionHistogram[15]; /* 133 */
/*
* MAC TX Attentions
*/
volatile u_int32_t nicMacTxTotalAttns; /* 134 */
/*
* NIC Profile
*/
volatile u_int32_t nicProfile[32]; /* 135 */
/*
* Pat to 1024 bytes.
*/
u_int32_t pad[75];
};
struct tg_reg {
u_int32_t data;
u_int32_t addr;
};
struct tg_mem {
u_int32_t tgAddr;
caddr_t userAddr;
int len;
};
typedef enum {
TI_PARAM_NONE = 0x00,
TI_PARAM_STAT_TICKS = 0x01,
TI_PARAM_RX_COAL_TICKS = 0x02,
TI_PARAM_TX_COAL_TICKS = 0x04,
TI_PARAM_RX_COAL_BDS = 0x08,
TI_PARAM_TX_COAL_BDS = 0x10,
TI_PARAM_TX_BUF_RATIO = 0x20,
TI_PARAM_ALL = 0x2f
} ti_param_mask;
struct ti_params {
u_int32_t ti_stat_ticks;
u_int32_t ti_rx_coal_ticks;
u_int32_t ti_tx_coal_ticks;
u_int32_t ti_rx_max_coal_bds;
u_int32_t ti_tx_max_coal_bds;
u_int32_t ti_tx_buf_ratio;
ti_param_mask param_mask;
};
typedef enum {
TI_TRACE_TYPE_NONE = 0x00000000,
TI_TRACE_TYPE_SEND = 0x00000001,
TI_TRACE_TYPE_RECV = 0x00000002,
TI_TRACE_TYPE_DMA = 0x00000004,
TI_TRACE_TYPE_EVENT = 0x00000008,
TI_TRACE_TYPE_COMMAND = 0x00000010,
TI_TRACE_TYPE_MAC = 0x00000020,
TI_TRACE_TYPE_STATS = 0x00000040,
TI_TRACE_TYPE_TIMER = 0x00000080,
TI_TRACE_TYPE_DISP = 0x00000100,
TI_TRACE_TYPE_MAILBOX = 0x00000200,
TI_TRACE_TYPE_RECV_BD = 0x00000400,
TI_TRACE_TYPE_LNK_PHY = 0x00000800,
TI_TRACE_TYPE_LNK_NEG = 0x00001000,
TI_TRACE_LEVEL_1 = 0x10000000,
TI_TRACE_LEVEL_2 = 0x20000000
} ti_trace_type;
struct ti_trace_buf {
u_long *buf;
int buf_len;
int fill_len;
u_long cur_trace_ptr;
};
#define TIIOCGETSTATS _IOR('T', 1, struct ti_stats)
#define TIIOCGETPARAMS _IOR('T', 2, struct ti_params)
#define TIIOCSETPARAMS _IOW('T', 3, struct ti_params)
#define TIIOCSETTRACE _IOW('T', 11, ti_trace_type)
#define TIIOCGETTRACE _IOWR('T', 12, struct ti_trace_buf)
/*
* Taken from Alteon's altioctl.h. Alteon's ioctl numbers 1-6 aren't
* used by the FreeBSD driver.
*/
#define ALT_ATTACH _IO('a', 7)
#define ALT_READ_TG_MEM _IOWR('a', 10, struct tg_mem)
#define ALT_WRITE_TG_MEM _IOWR('a', 11, struct tg_mem)
#define ALT_READ_TG_REG _IOWR('a', 12, struct tg_reg)
#define ALT_WRITE_TG_REG _IOWR('a', 13, struct tg_reg)
#endif /* _SYS_TIIO_H_ */

View File

@ -85,7 +85,7 @@ struct vm_object;
void uio_yield(void);
int uiomove(caddr_t, int, struct uio *);
int uiomoveco(caddr_t, int, struct uio *, struct vm_object *);
int uiomoveco(caddr_t, int, struct uio *, struct vm_object *, int);
int uioread(int, struct uio *, struct vm_object *, int *);
int copyinfrom(const void *src, void *dst, size_t len, int seg);
int copyinstrfrom(const void *src, void *dst, size_t len,

View File

@ -316,7 +316,7 @@ READ(ap)
*/
error =
uiomoveco((char *)bp->b_data + blkoffset,
(int)xfersize, uio, object);
(int)xfersize, uio, object, 0);
} else
#endif
{

View File

@ -311,6 +311,20 @@ RetryFault:;
fs.m = vm_page_lookup(fs.object, fs.pindex);
if (fs.m != NULL) {
int queue, s;
/*
* check for page-based copy on write
*/
if ((fs.m->cow) &&
(fault_type & VM_PROT_WRITE)) {
s = splvm();
vm_page_cowfault(fs.m);
splx(s);
unlock_things(&fs);
goto RetryFault;
}
/*
* Wait/Retry if the page is busy. We have to do this
* if the page is busy via either PG_BUSY or

View File

@ -335,6 +335,26 @@ vm_object_pip_wait(vm_object_t object, char *waitid)
vm_object_pip_sleep(object, waitid);
}
/*
* vm_object_allocate_wait
*
* Return a new object with the given size, and give the user the
* option of waiting for it to complete or failing if the needed
* memory isn't available.
*/
vm_object_t
vm_object_allocate_wait(objtype_t type, vm_pindex_t size, int flags)
{
vm_object_t result;
result = (vm_object_t) uma_zalloc(obj_zone, flags);
if (result != NULL)
_vm_object_allocate(type, size, result);
return (result);
}
/*
* vm_object_allocate:
*
@ -343,12 +363,7 @@ vm_object_pip_wait(vm_object_t object, char *waitid)
vm_object_t
vm_object_allocate(objtype_t type, vm_pindex_t size)
{
vm_object_t result;
result = (vm_object_t) uma_zalloc(obj_zone, M_WAITOK);
_vm_object_allocate(type, size, result);
return (result);
return(vm_object_allocate_wait(type, size, M_WAITOK));
}

View File

@ -183,6 +183,7 @@ void vm_object_pip_sleep(vm_object_t object, char *waitid);
void vm_object_pip_wait(vm_object_t object, char *waitid);
vm_object_t vm_object_allocate (objtype_t, vm_pindex_t);
vm_object_t vm_object_allocate_wait (objtype_t, vm_pindex_t, int);
void _vm_object_allocate (objtype_t, vm_pindex_t, vm_object_t);
boolean_t vm_object_coalesce (vm_object_t, vm_pindex_t, vm_size_t, vm_size_t);
void vm_object_collapse (vm_object_t);

View File

@ -1733,6 +1733,75 @@ vm_page_test_dirty(vm_page_t m)
}
}
int so_zerocp_fullpage = 0;
void
vm_page_cowfault(vm_page_t m)
{
vm_page_t mnew;
vm_object_t object;
vm_pindex_t pindex;
object = m->object;
pindex = m->pindex;
vm_page_busy(m);
retry_alloc:
vm_page_remove(m);
mnew = vm_page_alloc(object, pindex, VM_ALLOC_NORMAL);
if (mnew == NULL) {
vm_page_insert(m, object, pindex);
VM_WAIT;
goto retry_alloc;
}
if (m->cow == 0) {
/*
* check to see if we raced with an xmit complete when
* waiting to allocate a page. If so, put things back
* the way they were
*/
vm_page_busy(mnew);
vm_page_free(mnew);
vm_page_insert(m, object, pindex);
} else { /* clear COW & copy page */
if (so_zerocp_fullpage) {
mnew->valid = VM_PAGE_BITS_ALL;
} else {
vm_page_copy(m, mnew);
}
vm_page_dirty(mnew);
vm_page_flag_clear(mnew, PG_BUSY);
}
vm_page_wakeup(m); /*unbusy the page */
}
void
vm_page_cowclear(vm_page_t m)
{
/* XXX KDM find out if giant is required here. */
GIANT_REQUIRED;
if (m->cow) {
atomic_subtract_int(&m->cow, 1);
/*
* let vm_fault add back write permission lazily
*/
}
/*
* sf_buf_free() will free the page, so we needn't do it here
*/
}
void
vm_page_cowsetup(vm_page_t m)
{
/* XXX KDM find out if giant is required here */
GIANT_REQUIRED;
atomic_add_int(&m->cow, 1);
vm_page_protect(m, VM_PROT_READ);
}
#include "opt_ddb.h"
#ifdef DDB
#include <sys/kernel.h>

View File

@ -133,6 +133,7 @@ struct vm_page {
u_short valid; /* map of valid DEV_BSIZE chunks */
u_short dirty; /* map of dirty DEV_BSIZE chunks */
#endif
u_int cow; /* page cow mapping count */
};
/*
@ -363,5 +364,9 @@ int vm_page_bits (int, int);
void vm_page_zero_invalid(vm_page_t m, boolean_t setvalid);
void vm_page_free_toq(vm_page_t m);
void vm_page_zero_idle_wakeup(void);
void vm_page_cowfault (vm_page_t);
void vm_page_cowsetup (vm_page_t);
void vm_page_cowclear (vm_page_t);
#endif /* _KERNEL */
#endif /* !_VM_PAGE_ */