Final commit to round out the "Five New TCP Congestion Control Algorithms for

FreeBSD" FreeBSD Foundation funded project.

- Add new man pages for the modular congestion control, Khelp and Hhook
  frameworks (cc.4, cc.9, khelp.9 and hhook.9).

- Add new man pages for each available congestion control algorithm (cc_chd.4,
  cc_cubic.4, cc_hd.4, cc_htcp.4, cc_newreno.4 and cc_vegas.4).

- Add a new man page for the Enhanced Round Trip Time (ERTT) Khelp module
  (h_ertt.4).

- Update the TCP (tcp.4) man page to mention the TCP_CONGESTION socket option,
  cross reference to cc.4 and remove references to the retired
  "net.inet.tcp.newreno" sysctl MIB variable.

In collaboration with:	David Hayes <dahayes at swin edu au> and
				Grenville Armitage <garmitage at swin edu au>
Sponsored by:	FreeBSD Foundation
MFC after:	3 months
This commit is contained in:
Lawrence Stewart 2011-02-21 11:56:11 +00:00
parent bcaa6ebc45
commit 29f269dc1f
14 changed files with 2170 additions and 8 deletions

View File

@ -69,6 +69,13 @@ MAN= aac.4 \
cardbus.4 \
carp.4 \
cas.4 \
cc.4 \
cc_chd.4 \
cc_cubic.4 \
cc_hd.4 \
cc_htcp.4 \
cc_newreno.4 \
cc_vegas.4 \
ccd.4 \
cd.4 \
cdce.4 \
@ -131,6 +138,7 @@ MAN= aac.4 \
gif.4 \
gpib.4 \
gre.4 \
h_ertt.4 \
harp.4 \
hatm.4 \
hfa.4 \

118
share/man/man4/cc.4 Normal file
View File

@ -0,0 +1,118 @@
.\"
.\" Copyright (c) 2010-2011 The FreeBSD Foundation
.\" All rights reserved.
.\"
.\" This documentation was written at the Centre for Advanced Internet
.\" Architectures, Swinburne University, Melbourne, Australia by David Hayes and
.\" Lawrence Stewart under sponsorship from the FreeBSD Foundation.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
.\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" $FreeBSD$
.\"
.Dd February 15, 2011
.Dt cc 4
.Os
.Sh NAME
.Nm cc
.Nd Modular congestion control
.Sh DESCRIPTION
The modular congestion control framework allows the TCP implementation to
dynamically change the congestion control algorithm used by new and existing
connections.
Algorithms are identified by a unique
.Xr ascii 7
name.
Algorithm modules can be compiled into the kernel or loaded as kernel modules
using the
.Xr kld 4
facility.
.Pp
The default algorithm is NewReno, and all connections use the default unless
explicitly overridden using the TCP_CONGESTION socket option (see
.Xr tcp 4
for details).
The default can be changed using a
.Xr sysctl 3
MIB variable detailed in the
.Sx MIB Variables
section below.
.Sh MIB Variables
The framework exposes the following variables in the
.Va net.inet.tcp.cc
branch of the
.Xr sysctl 3
MIB:
.Bl -tag -width ".Va available"
.It Va available
Read-only list of currently available congestion control algorithms by name.
.El
.Bl -tag -width ".Va algorithm"
.It Va algorithm
Returns the current default congestion control algorithm when read, and changes
the default when set.
When attempting to change the default algorithm, this variable should be set to
one of the names listed by the
.Va net.inet.tcp.cc.available
MIB variable.
.El
.Sh SEE ALSO
.Xr cc_chd 4 ,
.Xr cc_cubic 4 ,
.Xr cc_hd 4 ,
.Xr cc_htcp 4 ,
.Xr cc_newreno 4 ,
.Xr cc_vegas 4 ,
.Xr tcp 4 ,
.Xr cc 9
.Sh ACKNOWLEDGEMENTS
Development and testing of this software were made possible in part by grants
from the FreeBSD Foundation and Cisco University Research Program Fund at
Community Foundation Silicon Valley.
.Sh HISTORY
The
.Nm
modular congestion control framework first appeared in
.Fx 9.0 .
.Pp
The framework was first released in 2007 by James Healy and Lawrence Stewart
whilst working on the NewTCP research project at Swinburne University's Centre
for Advanced Internet Architectures, Melbourne, Australia, which was made
possible in part by a grant from the Cisco University Research Program Fund at
Community Foundation Silicon Valley.
More details are available at:
.Pp
http://caia.swin.edu.au/urp/newtcp/
.Sh AUTHORS
.An -nosplit
The
.Nm
facility was written by
.An Lawrence Stewart Aq lstewart@FreeBSD.org ,
.An James Healy Aq jimmy@deefa.com
and
.An David Hayes Aq david.hayes@ieee.org .
.Pp
This manual page was written by
.An David Hayes Aq david.hayes@ieee.org
and
.An Lawrence Stewart Aq lstewart@FreeBSD.org .

127
share/man/man4/cc_chd.4 Normal file
View File

@ -0,0 +1,127 @@
.\"
.\" Copyright (c) 2010-2011 The FreeBSD Foundation
.\" All rights reserved.
.\"
.\" This documentation was written at the Centre for Advanced Internet
.\" Architectures, Swinburne University, Melbourne, Australia by David Hayes
.\" under sponsorship from the FreeBSD Foundation.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
.\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" $FreeBSD$
.\"
.Dd February 15, 2011
.Dt CC_CHD 4
.Os
.Sh NAME
.Nm cc_chd
.Nd CHD Congestion Control Algorithm
.Sh DESCRIPTION
CHD enhances the HD algorithm implemented in
.Xr cc_hd 4 .
It provides tolerance to non-congestion related packet loss and improvements to
coexistence with traditional loss-based TCP flows, especially when the
bottleneck link is lightly multiplexed.
.Pp
Like HD, the algorithm aims to keep network queuing delays below a particular
threshold (queue_threshold) and decides to reduce the congestion window (cwnd)
probabilistically based on its estimate of the network queuing delay.
.Pp
It differs from HD in three key aspects:
.Bl -bullet
.It
The probability of cwnd reduction due to congestion is calculated once per round
trip time instead of each time an acknowledgement is received as done by
.Xr cc_hd 4 .
.It
Packet losses that occur while the queuing delay is less than queue_threshold
do not cause cwnd to be reduced.
.It
CHD uses a shadow window to help regain lost transmission opportunities when
competing with loss-based TCP flows.
.Sh MIB Variables
The algorithm exposes the following tunable variables in the
.Va net.inet.tcp.cc.chd
branch of the
.Xr sysctl 3
MIB:
.Bl -tag -width ".Va queue_threshold"
.It Va queue_threshold
Queueing congestion threshold (qth) in ticks.
Default is 20.
.It Va pmax
Per RTT maximum backoff probability as a percentage.
Default is 50.
.It Va qmin
Minimum queuing delay threshold (qmin) in ticks.
Default is 5.
.It Va loss_fair
If 1, cwnd is adjusted using the shadow window when a congestion
related loss is detected.
Default is 1.
.It Va use_max
If 1, the maximum RTT seen within the measurement period is used as the basic
delay measurement for the algorithm, otherwise a sampled RTT measurement
is used.
Default is 1.
.El
.Sh SEE ALSO
.Xr cc 4 ,
.Xr cc_cubic 4 ,
.Xr cc_hd 4 ,
.Xr cc_htcp 4 ,
.Xr cc_newreno 4 ,
.Xr cc_vegas 4 ,
.Xr h_ertt 4 ,
.Xr tcp 4 ,
.Xr cc 9 ,
.Xr khelp 9
.Rs
.%A "D. A. Hayes"
.%A "G. Armitage"
.%T "Improved coexistence and loss tolerance for delay based TCP congestion control"
.%J "in 35th Annual IEEE Conference on Local Computer Networks"
.%D "October 2010"
.%P "24-31"
.Re
.Sh ACKNOWLEDGEMENTS
Development and testing of this software were made possible in part by grants
from the FreeBSD Foundation and Cisco University Research Program Fund at
Community Foundation Silicon Valley.
.Sh HISTORY
The
.Nm
congestion control module first appeared in
.Fx 9.0 .
.Pp
The module was first released in 2010 by David Hayes whilst working on the
NewTCP research project at Swinburne University's Centre for Advanced Internet
Architectures, Melbourne, Australia.
More details are available at:
.Pp
http://caia.swin.edu.au/urp/newtcp/
.Sh AUTHORS
.An -nosplit
The
.Nm
congestion control module and this manual page were written by
.An David Hayes Aq david.hayes@ieee.org .

114
share/man/man4/cc_cubic.4 Normal file
View File

@ -0,0 +1,114 @@
.\"
.\" Copyright (c) 2009 Lawrence Stewart <lstewart@FreeBSD.org>
.\" Copyright (c) 2010-2011 The FreeBSD Foundation
.\" All rights reserved.
.\"
.\" Portions of this documentation were written at the Centre for Advanced
.\" Internet Architectures, Swinburne University, Melbourne, Australia by
.\" David Hayes under sponsorship from the FreeBSD Foundation.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
.\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" $FreeBSD$
.\"
.Dd February 15, 2011
.Dt CC_CUBIC 4
.Os
.Sh NAME
.Nm cc_cubic
.Nd CUBIC Congestion Control Algorithm
.Sh DESCRIPTION
The CUBIC congestion control algorithm was designed to provide increased
throughput in fast and long-distance networks.
It attempts to maintain fairness when competing with legacy NewReno TCP in lower
speed scenarios where NewReno is able to operate adequately.
.Pp
The congestion window is increased as a function of the time elapsed since the
last congestion event.
During regular operation, the window increase function follows a cubic function,
with the inflection point set to be the congestion window value reached at the
last congestion event.
CUBIC also calculates an estimate of the congestion window that NewReno would
have achieved at a given time after a congestion event.
When updating the congestion window, the algorithm will choose the larger of the
calculated CUBIC and estimated NewReno windows.
.Pp
CUBIC also backs off less on congestion by changing the multiplicative decrease
factor from 1/2 (used by standard NewReno TCP) to 4/5.
.Pp
The implementation was done in a clean-room fashion, and is based on the
Internet Draft and paper referenced in the
.Sx SEE ALSO
section below.
.Sh MIB Variables
There are currently no tunable MIB variables.
.Sh SEE ALSO
.Xr cc 4 ,
.Xr cc_chd 4 ,
.Xr cc_hd 4 ,
.Xr cc_htcp 4 ,
.Xr cc_newreno 4 ,
.Xr cc_vegas 4 ,
.Xr tcp 4 ,
.Xr cc 9
.Rs
.%A "Sangtae Ha"
.%A "Injong Rhee"
.%A "Lisong Xu"
.%T "CUBIC for Fast Long-Distance Networks"
.%U "http://tools.ietf.org/id/draft-rhee-tcpm-cubic-02.txt"
.Re
.Rs
.%A "Sangtae Ha"
.%A "Injong Rhee"
.%A "Lisong Xu"
.%T "CUBIC: a new TCP-friendly high-speed TCP variant"
.%J "SIGOPS Oper. Syst. Rev."
.%V "42"
.%N "5"
.%D "July 2008"
.%P "64-74"
.Re
.Sh ACKNOWLEDGEMENTS
Development and testing of this software were made possible in part by grants
from the FreeBSD Foundation and Cisco University Research Program Fund at
Community Foundation Silicon Valley.
.Sh HISTORY
The
.Nm
congestion control module first appeared in
.Fx 9.0 .
.Pp
The module was first released in 2009 by Lawrence Stewart whilst studying at
Swinburne University's Centre for Advanced Internet Architectures, Melbourne,
Australia.
More details are available at:
.Pp
http://caia.swin.edu.au/urp/newtcp/
.Sh AUTHORS
.An -nosplit
The
.Nm
congestion control module and this manual page were written by
.An Lawrence Stewart Aq lstewart@FreeBSD.org
and
.An David Hayes Aq david.hayes@ieee.org .

120
share/man/man4/cc_hd.4 Normal file
View File

@ -0,0 +1,120 @@
.\"
.\" Copyright (c) 2010-2011 The FreeBSD Foundation
.\" All rights reserved.
.\"
.\" This documentation was written at the Centre for Advanced Internet
.\" Architectures, Swinburne University, Melbourne, Australia by David Hayes
.\" under sponsorship from the FreeBSD Foundation.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
.\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" $FreeBSD$
.\"
.Dd February 15, 2011
.Dt CC_HD 4
.Os
.Sh NAME
.Nm cc_hd
.Nd HD Congestion Control Algorithm
.Sh DESCRIPTION
The HD congestion control algorithm is an implementation of the Hamilton
Institute's delay-based congestion control which aims to keep network queuing
delays below a particular threshold (queue_threshold).
.Pp
HD probabilistically reduces the congestion window (cwnd) based on its estimate
of the network queuing delay.
The probability of reducing cwnd is zero at hd_qmin or less, rising to a maximum
at queue_threshold, and then back to zero at the maximum queuing delay.
.Pp
Loss-based congestion control algorithms such as NewReno probe for network
capacity by filling queues until there is a packet loss.
HD competes with loss-based congestion control algorithms by allowing its
probability of reducing cwnd to drop from a maximum at queue_threshold to be
zero at the maximum queuing delay.
This has been shown to work well when the bottleneck link is highly multiplexed.
.Sh MIB Variables
The algorithm exposes the following tunable variables in the
.Va net.inet.tcp.cc.hd
branch of the
.Xr sysctl 3
MIB:
.Bl -tag -width ".Va queue_threshold"
.It Va queue_threshold
Queueing congestion threshold (qth) in ticks.
Default is 20.
.It Va pmax
Per packet maximum backoff probability as a percentage.
Default is 5.
.It Va qmin
Minimum queuing delay threshold (qmin) in ticks.
Default is 5.
.El
.Sh SEE ALSO
.Xr cc 4 ,
.Xr cc_chd 4 ,
.Xr cc_cubic 4 ,
.Xr cc_htcp 4 ,
.Xr cc_newreno 4 ,
.Xr cc_vegas 4 ,
.Xr h_ertt 4 ,
.Xr tcp 4 ,
.Xr cc 9 ,
.Xr khelp 9
.Rs
.%A "L. Budzisz"
.%A "R. Stanojevic"
.%A "R. Shorten"
.%A "F. Baker"
.%T "A strategy for fair coexistence of loss and delay-based congestion control algorithms"
.%J "IEEE Commun. Lett."
.%D "Jul 2009"
.%V "13"
.%N "7"
.%P "555-557"
.Re
.Sh ACKNOWLEDGEMENTS
Development and testing of this software were made possible in part by grants
from the FreeBSD Foundation and Cisco University Research Program Fund at
Community Foundation Silicon Valley.
.Sh FUTURE WORK
The Hamilton Institute have recently made some improvements to the algorithm
implemented by this module and have called it Coexistent-TCP (C-TCP).
The improvments should be evaluated and potentially incorporated into this
module.
.Sh HISTORY
The
.Nm
congestion control module first appeared in
.Fx 9.0 .
.Pp
The module was first released in 2010 by David Hayes whilst working on the
NewTCP research project at Swinburne University's Centre for Advanced Internet
Architectures, Melbourne, Australia.
More details are available at:
.Pp
http://caia.swin.edu.au/urp/newtcp/
.Sh AUTHORS
.An -nosplit
The
.Nm
congestion control module and this manual page were written by
.An David Hayes Aq david.hayes@ieee.org .

136
share/man/man4/cc_htcp.4 Normal file
View File

@ -0,0 +1,136 @@
.\"
.\" Copyright (c) 2008 Lawrence Stewart <lstewart@FreeBSD.org>
.\" Copyright (c) 2010-2011 The FreeBSD Foundation
.\" All rights reserved.
.\"
.\" Portions of this documentation were written at the Centre for Advanced
.\" Internet Architectures, Swinburne University, Melbourne, Australia by
.\" David Hayes under sponsorship from the FreeBSD Foundation.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
.\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" $FreeBSD$
.\"
.Dd February 15, 2011
.Dt CC_HTCP 4
.Os
.Sh NAME
.Nm cc_htcp
.Nd H-TCP Congestion Control Algorithm
.Sh DESCRIPTION
The H-TCP congestion control algorithm was designed to provide increased
throughput in fast and long-distance networks.
It attempts to maintain fairness when competing with legacy NewReno TCP in lower
speed scenarios where NewReno is able to operate adequately.
.Pp
The congestion window is increased as a function of the time elapsed since the
last congestion event.
The window increase algorithm operates like NewReno for the first second after a
congestion event, and then switches to a high-speed mode based on a quadratic
increase function.
.Pp
The implementation was done in a clean-room fashion, and is based on the
Internet Draft and other documents referenced in the
.Sx SEE ALSO
section below.
.Sh MIB Variables
The algorithm exposes the following tunable variables in the
.Va net.inet.tcp.cc.htcp
branch of the
.Xr sysctl 3
MIB:
.Bl -tag -width ".Va adaptive_backoff"
.It Va adaptive_backoff
Controls use of the adaptive backoff algorithm, which is designed to keep
network queues non-empty during congestion recovery episodes.
Default is 0 (disabled).
.It Va rtt_scaling
Controls use of the RTT scaling algorithm, which is designed to make congestion
window increase during congestion avoidance mode invariant with respect to RTT.
Default is 0 (disabled).
.El
.Sh SEE ALSO
.Xr cc 4 ,
.Xr cc_chd 4 ,
.Xr cc_cubic 4 ,
.Xr cc_hd 4 ,
.Xr cc_newreno 4 ,
.Xr cc_vegas 4 ,
.Xr tcp 4 ,
.Xr cc 9
.Rs
.%A "D. Leith"
.%A "R. Shorten"
.%T "H-TCP: TCP Congestion Control for High Bandwidth-Delay Product Paths"
.%U "http://tools.ietf.org/id/draft-leith-tcp-htcp-06.txt"
.Re
.Rs
.%A "D. Leith"
.%A "R. Shorten"
.%A "T. Yee"
.%T "H-TCP: A framework for congestion control in high-speed and long-distance networks"
.%B "Proc. PFLDnet"
.%D "2005"
.Re
.Rs
.%A "G. Armitage"
.%A "L. Stewart"
.%A "M. Welzl"
.%A "J. Healy"
.%T "An independent H-TCP implementation under FreeBSD 7.0: description and observed behaviour"
.%J "SIGCOMM Comput. Commun. Rev."
.%V "38"
.%N "3"
.%D "July 2008"
.%P "27-38"
.Re
.Sh ACKNOWLEDGEMENTS
Development and testing of this software were made possible in part by grants
from the FreeBSD Foundation and Cisco University Research Program Fund at
Community Foundation Silicon Valley.
.Sh HISTORY
The
.Nm
congestion control module first appeared in
.Fx 9.0 .
.Pp
The module was first released in 2007 by James Healy and Lawrence Stewart whilst
working on the NewTCP research project at Swinburne University's Centre for
Advanced Internet Architectures, Melbourne, Australia, which was made possible
in part by a grant from the Cisco University Research Program Fund at Community
Foundation Silicon Valley.
More details are available at:
.Pp
http://caia.swin.edu.au/urp/newtcp/
.Sh AUTHORS
.An -nosplit
The
.Nm
congestion control module was written by
.An James Healy Aq jimmy@deefa.com
and
.An Lawrence Stewart Aq lstewart@FreeBSD.org .
.Pp
This manual page was written by
.An Lawrence Stewart Aq lstewart@FreeBSD.org
and
.An David Hayes Aq david.hayes@ieee.org .

View File

@ -0,0 +1,82 @@
.\"
.\" Copyright (c) 2009 Lawrence Stewart <lstewart@FreeBSD.org>
.\" Copyright (c) 2011 The FreeBSD Foundation
.\" All rights reserved.
.\"
.\" Portions of this documentation were written at the Centre for Advanced
.\" Internet Architectures, Swinburne University, Melbourne, Australia by
.\" Lawrence Stewart under sponsorship from the FreeBSD Foundation.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
.\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" $FreeBSD$
.\"
.Dd February 15, 2011
.Dt CC_NEWRENO 4
.Os
.Sh NAME
.Nm cc_newreno
.Nd NewReno Congestion Control Algorithm
.Sh DESCRIPTION
The NewReno congestion control algorithm is the default for TCP.
Details about the algorithm can be found in RFC5681.
.Sh MIB Variables
There are currently no tunable MIB variables.
.Sh SEE ALSO
.Xr cc 4 ,
.Xr cc_chd 4 ,
.Xr cc_cubic 4 ,
.Xr cc_hd 4 ,
.Xr cc_htcp 4 ,
.Xr cc_vegas 4 ,
.Xr tcp 4 ,
.Xr cc 9
.Sh ACKNOWLEDGEMENTS
Development and testing of this software were made possible in part by grants
from the FreeBSD Foundation and Cisco University Research Program Fund at
Community Foundation Silicon Valley.
.Sh HISTORY
The
.Nm
congestion control algorithm first appeared in its modular form in
.Fx 9.0 .
.Pp
The module was first released in 2007 by James Healy and Lawrence Stewart whilst
working on the NewTCP research project at Swinburne University's Centre for
Advanced Internet Architectures, Melbourne, Australia, which was made possible
in part by a grant from the Cisco University Research Program Fund at Community
Foundation Silicon Valley.
More details are available at:
.Pp
http://caia.swin.edu.au/urp/newtcp/
.Sh AUTHORS
.An -nosplit
The
.Nm
congestion control module was written by
.An James Healy Aq jimmy@deefa.com ,
.An Lawrence Stewart Aq lstewart@FreeBSD.org
and
.An David Hayes Aq david.hayes@ieee.org .
.Pp
This manual page was written by
.An Lawrence Stewart Aq lstewart@FreeBSD.org .

138
share/man/man4/cc_vegas.4 Normal file
View File

@ -0,0 +1,138 @@
.\"
.\" Copyright (c) 2010-2011 The FreeBSD Foundation
.\" All rights reserved.
.\"
.\" This documentation was written at the Centre for Advanced Internet
.\" Architectures, Swinburne University, Melbourne, Australia by David Hayes
.\" under sponsorship from the FreeBSD Foundation.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
.\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" $FreeBSD$
.\"
.Dd February 15, 2011
.Dt CC_VEGAS 4
.Os
.Sh NAME
.Nm cc_vegas
.Nd Vegas Congestion Control Algorithm
.Sh DESCRIPTION
The Vegas congestion control algorithm uses what the authors term the actual and
expected transmission rates to determine whether there is congestion along the
network path i.e.
.Pp
.Bl -item -offset indent
.It
actual rate = (total data sent in a RTT) / RTT
.It
expected rate = cwnd / RTTmin
.It
diff = expected - actual
.El
.Pp
where RTT is the measured instantaneous round trip time and RTTmin is the
smallest round trip time observed during the connection.
.Pp
The algorithm aims to keep diff between two parameters alpha and beta, such
that:
.Pp
.Bl -item -offset indent
.It
alpha < diff < beta
.El
.Pp
If diff > beta, congestion is inferred and cwnd is decremented by one packet (or
the maximum TCP segment size).
If diff < alpha, then cwnd is incremented by one packet.
Alpha and beta govern the amount of buffering along the path.
.Pp
The implementation was done in a clean-room fashion, and is based on the
paper referenced in the
.Sx SEE ALSO
section below.
.Sh IMPLEMENTATION NOTES
The time from the transmission of a marked packet until the receipt of an
acknowledgement for that packet is measured once per RTT.
This implementation does not implement Brakmo's and Peterson's original
duplicate ACK policy since clock ticks in today's machines are not as coarse as
they were (i.e. 500ms) when Vegas was originally designed.
Note that modern TCP recovery processes such as fast retransmit and SACK are
enabled by default in the TCP stack.
.Sh MIB Variables
The algorithm exposes the following tunable variables in the
.Va net.inet.tcp.cc.vegas
branch of the
.Xr sysctl 3
MIB:
.Bl -tag -width ".Va alpha"
.It Va alpha
Query or set the Vegas alpha parameter as a number of buffers on the path.
When setting alpha, the value must satisfy: 0 < alpha < beta.
Default is 1.
.It Va beta
Query or set the Vegas beta parameter as a number of buffers on the path.
When setting beta, the value must satisfy: 0 < alpha < beta.
Default is 3.
.El
.Sh SEE ALSO
.Xr cc 4 ,
.Xr cc_chd 4 ,
.Xr cc_cubic 4 ,
.Xr cc_hd 4 ,
.Xr cc_htcp 4 ,
.Xr cc_newreno 4 ,
.Xr h_ertt 4 ,
.Xr tcp 4 ,
.Xr cc 9 ,
.Xr khelp 9
.Rs
.%A "L. S. Brakmo"
.%A "L. L. Peterson"
.%T "TCP Vegas: end to end congestion avoidance on a global internet"
.%J "IEEE J. Sel. Areas Commun."
.%D "October 1995"
.%V "13"
.%N "8"
.%P "1465-1480"
.Re
.Sh ACKNOWLEDGEMENTS
Development and testing of this software were made possible in part by grants
from the FreeBSD Foundation and Cisco University Research Program Fund at
Community Foundation Silicon Valley.
.Sh HISTORY
The
.Nm
congestion control module first appeared in
.Fx 9.0 .
.Pp
The module was first released in 2010 by David Hayes whilst working on the
NewTCP research project at Swinburne University's Centre for Advanced Internet
Architectures, Melbourne, Australia.
More details are available at:
.Pp
http://caia.swin.edu.au/urp/newtcp/
.Sh AUTHORS
.An -nosplit
The
.Nm
congestion control module and this manual page were written by
.An David Hayes Aq david.hayes@ieee.org .

143
share/man/man4/h_ertt.4 Normal file
View File

@ -0,0 +1,143 @@
.\"
.\" Copyright (c) 2010-2011 The FreeBSD Foundation
.\" All rights reserved.
.\"
.\" This documentation was written at the Centre for Advanced Internet
.\" Architectures, Swinburne University, Melbourne, Australia by David Hayes
.\" under sponsorship from the FreeBSD Foundation.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
.\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" $FreeBSD$
.\"
.Dd February 15, 2011
.Dt h_ertt 9
.Os
.Sh NAME
.Nm h_ertt
.Nd Enhanced Round Trip Time Khelp module
.Sh SYNOPSIS
.In netinet/khelp/h_ertt.h
.Sh DESCRIPTION
The
.Nm
Khelp module works within the
.Xr khelp 9
framework to provide TCP with a per-connection, low noise estimate of the
instantaneous RTT.
The implementation attempts to be robust in the face of delayed
acknowledgements, TCP Segmentation Offload (TSO), receivers who manipulate TCP
timestamps and lack of the TCP timestamp option altogether.
.Pp
TCP receivers using delayed acknowledgements either acknowledge every second packet
(reflecting the time stamp of the first) or use a timeout to trigger the
acknowledgement if no second packet arrives.
If the heuristic used by
.Nm
determines that the receiver is using delayed acknowledgements, it measures the
RTT using the second packet (the one that triggers the acknowledgement).
It does not measure the RTT if the acknowledgement is for the
first packet, since it cannot be accurately determined.
.Pp
When TSO is in use,
.Nm
will momentarily disable TSO whilst marking a packet to use for a new
measurement.
The process has negligible impact on the connection.
.Pp
.Nm
associates the following struct with each connection's TCP control block:
.Bd -literal
struct ertt {
TAILQ_HEAD(txseginfo_head, txseginfo) txsegi_q; /* Private. */
long bytes_tx_in_rtt; /* Private. */
long bytes_tx_in_marked_rtt;
unsigned long marked_snd_cwnd;
int rtt;
int maxrtt;
int minrtt;
int dlyack_rx; /* Private. */
int timestamp_errors; /* Private. */
int markedpkt_rtt; /* Private. */
uint32_t flags;
};
.Ed
.Pp
The fields marked as private should not be manipulated by any code outside of
the
.Nm
implementation.
The non-private fields provide the following data:
.Bl -tag -width ".Va bytes_tx_in_marked_rtt" -offset indent
.It Va bytes_tx_in_marked_rtt
The number of bytes transmitted in the
.Va markedpkt_rtt .
.It Va marked_snd_cwnd
The value of cwnd for the marked rtt measurement.
.It Va rtt
The most recent RTT measurement.
.It Va maxrtt
The longest RTT measurement that has been taken.
.It Va minrtt
The shortest RTT measurement that has been taken.
.It Va flags
The ERTT_NEW_MEASUREMENT flag will be set by the implementation when a new
measurement is available.
It is the responsibility of
.Nm
consumers to unset the flag if they wish to use it as a notification method for
new measurements.
.El
.Sh SEE ALSO
.Xr cc 4 ,
.Xr cc_chd 4 ,
.Xr cc_hd 4 ,
.Xr cc_vegas 4 ,
.Xr hhook 9 ,
.Xr khelp 9
.Sh ACKNOWLEDGEMENTS
Development and testing of this software were made possible in part by grants
from the FreeBSD Foundation and Cisco University Research Program Fund at
Community Foundation Silicon Valley.
.Sh HISTORY
The
.Nm
module first appeared in
.Fx 9.0 .
.Pp
The module was first released in 2010 by David Hayes whilst working on the
NewTCP research project at Swinburne University's Centre for Advanced Internet
Architectures, Melbourne, Australia.
More details are available at:
.Pp
http://caia.swin.edu.au/urp/newtcp/
.Sh AUTHORS
.An -nosplit
The
.Nm
Khelp module and this manual page were written by
.An David Hayes Aq david.hayes@ieee.org .
.Sh BUGS
The module maintains enhanced RTT estimates for all new TCP connections created
after the time at which the module was loaded.
It might be beneficial to see if it is possible to have the module only affect
connections which actually care about ERTT estimates.

View File

@ -1,5 +1,11 @@
.\" Copyright (c) 1983, 1991, 1993
.\" The Regents of the University of California. All rights reserved.
.\" The Regents of the University of California.
.\" Copyright (c) 2010-2011 The FreeBSD Foundation
.\" All rights reserved.
.\"
.\" Portions of this documentation were written at the Centre for Advanced
.\" Internet Architectures, Swinburne University of Technology, Melbourne,
.\" Australia by David Hayes under sponsorship from the FreeBSD Foundation.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
@ -32,7 +38,7 @@
.\" From: @(#)tcp.4 8.1 (Berkeley) 6/5/93
.\" $FreeBSD$
.\"
.Dd January 8, 2011
.Dd February 15, 2011
.Dt TCP 4
.Os
.Sh NAME
@ -116,7 +122,7 @@ supports a number of socket options which can be set with
.Xr setsockopt 2
and tested with
.Xr getsockopt 2 :
.Bl -tag -width ".Dv TCP_NODELAY"
.Bl -tag -width ".Dv TCP_CONGESTION"
.It Dv TCP_INFO
Information about a socket's underlying TCP session may be retrieved
by passing the read-only option
@ -134,6 +140,12 @@ send window size,
receive window size,
and
bandwidth-controlled window space.
.It Dv TCP_CONGESTION
Select or query the congestion control algorithm that TCP will use for the
connection.
See
.Xr cc 4
for details.
.It Dv TCP_NODELAY
Under most circumstances,
.Tn TCP
@ -231,6 +243,14 @@ see
.Xr ip 4 .
Incoming connection requests that are source-routed are noted,
and the reverse source route is used in responding.
.Pp
The default congestion control algorithm for
.Tn TCP
is
.Xr cc_newreno 4 .
Other congestion control algorithms can be made available using the
.Xr cc 4
framework.
.Ss MIB Variables
The
.Tn TCP
@ -322,11 +342,6 @@ See
Delay ACK to try and piggyback it onto a data packet.
.It Va delacktime
Maximum amount of time, in milliseconds, before a delayed ACK is sent.
.It Va newreno
Enable
.Tn TCP
NewReno Fast Recovery algorithm,
as described in RFC 2582.
.It Va path_mtu_discovery
Enable Path MTU Discovery.
.It Va tcbhashsize
@ -495,6 +510,7 @@ address.
.Xr socket 2 ,
.Xr sysctl 3 ,
.Xr blackhole 4 ,
.Xr cc 4 ,
.Xr inet 4 ,
.Xr intro 4 ,
.Xr ip 4 ,

View File

@ -43,6 +43,7 @@ MAN= accept_filter.9 \
BUS_SETUP_INTR.9 \
bus_space.9 \
byteorder.9 \
cc.9 \
cd.9 \
condvar.9 \
config_intrhook.9 \
@ -122,6 +123,7 @@ MAN= accept_filter.9 \
hash.9 \
hashinit.9 \
hexdump.9 \
hhook.9 \
ieee80211.9 \
ieee80211_amrr.9 \
ieee80211_beacon.9 \
@ -144,6 +146,7 @@ MAN= accept_filter.9 \
KASSERT.9 \
kernacc.9 \
kernel_mount.9 \
khelp.9 \
kobj.9 \
kproc.9 \
kqueue.9 \

333
share/man/man9/cc.9 Normal file
View File

@ -0,0 +1,333 @@
.\"
.\" Copyright (c) 2008-2009 Lawrence Stewart <lstewart@FreeBSD.org>
.\" Copyright (c) 2010-2011 The FreeBSD Foundation
.\" All rights reserved.
.\"
.\" Portions of this documentation were written at the Centre for Advanced
.\" Internet Architectures, Swinburne University, Melbourne, Australia by
.\" David Hayes and Lawrence Stewart under sponsorship from the
.\" FreeBSD Foundation.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
.\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" $FreeBSD$
.\"
.Dd February 15, 2011
.Dt CC 9
.Os
.Sh NAME
.Nm cc ,
.Nm DECLARE_CC_MODULE ,
.Nm CC_VAR
.Nd Modular Congestion Control
.Sh SYNOPSIS
.In netinet/cc.h
.In netinet/cc/cc_module.h
.Fn DECLARE_CC_MODULE "ccname" "ccalgo"
.Fn CC_VAR "ccv" "what"
.Sh DESCRIPTION
The
.Nm
framework allows congestion control algorithms to be implemented as dynamically
loadable kernel modules via the
.Xr kld 4
facility.
Transport protocols can select from the list of available algorithms on a
connection-by-connection basis, or use the system default (see
.Xr cc 4
for more details).
.Pp
.Nm
modules are identified by an
.Xr ascii 7
name and set of hook functions encapsulated in a
.Vt "struct cc_algo" ,
which has the following members:
.Bd -literal -offset indent
struct cc_algo {
char name[TCP_CA_NAME_MAX];
int (*mod_init) (void);
int (*mod_destroy) (void);
int (*cb_init) (struct cc_var *ccv);
void (*cb_destroy) (struct cc_var *ccv);
void (*conn_init) (struct cc_var *ccv);
void (*ack_received) (struct cc_var *ccv, uint16_t type);
void (*cong_signal) (struct cc_var *ccv, uint32_t type);
void (*post_recovery) (struct cc_var *ccv);
void (*after_idle) (struct cc_var *ccv);
};
.Ed
.Pp
The
.Va name
field identifies the unique name of the algorithm, and should be no longer than
TCP_CA_NAME_MAX-1 characters in length (the TCP_CA_NAME_MAX define lives in
.In netinet/tcp.h
for compatibility reasons).
.Pp
The
.Va mod_init
function is called when a new module is loaded into the system but before the
registration process is complete.
It should be implemented if a module needs to set up some global state prior to
being available for use by new connections.
Returning a non-zero value from
.Va mod_init
will cause the loading of the module to fail.
.Pp
The
.Va mod_destroy
function is called prior to unloading an existing module from the kernel.
It should be implemented if a module needs to clean up any global state before
being removed from the kernel.
The return value is currently ignored.
.Pp
The
.Va cb_init
function is called when a TCP control block
.Vt struct tcpcb
is created.
It should be implemented if a module needs to allocate memory for storing
private per-connection state.
Returning a non-zero value from
.Va cb_init
will cause the connection set up to be aborted, terminating the connection as a
result.
.Pp
The
.Va cb_destroy
function is called when a TCP control block
.Vt struct tcpcb
is destroyed.
It should be implemented if a module needs to free memory allocated in
.Va cb_init .
.Pp
The
.Va conn_init
function is called when a new connection has been established and variables are
being initialised.
It should be implemented to initialise congestion control algorithm variables
for the newly established connection.
.Pp
The
.Va ack_received
function is called when a TCP acknowledgement (ACK) packet is received.
Modules use the
.Fa type
argument as an input to their congestion management algorithms.
The ACK types currently reported by the stack are CC_ACK and CC_DUPACK.
CC_ACK indicates the received ACK acknowledges previously unacknowledged data.
CC_DUPACK indicates the received ACK acknowledges data we have already received
an ACK for.
.Pp
The
.Va cong_signal
function is called when a congestion event is detected by the TCP stack.
Modules use the
.Fa type
argument as an input to their congestion management algorithms.
The congestion event types currently reported by the stack are CC_ECN, CC_RTO,
CC_RTO_ERR and CC_NDUPACK.
CC_ECN is reported when the TCP stack receives an explicit congestion notification
(RFC3168).
CC_RTO is reported when the retransmission time out timer fires.
CC_RTO_ERR is reported if the retransmission time out timer fired in error.
CC_NDUPACK is reported if N duplicate ACKs have been received back-to-back,
where N is the fast retransmit duplicate ack threshold (N=3 currently as per
RFC5681).
.Pp
The
.Va post_recovery
function is called after the TCP connection has recovered from a congestion event.
It should be implemented to adjust state as required.
.Pp
The
.Va after_idle
function is called when data transfer resumes after an idle period.
It should be implemented to adjust state as required.
.Pp
The
.Fn DECLARE_CC_MODULE
macro provides a convenient wrapper around the
.Xr DECLARE_MODULE 9
macro, and is used to register a
.Nm
module with the
.Nm
framework.
The
.Fa ccname
argument specifies the module's name.
The
.Fa ccalgo
argument points to the module's
.Vt struct cc_algo .
.Pp
.Nm
modules must instantiate a
.Vt struct cc_algo ,
but are only required to set the name field, and optionally any of the function
pointers.
The stack will skip calling any function pointer which is NULL, so there is no
requirement to implement any of the function pointers.
Using the C99 designated initialiser feature to set fields is encouraged.
.Pp
Each function pointer which deals with congestion control state is passed a
pointer to a
.Vt struct cc_var ,
which has the following members:
.Bd -literal -offset indent
struct cc_var {
void *cc_data;
int bytes_this_ack;
tcp_seq curack;
uint32_t flags;
int type;
union ccv_container {
struct tcpcb *tcp;
struct sctp_nets *sctp;
} ccvc;
};
.Ed
.Pp
.Vt struct cc_var
groups congestion control related variables into a single, embeddable structure
and adds a layer of indirection to accessing transport protocol control blocks.
The eventual goal is to allow a single set of
.Nm
modules to be shared between all congestion aware transport protocols, though
currently only
.Xr tcp 4
is supported.
.Pp
To aid the eventual transition towards this goal, direct use of variables from
the transport protocol's data structures is strongly discouraged.
However, it is inevitable at the current time to require access to some of these
variables, and so the
.Fn CC_VAR
macro exists as a convenience accessor.
The
.Fa ccv
argument points to the
.Vt struct cc_var
passed into the function by the
.Nm
framework.
The
.Fa what
argument specifies the name of the variable to access.
.Pp
Apart from the
.Va type
and
.Va ccv_container
fields, the remaining fields in
.Vt struct cc_var
are for use by
.Nm
modules.
.Pp
The
.Va cc_data
field is available for algorithms requiring additional per-connection state to
attach a dynamic memory pointer to.
The memory should be allocated and attached in the module's
.Va cb_init
hook function.
.Pp
The
.Va bytes_this_ack
field specifies the number of new bytes acknowledged by the most recently
received ACK packet.
It is only valid in the
.Va ack_received
hook function.
.Pp
The
.Va curack
field specifies the sequence number of the most recently received ACK packet.
It is only valid in the
.Va ack_received ,
.Va cong_signal
and
.Va post_recovery
hook functions.
.Pp
The
.Va flags
field is used to pass useful information from the stack to a
.Nm
module.
The CCF_ABC_SENTAWND flag is relevant in
.Va ack_received
and is set when appropriate byte counting (RFC3465) has counted a window's worth
of bytes has been sent.
It is the module's responsibility to clear the flag after it has processed the
signal.
The CCF_CWND_LIMITED flag is relevant in
.Va ack_received
and is set when the connection's ability to send data is currently constrained
by the value of the congestion window.
Algorithms should use the abscence of this flag being set to avoid accumulating
a large difference between the congestion window and send window.
.Sh SEE ALSO
.Xr cc 4 ,
.Xr cc_chd 4 ,
.Xr cc_cubic 4 ,
.Xr cc_hd 4 ,
.Xr cc_htcp 4 ,
.Xr cc_newreno 4 ,
.Xr cc_vegas 4 ,
.Xr tcp 4
.Sh ACKNOWLEDGEMENTS
Development and testing of this software were made possible in part by grants
from the FreeBSD Foundation and Cisco University Research Program Fund at
Community Foundation Silicon Valley.
.Sh FUTURE WORK
Integrate with
.Xr sctp 4 .
.Sh HISTORY
The modular Congestion Control (CC) framework first appeared in
.Fx 9.0 .
.Pp
The framework was first released in 2007 by James Healy and Lawrence Stewart
whilst working on the NewTCP research project at Swinburne University's Centre
for Advanced Internet Architectures, Melbourne, Australia, which was made
possible in part by a grant from the Cisco University Research Program Fund at
Community Foundation Silicon Valley.
More details are available at:
.Pp
http://caia.swin.edu.au/urp/newtcp/
.Sh AUTHORS
.An -nosplit
The
.Nm
framework was written by
.An Lawrence Stewart Aq lstewart@FreeBSD.org ,
.An James Healy Aq jimmy@deefa.com
and
.An David Hayes Aq david.hayes@ieee.org .
.Pp
This manual page was written by
.An David Hayes Aq david.hayes@ieee.org
and
.An Lawrence Stewart Aq lstewart@FreeBSD.org .

387
share/man/man9/hhook.9 Normal file
View File

@ -0,0 +1,387 @@
.\"
.\" Copyright (c) 2010-2011 The FreeBSD Foundation
.\" All rights reserved.
.\"
.\" This documentation was written at the Centre for Advanced Internet
.\" Architectures, Swinburne University, Melbourne, Australia by David Hayes and
.\" Lawrence Stewart under sponsorship from the FreeBSD Foundation.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
.\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" $FreeBSD$
.\"
.Dd February 15, 2011
.Dt hhook 9
.Os
.Sh NAME
.Nm hhook ,
.Nm hhook_head_register ,
.Nm hhook_head_deregister ,
.Nm hhook_head_deregister_lookup ,
.Nm hhook_run_hooks ,
.Nm HHOOKS_RUN_IF ,
.Nm HHOOKS_RUN_LOOKUP_IF
.Nd Helper Hook Framework
.Sh SYNOPSIS
.In sys/hhook.h
.Ft typedef int
.Fn "\*(lp*hhook_func_t\*(rp" "int32_t hhook_type" "int32_t hhook_id" \
"void *udata" "void *ctx_data" "void *hdata" "struct osd *hosd"
.Fn "int hhook_head_register" "int32_t hhook_type" "int32_t hhook_id" \
"struct hhook_head **hhh" "uint32_t flags"
.Fn "int hhook_head_deregister" "struct hhook_head *hhh"
.Fn "int hhook_head_deregister_lookup" "int32_t hhook_type" "int32_t hhook_id"
.Fn "void hhook_run_hooks" "struct hhook_head *hhh" "void *ctx_data" \
"struct osd *hosd"
.Fn HHOOKS_RUN_IF "hhh" "ctx_data" "hosd"
.Fn HHOOKS_RUN_LOOKUP_IF "hhook_type" "hhook_id" "ctx_data" "hosd"
.Sh DESCRIPTION
.Nm
provides a framework for managing and running arbitrary hook functions at
defined hook points within the kernel.
The KPI was inspired by
.Xr pfil 9 ,
and in many respects can be thought of as a more generic superset of pfil.
.Pp
The
.Xr khelp 9
and
.Nm
frameworks are tightly integrated.
Khelp is responsible for registering and deregistering Khelp module hook
functions with
.Nm
points.
The KPI functions used by
.Xr khelp 9
to do this are not documented here as they are not relevant to consumers wishing
to instantiate hook points.
.Ss Information for Khelp Module Implementors
Khelp modules indirectly interact with
.Nm
by defining appropriate hook functions for insertion into hook points.
Hook functions must conform to the
.Ft hhook_func_t
function pointer declaration
outlined in the
.Sx SYNOPSIS .
.Pp
The
.Fa hhook_type
and
.Fa hhook_id
arguments identify the hook point which has called into the hook function.
These are useful when a single hook function is registered for multiple hook
points and wants to know which hook point has called into it.
.In sys/hhook.h
lists available
.Fa hhook_type
defines and subsystems which export hook points are responsible for defining
the
.Fa hhook_id
value in appropriate header files.
.Pp
The
.Fa udata
argument will be passed to the hook function if it was specified in the
.Vt struct hookinfo
at hook registration time.
.Pp
The
.Fa ctx_data
argument contains context specific data from the hook point call site.
The data type passed is subsystem dependent.
.Pp
The
.Fa hdata
argument is a pointer to the persistent per-object storage allocated for use by
the module if required.
The pointer will only ever be NULL if the module did not request per-object
storage.
.Pp
The
.Fa hosd
argument can be used with the
.Xr khelp 9
framework's
.Fn khelp_get_osd
function to access data belonging to a different Khelp module.
.Pp
Khelp modules instruct the Khelp framework to register their hook functions with
.Nm
points by creating a
.Vt "struct hookinfo"
per hook point, which contains the following members:
.Bd -literal -offset indent
struct hookinfo {
hhook_func_t hook_func;
struct helper *hook_helper;
void *hook_udata;
int32_t hook_id;
int32_t hook_type;
};
.Ed
.Pp
Khelp modules are responsible for setting all members of the struct except
.Va hook_helper
which is handled by the Khelp framework.
.Ss Creating and Managing Hook Points
Kernel subsystems that wish to provide
.Nm
points typically need to make four and possibly five key changes to their
implementation:
.Bl -bullet
.It
Define a list of
.Va hhook_id
mappings in an appropriate subsystem header.
.It
Register each hook point with the
.Fn hhook_head_register
function during initialisation of the subsystem.
.It
Select or create a standardised data type to pass to hook functions as
contextual data.
.It
Add a call to
.Fn HHOOKS_RUN_IF
or
.Fn HHOOKS_RUN_IF_LOOKUP
at the point in the subsystem's code where the hook point should be executed.
.It
If the subsystem can be dynamically added/removed at runtime, each hook
point registered with the
.Fn hhook_head_register
function when the subsystem was initialised needs to be deregistered with the
.Fn hhook_head_deregister
or
.Fn hhook_head_deregister_lookup
functions when the subsystem is being deinitialised prior to removal.
.El
.Pp
The
.Fn hhook_head_register
function registers a hook point with the
.Nm
framework.
The
.Fa hook_type
argument defines the high level type for the hook point.
Valid types are defined in
.In sys/hhook.h
and new types should be added as required.
The
.Fa hook_id
argument specifies a unique, subsystem specific identifier for the hook point.
The
.Fa hhh
argument will, if not NULL, be used to store a reference to the
.Vt struct hhook_head
created as part of the registration process.
Subsystems will generally want to store a local copy of the
.Vt struct hhook_head
so that they can use the
.Fn HHOOKS_RUN_IF
macro to instantiate hook points.
The HHOOK_WAITOK flag may be passed in via the
.Fa flags
argument if
.Xr malloc 9
is allowed to sleep waiting for memory to become available.
If the hook point is within a virtualised subsystem (e.g. the network stack),
the HHOOK_HEADISINVNET flag should be passed in via the
.Fa flags
argument so that the
.Vt struct hhook_head
created during the registration process will be added to a virtualised list.
.Pp
The
.Fn hhook_head_deregister
function deregisters a previously registered hook point from the
.Nm
framework.
The
.Fa hhh
argument is the pointer to the
.Vt struct hhook_head
returned by
.Fn hhoook_head_register
when the hook point was registered.
.Pp
The
.Fn hhook_head_deregister_lookup
function can be used instead of
.Fn hhook_head_deregister
in situations where the caller does not have a cached copy of the
.Vt struct hhook_head
and wants to deregister a hook point using the appropriate
.Fa hook_type
and
.Fa hook_id
identifiers instead.
.Pp
The
.Fn hhook_run_hooks
function should normally not be called directly and should instead be called
indirectly via the
.Fn HHOOKS_RUN_IF
macro.
However, there may be circumstances where it is preferable to call the function
directly, and so it is documented here for completeness.
The
.Fa hhh
argument references the
.Nm
point to call all registered hook functions for.
The
.Fa ctx_data
argument specifies a pointer to the contextual hook point data to pass into the
hook functions.
The
.Fa hosd
argument should be the pointer to the appropriate object's
.Vt struct osd
if the subsystem provides the ability for Khelp modules to associate per-object
data.
Subsystems which do not should pass NULL.
.Pp
The
.Fn HHOOKS_RUN_IF
macro is the preferred way to implement hook points.
It only calls the
.Fn hhook_run_hooks
function if at least one hook function is registered for the hook point.
By checking for registered hook functions, the macro minimises the cost
associated with adding hook points to frequently used code paths by reducing to
a simple if test in the common case where no hook functions are registered.
The arguments are as described for the
.Fn hhook_run_hooks
function.
.Pp
The
.Fn HHOOKS_RUN_IF_LOOKUP
macro performs the same function as the
.Fn HHOOKS_RUN_IF
macro, but performs an additional step to look up the
.Vt struct hhook_head
for the specified
.Fa hook_type
and
.Fa hook_id
identifiers.
It should not be used except in code paths which are infrequently executed
because of the reference counting overhead associated with the look up.
.Sh IMPLEMENTATION NOTES
Each
.Vt struct hhook_head
protects its internal list of hook functions with a
.Xr rmlock 9 .
Therefore, anytime
.Fn hhook_run_hooks
is called directly or indirectly via the
.Fn HHOOKS_RUN_IF
or
.Fn HHOOKS_RUN_IF_LOOKUP
macros, a non-sleepable read lock will be acquired and held across the calls to
all registered hook functions.
.Sh RETURN VALUES
.Fn hhook_head_register
returns 0 if no errors occurred.
It returns EEXIST if a hook point with the same
.Fa hook_type
and
.Fa hook_id
is already registered.
It returns EINVAL if the HHOOK_HEADISINVNET flag is not set in
.Fa flags
because the implementation does not yet support hook points in non-virtualised
subsystems (see the
.Sx BUGS
section for details).
It returns ENOMEM if
.Xr malloc 9
failed to allocate memory for the new
.Vt struct hhook_head .
.Pp
.Fn hhook_head_deregister
and
.Fn hhook_head_deregister_lookup
return 0 if no errors occurred.
They return ENOENT if
.Fa hhh
is NULL.
They return EBUSY if the reference count of
.Fa hhh
is greater than one.
.Sh EXAMPLES
A well commented example Khelp module can be found at:
.Pa /usr/share/examples/kld/khelp/h_example.c
.Pp
The
.Xr tcp 4
implementation provides two
.Nm
points which are called for packets sent/received when a connection is in the
established phase.
Search for HHOOK in the following files:
.Pa sys/netinet/tcp_var.h ,
.Pa sys/netinet/tcp_input.c ,
.Pa sys/netinet/tcp_output.c
and
.Pa sys/netinet/tcp_subr.c .
.Sh SEE ALSO
.Xr khelp 9
.Sh ACKNOWLEDGEMENTS
Development and testing of this software were made possible in part by grants
from the FreeBSD Foundation and Cisco University Research Program Fund at
Community Foundation Silicon Valley.
.Sh HISTORY
The
.Nm
framework first appeared in
.Fx 9.0 .
.Pp
The
.Nm
framework was first released in 2010 by Lawrence Stewart whilst studying at
Swinburne University's Centre for Advanced Internet Architectures, Melbourne,
Australia.
More details are available at:
.Pp
http://caia.swin.edu.au/urp/newtcp/
.Sh AUTHORS
.An -nosplit
The
.Nm
framework was written by
.An Lawrence Stewart Aq lstewart@FreeBSD.org .
.Pp
This manual page was written by
.An David Hayes Aq david.hayes@ieee.org
and
.An Lawrence Stewart Aq lstewart@FreeBSD.org .
.Sh BUGS
The framework does not currently support registering hook points in subsystems
which have not been virtualised with VIMAGE.
Fairly minimal internal changes to the
.Nm
implementation are required to address this.

437
share/man/man9/khelp.9 Normal file
View File

@ -0,0 +1,437 @@
.\"
.\" Copyright (c) 2010-2011 The FreeBSD Foundation
.\" All rights reserved.
.\"
.\" This documentation was written at the Centre for Advanced Internet
.\" Architectures, Swinburne University, Melbourne, Australia by David Hayes and
.\" Lawrence Stewart under sponsorship from the FreeBSD Foundation.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
.\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" $FreeBSD$
.\"
.Dd February 15, 2011
.Dt khelp 9
.Os
.Sh NAME
.Nm khelp ,
.Nm khelp_init_osd ,
.Nm khelp_destroy_osd ,
.Nm khelp_get_id ,
.Nm khelp_get_osd ,
.Nm khelp_add_hhook ,
.Nm khelp_remove_hhook ,
.Nm KHELP_DECLARE_MOD ,
.Nm KHELP_DECLARE_MOD_UMA
.Nd Kernel Helper Framework
.Sh SYNOPSIS
.In sys/khelp.h
.In sys/module_khelp.h
.Fn "int khelp_init_osd" "uint32_t classes" "struct osd *hosd"
.Fn "int khelp_destroy_osd" "struct osd *hosd"
.Fn "int32_t khelp_get_id" "char *hname"
.Fn "void * khelp_get_osd" "struct osd *hosd" "int32_t id"
.Fn "int khelp_add_hhook" "struct hookinfo *hki" "uint32_t flags"
.Fn "int khelp_remove_hhook" "struct hookinfo *hki"
.Fn KHELP_DECLARE_MOD "hname" "hdata" "hhooks" "version"
.Fn KHELP_DECLARE_MOD_UMA "hname" "hdata" "hhooks" "version" "ctor" "dtor"
.Sh DESCRIPTION
.Nm
provides a framework for managing
.Nm
modules, which indirectly use the
.Xr hhook 9
KPI to register their hook functions with hook points of interest within the
kernel.
Khelp modules aim to provide a structured way to dynamically extend the kernel
at runtime in an ABI preserving manner.
Depending on the subsystem providing hook points, a
.Nm
module may be able to associate per-object data for maintaining relevant state
between hook calls.
The
.Xr hhook 9
and
.Nm
frameworks are tightly integrated and anyone interested in
.Nm
should also read the
.Xr hhook 9
manual page thoroughly.
.Ss Information for Khelp Module Implementors
.Nm
modules are represented within the
.Nm
framework by a
.Vt struct helper
which has the following members:
.Bd -literal -offset indent
struct helper {
int (*mod_init) (void);
int (*mod_destroy) (void);
#define HELPER_NAME_MAXLEN 16
char h_name[HELPER_NAME_MAXLEN];
uma_zone_t h_zone;
struct hookinfo *h_hooks;
uint32_t h_nhooks;
uint32_t h_classes;
int32_t h_id;
volatile uint32_t h_refcount;
uint16_t h_flags;
TAILQ_ENTRY(helper) h_next;
};
.Ed
.Pp
Modules must instantiate a
.Vt struct helper ,
but are only required to set the
.Va h_classes
field, and may optionally set the
.Va h_flags ,
.Va mod_init
and
.Va mod_destroy
fields where required.
The framework takes care of all other fields and modules should refrain from
manipulating them.
Using the C99 designated initialiser feature to set fields is encouraged.
.Pp
If specified, the
.Va mod_init
function will be run by the
.Nm
framework prior to completing the registration process.
Returning a non-zero value from the
.Va mod_init
function will abort the registration process and fail to load the module.
If specified, the
.Va mod_destroy
function will be run by the
.Nm
framework during the deregistration process, after the module has been
deregistered by the
.Nm
framework.
The return value is currently ignored.
Valid
.Nm
classes are defined in
.In sys/khelp.h .
Valid flags are defined in
.In sys/module_khelp.h .
The HELPER_NEEDS_OSD flag should be set in the
.Va h_flags
field if the
.Nm
module requires persistent per-object data storage.
There is no programmatic way (yet) to check if a
.Nm
class provides the ability for
.Nm
modules to associate persistent per-object data, so a manual check is required.
.Pp
The
.Fn KHELP_DECLARE_MOD
and
.Fn KHELP_DECLARE_MOD_UMA
macros provide convenient wrappers around the
.Xr DECLARE_MODULE 9
macro, and are used to register a
.Nm
module with the
.Nm
framework.
.Fn KHELP_DECLARE_MOD_UMA
should only be used by modules which require the use of persistent per-object
storage i.e. modules which set the HELPER_NEEDS_OSD flag in their
.Vt struct helper Ns 's
.Va h_flags
field.
.Pp
The first four arguments common to both macros are as follows.
The
.Fa hname
argument specifies the unique
.Xr ascii 7
name for the
.Nm
module.
It should be no longer than HELPER_NAME_MAXLEN-1 characters in length.
The
.Fa hdata
argument is a pointer to the module's
.Vt struct helper .
The
.Fa hhooks
argument points to a static array of
.Vt struct hookinfo
structures.
The array should contain a
.Vt struct hookinfo
for each
.Xr hhook 9
point the module wishes to hook, even when using the same hook function multiple
times for different
.Xr hhook 9
points.
The
.Fa version
argument specifies a version number for the module which will be passed to
.Xr MODULE_VERSION 9 .
The
.Fn KHELP_DECLARE_MOD_UMA
macro takes the additional
.Fa ctor
and
.Fa dtor
arguments, which specify optional
.Xr uma 9
constructor and destructor functions.
NULL should be passed where the functionality is not required.
.Pp
The
.Fn khelp_get_id
function returns the numeric identifier for the
.Nm
module with name
.Fa hname .
.Pp
The
.Fn khelp_get_osd
function is used to obtain the per-object data pointer for a specified
.Nm
module.
The
.Fa hosd
argument is a pointer to the underlying subsystem object's
.Vt struct osd .
This is provided by the
.Xr hhook 9
framework when calling into a
.Nm
module's hook function.
The
.Fa id
argument specifies the numeric identifier for the
.Nm
module to extract the data pointer from
.Fa hosd
for.
The
.Fa id
is obtained using the
.Fn khelp_get_id
function.
.Pp
The
.Fn khelp_add_hhook
and
.Fn khelp_remove_hhook
functions allow a
.Nm
module to dynamically hook/unhook
.Xr hhook 9
points at run time.
The
.Fa hki
argument specifies a pointer to a
.Vt struct hookinfo
which encapsulates the required information about the
.Xr hhook 9
point and hook function being manipulated.
The HHOOK_WAITOK flag may be passed in via the
.Fa flags
argument of
.Fn khelp_add_hhook
if
.Xr malloc 9
is allowed to sleep waiting for memory to become available.
.Ss Integrating Khelp Into a Kernel Subsystem
Most of the work required to allow
.Nm
modules to do useful things relates to defining and instantiating suitable
.Xr hhook 9
points for
.Nm
modules to hook into.
The only additional decision a subsystem needs to make is whether it wants to
allow
.Nm
modules to associate persistent per-object data.
Providing support for persistent data storage can allow
.Nm
modules to perform more complex functionality which may be desirable.
Subsystems which want to allow Khelp modules to associate
persistent per-object data with one of the subsystem's data structures need to
make the following two key changes:
.Bl -bullet
.It
Embed a
.Vt struct osd
pointer in the structure definition for the object.
.It
Add calls to
.Fn khelp_init_osd
and
.Fn khelp_destroy_osd
to the subsystem code paths which are responsible for respectively initialising
and destroying the object.
.El
.Pp
The
.Fn khelp_init_osd
function initialises the per-object data storage for all currently loaded
.Nm
modules of appropriate classes which have set the HELPER_NEEDS_OSD flag in their
.Va h_flags
field.
The
.Fa classes
argument specifies a bitmask of
.Nm
classes which this subsystem associates with.
If a
.Nm
module matches any of the classes in the bitmask, that module will be associated
with the object.
The
.Fa hosd
argument specifies the pointer to the object's
.Vt struct osd
which will be used to provide the persistent storage for use by
.Nm
modules.
.Pp
The
.Fn khelp_destroy_osd
function frees all memory that was associated with an object's
.Vt struct osd
by a previous call to
.Fn khelp_init_osd .
The
.Fa hosd
argument specifies the pointer to the object's
.Vt struct osd
which will be purged in preparation for destruction.
.Sh IMPLEMENTATION NOTES
.Nm
modules are protected from being prematurely unloaded by a reference count.
The count is incremented each time a subsystem calls
.Fn khelp_init_osd
causing persistent storage to be allocated for the module, and decremented for
each corresponding call to
.Fn khelp_destroy_osd .
Only when a module's reference count has dropped to zero can the module be
unloaded.
.Sh RETURN VALUES
The
.Fn khelp_init_osd
function returns zero if no errors occurred.
It returns ENOMEM if a
.Nm
module which requires per-object storage fails to allocate the necessary memory.
.Pp
The
.Fn khelp_destroy_osd
function only returns zero to indicate that no errors occurred.
.Pp
The
.Fn khelp_get_id
function returns the unique numeric identifier for the registered
.Nm
module with name
.Fa hname .
It return -1 if no module with the specified name is currently registered.
.Pp
The
.Fn khelp_get_osd
function returns the pointer to the
.Nm
module's persistent object storage memory.
If the module identified by
.Fa id
does not have persistent object storage registered with the object's
.Fa hosd
.Vt struct osd ,
NULL is returned.
.Pp
The
.Fn khelp_add_hhook
function returns zero if no errors occurred.
It returns ENOENT if it could not find the requested
.Xr hhook 9
point.
It returns ENOMEM if
.Xr malloc 9
failed to allocate memory.
It returns EEXIST if attempting to register the same hook function more than
once for the same
.Xr hhook 9
point.
.Pp
The
.Fn khelp_remove_hhook
function returns zero if no errors occurred.
It returns ENOENT if it could not find the requested
.Xr hhook 9
point.
.Sh EXAMPLES
A well commented example Khelp module can be found at:
.Pa /usr/share/examples/kld/khelp/h_example.c
.Pp
The Enhanced Round Trip Time (ERTT)
.Xr h_ertt 4
.Nm
module provides a more complex example of what is possible.
.Sh SEE ALSO
.Xr h_ertt 4 ,
.Xr hhook 9 ,
.Xr osd 9
.Sh ACKNOWLEDGEMENTS
Development and testing of this software were made possible in part by grants
from the FreeBSD Foundation and Cisco University Research Program Fund at
Community Foundation Silicon Valley.
.Sh HISTORY
The
.Nm
kernel helper framework first appeared in
.Fx 9.0 .
.Pp
The
.Nm
framework was first released in 2010 by Lawrence Stewart whilst studying at
Swinburne University's Centre for Advanced Internet Architectures, Melbourne,
Australia.
More details are available at:
.Pp
http://caia.swin.edu.au/urp/newtcp/
.Sh AUTHORS
.An -nosplit
The
.Nm
framework was written by
.An Lawrence Stewart Aq lstewart@FreeBSD.org .
.Pp
This manual page was written by
.An David Hayes Aq david.hayes@ieee.org
and
.An Lawrence Stewart Aq lstewart@FreeBSD.org .