Luigi was polled for additional documentation about polling(4).
This commit is contained in:
parent
05cfdd0995
commit
07742bd14e
@ -1,7 +1,30 @@
|
||||
.\" Copyright (c) 2002 Luigi Rizzo
|
||||
.\" All rights reserved.
|
||||
.\"
|
||||
.\" Redistribution and use in source and binary forms, with or without
|
||||
.\" modification, are permitted provided that the following conditions
|
||||
.\" are met:
|
||||
.\" 1. Redistributions of source code must retain the above copyright
|
||||
.\" notice, this list of conditions and the following disclaimer.
|
||||
.\" 2. Redistributions in binary form must reproduce the above copyright
|
||||
.\" notice, this list of conditions and the following disclaimer in the
|
||||
.\" documentation and/or other materials provided with the distribution.
|
||||
.\"
|
||||
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
|
||||
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
|
||||
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
||||
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
||||
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
||||
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
||||
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
.\" SUCH DAMAGE.
|
||||
.\"
|
||||
.\" $FreeBSD$
|
||||
.\"
|
||||
.Dd February 15, 2002
|
||||
.Dd March 6, 2004
|
||||
.Dt POLLING 4
|
||||
.Os
|
||||
.Sh NAME
|
||||
@ -11,8 +34,9 @@
|
||||
.Cd "options DEVICE_POLLING"
|
||||
.Cd "options HZ=1000"
|
||||
.Sh DESCRIPTION
|
||||
.Dq "Device polling"
|
||||
(polling for brevity) refers to a technique to
|
||||
Device polling
|
||||
.Nm (
|
||||
for brevity) refers to a technique to
|
||||
handle devices that does not rely on the latter to generate
|
||||
interrupts when they need attention, but rather lets the CPU poll
|
||||
devices to service their needs.
|
||||
@ -21,7 +45,7 @@ properly,
|
||||
.Nm
|
||||
gives more control to the operating system on
|
||||
when and how to handle devices, with a number of advantages in terms
|
||||
of system responsivity and performance.
|
||||
of system responsiveness and performance.
|
||||
.Pp
|
||||
In particular,
|
||||
.Nm
|
||||
@ -30,7 +54,7 @@ switches which is incurred when servicing interrupts, and
|
||||
gives more control on the scheduling of the CPU between various
|
||||
tasks (user processes, software interrupts, device handling)
|
||||
which ultimately reduces the chances of livelock in the system.
|
||||
.Sh PRINCIPLES OF OPERATION
|
||||
.Ss Principles of Operation
|
||||
In the normal, interrupt-based mode, devices generate an interrupt
|
||||
whenever they need attention.
|
||||
This in turn causes a
|
||||
@ -41,12 +65,11 @@ unless the device driver has been programmed with real-time
|
||||
concerns in mind (which is generally not the case for
|
||||
.Fx
|
||||
drivers).
|
||||
Furthermore, under heavy traffic, the system might be
|
||||
Furthermore, under heavy traffic load, the system might be
|
||||
persistently processing interrupts without being able to
|
||||
complete other work, either in the kernel or in userland.
|
||||
.Pp
|
||||
.Nm Polling
|
||||
disables interrupts by polling devices at appropriate
|
||||
Device polling disables interrupts by polling devices at appropriate
|
||||
times, i.e., on clock interrupts, system calls and within the idle loop.
|
||||
This way, the context switch overhead is removed.
|
||||
Furthermore,
|
||||
@ -54,39 +77,107 @@ the operating system can control accurately how much work to spend
|
||||
in handling device events, and thus prevent livelock by reserving
|
||||
some amount of CPU to other tasks.
|
||||
.Pp
|
||||
.Nm Polling
|
||||
is enabled with a
|
||||
.Xr sysctl 8
|
||||
variable
|
||||
.Va kern.polling.enable
|
||||
whereas the percentage of CPU cycles reserved to userland processes is
|
||||
controlled by the
|
||||
.Xr sysctl 8
|
||||
variable
|
||||
.Va kern.polling.user_frac
|
||||
whose range is 0 to 100 (50 is the default value).
|
||||
.Pp
|
||||
When
|
||||
.Nm
|
||||
is enabled, and provided that there is work to do,
|
||||
up to
|
||||
.Va kern.polling.user_frac
|
||||
percent of the CPU cycles is reserved to userland tasks, the
|
||||
remaining fraction being available for device processing.
|
||||
.Pp
|
||||
Enabling
|
||||
.Nm
|
||||
also changes the way network software interrupts
|
||||
also changes the way software network interrupts
|
||||
are scheduled, so there is never the risk of livelock because
|
||||
packets are not processed to completion.
|
||||
.Ss MIB Variables
|
||||
The operation of
|
||||
.Nm
|
||||
is controlled by the following
|
||||
.Xr sysctl 8
|
||||
MIB variables:
|
||||
.Pp
|
||||
There are other variables which control or monitor the behaviour
|
||||
of devices operating in polling mode, but they are unlikely to
|
||||
require modifications, and are documented in the source file
|
||||
.Pa sys/kern/kern_poll.c .
|
||||
.Bl -tag -width indent -compact
|
||||
.It Va kern.polling.enable
|
||||
If set to non-zero,
|
||||
.Nm
|
||||
is enabled.
|
||||
Default is disabled.
|
||||
.Pp
|
||||
.It Va kern.polling.user_frac
|
||||
When
|
||||
.Nm
|
||||
is enabled, and provided that there is some work to do,
|
||||
up to this percent of the CPU cycles is reserved to userland tasks,
|
||||
the remaining fraction being available for
|
||||
.Nm
|
||||
processing.
|
||||
Default is 50.
|
||||
.Pp
|
||||
.It Va kern.polling.burst
|
||||
Maximum number of packets grabbed from each network interface in
|
||||
each timer tick.
|
||||
This number is dynamically adjusted by the kernel,
|
||||
according to the programmed
|
||||
.Va user_frac , burst_max ,
|
||||
CPU speed, and system load.
|
||||
.Pp
|
||||
.It Va kern.polling.each_burst
|
||||
The burst above is split into smaller chunks of this number of
|
||||
packets, going round-robin among all interfaces registered for
|
||||
.Nm .
|
||||
This prevents the case that a large burst from a single interface
|
||||
can saturate the IP interrupt queue
|
||||
.Pq Va net.inet.ip.intr_queue_maxlen .
|
||||
Default is 5.
|
||||
.Pp
|
||||
.It Va kern.polling.burst_max
|
||||
Upper bound for
|
||||
.Va kern.polling.burst .
|
||||
Note that when
|
||||
.Nm
|
||||
is enabled, each interface can receive at most
|
||||
.Pq Va HZ No * Va burst_max
|
||||
packets per second unless there are spare CPU cycles available for
|
||||
.Nm
|
||||
in the idle loop.
|
||||
This number should be tuned to match the expected load
|
||||
(which can be quite high with GigE cards).
|
||||
Default is 150 which is adequate for 100Mbit network and HZ=1000.
|
||||
.Pp
|
||||
.It Va kern.polling.idle_poll
|
||||
Controls if
|
||||
.Nm
|
||||
is enabled in the idle loop.
|
||||
There are no reasons (other than power saving or bugs in the scheduler's
|
||||
handling of idle priority kernel threads) to disable this.
|
||||
Note that -CURRENT apparently has some problems in this respect now,
|
||||
so default is disabled.
|
||||
.Pp
|
||||
.It Va kern.polling.poll_in_trap
|
||||
Controls if
|
||||
.Nm
|
||||
is enabled during hardware traps.
|
||||
Enabling this can be useful to improve the network responsiveness
|
||||
of boxes with 100% CPU usage.
|
||||
Default is disabled.
|
||||
.Pp
|
||||
.It Va kern.polling.reg_frac
|
||||
Controls how often (every
|
||||
.Va reg_frac No / Va HZ
|
||||
seconds) the status registers of the device are checked for error
|
||||
conditions and the like.
|
||||
Increasing this value reduces the load on the bus, but also delays
|
||||
the error detection.
|
||||
Default is 20.
|
||||
.Pp
|
||||
.It Va kern.polling.handlers
|
||||
How many active devices have registered for
|
||||
.Nm .
|
||||
.Pp
|
||||
.It Va kern.polling.short_ticks
|
||||
.It Va kern.polling.lost_polls
|
||||
.It Va kern.polling.pending_polls
|
||||
.It Va kern.polling.residual_burst
|
||||
.It Va kern.polling.phase
|
||||
.It Va kern.polling.suspect
|
||||
.It Va kern.polling.stalled
|
||||
Debugging variables.
|
||||
.El
|
||||
.Sh SUPPORTED DEVICES
|
||||
.Nm Polling
|
||||
requires explicit modifications to the device drivers.
|
||||
Device polling requires explicit modifications to the device drivers.
|
||||
As of this writing, the
|
||||
.Xr dc 4 ,
|
||||
.Xr em 4 ,
|
||||
@ -97,21 +188,26 @@ As of this writing, the
|
||||
.Xr rl 4 ,
|
||||
and
|
||||
.Xr sis 4
|
||||
devices are supported, with other in the works.
|
||||
devices are supported, with others in the works.
|
||||
The modifications are rather straightforward, consisting in
|
||||
the extraction of the inner part of the interrupt service routine
|
||||
and writing a callback function,
|
||||
.Fn *_poll ,
|
||||
which is invoked
|
||||
to probe the device for events and process them.
|
||||
See the
|
||||
(See the
|
||||
conditionally compiled sections of the devices mentioned above
|
||||
for more details.
|
||||
for more details.)
|
||||
.Pp
|
||||
As in the worst case devices are only polled on
|
||||
As in the worst case the devices are only polled on
|
||||
clock interrupts, in order to reduce the latency in processing
|
||||
packets, it is advisable to increase the frequency of the clock
|
||||
to at least 1000 HZ.
|
||||
.Sh HISTORY
|
||||
Device polling was introduced in February 2002 by
|
||||
Device polling first appeared in
|
||||
.Fx 4.6
|
||||
and
|
||||
.Fx 5.0 .
|
||||
.Sh AUTHORS
|
||||
Device polling was written by
|
||||
.An Luigi Rizzo Aq luigi@iet.unipi.it .
|
||||
|
Loading…
x
Reference in New Issue
Block a user