freebsd-dev/share/man/man4/divert.4
Gleb Smirnoff f1fb051716 divert(4): maintain own cb database and stop using inpcb KPI
Here go cons of using inpcb for divert:
- divert(4) uses only 16 bits (local port) out of struct inpcb,
  which is 424 bytes today.
- The inpcb KPI isn't able to provide hashing for divert(4),
  thus it uses global inpcb list for lookups.
- divert(4) uses INET-specific part of the KPI, making INET
  a requirement for IPDIVERT.

Maintain our own very simple hash lookup database instead.  It
has mutex protection for write and epoch protection for lookups.
Since now so->so_pcb no longer points to struct inpcb, don't
initialize protosw methods to methods that belong to PF_INET.
Also, drop support for setting options on a divert socket.  My
review of software in base and ports confirms that this has no
use and unlikely worked before.

Differential revision:	https://reviews.freebsd.org/D36382
2022-08-30 15:09:21 -07:00

200 lines
6.1 KiB
Groff

.\" $FreeBSD$
.\"
.Dd August 30, 2022
.Dt DIVERT 4
.Os
.Sh NAME
.Nm divert
.Nd kernel packet diversion mechanism
.Sh SYNOPSIS
.In sys/types.h
.In sys/socket.h
.In netinet/in.h
.Ft int
.Fn socket PF_DIVERT SOCK_RAW 0
.Pp
To enable support for divert sockets, place the following lines in the
kernel configuration file:
.Bd -ragged -offset indent
.Cd "options IPFIREWALL"
.Cd "options IPDIVERT"
.Ed
.Pp
Alternatively, to load
the driver
as a module at boot time, add the following lines into the
.Xr loader.conf 5
file:
.Bd -literal -offset indent
ipfw_load="YES"
ipdivert_load="YES"
.Ed
.Sh DESCRIPTION
Divert sockets allow to intercept and re-inject packets flowing through
the
.Xr ipfw 4
firewall.
A divert socket can be bound to a specific
.Nm
port via the
.Xr bind 2
system call.
The sockaddr argument shall be sockaddr_in with sin_port set to the
desired value.
Note that the
.Nm
port has nothing to do with TCP/UDP ports.
It is just a cookie number, that allows to differentiate between different
divert points in the
.Xr ipfw 4
ruleset.
A divert socket bound to a divert port will receive all packets diverted
to that port by
.Xr ipfw 4 .
Packets may also be written to a divert port, in which case they re-enter
firewall processing at the next rule.
.Pp
By reading from and writing to a divert socket, matching packets
can be passed through an arbitrary ``filter'' as they travel through
the host machine, special routing tricks can be done, etc.
.Sh READING PACKETS
Packets are diverted either as they are ``incoming'' or ``outgoing.''
Incoming packets are diverted after reception on an IP interface,
whereas outgoing packets are diverted before next hop forwarding.
.Pp
Diverted packets may be read unaltered via
.Xr read 2 ,
.Xr recv 2 ,
or
.Xr recvfrom 2 .
In the latter case, the address returned will have its port set to
some tag supplied by the packet diverter, (usually the ipfw rule number)
and the IP address set to the (first) address of
the interface on which the packet was received (if the packet
was incoming) or
.Dv INADDR_ANY
(if the packet was outgoing).
The interface name (if defined
for the packet) will be placed in the 8 bytes following the address,
if it fits.
.Sh WRITING PACKETS
Writing to a divert socket is similar to writing to a raw IP socket;
the packet is injected ``as is'' into the normal kernel IP packet
processing using
.Xr sendto 2
and minimal error checking is done.
Packets are distinguished as either incoming or outgoing.
If
.Xr sendto 2
is used with a destination IP address of
.Dv INADDR_ANY ,
then the packet is treated as if it were outgoing, i.e., destined
for a non-local address.
Otherwise, the packet is assumed to be
incoming and full packet routing is done.
.Pp
In the latter case, the
IP address specified must match the address of some local interface,
or an interface name
must be found after the IP address.
If an interface name is found,
that interface will be used and the value of the IP address will be
ignored (other than the fact that it is not
.Dv INADDR_ANY ) .
This is to indicate on which interface the packet
.Dq arrived .
.Pp
Normally, packets read as incoming should be written as incoming;
similarly for outgoing packets.
When reading and then writing back
packets, passing the same socket address supplied by
.Xr recvfrom 2
unmodified to
.Xr sendto 2
simplifies things (see below).
.Pp
The port part of the socket address passed to the
.Xr sendto 2
contains a tag that should be meaningful to the diversion module.
In the
case of
.Xr ipfw 8
the tag is interpreted as the rule number
.Em after which
rule processing should restart.
.Sh LOOP AVOIDANCE
Packets written into a divert socket
(using
.Xr sendto 2 )
re-enter the packet filter at the rule number
following the tag given in the port part of the socket address, which
is usually already set at the rule number that caused the diversion
(not the next rule if there are several at the same number).
If the 'tag'
is altered to indicate an alternative re-entry point, care should be taken
to avoid loops, where the same packet is diverted more than once at the
same rule.
.Sh DETAILS
If a packet is diverted but no socket is bound to the
port, or if
.Dv IPDIVERT
is not enabled or loaded in the kernel, the packet is dropped.
.Pp
Incoming packet fragments which get diverted are fully reassembled
before delivery; the diversion of any one fragment causes the entire
packet to get diverted.
If different fragments divert to different ports,
then which port ultimately gets chosen is unpredictable.
.Pp
Note that packets arriving on the divert socket by the
.Xr ipfw 8
.Cm tee
action are delivered as-is and packet fragments do not get reassembled
in this case.
.Pp
Packets are received and sent unchanged, except that
packets read as outgoing have invalid IP header checksums, and
packets written as outgoing have their IP header checksums overwritten
with the correct value.
Packets written as incoming and having incorrect checksums will be dropped.
Otherwise, all header fields are unchanged (and therefore in network order).
.Pp
Creating a
.Nm
socket requires super-user access.
.Sh ERRORS
Writing to a divert socket can return these errors, along with
the usual errors possible when writing raw packets:
.Bl -tag -width Er
.It Bq Er EINVAL
The packet had an invalid header, or the IP options in the packet
and the socket options set were incompatible.
.It Bq Er EADDRNOTAVAIL
The destination address contained an IP address not equal to
.Dv INADDR_ANY
that was not associated with any interface.
.El
.Sh SEE ALSO
.Xr bind 2 ,
.Xr recvfrom 2 ,
.Xr sendto 2 ,
.Xr socket 2 ,
.Xr ipfw 4 ,
.Xr ipfw 8
.Sh AUTHORS
.An Archie Cobbs Aq Mt archie@FreeBSD.org ,
Whistle Communications Corp.
.Sh BUGS
This is an attempt to provide a clean way for user mode processes
to implement various IP tricks like address translation, but it
could be cleaner, and it is too dependent on
.Xr ipfw 8 .
.Pp
It is questionable whether incoming fragments should be reassembled
before being diverted.
For example, if only some fragments of a
packet destined for another machine do not get routed through the
local machine, the packet is lost.
This should probably be
a settable socket option in any case.