netlink(4) and associated features will exist in FreeBSD 14.0 but they will also exist in 13.2, an older version, from commits such as 02b958b1 and b309249b. This commit needs merging to stable/13 and releng/13.2. MFC after: 2days (needs to be in RC2) Reviewed by: imp,melifaro Pull Request: https://github.com/freebsd/freebsd-src/pull/651
360 lines
11 KiB
Groff
360 lines
11 KiB
Groff
.\"
|
|
.\" Copyright (C) 2022 Alexander Chernikov <melifaro@FreeBSD.org>.
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that the following conditions
|
|
.\" are met:
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
.\"
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
|
|
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
|
|
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
.\" SUCH DAMAGE.
|
|
.\"
|
|
.\" $FreeBSD$
|
|
.\"
|
|
.Dd November 30, 2022
|
|
.Dt NETLINK 4
|
|
.Os
|
|
.Sh NAME
|
|
.Nm Netlink
|
|
.Nd Kernel network configuration protocol
|
|
.Sh SYNOPSIS
|
|
.In netlink/netlink.h
|
|
.In netlink/netlink_route.h
|
|
.Ft int
|
|
.Fn socket AF_NETLINK SOCK_RAW "int family"
|
|
.Sh DESCRIPTION
|
|
Netlink is a user-kernel message-based communication protocol primarily used
|
|
for network stack configuration.
|
|
Netlink is easily extendable and supports large dumps and event
|
|
notifications, all via a single socket.
|
|
The protocol is fully asynchronous, allowing one to issue and track multiple
|
|
requests at once.
|
|
Netlink consists of multiple families, which commonly group the commands
|
|
belonging to the particular kernel subsystem.
|
|
Currently, the supported families are:
|
|
.Pp
|
|
.Bd -literal -offset indent -compact
|
|
NETLINK_ROUTE network configuration,
|
|
NETLINK_GENERIC "container" family
|
|
.Ed
|
|
.Pp
|
|
The
|
|
.Dv NETLINK_ROUTE
|
|
family handles all interfaces, addresses, neighbors, routes, and VNETs
|
|
configuration.
|
|
More details can be found in
|
|
.Xr rtnetlink 4 .
|
|
The
|
|
.Dv NETLINK_GENERIC
|
|
family serves as a
|
|
.Do container Dc ,
|
|
allowing registering other families under the
|
|
.Dv NETLINK_GENERIC
|
|
umbrella.
|
|
This approach allows using a single netlink socket to interact with
|
|
multiple netlink families at once.
|
|
More details can be found in
|
|
.Xr genetlink 4 .
|
|
.Pp
|
|
Netlink has its own sockaddr structure:
|
|
.Bd -literal
|
|
struct sockaddr_nl {
|
|
uint8_t nl_len; /* sizeof(sockaddr_nl) */
|
|
sa_family_t nl_family; /* netlink family */
|
|
uint16_t nl_pad; /* reserved, set to 0 */
|
|
uint32_t nl_pid; /* automatically selected, set to 0 */
|
|
uint32_t nl_groups; /* multicast groups mask to bind to */
|
|
};
|
|
.Ed
|
|
.Pp
|
|
Typically, filling this structure is not required for socket operations.
|
|
It is presented here for completeness.
|
|
.Sh PROTOCOL DESCRIPTION
|
|
The protocol is message-based.
|
|
Each message starts with the mandatory
|
|
.Va nlmsghdr
|
|
header, followed by the family-specific header and the list of
|
|
type-length-value pairs (TLVs).
|
|
TLVs can be nested.
|
|
All headers and TLVS are padded to 4-byte boundaries.
|
|
Each
|
|
.Xr send 2 or
|
|
.Xr recv 2
|
|
system call may contain multiple messages.
|
|
.Ss BASE HEADER
|
|
.Bd -literal
|
|
struct nlmsghdr {
|
|
uint32_t nlmsg_len; /* Length of message including header */
|
|
uint16_t nlmsg_type; /* Message type identifier */
|
|
uint16_t nlmsg_flags; /* Flags (NLM_F_) */
|
|
uint32_t nlmsg_seq; /* Sequence number */
|
|
uint32_t nlmsg_pid; /* Sending process port ID */
|
|
};
|
|
.Ed
|
|
.Pp
|
|
The
|
|
.Va nlmsg_len
|
|
field stores the whole message length, in bytes, including the header.
|
|
This length has to be rounded up to the nearest 4-byte boundary when
|
|
iterating over messages.
|
|
The
|
|
.Va nlmsg_type
|
|
field represents the command/request type.
|
|
This value is family-specific.
|
|
The list of supported commands can be found in the relevant family
|
|
header file.
|
|
.Va nlmsg_seq
|
|
is a user-provided request identifier.
|
|
An application can track the operation result using the
|
|
.Dv NLMSG_ERROR
|
|
messages and matching the
|
|
.Va nlmsg_seq
|
|
.
|
|
The
|
|
.Va nlmsg_pid
|
|
field is the message sender id.
|
|
This field is optional for userland.
|
|
The kernel sender id is zero.
|
|
The
|
|
.Va nlmsg_flags
|
|
field contains the message-specific flags.
|
|
The following generic flags are defined:
|
|
.Pp
|
|
.Bd -literal -offset indent -compact
|
|
NLM_F_REQUEST Indicates that the message is an actual request to the kernel
|
|
NLM_F_ACK Request an explicit ACK message with an operation result
|
|
.Ed
|
|
.Pp
|
|
The following generic flags are defined for the "GET" request types:
|
|
.Pp
|
|
.Bd -literal -offset indent -compact
|
|
NLM_F_ROOT Return the whole dataset
|
|
NLM_F_MATCH Return all entries matching the criteria
|
|
.Ed
|
|
These two flags are typically used together, aliased to
|
|
.Dv NLM_F_DUMP
|
|
.Pp
|
|
The following generic flags are defined for the "NEW" request types:
|
|
.Pp
|
|
.Bd -literal -offset indent -compact
|
|
NLM_F_CREATE Create an object if none exists
|
|
NLM_F_EXCL Don't replace an object if it exists
|
|
NLM_F_REPLACE Replace an existing matching object
|
|
NLM_F_APPEND Append to an existing object
|
|
.Ed
|
|
.Pp
|
|
The following generic flags are defined for the replies:
|
|
.Pp
|
|
.Bd -literal -offset indent -compact
|
|
NLM_F_MULTI Indicates that the message is part of the message group
|
|
NLM_F_DUMP_INTR Indicates that the state dump was not completed
|
|
NLM_F_DUMP_FILTERED Indicates that the dump was filtered per request
|
|
NLM_F_CAPPED Indicates the original message was capped to its header
|
|
NLM_F_ACK_TLVS Indicates that extended ACK TLVs were included
|
|
.Ed
|
|
.Ss TLVs
|
|
Most messages encode their attributes as type-length-value pairs (TLVs).
|
|
The base TLV header:
|
|
.Bd -literal
|
|
struct nlattr {
|
|
uint16_t nla_len; /* Total attribute length */
|
|
uint16_t nla_type; /* Attribute type */
|
|
};
|
|
.Ed
|
|
The TLV type
|
|
.Pq Va nla_type
|
|
scope is typically the message type or group within a family.
|
|
For example, the
|
|
.Dv RTN_MULTICAST
|
|
type value is only valid for
|
|
.Dv RTM_NEWROUTE
|
|
,
|
|
.Dv RTM_DELROUTE
|
|
and
|
|
.Dv RTM_GETROUTE
|
|
messages.
|
|
TLVs can be nested; in that case internal TLVs may have their own sub-types.
|
|
All TLVs are packed with 4-byte padding.
|
|
.Ss CONTROL MESSAGES
|
|
A number of generic control messages are reserved in each family.
|
|
.Pp
|
|
.Dv NLMSG_ERROR
|
|
reports the operation result if requested, optionally followed by
|
|
the metadata TLVs.
|
|
The value of
|
|
.Va nlmsg_seq
|
|
is set to its value in the original messages, while
|
|
.Va nlmsg_pid
|
|
is set to the socket pid of the original socket.
|
|
The operation result is reported via
|
|
.Vt "struct nlmsgerr":
|
|
.Bd -literal
|
|
struct nlmsgerr {
|
|
int error; /* Standard errno */
|
|
struct nlmsghdr msg; /* Original message header */
|
|
};
|
|
.Ed
|
|
If the
|
|
.Dv NETLINK_CAP_ACK
|
|
socket option is not set, the remainder of the original message will follow.
|
|
If the
|
|
.Dv NETLINK_EXT_ACK
|
|
socket option is set, the kernel may add a
|
|
.Dv NLMSGERR_ATTR_MSG
|
|
string TLV with the textual error description, optionally followed by the
|
|
.Dv NLMSGERR_ATTR_OFFS
|
|
TLV, indicating the offset from the message start that triggered an error.
|
|
Some operations may return additional metadata encapsulated in the
|
|
.Dv NLMSGERR_ATTR_COOKIE
|
|
TLV.
|
|
The metadata format is specific to the operation.
|
|
If the operation reply is a multipart message, then no
|
|
.Dv NLMSG_ERROR
|
|
reply is generated, only a
|
|
.Dv NLMSG_DONE
|
|
message, closing multipart sequence.
|
|
.Pp
|
|
.Dv NLMSG_DONE
|
|
indicates the end of the message group: typically, the end of the dump.
|
|
It contains a single
|
|
.Vt int
|
|
field, describing the dump result as a standard errno value.
|
|
.Sh SOCKET OPTIONS
|
|
Netlink supports a number of custom socket options, which can be set with
|
|
.Xr setsockopt 2
|
|
with the
|
|
.Dv SOL_NETLINK
|
|
.Fa level :
|
|
.Bl -tag -width indent
|
|
.It Dv NETLINK_ADD_MEMBERSHIP
|
|
Subscribes to the notifications for the specific group (int).
|
|
.It Dv NETLINK_DROP_MEMBERSHIP
|
|
Unsubscribes from the notifications for the specific group (int).
|
|
.It Dv NETLINK_LIST_MEMBERSHIPS
|
|
Lists the memberships as a bitmask.
|
|
.It Dv NETLINK_CAP_ACK
|
|
Instructs the kernel to send the original message header in the reply
|
|
without the message body.
|
|
.It Dv NETLINK_EXT_ACK
|
|
Acknowledges ability to receive additional TLVs in the ACK message.
|
|
.El
|
|
.Pp
|
|
Additionally, netlink overrides the following socket options from the
|
|
.Dv SOL_SOCKET
|
|
.Fa level :
|
|
.Bl -tag -width indent
|
|
.It Dv SO_RCVBUF
|
|
Sets the maximum size of the socket receive buffer.
|
|
If the caller has
|
|
.Dv PRIV_NET_ROUTE
|
|
permission, the value can exceed the currently-set
|
|
.Va kern.ipc.maxsockbuf
|
|
value.
|
|
.El
|
|
.Sh SYSCTL VARIABLES
|
|
A set of
|
|
.Xr sysctl 8
|
|
variables is available to tweak run-time parameters:
|
|
.Bl -tag -width indent
|
|
.It Va net.netlink.sendspace
|
|
Default send buffer for the netlink socket.
|
|
Note that the socket sendspace has to be at least as long as the longest
|
|
message that can be transmitted via this socket.
|
|
.El
|
|
.Bl -tag -width indent
|
|
.It Va net.netlink.recvspace
|
|
Default receive buffer for the netlink socket.
|
|
Note that the socket recvspace has to be least as long as the longest
|
|
message that can be received from this socket.
|
|
.El
|
|
.Bl -tag -width indent
|
|
.It Va net.netlink.nl_maxsockbuf
|
|
Maximum receive buffer for the netlink socket that can be set via
|
|
.Dv SO_RCVBUF
|
|
socket option.
|
|
.El
|
|
.Sh DEBUGGING
|
|
Netlink implements per-functional-unit debugging, with different severities
|
|
controllable via the
|
|
.Va net.netlink.debug
|
|
branch.
|
|
These messages are logged in the kernel message buffer and can be seen in
|
|
.Xr dmesg 8
|
|
.
|
|
The following severity levels are defined:
|
|
.Bl -tag -width indent
|
|
.It Dv LOG_DEBUG(7)
|
|
Rare events or per-socket errors are reported here.
|
|
This is the default level, not impacting production performance.
|
|
.It Dv LOG_DEBUG2(8)
|
|
Socket events such as groups memberships, privilege checks, commands and dumps
|
|
are logged.
|
|
This level does not incur significant performance overhead.
|
|
.It Dv LOG_DEBUG3(9)
|
|
All socket events, each dumped or modified entities are logged.
|
|
Turning it on may result in significant performance overhead.
|
|
.El
|
|
.Sh ERRORS
|
|
Netlink reports operation results, including errors and error metadata, by
|
|
sending a
|
|
.Dv NLMSG_ERROR
|
|
message for each request message.
|
|
The following errors can be returned:
|
|
.Bl -tag -width Er
|
|
.It Bq Er EPERM
|
|
when the current privileges are insufficient to perform the required operation;
|
|
.It Bo Er ENOBUFS Bc or Bo Er ENOMEM Bc
|
|
when the system runs out of memory for
|
|
an internal data structure;
|
|
.It Bq Er ENOTSUP
|
|
when the requested command is not supported by the family or
|
|
the family is not supported;
|
|
.It Bq Er EINVAL
|
|
when some necessary TLVs are missing or invalid, detailed info
|
|
may be provided in NLMSGERR_ATTR_MSG and NLMSGERR_ATTR_OFFS TLVs;
|
|
.It Bq Er ENOENT
|
|
when trying to delete a non-existent object.
|
|
.Pp
|
|
Additionally, a socket operation itself may fail with one of the errors
|
|
specified in
|
|
.Xr socket 2
|
|
,
|
|
.Xr recv 2
|
|
or
|
|
.Xr send 2
|
|
.
|
|
.El
|
|
.Sh SEE ALSO
|
|
.Xr genetlink 4 ,
|
|
.Xr rtnetlink 4
|
|
.Rs
|
|
.%A "J. Salim"
|
|
.%A "H. Khosravi"
|
|
.%A "A. Kleen"
|
|
.%A "A. Kuznetsov"
|
|
.%T "Linux Netlink as an IP Services Protocol"
|
|
.%O "RFC 3549"
|
|
.Re
|
|
.Sh HISTORY
|
|
The netlink protocol appeared in
|
|
.Fx 13.2 .
|
|
.Sh AUTHORS
|
|
The netlink was implemented by
|
|
.An -nosplit
|
|
.An Alexander Chernikov Aq Mt melifaro@FreeBSD.org .
|
|
It was derived from the Google Summer of Code 2021 project by
|
|
.An Ng Peng Nam Sean .
|