Unfortunately filemon/meta mode tracks all indirect dependencies here
since ld(1) is reading libelf when linking in libkvm. Churn would be
reduced if this was able to be limited to direct dependencies.
Sponsored by: EMC / Isilon Storage Division
This is so that 'make depend' is not a required build step in these
files.
DPSRCS is overall unneeded. DPSRCS already contains SRCS, so anything
which can safely be in SRCS should be. DPSRCS is mostly just a way to
generate files that should not be linked into the final PROG/LIB. For
headers and grammars it is safe for them to be in SRCS since they will
be excluded during linking and installation.
The only remaining uses of DPSRCS are for generating .c or .o files that
must be built before 'make depend' can run 'mkdep' on the SRCS c files
list. A semi-proper example is in tests/sys/kern/acct/Makefile where a
checked-in .c file has an #include on a generated .c file. The
generated .c file should not be linked into the final PROG though since
it is #include'd. The more proper way here is just to build/link it in
though without DPSRCS. Another example is in sys/modules/linux/Makefile
where a shell script runs to parse a DPSRCS .o file that should not be
linked into the module. Beyond those, the need for DPSRCS is largely
unneeded, redundant, and forces 'make depend' to be ran. Generally,
these Makefiles should avoid the need for DPSRCS and define proper
dependencies for their files as well.
An example of an improper usage and why this matters is in usr.bin/netstat.
nl_defs.h was only in DPSRCS and so was not generated during 'make all',
but only during 'make depend'. The files including it lacked proper
depenencies on it, which forced running 'make depend' to workaround that
bug. The 'make depend' target should mostly be used for incremental build
help, not to produce a working build. This specific example was broken in
the meta build until r287905 since it does not run 'make depend'.
The gnu/lib/libreadline/readline case is fine since bsd.lib.mk has 'OBJS:
SRCS:M*.h' when there is no .depend file.
Sponsored by: EMC / Isilon Storage Division
MFC after: 1 week
routepr() (-r flag). It is too narrow to show an IPv6 prefix
in most cases.
- Accept "local" as a synonym of "unix" in protocol family name.
- Show a prefix length in CIDR notation when name resolution failed in
netname().
- Make routename() and netname() AF-independent and remove
unnecessary typecasting from struct sockaddr.
- Use getnameinfo(3) to format L2 addr in intpr().
- Fix a bug which showed "Address" when -A flag is specfied in pr_rthdr().
- Replace cryptic GETSA() macro with SA_SIZE().
- Fix declarations shadowing local variables with the same names.
- Add more static, remove unused header files and variables.
MFC after: 1 week
In the kernel, structs such as tcpstat are manipulated as an array of
counter_u64_t (uint64_t *), but made visible to userland as an array of
uint64_t. kread_counters() was previously copying the counter array into
user space and sequentially overwriting each counter with its value. This
mostly affects IPsec counters, as other counters are exported via sysctl.
PR: 201700
Tested by: Jason Unovitch
MFC after: 1 week
Update setkey and libipsec to understand aes-gcm-16 as an
encryption method.
A partial commit of the work in review D2936.
Submitted by: eri
Reviewed by: jmg
MFC after: 2 weeks
Sponsored by: Rubicon Communications (Netgate)
Off by default, build behaves normally.
WITH_META_MODE we get auto objdir creation, the ability to
start build from anywhere in the tree.
Still need to add real targets under targets/ to build packages.
Differential Revision: D2796
Reviewed by: brooks imp
o Restore historical behaviour of appending '*' if interface is down,
and we have enough space to print it (usually we don't). [1]
o Do not truncate interface names when printing in encoded format.
o Report interface flags into encoded format.
PR: 199873 [1]
Sponsored by: Nginx, Inc.
in 'netstat -r'.
The netstat/route.c was the last abuser of struct ifnet and struct
rtentry in the tree. With this change if_var.h can become kernel
only include, _WANT_RTENTRY can go away and projects/ifnet and
projects/routing can go forward.
Differential Revision: https://reviews.freebsd.org/D2242
Reviewed by: melifaro, gnn
Sponsored by: Nginx, Inc.
Sponsored by: Netflix
contain kernel pointers, and instead has interface index.
Bump __FreeBSD_version for that change.
o Now, netstat/mroute6.c no longer needs to kvm_read(3) struct ifnet, and
no longer needs to include if_var.h
Note that this change is far from being a complete move of IPv6 multicast
routing to a proper API. Other structures are still dumped into their
sysctls as is, requiring userland application to #define _KERNEL when
including ip6_mroute.h and then call kvm_read(3) to gather all bits and
pieces. But fixing this is out of scope of the opaque ifnet project.
Sponsored by: Nginx, Inc.
Sponsored by: Netflix
Obtained from: Phil Shafer <phil@juniper.net>
Ported to -current by: alfred@ (mostly), Kim Shrier
Formatting: marcel@
Sponsored by: Juniper Networks, Inc.
o Introduce a notion of "not ready" mbufs in socket buffers. These
mbufs are now being populated by some I/O in background and are
referenced outside. This forces following implications:
- An mbuf which is "not ready" can't be taken out of the buffer.
- An mbuf that is behind a "not ready" in the queue neither.
- If sockbet buffer is flushed, then "not ready" mbufs shouln't be
freed.
o In struct sockbuf the sb_cc field is split into sb_ccc and sb_acc.
The sb_ccc stands for ""claimed character count", or "committed
character count". And the sb_acc is "available character count".
Consumers of socket buffer API shouldn't already access them directly,
but use sbused() and sbavail() respectively.
o Not ready mbufs are marked with M_NOTREADY, and ready but blocked ones
with M_BLOCKED.
o New field sb_fnrdy points to the first not ready mbuf, to avoid linear
search.
o New function sbready() is provided to activate certain amount of mbufs
in a socket buffer.
A special note on SCTP:
SCTP has its own sockbufs. Unfortunately, FreeBSD stack doesn't yet
allow protocol specific sockbufs. Thus, SCTP does some hacks to make
itself compatible with FreeBSD: it manages sockbufs on its own, but keeps
sb_cc updated to inform the stack of amount of data in them. The new
notion of "not ready" data isn't supported by SCTP. Instead, only a
mechanical substitute is done: s/sb_cc/sb_ccc/.
A proper solution would be to take away struct sockbuf from struct
socket and allow protocols to implement their own socket buffers, like
SCTP already does. This was discussed with rrs@.
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
It affects the IPv6 source address selection algorithm (RFC 6724)
and allows override the last rule ("longest matching prefix") for
choosing among equivalent addresses. The address with `prefer_source'
will be preferred source address.
Obtained from: Yandex LLC
MFC after: 1 month
Sponsored by: Yandex LLC
- Reformat the entire man page
- Create a proper synopsis section
- Use itemized-lists to describe each flag, rather than paragraphs
- Cross-reference common flags to a 'general flags' sub-section with short
inline description of the flag
- Label 'general flags' sub-section
- Apply additional fixes suggested by wblock, brueffer, and bdrewery
- Update .Dd that got undone previously
- Change the order of the .Op Fl to be alphabetical
- Add the -i | -I interface flags to the description of 'interface
display mode'
- Fix missing parameters in man page
- Fix missing parameters in usage()
- Sync man page and usage()
MFC Note: stable/9 and stable/10 do not have -R, will need to be removed
when merged
CR: D58
Reviewed by: brueffer, bcr
Approved by: wblock (mentor)
MFC after: 7 days
Sponsored by: ScaleEngine Inc.
- Increase WID_IF_DEFAULT() from 6 to 8 (the default for AF_INET6) because
we have interfaces with longer names than 6 chars like epairN{a,b}.
- Style fixes.
This is intended to help in diagnostics and debugging of NIC and stack
flowid support.
Eventually this will grow another column (RSS CPU ID) but
that currently isn't cached in the inpcb.
There's also no clean flowtype -> flowtype identifier string. This is
the mbuf M_HASHTYPE_* values for RSS.
Here's some example output:
adrian@adrian-hackbox:~/work/freebsd/head/src % netstat -Rn | more
Active Internet connections
Proto Recv-Q Send-Q Local Address Foreign Address flowid ftype
tcp4 0 0 10.11.1.65.22 10.11.1.64.12409 29041942 2
udp4 0 0 127.0.0.1.123 *.* 00000000 0
udp6 0 0 fe80::1%lo0.123 *.* 00000000 0
udp6 0 0 ::1.123 *.* 00000000 0
udp4 0 0 10.11.1.65.123 *.* 00000000 0
Tested:
* amd64 system w/ igb NIC; local driver changes to expose RSS flowid in if_igb.
same events that tcpstat's tcps_rcvmemdrop counter counts.
- Rename tcps_rcvmemdrop to tcps_rcvreassfull and improve its
description in netstat(1) output.
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
netstat has two options for printing multicast tables:
sysctl (the default one for live systems) and kvm-based one (for cores).
It looks like kvm-based one hasn't been working since it's been introduced
in r190012 due to absence of mfctablesize kernel symbol.
Check for all ipv4-multicast symbols being correctly resolved was introduced
in r259638 regardless of 'live' value leading to "No IPv4 MROUTING" error
message.
Reported by: Olivier Cochard-Labbé
MFC after: 1 week
AppleTalk was a network transport protocol for Apple Macintosh devices
in 80s and then 90s. Starting with Mac OS X in 2000 the AppleTalk was
a legacy protocol and primary networking protocol is TCP/IP. The last
Mac OS X release to support AppleTalk happened in 2009. The same year
routing equipment vendors (namely Cisco) end their support.
Thus, AppleTalk won't be supported in FreeBSD 11.0-RELEASE.
IPX was a network transport protocol in Novell's NetWare network operating
system from late 80s and then 90s. The NetWare itself switched to TCP/IP
as default transport in 1998. Later, in this century the Novell Open
Enterprise Server became successor of Novell NetWare. The last release
that claimed to still support IPX was OES 2 in 2007. Routing equipment
vendors (e.g. Cisco) discontinued support for IPX in 2011.
Thus, IPX won't be supported in FreeBSD 11.0-RELEASE.
interface, in the r241616 a crutch was provided. It didn't work well, and
finally we decided that it is time to break ABI and simply make if_baudrate
a 64-bit value. Meanwhile, the entire struct if_data was reviewed.
o Remove the if_baudrate_pf crutch.
o Make all fields of struct if_data fixed machine independent size. The
notion of data (packet counters, etc) are by no means MD. And it is a
bug that on amd64 we've got a 64-bit counters, while on i386 32-bit,
which at modern speeds overflow within a second.
This also removes quite a lot of COMPAT_FREEBSD32 code.
o Give 16 bit for the ifi_datalen field. This field was provided to
make future changes to if_data less ABI breaking. Unfortunately the
8 bit size of it had effectively limited sizeof if_data to 256 bytes.
o Give 32 bits to ifi_mtu and ifi_metric.
o Give 64 bits to the rest of fields, since they are counters.
__FreeBSD_version bumped.
Discussed with: emax
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
- Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This
removes another cache trashing ++ from packet forwarding path.
- Create zini/fini methods for the rtentry UMA zone. Via initialize
mutex and counter in them.
- Fix reporting of rmx_pksent to routing socket.
- Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode.
The change is mostly targeted for stable/10 merge. For head,
rt_pksent is expected to just disappear.
Discussed with: melifaro
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
insert flow entry. During the route lookup the critical section is
exited. It may happen, that after route lookup we will be executed
on an other CPU that already has such flowentry. Before this change
we simply freed the flowentry and returned to ip_output() with
failure.
Actually there is nothing wrong with using previously allocated
flow entry, updating it properly. Thus, make flowentry_insert()
return the new either old fle, and make use of it.
Count reuses as "collisions" and real inserts as "inserts".
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
Some of the collisions that are occuring are due to flowtable lookups
that succeed but have an invalid lle - typically because the L2 adjacency
lookup hasn't completed. This would lead to a follow-up insert which
would then fail (ie, collision) and the code would fall through to doing
a slow-path L2/L3 lookup in the netinet/netinet6 code.
This patch simply aborts storing a new flowtable entry if the lle isn't
yet valid.
Whilst I'm here, add a new pcpu counter for the item so the number of
failures can be tracked separately from generic "collisions."
Reviewed by: glebius
MFC after: 10 days
Sponsored by: Netflix, Inc.
- ip_output() and ip_output6() simply call flowtable_lookup(),
passing mbuf and address family. That's the only code under
#ifdef FLOWTABLE in the protocols code now.
o Revamp statistics gathering and export.
- Remove hand made pcpu stats, and utilize counter(9).
- Snapshot of statistics is available via 'netstat -rs'.
- All sysctls are moved into net.flowtable namespace, since
spreading them over net.inet isn't correct.
o Properly separate at compile time INET and INET6 parts.
o General cleanup.
- Remove chain of multiple flowtables. We simply have one for
IPv4 and one for IPv6.
- Flowtables are allocated in flowtable.c, symbols are static.
- With proper argument to SYSINIT() we no longer need flowtable_ready.
- Hash salt doesn't need to be per-VNET.
- Removed rudimentary debugging, which use quite useless in dtrace era.
The runtime behavior of flowtable shouldn't be changed by this commit.
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
necessary symbols needed per subsystem. Main kvm(3) init is now delayed
as much as possbile. This finally fixes performance issues reported in
kern/167204.
Some non-working code (ng_socket.ko symbol addresses calculation) removed.
Some global variables eliminated.
PR: kern/167204
MFC after: 4 weeks
instead of peeking inside in-kernel radix via kget.
This permits us to change kernel structures without breaking userland.
Additionally, this change provide more reliable and faster output.
`Refs` and `Use` fields available in IPv4 by default (and via -W
for other families) were removed. `Refs` is radix-specific thing
which is not informative for users. `Use` field value is handy sometimes,
but a) current API does not support it and b) I'm not sure we will
support per-rte pcpu counters in near future.
Old method of retrieving data is still supported (either by defining
NewTree=0 or running netstat with -A). However, Refs/Use fields are
hidden.
Sponsored by: Yandex LLC
MFC after: 4 weeks
PR: kern/167204
libkvm digging in kernel memory. This is possible since r231506 made
getifaddrs(3) to supply if_data for each ifaddr.
The pros of this change is that now netstat(1) doesn't know about kernel
struct ifnet and struct ifaddr. And these structs are about to change
significantly in head soon. New netstat binary will work well with 10.0
and any future kernel.
The cons is that now it isn't possible to obtain interface statistics
from a vmcore.
Functions intpr() and sidewaysintpr() were rewritten from scratch.
The output of netstat(1) has underwent the following changes:
1) The MTU is not printed for protocol addresses, since it has no notion.
Dash is printed instead. If there would be a strong desire to return
previous output, it is doable.
2) Output interface queue drops are not printed. Currently this data isn't
available to userland via any API. We plan to drop 'struct ifqueue' from
'struct ifnet' very soon, so old kvm(3) access to queue drops is soon
to be broken, too. The plan is that drivers would handle their queues
theirselves and a new field in if_data would be updated in case of drops.
3) In-kernel reference count for multicast addresses isn't printed. I doubt
that anyone used it. Anyway, netstat(1) is sysadmin tool, not kernel
debugger.
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
structure is used, but they already have equal fields in the struct
newipsecstat, that was introduced with FAST_IPSEC and then was merged
together with old ipsecstat structure.
This fixes kernel stack overflow on some architectures after migration
ipsecstat to PCPU counters.
Reported by: Taku YAMAMOTO, Maciej Milewski
Convert 'struct ipstat' and 'struct tcpstat' to counter(9).
This speeds up IP forwarding at extreme packet rates, and
makes accounting more precise.
Sponsored by: Nginx, Inc.
userland via routing socket or sysctl. This eliminates the following
KAME-specific sin6_scope_id handling routine from each userland utility:
sin6.sin6_scope_id = ntohs(*(u_int16_t *)&sin6.sin6_addr.s6_addr[2]);
This behavior can be controlled by net.inet6.ip6.deembed_scopeid. This is
set to 1 by default (sin6_scope_id will be filled in the kernel).
Reviewed by: bz
Machines can stall out because mbufs are low, however sometimes we won't
see "requests denied", instead we see user land processes or kernel threads
blocking waiting for mbufs because they set M_WAIT. These consumers do not
see errors, only stalling.
Unfortunately until now, netstat did not export this information
so you could have experienced an mbuf shortage and have no way of
seeing it unless you happen to run netstat at the exact time of the
shortage and see "in use" = "max".
By exporting the number of times processes are blocked, we can
effectively see how often non-interrupt context threads are effectively
"denied".
MFC after: 2 weeks
- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
These are available as t3_tom and t4_tom modules that augment cxgb(4)
and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as
usual with or without these extra features.
- iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the
works and will follow soon.
Build-tested with make universe.
30s overview
============
What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE
Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe
Which connections are offloaded? Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe
Reviewed by: bz, gnn
Sponsored by: Chelsio communications.
MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)
__uint16_t, we can partially undo r228668.
Note the remark "Work around a clang false positive with format string
warnings and ntohs macros (see LLVM PR 11313)" was actually incorrect.
Before r232745, on some arches, the ntohs() macros did in fact return
int, not uint16_t, so clang was right in warning about the %hu format
string.
MFC after: 2 weeks
get rid of testing explicitly for clang (using ${CC:T:Mclang}) in
individual Makefiles.
Instead, use the following extra macros, for use with clang:
- NO_WERROR.clang (disables -Werror)
- NO_WCAST_ALIGN.clang (disables -Wcast-align)
- NO_WFORMAT.clang (disables -Wformat and friends)
- CLANG_NO_IAS (disables integrated assembler)
- CLANG_OPT_SMALL (adds flags for extra small size optimizations)
As a side effect, this enables setting CC/CXX/CPP in src.conf instead of
make.conf! For clang, use the following:
CC=clang
CXX=clang++
CPP=clang-cpp
MFC after: 2 weeks
The index() and rindex() functions were marked LEGACY in the 2001
revision of POSIX and were subsequently removed from the 2008 revision.
The strchr() and strrchr() functions are part of the C standard.
This makes the source code a lot more consistent, as most of these C
files also call into other str*() routines. In fact, about a dozen
already perform strchr() calls.
* Correctly handle -a.
* -A isn't supported.
* Show all closed 1-to-1 and 1-to-many style sockets.
* Show all listening 1-to-many style sockets.
* Use consistent formatting for -W.
PR: 150642
Approved by: re@
MFC after: 4 weeks.
each workstream, rather than on every protocol, is prettier, it makes
machine-parsing of netstat -Q output a lot harder. Repeat the information
and hope that the user forgives us slightly dense formatting.
MFC after: 3 days
Reported by: bz
Sponsored by: Juniper Networks
stack from the output of `netstat -ani'.
- The node-local multicast address in the output of `netstat -rn'
should be handled as well.
Spotted by: Bernd Walter <ticso__at__cicely7.cicely.de>