The lang/gcc* ports patch headers where they think something is
non-standard. These patched headers override the system headers which means
you have to rebuild these ports whenever you do installworld to make sure
they contain the latest changes.
on extended and extensible structs if_msghdrl and ifa_msghdrl. This
will allow us to extend both the msghdrl structs and eventually if_data
in the future without breaking the ABI.
Bump __FreeBSD_version to allow ports to more easily detect the new API.
Reviewed by: glebius, brooks
MFC after: 3 days
TUNABLE variable (hw.netmap.buf_size) so we can experiment
with values different from 2048 which may give better cache performance.
- rearrange the memory allocation code so it will be easier
to replace it with a different implementation. The current code
relies on a single large contiguous chunk of memory obtained through
contigmalloc.
The new implementation (not committed yet) uses multiple
smaller chunks which are easier to fit in a fragmented address
space.
the original IPv4 implementation from r178888:
- Use RT_DEFAULT_FIB in the IPv4 implementation where noticed.
- Use rt*fib() KPI with explicit RT_DEFAULT_FIB where applicable in
the NFS code.
- Use the new in6_rt* KPI in TCP, gif(4), and the IPv6 network stack
where applicable.
- Split in6_rtqtimo() and in6_mtutimo() as done in IPv4 and equally
prevent multiple initializations of callouts in in6_inithead().
- Use wrapper functions where needed to preserve the current KPI to
ease MFCs. Use BURN_BRIDGES to indicate expected future cleanup.
- Fix (related) comments (both technical or style).
- Convert to rtinit() where applicable and only use custom loops where
currently not possible otherwise.
- Multicast group, most neighbor discovery address actions and faith(4)
are locked to the default FIB. Individual IPv6 addresses will only
appear in the default FIB, however redirect information and prefixes
of connected subnets are automatically propagated to all FIBs by
default (mimicking IPv4 behavior as closely as possible).
Sponsored by: Cisco Systems, Inc.
the (maximum) number of FIBs trying to clarify that evetually FIBs
should probably attached to domain(9) specific storage. [1]
Add a comment on a limitimation on the rt_add_addr_allfibs option.
Use RT_DEFAULT_FIB instead of 0 where applicable.
Add empty line to functions without local variables per style.
Put public yet unused in-tree function rtinit_fib() under BURN_BRIDGES
to indicate that it might go away in the future.
No functional change.
Discussed with: julian [1] (clarification on what the original one meant)
Sponsored by: Cisco Systems, Inc.
While doing so, for consistency with the rtalloc_ign_fib(9) interface
called, remove the "in_" prefix from rtalloc_ign_wrapper() no longer
indicating that it would only handle the INET case.
Sponsored by: Cisco Systems, Inc.
tables (FIBs) as IPv4.
Prepare various general rt* functions for multi-FIB IPv6 handling in
addition to already existing multi-FIB IPv4 cases.
Sponsored by: Cisco Systems, Inc.
referenced within its timeout window. This change clears the LLE_VALID flag when an llentry
is removed from an interface's hash table and adds an extra check to the flowtable code
for the LLE_VALID flag in llentry to avoid retaining and using a stale reference.
Reviewed by: qingli@
MFC after: 2 weeks
802.1q-defined 16-bit VID, CFI, and PCP field in host by order) and a
VLAN ID (VID). Tags go in packets. VIDs identify VLANs.
No functional change is intended, so this should be safe to MFC. Further
cleanup with functional changes will be committed separately (for example,
renaming vlan_tag/vlan_tag_p, which modify the KPI and KBI).
Reviewed by: bz
Sponsored by: ADARA Networks, Inc.
MFC after: 3 days
bpf_iflist list which reference the specified ifnet. The existing implementation
only removes the first matching bpf_if found in the list, effectively leaking
list entries if an ifnet has been bpfattach()ed multiple times with different
DLTs.
Fix the leak by performing the detach logic in a loop, stopping when all bpf_if
structs referencing the specified ifnet have been detached and removed from the
bpf_iflist list.
Whilst here, also:
- Remove the unnecessary "bp->bif_ifp == NULL" check, as a bpf_if should never
exist in the list with a NULL ifnet pointer.
- Except when INVARIANTS is in the kernel config, silently ignore the case where
no bpf_if referencing the specified ifnet is found, as it is harmless and does
not require user attention.
Reviewed by: csjp
MFC after: 1 week
interface address lists that distinguish read locks from write locks.
To preserve the KPI, the previous operations are mapped to the write
lock macros. The lock is still kept as a mutex for now.
Reviewed by: bz
MFC after: 2 weeks
the code is doing, we recognise the legitimacy of its goal, but we're not
quite sure it's going about it the right way. More pondering is clearly
required.
Sponsored by: ADARA Networks, Inc.
Discussed with: bz
MFC after: 3 days
aspect of time stamp configuration per interface rather than per BPF
descriptor. Prior to this, the order in which BPF devices were opened and the
per descriptor time stamp configuration settings could cause non-deterministic
and unintended behaviour with respect to time stamping. With the new scheme, a
BPF attached interface's tscfg sysctl entry can be set to "default", "none",
"fast", "normal" or "external". Setting "default" means use the system default
option (set with the net.bpf.tscfg.default sysctl), "none" means do not
generate time stamps for tapped packets, "fast" means generate time stamps for
tapped packets using a hz granularity system clock read, "normal" means
generate time stamps for tapped packets using a full timecounter granularity
system clock read and "external" (currently unimplemented) means use the time
stamp provided with the packet from an underlying source.
- Utilise the recently introduced sysclock_getsnapshot() and
sysclock_snap2bintime() KPIs to ensure the system clock is only read once per
packet, regardless of the number of BPF descriptors and time stamp formats
requested. Use the per BPF attached interface time stamp configuration to
control if sysclock_getsnapshot() is called and whether the system clock read
is fast or normal. The per BPF descriptor time stamp configuration is then
used to control how the system clock snapshot is converted to a bintime by
sysclock_snap2bintime().
- Remove all FAST related BPF descriptor flag variants. Performing a "fast"
read of the system clock is now controlled per BPF attached interface using
the net.bpf.tscfg sysctl tree.
- Update the bpf.4 man page.
Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.
For more information, see http://www.synclab.org/radclock/
In collaboration with: Julien Ridoux (jridoux at unimelb edu au)
While I'm here update if_oerrors if parent interface of vlan is not
up and running. Previously it updated collision counter and it was
confusing to interprete it.
PR: kern/163478
Reviewed by: glebius, jhb
Tested by: Joe Holden < lists <> rewt dot org dot uk >
7.x, 8.x and 9.x with pf(4) imports: pfsync(4) should suppress CARP
preemption, while it is running its bulk update.
However, reimplement the feature in more elegant manner, that is
partially inspired by newer OpenBSD:
- Rename term "suppression" to "demotion", to match with OpenBSD.
- Keep a global demotion factor, that can be raised by several
conditions, for now these are:
- interface goes down
- carp(4) has problems with ip_output() or ip6_output()
- pfsync performs bulk update
- Unlike in OpenBSD the demotion factor isn't a counter, but
is actual value added to advskew. The adjustment values for
particular error conditions are also configurable, and their
defaults are maximum advskew value, so a single failure bumps
demotion to maximum. This is for POLA compatibility, and should
satisfy most users.
- Demotion factor is a writable sysctl, so user can do
foot shooting, if he desires to.
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.
The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.
ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.
To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]
The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.
Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!
PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by: bz
Submitted by: bz [1]
A link reset now is completely transparent for the netmap client:
even if the NIC resets its own ring (e.g. restarting from 0),
the client will not see any change in the current rx/tx positions,
because the driver will keep track of the offset between the two.
2. make the device-specific code more uniform across different drivers
There were some inconsistencies in the implementation of the netmap
support routines, now drivers have been aligned to a common
code structure.
3. import netmap support for ixgbe . This is implemented as a very
small patch for ixgbe.c (233 lines, 11 chunks, mostly comments:
in total the patch has only 54 lines of new code) , as most of
the code is in an external file sys/dev/netmap/ixgbe_netmap.h ,
following some initial comments from Jack Vogel about making
changes less intrusive.
(Note, i have emailed Jack multiple times asking if he had
comments on this structure of the code; i got no reply so
i assume he is fine with it).
Support for other drivers (em, lem, re, igb) will come later.
"ixgbe" is now the reference driver for netmap support. Both the
external file (sys/dev/netmap/ixgbe_netmap.h) and the device-specific
patches (in sys/dev/ixgbe/ixgbe.c) are heavily commented and should
serve as a reference for other device drivers.
Tested on i386 and amd64 with the pkt-gen program in tools/tools/netmap,
the sender does 14.88 Mpps at 1050 Mhz and 14.2 Mpps at 900 MHz
on an i7-860 with 4 cores and 82599 card. Haven't tried yet more
aggressive optimizations such as adding 'prefetch' instructions
in the time-critical parts of the code.
parent interface. This avoids the overhead of queueing a packet to an IFQ
only to immediately dequeue it again.
Suggested by: np
Reviewed by: brooks
MFC after: 1 month
of hand-made.
- When registering new cloner, check whether a cloner with
same name already exist.
- When allocating unit, also check with help of ifunit()
whether such interface already exist or not. [1]
PR: kern/162789 [1]
contain both a regular timestamp obtained from the system clock and the
current feed-forward ffcounter value. This enables new possibilities including
comparison of timekeeping performance and timestamp correction during post
processing.
- Add the net.bpf.ffclock_tstamp sysctl to provide a choice between timestamping
packets using the feedback or feed-forward system clock.
Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.
For more information, see http://www.synclab.org/radclock/
Submitted by: Julien Ridoux (jridoux at unimelb edu au)
I/O from userspace, capable of line rate at 10G, see
http://info.iet.unipi.it/~luigi/netmap/
At this time I am bringing in only the generic code (sys/dev/netmap/
plus two headers under sys/net/), and some sample applications in
tools/tools/netmap. There is also a manpage in share/man/man4 [1]
In order to make use of the framework you need to build a kernel
with "device netmap", and patch individual drivers with the code
that you can find in
sys/dev/netmap/head.diff
The file will go away as the relevant pieces are committed to
the various device drivers, which should happen in a few days
after talking to the driver maintainers.
Netmap support is available at the moment for Intel 10G and 1G
cards (ixgbe, em/lem/igb), and for the Realtek 1G card ("re").
I have partial patches for "bge" and am starting to work on "cxgbe".
Hopefully changes are trivial enough so interested third parties
can submit their patches. Interested people can contact me
for advice on how to add netmap support to specific devices.
CREDITS:
Netmap has been developed by Luigi Rizzo and other collaborators
at the Universita` di Pisa, and supported by EU project CHANGE
(http://www.change-project.eu/)
The code is distributed under a BSD Copyright.
[1] In my opinion is a bad idea to have all manpage in one directory.
We should place kernel documentation in the same dir that contains
the code, which would make it much simpler to keep doc and code
in sync, reduce the clutter in share/man/ and incidentally is
the policy used for all of userspace code.
Makefiles and doc tools can be trivially adjusted to find the
manpages in the relevant subdirs.
if_alloctype was used to store the origional interface type. Take
advantage of this change by removing all existing uses of if_free_type()
in favor of if_free().
MFC after: 1 Month
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.