the original IPv4 implementation from r178888:
- Use RT_DEFAULT_FIB in the IPv4 implementation where noticed.
- Use rt*fib() KPI with explicit RT_DEFAULT_FIB where applicable in
the NFS code.
- Use the new in6_rt* KPI in TCP, gif(4), and the IPv6 network stack
where applicable.
- Split in6_rtqtimo() and in6_mtutimo() as done in IPv4 and equally
prevent multiple initializations of callouts in in6_inithead().
- Use wrapper functions where needed to preserve the current KPI to
ease MFCs. Use BURN_BRIDGES to indicate expected future cleanup.
- Fix (related) comments (both technical or style).
- Convert to rtinit() where applicable and only use custom loops where
currently not possible otherwise.
- Multicast group, most neighbor discovery address actions and faith(4)
are locked to the default FIB. Individual IPv6 addresses will only
appear in the default FIB, however redirect information and prefixes
of connected subnets are automatically propagated to all FIBs by
default (mimicking IPv4 behavior as closely as possible).
Sponsored by: Cisco Systems, Inc.
the (maximum) number of FIBs trying to clarify that evetually FIBs
should probably attached to domain(9) specific storage. [1]
Add a comment on a limitimation on the rt_add_addr_allfibs option.
Use RT_DEFAULT_FIB instead of 0 where applicable.
Add empty line to functions without local variables per style.
Put public yet unused in-tree function rtinit_fib() under BURN_BRIDGES
to indicate that it might go away in the future.
No functional change.
Discussed with: julian [1] (clarification on what the original one meant)
Sponsored by: Cisco Systems, Inc.
While doing so, for consistency with the rtalloc_ign_fib(9) interface
called, remove the "in_" prefix from rtalloc_ign_wrapper() no longer
indicating that it would only handle the INET case.
Sponsored by: Cisco Systems, Inc.
tables (FIBs) as IPv4.
Prepare various general rt* functions for multi-FIB IPv6 handling in
addition to already existing multi-FIB IPv4 cases.
Sponsored by: Cisco Systems, Inc.
referenced within its timeout window. This change clears the LLE_VALID flag when an llentry
is removed from an interface's hash table and adds an extra check to the flowtable code
for the LLE_VALID flag in llentry to avoid retaining and using a stale reference.
Reviewed by: qingli@
MFC after: 2 weeks
802.1q-defined 16-bit VID, CFI, and PCP field in host by order) and a
VLAN ID (VID). Tags go in packets. VIDs identify VLANs.
No functional change is intended, so this should be safe to MFC. Further
cleanup with functional changes will be committed separately (for example,
renaming vlan_tag/vlan_tag_p, which modify the KPI and KBI).
Reviewed by: bz
Sponsored by: ADARA Networks, Inc.
MFC after: 3 days
bpf_iflist list which reference the specified ifnet. The existing implementation
only removes the first matching bpf_if found in the list, effectively leaking
list entries if an ifnet has been bpfattach()ed multiple times with different
DLTs.
Fix the leak by performing the detach logic in a loop, stopping when all bpf_if
structs referencing the specified ifnet have been detached and removed from the
bpf_iflist list.
Whilst here, also:
- Remove the unnecessary "bp->bif_ifp == NULL" check, as a bpf_if should never
exist in the list with a NULL ifnet pointer.
- Except when INVARIANTS is in the kernel config, silently ignore the case where
no bpf_if referencing the specified ifnet is found, as it is harmless and does
not require user attention.
Reviewed by: csjp
MFC after: 1 week
interface address lists that distinguish read locks from write locks.
To preserve the KPI, the previous operations are mapped to the write
lock macros. The lock is still kept as a mutex for now.
Reviewed by: bz
MFC after: 2 weeks
the code is doing, we recognise the legitimacy of its goal, but we're not
quite sure it's going about it the right way. More pondering is clearly
required.
Sponsored by: ADARA Networks, Inc.
Discussed with: bz
MFC after: 3 days
aspect of time stamp configuration per interface rather than per BPF
descriptor. Prior to this, the order in which BPF devices were opened and the
per descriptor time stamp configuration settings could cause non-deterministic
and unintended behaviour with respect to time stamping. With the new scheme, a
BPF attached interface's tscfg sysctl entry can be set to "default", "none",
"fast", "normal" or "external". Setting "default" means use the system default
option (set with the net.bpf.tscfg.default sysctl), "none" means do not
generate time stamps for tapped packets, "fast" means generate time stamps for
tapped packets using a hz granularity system clock read, "normal" means
generate time stamps for tapped packets using a full timecounter granularity
system clock read and "external" (currently unimplemented) means use the time
stamp provided with the packet from an underlying source.
- Utilise the recently introduced sysclock_getsnapshot() and
sysclock_snap2bintime() KPIs to ensure the system clock is only read once per
packet, regardless of the number of BPF descriptors and time stamp formats
requested. Use the per BPF attached interface time stamp configuration to
control if sysclock_getsnapshot() is called and whether the system clock read
is fast or normal. The per BPF descriptor time stamp configuration is then
used to control how the system clock snapshot is converted to a bintime by
sysclock_snap2bintime().
- Remove all FAST related BPF descriptor flag variants. Performing a "fast"
read of the system clock is now controlled per BPF attached interface using
the net.bpf.tscfg sysctl tree.
- Update the bpf.4 man page.
Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.
For more information, see http://www.synclab.org/radclock/
In collaboration with: Julien Ridoux (jridoux at unimelb edu au)
While I'm here update if_oerrors if parent interface of vlan is not
up and running. Previously it updated collision counter and it was
confusing to interprete it.
PR: kern/163478
Reviewed by: glebius, jhb
Tested by: Joe Holden < lists <> rewt dot org dot uk >
7.x, 8.x and 9.x with pf(4) imports: pfsync(4) should suppress CARP
preemption, while it is running its bulk update.
However, reimplement the feature in more elegant manner, that is
partially inspired by newer OpenBSD:
- Rename term "suppression" to "demotion", to match with OpenBSD.
- Keep a global demotion factor, that can be raised by several
conditions, for now these are:
- interface goes down
- carp(4) has problems with ip_output() or ip6_output()
- pfsync performs bulk update
- Unlike in OpenBSD the demotion factor isn't a counter, but
is actual value added to advskew. The adjustment values for
particular error conditions are also configurable, and their
defaults are maximum advskew value, so a single failure bumps
demotion to maximum. This is for POLA compatibility, and should
satisfy most users.
- Demotion factor is a writable sysctl, so user can do
foot shooting, if he desires to.
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.
The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.
ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.
To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]
The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.
Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!
PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by: bz
Submitted by: bz [1]
A link reset now is completely transparent for the netmap client:
even if the NIC resets its own ring (e.g. restarting from 0),
the client will not see any change in the current rx/tx positions,
because the driver will keep track of the offset between the two.
2. make the device-specific code more uniform across different drivers
There were some inconsistencies in the implementation of the netmap
support routines, now drivers have been aligned to a common
code structure.
3. import netmap support for ixgbe . This is implemented as a very
small patch for ixgbe.c (233 lines, 11 chunks, mostly comments:
in total the patch has only 54 lines of new code) , as most of
the code is in an external file sys/dev/netmap/ixgbe_netmap.h ,
following some initial comments from Jack Vogel about making
changes less intrusive.
(Note, i have emailed Jack multiple times asking if he had
comments on this structure of the code; i got no reply so
i assume he is fine with it).
Support for other drivers (em, lem, re, igb) will come later.
"ixgbe" is now the reference driver for netmap support. Both the
external file (sys/dev/netmap/ixgbe_netmap.h) and the device-specific
patches (in sys/dev/ixgbe/ixgbe.c) are heavily commented and should
serve as a reference for other device drivers.
Tested on i386 and amd64 with the pkt-gen program in tools/tools/netmap,
the sender does 14.88 Mpps at 1050 Mhz and 14.2 Mpps at 900 MHz
on an i7-860 with 4 cores and 82599 card. Haven't tried yet more
aggressive optimizations such as adding 'prefetch' instructions
in the time-critical parts of the code.
parent interface. This avoids the overhead of queueing a packet to an IFQ
only to immediately dequeue it again.
Suggested by: np
Reviewed by: brooks
MFC after: 1 month
of hand-made.
- When registering new cloner, check whether a cloner with
same name already exist.
- When allocating unit, also check with help of ifunit()
whether such interface already exist or not. [1]
PR: kern/162789 [1]
contain both a regular timestamp obtained from the system clock and the
current feed-forward ffcounter value. This enables new possibilities including
comparison of timekeeping performance and timestamp correction during post
processing.
- Add the net.bpf.ffclock_tstamp sysctl to provide a choice between timestamping
packets using the feedback or feed-forward system clock.
Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.
For more information, see http://www.synclab.org/radclock/
Submitted by: Julien Ridoux (jridoux at unimelb edu au)
I/O from userspace, capable of line rate at 10G, see
http://info.iet.unipi.it/~luigi/netmap/
At this time I am bringing in only the generic code (sys/dev/netmap/
plus two headers under sys/net/), and some sample applications in
tools/tools/netmap. There is also a manpage in share/man/man4 [1]
In order to make use of the framework you need to build a kernel
with "device netmap", and patch individual drivers with the code
that you can find in
sys/dev/netmap/head.diff
The file will go away as the relevant pieces are committed to
the various device drivers, which should happen in a few days
after talking to the driver maintainers.
Netmap support is available at the moment for Intel 10G and 1G
cards (ixgbe, em/lem/igb), and for the Realtek 1G card ("re").
I have partial patches for "bge" and am starting to work on "cxgbe".
Hopefully changes are trivial enough so interested third parties
can submit their patches. Interested people can contact me
for advice on how to add netmap support to specific devices.
CREDITS:
Netmap has been developed by Luigi Rizzo and other collaborators
at the Universita` di Pisa, and supported by EU project CHANGE
(http://www.change-project.eu/)
The code is distributed under a BSD Copyright.
[1] In my opinion is a bad idea to have all manpage in one directory.
We should place kernel documentation in the same dir that contains
the code, which would make it much simpler to keep doc and code
in sync, reduce the clutter in share/man/ and incidentally is
the policy used for all of userspace code.
Makefiles and doc tools can be trivially adjusted to find the
manpages in the relevant subdirs.
if_alloctype was used to store the origional interface type. Take
advantage of this change by removing all existing uses of if_free_type()
in favor of if_free().
MFC after: 1 Month
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
masked out when adding a prefix route through the "route" command.
However, when deleting the route, simply changing the command keyword
from "add" to "delete" does not work. The failoure is observed in
both IPv4 and IPv6 route insertion. The patch makes the route command
behavior consistent between the "add" and the "delete" operation.
MFC after: 1 week
According to POSIX, these two header files should be able to be included
by themselves, not depending on other headers. The <net/if.h> header
uses struct sockaddr when __BSD_VISIBLE=1, while <netinet/tcp.h> uses
integer datatypes (u_int32_t, u_short, etc).
MFC after: 2 months
It seems the D_PSEUDO flag was meant to allow make_dev() to return NULL.
Nowadays we have a different interface for that; make_dev_p(). There's
no need to keep it there.
While there, remove an unneeded D_NEEDMINOR from the gpio driver.
Discussed with: gonzo@ (gpio)
rtsock allowing routing daemons to filter routing updates on an
rtsock per FIB.
Adjust raw_input() and split it into wrapper and a new function
taking an optional callback argument even though we only have one
consumer [1] to keep the hackish flags local to rtsock.c.
PR: kern/134931
Submitted by: multiple (see PR)
Suggested by: rwatson [1]
Reviewed by: rwatson
MFC after: 3 days
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.
Reviewed by: rwatson
Approved by: re (bz)
hostid, this gives a good chance of keeping the same address over
reboots. This is intended to help IPV6 and similar which generate
their addresses from the mac.
PR: kern/160300
Submitted by: mdodd
Approved by: re (kib)
If a selinfo object is recorded (via selrecord()) and then it is
quickly destroyed, with the waiters missing the opportunity to awake,
at the next iteration they will find the selinfo object destroyed,
causing a PF#.
That happens because the selinfo interface has no way to drain the
waiters before to destroy the registered selinfo object. Also this
race is quite rare to get in practice, because it would require a
selrecord(), a poll request by another thread and a quick destruction
of the selrecord()'ed selinfo object.
Fix this by adding the seldrain() routine which should be called
before to destroy the selinfo objects (in order to avoid such case),
and fix the present cases where it might have already been called.
Sometimes, the context is safe enough to prevent this type of race,
like it happens in device drivers which installs selinfo objects on
poll callbacks. There, the destruction of the selinfo object happens
at driver detach time, when all the filedescriptors should be already
closed, thus there cannot be a race.
For this case, mfi(4) device driver can be set as an example, as it
implements a full correct logic for preventing this from happening.
Sponsored by: Sandvine Incorporated
Reported by: rstone
Tested by: pluknet
Reviewed by: jhb, kib
Approved by: re (bz)
MFC after: 3 weeks
to find the first route node of an ECMP chain before executing the route
command. If the system has a default route, and the specific route argument
to the command does not exist in the routing table, then the default route
would be reached. The current code does not verify the reached node matches
the given route argument, therefore erroneous removed the entry. This patch
fixes that bug.
Approved by: re
MFC after: 3 days
initialized by flags (function argument) or-ed with ifa->ifa_flags.
If both NIC has a loopback route to itself, so IFA_RTSELF is set on ifa(s).
As IFA_RTSELF is defined by RTF_HOST, rtrequest1_fib() is called with
RTF_HOST flag even if netmask is not NULL. Consequently, netmask is set
to zero in rtrequest1_fib(), and request to add network route is changed
under hands to request to add host route.
Tested by: Andrew Boyer <aboyer at averesystems.com>
Submitted by: Svatopluk Kraus <onwahe at gmail dot com>
Approved by: re (hrs)
- TCP keep* timers
- TCP UTO (adjust from what was there already)
- netmap
- route caching
- user cookie (temporary to allow for the real fix)
Slightly re-shuffle struct ifnet moving fields out of the middle
of spares and to better align.
Discussed with: rwatson (slightly earlier version)
same as the host address. This already works fine for INET6 and ND6.
While here, remove two function pointers from struct lltable which are
only initialized but never used.
MFC after: 3 days
setting (either default or if supported as set by SIOCSIFFIB, e.g.
from ifconfig).
Submitted by: Alexander V. Chernikov (melifaro ipfw.ru)
Reviewed by: julian
MFC after: 2 weeks
to be assigned to a non-default FIB instance.
You may need to recompile world or ports due to the change of struct ifnet.
Submitted by: cjsp
Submitted by: Alexander V. Chernikov (melifaro ipfw.ru)
(original versions)
Reviewed by: julian
Reviewed by: Alexander V. Chernikov (melifaro ipfw.ru)
MFC after: 2 weeks
X-MFC: use spare in struct ifnet
(i.e. under COMPAT_FREEBSD32) in case ifconf() returned success to match
the native SIOCGIFCONF behavior.
PR: kern/158369
Reported by: Paul Procacci <pprocacci att gmail com>
MFC after: 1 week
On MP systems this is not a usable solution anymore and could easily
lead to false positives triggering enough logging that even using
the console was no longer usable (multiple parallel ping -f can do).
Switch to the suggested solution of using mbuf tags to carry per
packet state between gre_output() invocations. Contrary to the
proposed solution modelled after gif(4) only allocate one mbuf tag
per packet rather than per packet and per gre_output() pass through.
As the sysctl to control the possible valid (gre in gre) nestings does
no sanity checks, make sure to always allocate space in the mbuf tag
for at least one, and at most 255 possible gre interfaces to detect
loops in addition to the counter.
Submitted by: Cristian KLEIN (cristi net.utcluj.ro) (original version)
PR: kern/114714
Reviewed by: Cristian KLEIN (cristi net.utcluj.ro)
Reviewed bu: Wooseog Choi (ben_choi hotmail.com)
Sponsored by: Sandvine Incorporated
MFC after: 1 week
Document the fact that we might want an IFCAP_CANTCHANGE mask,
even though the value is not yet used in sys/net/if.c
(asked on -current a week ago, no feedback so i assume no objection).
due to m_uiotombuf() failing.
While here, trim unneeded error handling related to tuninit() since it
can never fail.
Submitted by: Martin Birgmeier la5lbtyi aon at
Reviewed by: glebius
MFC after: 1 week
default dispatch method to NETISR_DISPATCH_DIRECT in order to force
direct dispatch. This adds a fairly negligble overhead without
changing default behavior, but in the future will allow deferred or
hybrid dispatch to other worker threads before link layer processing
has taken place.
For example, this could allow redistribution using RSS hashes
without ethernet header cache line hits, if the NIC was unable to
adequately implement load balancing to too small a number of input
queues -- perhaps due to hard queueset counts of 1, 3, or 8, but in
a modern system with 16-128 threads. This can happen on highly
threaded systems, where you want want an ithread per core,
redistributing work to other queues, but also on virtualised systems
where hardware hashing is (or is not) available, but only a single
queue has been directed to one VCPU on a VM.
Note: this adds a previously non-present assertion about the
equivalence of the ifnet from which the packet is received, and the
ifnet stamped in the mbuf header. I believe this assertion to
generally be true, but we'll find out soon -- if it's not, we might
have to add additional overhead in some cases to add an m_tag with
the originating ifnet pointer stored in it.
Reviewed by: bz
MFC after: 3 weeks
Sponsored by: Juniper Networks, Inc.
be brought up in the order they are enumerated in the device tree (in
particular, that thread 0 on each core be brought up first). The SLIST
through which we loop to start the CPUs has all of its entries added with
SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration
and so AP startup would always fail in such situations (causing a machine
check or RTAS failure). Fix this by changing the SLIST into an STAILQ,
and inserting new CPUs at the end.
Reviewed by: jhb
be represented:
- A single policy namespace is defined, consisting of four possible
policies: "default" to use the global default, "deferred" to force
deferred dispatch, "direct" to employ direct dispatch where possible, and
"hybrid" which makes a dynamic decision based on CPU affinity, ordering,
etc. Routines are implemented to convert between strings and an integer
namespace.
- A new global variable, netisr_dispatch_policy, subsumes existing global
variables for direct dispatch, forced direct dispatch, etc, and is used
for explicit policy interpretation and composition. Old variables remain
so that they can be exported by legacy sysctls for use by old netstat(1)
binaries. A new sysctl and tunable, netisr.dispatch.policy, accepts the
above strings for specifying a global policy default.
- The protocol registration structure, netisr_handler, grows an nh_dispatch
field, which accepts a per-policy policy override. The default value is
'0', which corresponds to "default", meaning that protocols will accept
the global default policy unless otherwise specified.
- Policies are now interpreted and composed explicitly at various points in
packet dispatch; protocol policies override global policies.
- Protocols grow the ability to express a non-opinion about affinity even
when implenting m2cpuid by returning NETISR_CPUID_NONE. In that case, the
framework falls back on source ordering, rather than simply using the
current CPU.
These changes are in support of allowing link layer re-dispatch based on
RSS or similar hashes provided by NICs, especially in the case where the
number of hardware receive queues matches hardware core count, rather than
hardware thread count, requiring further software redistributeon. (i.e.,
on RMI XLR).
MFC after: 3 weeks
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
on top of epair(4) virtual interfaces, since there's no physical
hardware associated with epair interfaces which would imply any
constraints on MTU sizes.
MFC after: 3 days
by borrowing the skeleton of if_media manipulation and reporting
code from if_lagg(4). The main motivation behind this change is
to allow for epair(4) interfaces to participate in STP if_bridge(4)
configurations.
Reviewed by: bz
MFC after: 3 days
interface is brought down, even though the interface address is still
valid. This patch maintains the permanent ARP entries as long as the
interface address (having the same prefix as that of the ARP entries)
is valid.
Reviewed by: delphij
MFC after: 5 days
- Add shorthand aliases for common media+option combinations as announced
by miibus(4) so that one can actually supply the media strings found in
the dmesg output to ifconfig(8).
Obtained from: NetBSD (in principle)
MFC after: 2 weeks
Transmission error in tun(4) is queueing error(i.e. ENOBUFS) and it
has nothing to do with collision.
Reported by: Zeus V Panchenko (zeus <> ibs dot dn dot ua)
adding appropriate #ifdefs. For module builds the framework needs
adjustments for at least carp.
Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 4 days
from the interface index, then decrease refcount, not vice versa.
Otherwise there is a race (reproducible) when if_free_internal()
contests on IFNET_WLOCK(), and we got a zero-refed ifnet in the
index for a long time. It may be picked by some other thread,
that runs ifnet_byindex_ref(), who takes the ifnet from index,
and bumps refcount. When reader drops the lock, if_free_internal()
proceeds with free. Then reader tries to free it a second time.
from processes inside jails if the addresses do not belong to the jail.
Originally reported by: Pieter de Boer via remko
PR: kern/151119
Tested by: Piotr KUCHARSKI (nospam 42.pl) [gif]
MFC after: 1 week
VNET socket push back:
try to minimize the number of places where we have to switch vnets
and narrow down the time we stay switched. Add assertions to the
socket code to catch possibly unset vnets as seen in r204147.
While this reduces the number of vnet recursion in some places like
NFS, POSIX local sockets and some netgraph, .. recursions are
impossible to fix.
The current expectations are documented at the beginning of
uipc_socket.c along with the other information there.
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
Reviewed by: jhb
Tested by: zec
Tested by: Mikolaj Golub (to.my.trociny gmail.com)
MFC after: 2 weeks
Resort the CURVNET_SET* macros in the non-VNET_DEBUG case to match
the call order of the VNET_DEBUG case.
Add the VNET_ASSERT() to the non-VNET_DEBUG case as well so that
INVARIANTS will still catch problems.
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
Reviewed by: jhb
MFC after: 2 weeks
Make VNET_ASSERT() available with either VNET_DEBUG or INVARIANTS.
Change the syntax to match KASSERT() to allow more flexible panic
messages rather than having a printf with hardcoded arguments
before panic.
Adjust the few assertions we have to the new format (and enhance
the output).
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
Reviewed by: jhb
MFC after: 2 weeks
table in if_grow(). The order of the SYSINIT's for ifnet state were swapped
so that the various locks were initialized before being used.
Reviewed by: pluknet, bz
MFC after: 2 weeks
reading. (This was already done for writing to a sysctl). This
requires all SYSCTL setups to specify a type. Most of them are now
checked at compile-time.
Remove SYSCTL_*X* sysctl additions as the print being in hex should be
controlled by the -x flag to sysctl(8).
Succested by: bde
This was lost when it was converted to using a condition variable instead
of lbolt.
- Drop the priority of flowtable down to PPAUSE when it is idle as well
since it is a similar background task.
MFC after: 2 weeks
counterpart also takes, i.e. "fdx" for "full-duplex", "flow" for
"flowcontrol", "hdx" for "half-duplex" as well as "loop" and "loopback"
for "hw-loopback".
MFC after: 1 week
Rather than duplicating the LLE_FREE_LOCKED() macro code in LLE_FREE(),
call it directly (like we do for the RT_* macros).
Sponsored by: ISPsystem [1]
Reviewed by: julian [1]
MFC After: 1 week
[1] Early 2010.
variable into two so that we can see on which one we are waiting.
This might also more properly propagate the update of the
flowclean_cycles flag and avoid "hangs" people were seeing.
Suggested by: rwatson [1]
Sponsored by: ISPsystem [1]
Reviewed by: julian [1]
Updated by: Mikolaj Golub (to.my.trociny gmail.com)
Tested by: Mikolaj Golub (to.my.trociny gmail.com)
MFC After: 1 week
[1] Early 2010, initial version.
isn't configurable in a meaningful way. This is for ifconfig(8) or
other tools not to change code whenever IFT_USB-like interfaces are
registered at the interface list.
Reviewed by: brooks
No objections: gavin, jkim
created in separated vnets. As a side-effect of having a separated
if_cloner instance for each vnet, all vlan ifnets created in a vnet
will be automatically destroyed when vnet teardown is initiated.
Disallow SIOCSETVLAN and SIOCGETVLAN ioctls on vlan ifnets which are
associated with physical ifnets residing in parent vnets.
This is an interim vlan-specific solution which will be superseded by a
more generic if_cloner V_irtualization change from p4. For nooptions
VIMAGE builds, this should be a no-op change.
Discussed with: bz
MFC after: 3 days
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files. A better long-term solution is
still being considered. This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.
Changes reverted:
------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines
Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.
------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines
Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.
------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines
Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.
support in mii(4):
- Merge generic flow control advertisement (which can be enabled by
passing by MIIF_DOPAUSE to mii_attach(9)) and parsing support from
NetBSD into mii_physubr.c and ukphy_subr.c. Unlike as in NetBSD,
IFM_FLOW isn't implemented as a global option via the "don't care
mask" but instead as a media specific option this. This has the
following advantages:
o allows flow control advertisement with autonegotiation to be
turned on and off via ifconfig(8) with the default typically
being off (though MIIF_FORCEPAUSE has been added causing flow
control to be always advertised, allowing to easily MFC this
changes for drivers that previously used home-grown support for
flow control that behaved that way without breaking POLA)
o allows to deal with PHY drivers where flow control advertisement
with manual selection doesn't work or at least isn't implemented,
like it's the case with brgphy(4), e1000phy(4) and ip1000phy(4),
by setting MIIF_NOMANPAUSE
o the available combinations of media options are readily available
from the `ifconfig -m` output
- Add IFM_FLOW to IFM_SHARED_OPTION_DESCRIPTIONS and IFM_ETH_RXPAUSE
and IFM_ETH_TXPAUSE to IFM_SUBTYPE_ETHERNET_OPTION_DESCRIPTIONS so
these are understood by ifconfig(8).
o Make the master/slave support in mii(4) actually usable:
- Change IFM_ETH_MASTER from being implemented as a global option via
the "don't care mask" to a media specific one as it actually is only
applicable to IFM_1000_T to date.
- Let mii_phy_setmedia() set GTCR_MAN_MS in IFM_1000_T slave mode to
actually configure manually selected slave mode (like we also do in
the PHY specific implementations).
- Add IFM_ETH_MASTER to IFM_SUBTYPE_ETHERNET_OPTION_DESCRIPTIONS so it
is understood by ifconfig(8).
o Switch bge(4), bce(4), msk(4), nfe(4) and stge(4) along with brgphy(4),
e1000phy(4) and ip1000phy(4) to use the generic flow control support
instead of home-grown solutions via IFM_FLAGs. This includes changing
these PHY drivers and smcphy(4) to no longer unconditionally advertise
support for flow control but only if the selected media has IFM_FLOW
set (or MIIF_FORCEPAUSE is set) and implemented for these media variants,
i.e. typically only for copper.
o Switch brgphy(4), ciphy(4), e1000phy(4) and ip1000phy(4) to report and
set IFM_1000_T master mode via IFM_ETH_MASTER instead of via IFF_LINK0
and some IFM_FLAGn.
o Switch brgphy(4) to add at least the the supported copper media based on
the contents of the BMSR via mii_phy_add_media() instead of hardcoding
them. The latter approach seems to have developed historically, besides
causing unnecessary code duplication it was also undesirable because
brgphy_mii_phy_auto() already based the capability advertisement on the
contents of the BMSR though.
o Let brgphy(4) set IFM_1000_T master mode on all supported PHY and not
just BCM5701. Apparently this was a misinterpretation of a workaround
in the Linux tg3 driver; BCM5701 seem to require RGPHY_1000CTL_MSE and
BRGPHY_1000CTL_MSC to be set when configuring autonegotiation but
this doesn't mean we can't set these as well on other PHYs for manual
media selection.
o Let ukphy_status() report IFM_1000_T master mode via IFM_ETH_MASTER so
IFM_1000_T master mode support now is generally available with all PHY
drivers.
o Don't let e1000phy(4) set master/slave bits for IFM_1000_SX as it's
not applicable there.
Reviewed by: yongari (plus additional testing)
Obtained from: NetBSD (partially), OpenBSD (partially)
MFC after: 2 weeks
bug (incorrect placement of __start_SECNAME in some cases) that was
fixed in r210245.
There is already an UPDATING entry about needing a recent ld.
MFC after: 1 month
When a fast machine first brings up some non TCP networking program
it is quite possible that we will drop packets due to the fact that
only one packet can be held per ARP entry. This leads to packets
being missed when a program starts or restarts if the ARP data is
not currently in the ARP cache.
This code adds a new sysctl, net.link.ether.inet.maxhold, which defines
a system wide maximum number of packets to be held in each ARP entry.
Up to maxhold packets are queued until an ARP reply is received or
the ARP times out. The default setting is the old value of 1
which has been part of the BSD networking code since time
immemorial.
Expose the time we hold an incomplete ARP entry by adding
the sysctl net.link.ether.inet.wait, which defaults to 20
seconds, the value used when the new ARP code was added..
Reviewed by: bz, rpaulo
MFC after: 3 weeks
symbols of the set_vnet and set_pcpu sections, so those symbols will
always be emitted in kernel modules, if they use vnet.h or pcpu.h.
Also, for pcpu.h, make the __(start|stop)_set_pcpu declarations, and
associated macros invisible to userland, to prevent it picking up these
symbols.
Reviewed by: kib
enhancements (1). Switch to a standard 2-clause BSD license for this (2).
Unfortunately we have to un-static the ifindex_table for this but do not
publicly export it.
Suggested by: rwatson (1) a while back.
Approved by: thompsa (2) for the change from r204279.
MFC after: 6 days
- move all the chunks into one file, which allows to hide SIOCGIFCONF32
global definition as well.
- replace __amd64__ with proper COMPAT_FREEBSD32 around.
- handle 32bit capacity before going into the handler itself instead of
doing internal 32bit specific changes within it (e.g. as it's done for
SIOCGDEFIFACE32_IN6).
- use explicitely sized types for ABI compat.
Approved by: kib (mentor)
MFC after: 2 weeks
over all interfaces to make sure the address will neither change nor be
freed while we are working on it.
PR: kern/146250
Submitted by: Mikolaj Golub (to.my.trociny gmail.com)
MFC after: 1 week
driver-maintained ifnet fields (such as if_drv_flags).
- Use soft locks as the mutex that protects each interface's knote list
rather than using the global knote list lock. Also, use the softc
for kn_hook instead of the cdev.
- Use mtx_sleep() instead of tsleep() when blocking in the read routines.
This fixes a lost wakeup race.
- Remove D_NEEDGIANT now that the cdevsw routines use the softc lock
where locking is needed.
- Lock IFQ when calculating the result for FIONREAD in tap(4). tun(4)
already did this.
- Remove remaining spl calls.
Submitted by: Marcin Cieslak saper of saper|info (3)
MFC after: 2 weeks
code associated with overflow or with the drain function. While this
function is not expected to be used often, it produces more information
in the form of an errno that sbuf_overflowed() did.
use '-' in probe names, matching the probe names in Solaris.[1]
Add userland SDT probes definitions to sys/sdt.h.
Sponsored by: The FreeBSD Foundation
Discussed with: rwaston [1]
bridge(4), lagg(4) etc. and make use of function pointers and
pf_proto_register() to hook carp into the network stack.
Currently, because of the uncertainty about whether the unload path is free
of race condition panics, unloads are disallowed by default. Compiling with
CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure.
This commit requires IP6PROTOSPACER, introduced in r211115.
Reviewed by: bz, simon
Approved by: ken (mentor)
MFC after: 2 weeks
interfaces to be a vlan (IFT_L2VLAN) rather than an Ethernet interface
(IFT_ETHER). The code already fixed if_type in the ifnet causing some
places to report the interface as a vlan (e.g. arp -a output) and other
places to report the interface as Ethernet (getifaddrs(3)). Now they
should all report IFT_L2VLAN.
Reviewed by: brooks
MFC after: 1 month
either find an existing entry, or allocate a new one. In the latter
case an entry would have flags, that were supplied as argument to
lla_lookup(). In case of an existing entry, flags aren't modified.
This lead to losing LLE_PUB and/or LLE_PROXY flags.
We should apply these flags either in lla_rt_output() or in the
in.c:in_lltable_lookup(). It seems to me that lla_rt_output() is
a more correct choice.
PR: kern/148784, kern/146539
Silence from: qingli, 5 days
- Allow setting format, resolution and accuracy of BPF time stamps per
listener. Previously, we were only able to use microtime(9). Now we can
set various resolutions and accuracies with ioctl(2) BIOCSTSTAMP command.
Similarly, we can get the current resolution and accuracy with BIOCGTSTAMP
command. Document all supported options in bpf(4) and their uses.
- Introduce new time stamp 'struct bpf_ts' and header 'struct bpf_xhdr'.
The new time stamp has both 64-bit second and fractional parts. bpf_xhdr
has this time stamp instead of 'struct timeval' for bh_tstamp. The new
structures let us use bh_tstamp of same size on both 32-bit and 64-bit
platforms without adding additional shims for 32-bit binaries. On 64-bit
platforms, size of BPF header does not change compared to bpf_hdr as its
members are already all 64-bit long. On 32-bit platforms, the size may
increase by 8 bytes. For backward compatibility, struct bpf_hdr with
struct timeval is still the default header unless new time stamp format is
explicitly requested. However, the behaviour may change in the future and
all relevant code is wrapped around "#ifdef BURN_BRIDGES" for now.
- Add experimental support for tagging mbufs with time stamps from a lower
layer, e.g., device driver. Currently, mbuf_tags(9) is used to tag mbufs.
The time stamps must be uptime in 'struct bintime' format as binuptime(9)
and getbinuptime(9) do.
Reviewed by: net@
interface when tearing down a vlan interface. If a trunk interface is
detached, all of its multicast addresses are removed before the ifnet
departure eventhandlers are invoked. This means that all of the multicast
addresses are removed before the vlan interfaces are removed which causes
the if_delmulti() calls in the vlan teardown to fail.
In the VLAN_ARRAY case, this left vlan interfaces referencing a no longer
valid parent interface. In the !VLAN_ARRAY case, the eventhandler gets
stuck in an infinite loop retrying vlan_unconfig_locked() forever. In
general the callers of vlan_unconfig_locked() do not expect nor handle
failure, so I believe it is safer to ignore the errors and tear down as
much of the vlan state as possible.
Silence from: net@
MFC after: 4 days
We cannot expect that modspace is the last entry in the linker
set and thus that modspace + possible extra space up to PAGE_SIZE
would be contiguous. For the moment do not support more than
*_MODMIN space and ignore the extra space (*).
(*) We know how to get it back but it'll need testing.
Discussed with: jeff, rwatson (briefly)
Reviewed by: jeff
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
MFC after: 4 days
queue length. The default value for this parameter is 50, which is
quite low for many of today's uses and the only way to modify this
parameter right now is to edit if_var.h file. Also add read-only
sysctl with the same name, so that it's possible to retrieve the
current value.
MFC after: 1 month
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.
Supported by: Bitgravity Inc.
Discussed with: alc, jeffr, and kib
"Whitspace" churn after the VIMAGE/VNET whirls.
Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.
Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.
This also removes some header file pollution for putatively
static global variables.
Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.
Reviewed by: jhb
Discussed with: rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
MFC after: 6 days
Add missing CURVNET_RESTORE() calls for multiple code paths, to stop
leaking the currently cached vnet into callers and to the process.
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
MFC after: 4 days
bd_compat32 field of struct bpf_d is kept unconditionally to not
impose the requirement of including "opt_compat.h" on all numerous
users of bpfdesc.h.
Submitted by: jhb (version for 6.x)
Reviewed and tested by: emaste
MFC after: 2 weeks
interface considers that it hits a fatal error, and will not copyout
the request structure back for _IOW and _IOWR ioctls, keeping them
untouched.
The previous implementation of the SIOCGIFDESCR ioctl intends to
feed the buffer length back to userland. However, if we return
an error, the feedback would be defeated and ifconfig(8) would
trap into an infinite loop.
This commit changes SIOCGIFDESCR to set buffer field to NULL to
indicate the previous ENAMETOOLONG case.
Reported by: bschmidt
MFC after: 2 weeks
to remove it to avoid panics in case of two threads trying to remove it in
parallel.
PR: kern/116837
Submitted by: Takahiro Kurosawa (takahiro.kurosawa gmail.com) (orig version)
MFC after: 10 days
prevented the link-layer entry from being freed.
In both in.c and in6.c (though that code path seems to be basically dead)
plug a reference leak in case of a pending callout being drained.
In if_ether.c consistently add a reference before resetting the callout
and in case we canceled a pending one remove the reference for that.
In the final case in arptimer, before freeing the expired entry, remove
the reference again and explicitly call callout_stop() to clear the active
flag.
In nd6.c:nd6_free() we are only ever called from the callout function and
thus need to remove the reference there as well before calling into
llentry_free().
In if_llatbl.c when freeing entire tables make sure that in case we cancel
a pending callout to remove the reference as well.
Reviewed by: qingli (earlier version)
MFC after: 10 days
Problem observed, patch tested by: simon on ipv6gw.f.o,
Christian Kratzer (ck cksoft.de),
Evgenii Davidov (dado korolev-net.ru)
PR: kern/144564
Configurations still affected: with options FLOWTABLE
dom_ifdetach() calls as they might sleep for callout_drain().
Do as we do in if_attachdomain1() [r121470] and handle
if_afdata_initialized earlier and call dom_ifdetach() unlocked.
Discussed with: rwatson
MFC after: 10 days
has actually succeeded to initialize and attach. There is a theoretical
possibility to drop out early in if_attachdomain1() leaving the array
uninitialized if we cannot get the lock.
Discussed with: rwatson
MFC after: 10 days
- increase flow cleaning frequency and decrease flow caching time
when near the flow limit
- stop allocating new flows when within 3% of maxflows don't start
allocating again until below 12.5%
MFC after: 7 days
that provides the allocated and setup eventhandler entry.
Add a new wrapper for VIMAGE that allocates extra space to hold the
callback function and argument in addition to an extra wrapper function.
While the wrapper function goes as normal callback function the
argument points to the extra space allocated holding the original func
and arg that the wrapper function can then call.
Provide an iterator function for the virtual network stack (vnet) that
will call the callback function for each network stack.
Provide a new set of macros for VNET that in the non-VIMAGE case will
just call eventhandler_register() while in the VIMAGE case it will use
vimage_eventhandler_register() passing in the extra iterator function
but will only register once rather than per-vnet.
We need a special macro in case we are interested in the tag returned
as we must check for curvnet and can neither simply assign the
return value, nor not change it in the non-vnet0 case without that.
Sponsored by: ISPsystem
Discussed with: jhb
Reviewed by: zec (earlier version), jhb
MFC after: 1 month
- show all lltables [1] (optional flag to also show the llentries as well)
- show lltable <struct lltable *>
- show llentry <struct llentry *>
MFC after: 6 days
if the interface has such capability. The interface
capability flag indicates whether such capability
exists. This approach is much more backward compatible.
Physical device driver changes will be part of another
commit.
Also updated the ifconfig utility to show the LINKSTATE
capability if present.
Reviewed by: rwatson, imp, juli
MFC after: 3 days
- Add a missing callout_drain(9) before the descriptor deallocation.[1]
- Prefer callout_init_mtx(9) over callout_init(9) and let the callout
subsystem handle the mutex for callout function.
PR: kern/144453
Submitted by: Alexander Sack (asack at niksun dot com)[1]
MFC after: 1 week
address as well as the transport protocol port information
from the outbound packets. The routing code is generic and
compares every byte in the given sockaddr object. Therefore
the temporary sockaddr objects must be cleared due to padding
bytes. In addition, the port information must be stripped
or the route search will either fail or return the incorrect
route entry.
Unit testing is done using OpenVPN over the if_tun interface.
MFC after: 7 days
- add a name argument to flowtable_alloc for printing with ddb commands
- extend ddb commands to print destination address or 4-tuples
- don't parse ports in ulp header if FL_HASH_ALL is not passed
- add kern_flowtable_insert to enable more generic use of flowtable
(e.g. system calls for adding entries)
- don't hash loopback addresses
- cleanup whitespace
- keep statistics per-cpu for per-cpu flowtables to avoid cache line contention
- add sysctls to accumulate stats and report aggregate
MFC after: 7 days
does not set or update the if_link_state variable.
As such RT_LINK_IS_UP() fails for the if_tap interface.
Also, the RT_LINK_IS_UP() needs to bypass all loopback
interfaces because loopback interfaces are considered
up logically as long as the system is running.
This patch fixes the above issues by setting and updating
the if_link_state variable when the tap interface is
opened or closed respectively. Similary approach is
already done in the if_tun device.
MFC after: 3 days
allow for connection load balancing across interfaces. Currently
the address alias handling method is colliding with the ECMP code.
For example, when two interfaces are configured on the same prefix,
only one prefix route is installed. So connection load balancing
among the available interfaces is not possible.
The other advantage of ECMP is for failover. The issue with the
current code, is that the interface link-state is not reflected
in the route entry. For example, if there are two interfaces on
the same prefix, the cable on one interface is unplugged, new and
existing connections should switch over to the other interface.
This is not done today and packets go into a black hole.
Also, there is a small bug in the kernel where deleting ECMP routes
in the userland will always return an error even though the command
is successfully executed.
MFC after: 5 days
radix table root nodes. This is only needed (and available)
in the virtualization case to free the resources when tearing
down a virtual network stack.
Sponsored by: ISPsystem
Reviewed by: julian, zec
MFC after: 5 days
or overflow the netisr queue and fall back to the interface
queue so that we can garuantee that the ifnet pointer stays
valid. Formerly we ended up with reference counts <= 0 in
case the netisr had returned ENOBUFS. The idea is to track
any packet in the netisr queue and only change the refount
on edge operations for the fallback interface queue. This
also avoids problems in case the if_snd.ifq_len lies to us.
Also rework refount assertions to make sure they trigger if
we go below 1. Formerly a negative refence count did not
trigger the assert as the refcount variable is u_int.
Sponsored by: ISPsystem
MFC after: 5 days
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.
The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.
CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
Enhanced process coredump routines.
This brings in the following features:
1) Limit number of cores per process via the %I coredump formatter.
Example:
if corefilename is set to %N.%I.core AND num_cores = 3, then
if a process "rpd" cores, then the corefile will be named
"rpd.0.core", however if it cores again, then the kernel will
generate "rpd.1.core" until we hit the limit of "num_cores".
this is useful to get several corefiles, but also prevent filling
the machine with corefiles.
2) Encode machine hostname in core dump name via %H.
3) Compress coredumps, useful for embedded platforms with limited space.
A sysctl kern.compress_user_cores is made available if turned on.
To enable compressed coredumps, the following config options need to be set:
options COMPRESS_USER_CORES
device zlib # brings in the zlib requirements.
device gzio # brings in the kernel vnode gzip output module.
4) Eventhandlers are fired to indicate coredumps in progress.
5) The imgact sv_coredump routine has grown a flag to pass in more
state, currently this is used only for passing a flag down to compress
the coredump or not.
Note that the gzio facility can be used for generic output of gzip'd
streams via vnodes.
Obtained from: Juniper Networks
Reviewed by: kan
- Rename the netisr protocol registration array, 'np' to 'netisr_proto',
in order to reduce the chances of symbol name collisions. It remains
statically defined, but it will be looked up by netstat(1).
- Move certain internal structure definitions from netisr.c to
netisr_internal.h so that netstat(1) can find them. They remain
private, and should not be used for any other purpose (for example,
they should not be used by kernel modules, which must instead use the
public interfaces in netisr.h).
- Store a kernel-compiled version of NETISR_MAXPROT in the global variable
netisr_maxprot, and export via a sysctl, so that it is available for use
by netstat(1). This is especially important for crashdump
interpretation, where the size of the workstream structure is determined
by the maximum number of protocols compiled into the kernel.
MFC after: 1 week
Sponsored by: Juniper Networks
Do not do additional dev_ref() on the newly created interface in the
if_clone create method [1]. This reference is not needed and never
removed, causing struct cdevpriv leakage. Remove the setting of
SI_CHEAPCLONE flag as well, since it is unused.
For dev_clone handlers, create cdevs with the call make_dev_credf(MAKEDEV_REF)
instead of calling make_dev() and then dev_ref(), to avoid a race.
Call drain_dev_clone_events() at the module unload time after dev_clone
handler is deinstalled.
Submitted by: Mikolaj Golub <to.my.trociny gmail com> [1]
MFC after: 1 week
counters have not gone about MAXCPU or NETISR_MAXPROT. These problems
caused panics on UP kernels with INVARIANTS when using sysctl -a, but
would also have caused problems for 32-core boxes or if the netisr
protocol vector was fully populated.
Reported by: nwhitehorn, Neel Natu <neelnatu@gmail.com>
MFC after: 4 days