Detailed analysis in https://github.com/genneko/freebsd-vimage-jails/issues/2
brought the problem down to a double call of ng_node_name() before and
after a vnet move. Because the name of the node is already known
(occupied by itself), the second call fails.
PR: 241954
Reported by: Paul Armstrong
MFC: 1 week
Differential Revision: https://reviews.freebsd.org/D30110
Allocate the necessary memory for the conversion dynamically starting
with a value which is sufficient for almost all normal cases.
PR: 187835
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D23840
The data path in netgraph is designed to work on an read only state of
the whole netgraph network. Currently this is achived by convention,
there is no technical enforcment. In the case of NETGRAPH_DEBUG all
nodes can be annotated for debugging purposes, so the strict
enforcment needs to be lifted for this purpose.
This patch is part of a series to make ng_bridge multithreaded, which
is done by rewrite the data path to operate on const.
Reviewed By: kp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D28141
The data path in netgraph is designed to work on an read only state of
the whole netgraph network. Currently this is achived by convetion,
there is no technical enforcment. This patch is part of a series to
make ng_brigde multithreaded, which is done by rewrite the data path
to const handling.
Reviewed By: kp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D28141
When tearing down a VNET, netgraph sends shutdown messages to all of the
nodes before detaching interfaces (SI_SUB_NETGRAPH comes before
SI_SUB_INIT_IF in teardown order). ng_ether nodes handle this by
destroying themselves without detaching from the parent ifnet. Then,
when ifnets go away they detach their ng_ether nodes again, triggering a
use-after-free.
Handle this by modifying ng_ether_shutdown() to detach from the ifnet.
If the shutdown was triggered by an ifnet being destroyed, we will clear
priv->ifp in the ng_ether detach callback, so priv->ifp may be NULL.
Also get rid of the printf in vnet_netgraph_uninit(). It can be
triggered trivially by ng_ether since ng_ether_shutdown() persists the
node unless NG_REALLY_DIE is set.
PR: 233622
Reviewed by: afedorov, kp, Lutz Donnerhacke
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27662
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
callout_stop() recently started returning -1 when the callout is already
stopped, which is not handled by the netgraph code. Properly filter
the return value. Netgraph callers only want to know if the callout
was cancelled and not draining or already stopped.
Discussed with: julian, glebius
MFC after: 2 weeks
- Wrong integer type was specified.
- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.
- Logical OR where binary OR was expected.
- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.
- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.
- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.
- Updated "EXAMPLES" section in SYSCTL manual page.
MFC after: 3 days
Sponsored by: Mellanox Technologies
These changes prevent sysctl(8) from returning proper output,
such as:
1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.
Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.
MFC after: 2 weeks
Sponsored by: Mellanox Technologies
Actually, text versions of generic commands are not used, since ngctl(8)
uses binary messages for them. And to request a text command one needs
a working ngctl(8). That's why the bug was never discovered. I'm pondering
on removing the text support for generic commands.
Found by: dim with clang 3.4
Submitted by: adrian, zec
Fix multiple kernel panics when VIMAGE is enabled in the kernel.
These fixes are based on patches submitted by Adrian Chadd and Marko Zec.
(1) Set curthread->td_vnet to vnet0 in device_probe_and_attach() just before calling
device_attach(). This fixes multiple VIMAGE related kernel panics
when trying to attach Bluetooth or USB Ethernet devices because
curthread->td_vnet is NULL.
(2) Set curthread->td_vnet in if_detach(). This fixes kernel panics when detaching networking
interfaces, especially USB Ethernet devices.
(3) Use VNET_DOMAIN_SET() in ng_btsocket.c
(4) In ng_unref_node() set curthread->td_vnet. This fixes kernel panics
when detaching Netgraph nodes.
ngthread properly set the item's depth to 1. In particular, prior to this
change if ng_snd_item failed to acquire a lock on a node, the item's depth
would not be set at all. This fix ensures that the error code from rcvmsg/
rcvdata is properly passed back to the apply callback. For example, this
fixes a bug where an error from rcvmsg/rcvdata would not previously
propagate back to a libnetgraph consumer when the message was queued.
Reviewed by: mav
MFC after: 1 month
Sponsored by: Sandvine Incorporated
- Make hash sizes growable, to satisfy users running large mpd
installations, having thousands of nodes.
- NG_NAMEHASH() proved to give a very bad distribution in real life
name sets, while generic hash32_str(name, HASHINIT) proved to give
an even one, so you the latter for name hash.
- Do not store unnamed nodes in slot 0 of name hash, no reason for that.
- Use the ID hash in cases when we need to run through all nodes: the
NGM_LISTNODES command and in the vnet_netgraph_uninit().
- Implement NGM_LISTNODES and NGM_LISTNAMES as separate code, the former
iterates through the ID hash, and the latter through the name hash.
- Keep count of all nodes and of named nodes, so that we don't need
to count nodes in NGM_LISTNODES and NGM_LISTNAMES. The counters are
also used to estimate whether we need to grow hashes.
- Close a race between two threads running ng_name_node() assigning same
name to different nodes.
mutex(9) to rwlock(9) based locks.
While here remove dropping lock when processing NGM_LISTNODES,
and NGM_LISTTYPES generic commands. We don't need to drop it
since memory allocation is done with M_NOWAIT.
- Make ng_unref_node() void, since caller shouldn't be
interested in whether node is valid after call or not,
since it can't be guaranteed to be valid. [1]
Ok from: julian [1]
the topology mutex in the following functions, that manipulate pointers
to peer nodes:
- ng_bypass()
- ng_path2noderef() when switching to the next node in sequence.
Rewrite the function a bit.
- ng_address_hook()
- ng_address_path()
This patch improves stability of large mpd5 installations.
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files. A better long-term solution is
still being considered. This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.
Changes reverted:
------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines
Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.
------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines
Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.
------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines
Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.
from 2000 bytes to 20 Kbytes, which now matches the buffer size used for
NGM_BINARY2ASCII conversions.
The aim of this change is to allow for bigger binary structures to be
managed via netgraph ASCII messages, until we come up with an API
improvement which would get rid of such arbitrary hardcoded limits.
MFC after: 3 days