61 Commits

Author SHA1 Message Date
ngie
e40523722e Add missing braces around MCAST_EXCLUDE check when KTR support is
compiled into the kernel

This ensures that .iss_asm (the number of ASM listeners) isn't incorrectly
decremented for MLD-layer source datagrams when inspecting im*s_st[1]
(the second state in the structure).

MFC after:	2 months
PR:		217509 [1]
Reported by:	Coverity (Isilon)
Reviewed by:	ae ("This patch looks correct to me." [1])
Submitted by:	Miles Ohlrich <miles.ohlrich@isilon.com>
Sponsored by:	Dell EMC Isilon
2017-05-13 18:41:24 +00:00
vangyzen
d7b79e158f Add some ntohl() love to r315277
inet_ntoa() and inet_ntoa_r() take the address in network
byte-order.  When I removed those calls, I should have
replaced them with ntohl() to make the hex addresses slightly
less unreadable.  Here they are.

See r315277 regarding classic blunders.

vangyzen: you're deep in "no good deed" territory, it seems
    --badger

Reported by:	ian
MFC after:	3 days
MFC when:	I finally get it right
Sponsored by:	Dell EMC
2017-03-14 20:57:54 +00:00
vangyzen
fad1018fc7 KTR: log IPv4 addresses in hex rather than dotted-quad
When I made the changes in r313821, I fell victim to one of the
classic blunders, the most famous of which is: never get involved
in a land war in Asia.  But only slightly less well known is this:
Keep your brain turned on and engaged when making a tedious, sweeping,
mechanical change.  KTR can correctly log the immediate integral values
passed to it, as well as constant strings, but not non-constant strings,
since they might change by the time ktrdump retrieves them.

Reported by:	glebius
MFC after:	3 days
Sponsored by:	Dell EMC
2017-03-14 18:27:48 +00:00
vangyzen
c7c348accc Use inet_ntoa_r() instead of inet_ntoa() throughout the kernel
inet_ntoa() cannot be used safely in a multithreaded environment
because it uses a static local buffer. Instead, use inet_ntoa_r()
with a buffer on the caller's stack.

Suggested by:	glebius, emaste
Reviewed by:	gnn
MFC after:	2 weeks
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D9625
2017-02-16 20:47:41 +00:00
pfg
d9c9113377 sys/net*: minor spelling fixes.
No functional change.
2016-05-03 18:05:43 +00:00
pfg
fe6332dab0 netinet: for pointers replace 0 with NULL.
These are mostly cosmetical, no functional change.

Found with devel/coccinelle.

Reviewed by:	ae. tuexen
2016-04-15 15:46:41 +00:00
melifaro
2bb0e924cc Make in_arpinput(), inp_lookup_mcast_ifp(), icmp_reflect(),
ip_dooptions(), icmp6_redirect_input(), in6_lltable_rtcheck(),
  in6p_lookup_mcast_ifp() and in6_selecthlim() use new routing api.

Eliminate now-unused ip_rtaddr().
Fix lookup key fib6_lookup_nh_basic() which was lost diring merge.
Make fib6_lookup_nh_basic() and fib6_lookup_nh_extended() always
  return IPv6 destination address with embedded scope. Currently
  rw_gateway has it scope embedded, do the same for non-gatewayed
  destinations.

Sponsored by:	Yandex LLC
2015-12-09 11:14:27 +00:00
cem
6ae0b4f9be in_getmulti: Fix recursion on if_addr_lock on malloc failure
When the M_NOWAIT allocation fails, we recurse the if_addr_lock trying
to clean up.  Reorder the cleanup after dropping the if_addr_lock.  The
obvious race is already possible between if_addmulti and IF_ADDR_WLOCK
above, so it must be ok.

Submitted by:	Ryan Libby <rlibby@gmail.com>
Reviewed by:	jhb
Found with:	M_NOWAIT failure injection testing
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D4138
2015-11-18 23:53:13 +00:00
ae
75425458ac Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock.
Both are used to protect access to IP addresses lists and they can be
acquired for reading several times per packet. To reduce lock contention
it is better to use rmlock here.

Reviewed by:	gnn (previous version)
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D3149
2015-07-29 08:12:05 +00:00
kib
3422f5929d Fix build with KTR after r278978. 2015-02-19 15:41:23 +00:00
glebius
17f48f0414 Use new struct mbufq instead of struct ifqueue to manage packet queues in
IPv4 multicast code.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-02-19 01:21:02 +00:00
jhb
9d6dc35e9d Only define the full inm_print() if KTR_IGMPV3 is enabled at compile time. 2014-09-30 17:26:34 +00:00
hselasky
35b126e324 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
gjb
fc21f40567 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
hselasky
bd1ed65f0f Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
gnn
61b1174206 Fix various places where we don't properly release a lock
PR:		185043
Submitted by:	Michael Bentkofsky
MFC after:	2 weeks
2014-01-16 22:14:54 +00:00
dim
c0103af5cd In sys/netinet/in_mcast.c, inm_is_ifp_detached() is only used whenever
KTR is defined, so put it between #ifdef KTR guards.  This avoids a
warning about a unused function if KTR is not enabled.

MFC after:	 3 days
2013-12-24 20:25:18 +00:00
glebius
24a0040a5c Uninline inm_lookup_locked(). Now in_var.h doesn't dereference
fields of struct ifnet.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-29 11:21:31 +00:00
glebius
ff6e113f1b The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
delphij
d76e7522db Fix an integer overflow in computing the size of a temporary buffer
can result in a buffer which is too small for the requested
operation.

Security:	CVE-2013-3077
Security:	FreeBSD-SA-13:09.ip_multicast
2013-08-22 00:51:37 +00:00
hrs
b3bce89084 Fix a panic when leaving MC group in a kernel with VIMAGE enabled.
in_leavegroup() is called from an asynchronous task, and
igmp_change_state() requires that curvnet is set by the caller.
2013-07-02 16:39:12 +00:00
glebius
b7d04855ff Remove unused variable. 2012-01-24 14:27:14 +00:00
jhb
4ef366671a Convert all users of IF_ADDR_LOCK to use new locking macros that specify
either a read lock or write lock.

Reviewed by:	bz
MFC after:	2 weeks
2012-01-05 19:00:36 +00:00
jhb
419867b9b4 Defer the work of freeing IPv4 multicast options from a socket to an
asychronous task.  This avoids tearing down multicast state including
sending IGMP leave messages and reprogramming MAC filters while holding
the per-protocol global pcbinfo lock that is used in the receive path of
packet processing.

Reviewed by:	rwatson
MFC after:	1 month
2011-12-29 20:41:16 +00:00
ed
0c56cf839d Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
bms
a59efb5aef Fix a few issues related to the legacy 4.4 BSD multicast APIs.
IPv4 addresses can and do change during normal operation. Testing by
pfSense developers exposed an issue where OpenOSPFD was using the IPv4
address to leave the OSPF link-scope multicast groups on a dynamic
OpenVPN tun interface, rather than using RFC 3678 with the interface
index, which won't be raced when the interface's addresses change.

In inp_join_group():
 If we are already a member of an ASM group, and IP_ADD_MEMBERSHIP or
 MCAST_JOIN_GROUP ioctls are re-issued, return EADDRINUSE as per the
 legacy 4.4BSD multicast API. This bends RFC 3678 slightly, but does
 not violate POLA for apps using the old API.
 It also stops us falling through to kicking IGMP state transactions
 in what is otherwise a no-op case.
 [This has already been dealt with in HEAD, but make it explicit before
  we MFC the change to 8.]

In inp_leave_group():
 Fix a bogus conditional.
 Move the ifp null check to ioctls MCAST_LEAVE* in the switch..case
 where it actually belongs.
 If an interface was specified, by primary IPv4 address, for ioctl
 IP_DROP_MEMBERSHIP or MCAST_LEAVE_GROUP (an ASM full leave operation),
 then and only then should we look up the ifp from the IPv4 address in
 mreqs.imr_interface.
 If not, we fall through to imo_match_group() as before, but only in
 the IP_DROP_MEMBERSHIP case.

With these changes, the legacy 4.4BSD multicast API idempotence should
be mostly preserved in the SSM enabled IPv4 stack.

Found by:	ermal (with pfSense)
MFC after:	3 days
2010-04-10 12:05:31 +00:00
bms
8c86c2baad Correct a comment.
MFC after:	1 day
2009-11-19 13:21:37 +00:00
bms
178f20a475 Return ENOBUFS consistently if user attempts to exceed
in_mcast_maxsocksrc resource limit.

Submitted by:	syrinx
MFC after:	3 days
2009-09-18 15:12:31 +00:00
bms
9dcdfc8226 Comment some flawed assumptions in inp_join_group() about
mixing SSM full-state and delta-based APIs.

ENOTIME to fix right now.  No functional changes.

MFC after:	5 days
2009-09-12 20:37:44 +00:00
bms
ebe8706174 Don't allow joins w/o source on an existing group.
This is almost always pilot error.

We don't need to check for group filter UNDEFINED state at t1,
because we only ever allocate filters with their groups, so we
unconditionally reject such calls with EINVAL.
Trying to change the active filter mode w/o going through IP_MSFILTER
is also disallowed.

Deals with the case described in PR 137164 upfront, cumulative
with the fix in svn rev 197132 which only calls imo_match_source()
if the source address family was not unspecified.

PR:		137164
MFC after:	5 days
2009-09-12 20:18:23 +00:00
bms
e3b721990b Tighten input checking in inp_join_group():
* Don't try to use the source address, when its family is unspecified.
 * If we get a join without a source, on an existing inclusive
   mode group, this is an error, as it would change the filter mode.

Fix a problem with the handling of in_mfilter for new memberships:
 * Do not rely on imf being NULL; it is explicitly initialized to a
   non-NULL pointer when constructing a membership.
 * Explicitly initialize *imf to EX mode when the source address
   is unspecified.

This fixes a problem with in_mfilter slot recycling in the join path.

PR:		138690
Submitted by:	Stef Walter
MFC after:	5 days
2009-09-12 19:45:55 +00:00
bms
fe0bf7a268 Fix an obvious logic error in the IPv4 multicast leave processing,
where the filter mode vector was not updated correctly after the leave.

PR:		138691
Submitted by:	Stef Walter
MFC after:	5 days
2009-09-12 19:07:03 +00:00
bms
3031a90b58 Fix an API issue in leave processing for IPv4 multicast groups.
* Do not assume that the group lookup performed by imo_match_group()
   is valid when ifp is NULL in this case.
 * Instead, return EADDRNOTAVAIL if the ifp cannot be resolved for the
   membership we are being asked to leave.

Caveat user:
 * The way IPv4 multicast memberships are implemented in the inpcb layer
   at the moment, has the side-effect that struct ip_moptions will
   still hold the membership, under the old ifp, until ip_freemoptions()
   is called for the parent inpcb.
 * The underlying issue is: the inpcb layer does not get notification
   of ifp being detached going away in a thread-safe manner.
   This is non-trivial to fix.

But hey, at least the kernel should't panic when you unplug a card.

PR:		138689
Submitted by:	Stef Walter
MFC after:	5 days
2009-09-12 18:55:15 +00:00
syrinx
805d7d8fea When joining a multicast group, the inp_lookup_mcast_ifp call
does a KASSERT that the group address is multicast, so the
check if this is indeed true and eventually return a EINVAL if not,
should be done before calling inp_lookup_mcast_ifp. This fixes a kernel
crash when calling setsockopt (sock, IPPROTO_IP, IP_ADD_MEMBERSHIP,...)
with invalid group address.

Reviewed by:	bms
Approved by:	bms

MFC after:	3 days
2009-09-07 16:00:33 +00:00
rwatson
fb9ffed650 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
rwatson
57ca4583e7 Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator.  Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...).  This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack.  Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory.  Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy.  Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address.  When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by:  bz
Reviewed by:            bz, zec
Discussed with:         gnn, jamie, jeff, jhb, julian, sam
Suggested by:           peter
Approved by:            re (kensmith)
2009-07-14 22:48:30 +00:00
rwatson
ea70a3542d Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the
in_ifaddrhead and INADDR_HASH address lists.

Previously, these lists were used unsynchronized as they were effectively
never changed in steady state, but we've seen increasing reports of
writer-writer races on very busy VPN servers as core count has gone up
(and similar configurations where address lists change frequently and
concurrently).

For the time being, use rwlocks rather than rmlocks in order to take
advantage of their better lock debugging support.  As a result, we don't
enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion
is complete and a performance analysis has been done.  This means that one
class of reader-writer races still exists.

MFC after:      6 weeks
Reviewed by:    bz
2009-06-25 11:52:33 +00:00
rwatson
c9ef486fe1 Modify most routines returning 'struct ifaddr *' to return references
rather than pointers, requiring callers to properly dispose of those
references.  The following routines now return references:

  ifaddr_byindex
  ifa_ifwithaddr
  ifa_ifwithbroadaddr
  ifa_ifwithdstaddr
  ifa_ifwithnet
  ifaof_ifpforaddr
  ifa_ifwithroute
  ifa_ifwithroute_fib
  rt_getifa
  rt_getifa_fib
  IFP_TO_IA
  ip_rtaddr
  in6_ifawithifp
  in6ifa_ifpforlinklocal
  in6ifa_ifpwithaddr
  in6_ifadd
  carp_iamatch6
  ip6_getdstifaddr

Remove unused macro which didn't have required referencing:

  IFP_TO_IA6

This closes many small races in which changes to interface
or address lists while an ifaddr was in use could lead to use of freed
memory (etc).  In a few cases, add missing if_addr_list locking
required to safely acquire references.

Because of a lack of deep copying support, we accept a race in which
an in6_ifaddr pointed to by mbuf tags and extracted with
ip6_getdstifaddr() doesn't hold a reference while in transmit.  Once
we have mbuf tag deep copy support, this can be fixed.

Reviewed by:	bz
Obtained from:	Apple, Inc. (portions)
MFC after:	6 weeks (portions)
2009-06-23 20:19:09 +00:00
bz
b7ff2bdc20 After r193232 rt_tables in vnet.h are no longer indirectly dependent on
the ROUTETABLES kernel option thus there is no need to include opt_route.h
anymore in all consumers of vnet.h and no longer depend on it for module
builds.

Remove the hidden include in flowtable.h as well and leave the two
explicit #includes in ip_input.c and ip_output.c.
2009-06-08 19:57:35 +00:00
bms
e34d0b3005 Cut over to KTR_INET for CTR.
For clarity, put pointer incremement/size decrement on own line
when copying out in-mode source filters to userland.
2009-04-29 10:14:16 +00:00
zec
8d976eab5c In preparation for turning on options VIMAGE in next commits,
rearrange / replace / adjust several INIT_VNET_* initializer
macros, all of which currently resolve to whitespace.

Reviewed by:	bz (an older version of the patch)
Approved by:	julian (mentor)
2009-04-26 22:06:42 +00:00
bms
424b3895c7 remove IFF_ASSERTGIANT 2009-04-21 09:43:51 +00:00
kan
6f5bda3790 If KTR_SUBSYS is compiled in, it does not necessarily mean that user
is interested in being spammed by mcast-related printfs.

Use proper check against ktr_mask instead KTR_COMPILE.
2009-04-05 23:25:06 +00:00
bms
f2006dd38e Deal with the case where ifma_protospec may be NULL, during
any IPv4 multicast operations which reference it.

There is a potential race because ifma_protospec is set to NULL
when we discover the underlying ifnet has gone away. This write
is not covered by the IF_ADDR_LOCK, and it's difficult to widen
its scope without making it a recursive lock. It isn't clear why
this manifests more quickly with 802.11 interfaces, but does not
seem to manifest at all with wired interfaces.

With this change, the 802.11 related panics reported by sam@
and cokane@ should go away. It is not the right fix, that requires
more thought before 8.0.

Idea from:	sam
Tested by:	cokane
2009-03-17 14:41:54 +00:00
rwatson
70b6a8119c Remove IFF_NEEDSGIANT, a compatibility infrastructure introduced
in FreeBSD 5.x to allow network device drivers to run with Giant
despite the network stack being Giant-free.  This significantly
simplifies calls into ioctl() on network interfaces, especially
in the multicast code, as well as eliminates deferred invocation
of interface if_start routines.

Disable the build on device drivers still depending on
IFF_NEEDSGIANT as they no longer compile.  They will be removed
in a few weeks if they haven't been made MPSAFE in that time.
Disabled drivers:

        if_ar
        if_axe
        if_aue
        if_cdce
        if_cue
        if_kue
        if_ray
        if_rue
        if_rum
        if_sr
        if_udav
        if_ural
        if_zyd

Drivers that were already disabled because of tty changes:

        if_ppp
        if_sl

Discussed on:	arch@
2009-03-15 14:21:05 +00:00
bms
b00c7265e0 Don't print inm_print() chatter when KTR_IGMPV3 is not enabled
in the KTR_COMPILE mask.

Found by:	gnn
2009-03-10 17:48:49 +00:00
bms
71233409ea Merge IGMPv3 and Source-Specific Multicast (SSM) to the FreeBSD
IPv4 stack.

Diffs are minimized against p4.
PCS has been used for some protocol verification, more widespread
testing of recorded sources in Group-and-Source queries is needed.
sizeof(struct igmpstat) has changed.

__FreeBSD_version is bumped to 800070.
2009-03-09 17:53:05 +00:00
bms
ea92a55057 Add sysctl net.inet.ip.mcast.loop. This controls whether or not
IPv4 multicast sends are looped back to senders by default
on a stack-wide basis, rather than relying on the socket option.
Note that the sysctl only applies to newly created multicast sockets.
2009-03-04 03:40:02 +00:00
bz
df2be82cec For all files including net/vnet.h directly include opt_route.h and
net/route.h.

Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.

We need to make sure that both opt_route.h and net/route.h are included
before net/vnet.h because of the way MRT figures out the number of FIBs
from the kernel option. If we do not, we end up with the default number
of 1 when including net/vnet.h and array sizes are wrong.

This does not change the list of files which depend on opt_route.h
but we can identify them now more easily.
2009-02-27 14:12:05 +00:00
qingli
ec826ad5c7 This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
   possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,

The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.

Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:

- Kip Macy revised the locking code completely, thus completing
  the last piece of the puzzle, Kip has also been conducting
  active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
  provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
  me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00