Commit Graph

1927 Commits

Author SHA1 Message Date
tuexen
a26060a087 When an IPv6 packet is received for a raw socket which has the
IPPROTO_IPV6 level socket option IPV6_CHECKSUM enabled and the
checksum check fails, drop the message. Without this fix, an
ICMP6 message was sent indicating a parameter problem.

Thanks to bz@ for suggesting a way to simplify this fix.

Reviewed by:		bz@
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D19969
2019-04-19 18:09:37 +00:00
tuexen
36983a7b31 When a checksum has to be computed for a received IPv6 packet because it
is requested by the application using the IPPROTO_IPV6 level socket option
IPV6_CHECKSUM on a raw socket, ensure that the packet contains enough
bytes to contain the checksum at the specified offset.

Reported by:		syzbot+6295fcc5a8aced81d599@syzkaller.appspotmail.com
Reviewed by:		bz@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D19968
2019-04-19 17:28:28 +00:00
tuexen
ac889cf71b Avoid a buffer overwrite in rip6_output() when computing the checksum
as requested by the user via the IPPROTO_IPV6 level socket option
IPV6_CHECKSUM. The check if there are enough bytes in the packet to
store the checksum at the requested offset was wrong by 1.

Reviewed by:		bz@
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D19967
2019-04-19 17:21:35 +00:00
tuexen
3e538ee797 Improve input validation for the socket option IPV6_CHECKSUM.
When using the IPPROTO_IPV6 level socket option IPV6_CHECKSUM on a raw
IPv6 socket, ensure that the value is either -1 or a non-negative even
number.

Reviewed by:		bz@, thj@
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D19966
2019-04-19 17:17:41 +00:00
thj
56e95f6462 Add stat counter for ipv6 atomic fragments
Add a stat counter to track ipv6 atomic fragments. Atomic fragments can be
generated in response to invalid path MTU values, but are also a potential
attack vector and considered harmful (see RFC6946 and RFC8021).

While here add tracking of the atomic fragment counter to netstat and systat.

Reviewed by:    tuexen, jtl, bz
Approved by:    jtl (mentor), bz (mentor)
Event:  Aberdeen hackathon 2019
Differential Revision:  https://reviews.freebsd.org/D17511
2019-04-19 17:06:43 +00:00
markj
9abf4945e6 Reinitialize multicast source filter structures after invalidation.
When leaving a multicast group, a hole may be created in the inpcb's
source filter and group membership arrays.  To remove the hole, the
succeeding array elements are copied over by one entry.  The multicast
code expects that a newly allocated array element is initialized, but
the code which shifts a tail of the array was leaving stale data
in the final entry.  Fix this by explicitly reinitializing the last
entry following such a copy.

Reported by:	syzbot+f8c3c564ee21d650475e@syzkaller.appspotmail.com
Reviewed by:	ae
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19872
2019-04-11 08:00:59 +00:00
markj
f0215c1068 Do not perform DAD on stf(4) interfaces.
stf(4) interfaces are not multicast-capable so they can't perform DAD.
They also did not set IFF_DRV_RUNNING when an address was assigned, so
the logic in nd6_timer() would periodically flag such an address as
tentative, resulting in interface flapping.

Fix the problem by setting IFF_DRV_RUNNING when an address is assigned,
and do some related cleanup:
- In in6if_do_dad(), remove a redundant check for !UP || !RUNNING.
  There is only one caller in the tree, and it only looks at whether
  the return value is non-zero.
- Have in6if_do_dad() return false if the interface is not
  multicast-capable.
- Set ND6_IFF_NO_DAD when an address is assigned to an stf(4) interface
  and the interface goes UP as a result. Note that this is not
  sufficient to fix the problem because the new address is marked as
  tentative and DAD is started before in6_ifattach() is called.
  However, setting no_dad is formally correct.
- Change nd6_timer() to not flag addresses as tentative if no_dad is
  set.

This is based on a patch from Viktor Dukhovni.

Reported by:	Viktor Dukhovni <ietf-dane@dukhovni.org>
Reviewed by:	ae
MFC after:	3 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19751
2019-03-30 18:00:44 +00:00
ae
d763427450 Reapply r345274 with build fixes for 32-bit architectures.
Update NAT64LSN implementation:

  o most of data structures and relations were modified to be able support
    large number of translation states. Now each supported protocol can
    use full ports range. Ports groups now are belongs to IPv4 alias
    addresses, not hosts. Each ports group can keep several states chunks.
    This is controlled with new `states_chunks` config option. States
    chunks allow to have several translation states for single alias address
    and port, but for different destination addresses.
  o by default all hash tables now use jenkins hash.
  o ConcurrencyKit and epoch(9) is used to make NAT64LSN lockless on fast path.
  o one NAT64LSN instance now can be used to handle several IPv6 prefixes,
    special prefix "::" value should be used for this purpose when instance
    is created.
  o due to modified internal data structures relations, the socket opcode
    that does states listing was changed.

Obtained from:	Yandex LLC
MFC after:	1 month
Sponsored by:	Yandex LLC
2019-03-19 10:57:03 +00:00
ae
e171491f01 Revert r345274. It appears that not all 32-bit architectures have
necessary CK primitives.
2019-03-18 14:00:19 +00:00
ae
f13ac20eb6 Update NAT64LSN implementation:
o most of data structures and relations were modified to be able support
  large number of translation states. Now each supported protocol can
  use full ports range. Ports groups now are belongs to IPv4 alias
  addresses, not hosts. Each ports group can keep several states chunks.
  This is controlled with new `states_chunks` config option. States
  chunks allow to have several translation states for single alias address
  and port, but for different destination addresses.
o by default all hash tables now use jenkins hash.
o ConcurrencyKit and epoch(9) is used to make NAT64LSN lockless on fast path.
o one NAT64LSN instance now can be used to handle several IPv6 prefixes,
  special prefix "::" value should be used for this purpose when instance
  is created.
o due to modified internal data structures relations, the socket opcode
  that does states listing was changed.

Obtained from:	Yandex LLC
MFC after:	1 month
Sponsored by:	Yandex LLC
2019-03-18 12:59:08 +00:00
ae
93a7173b74 Add NAT64 CLAT implementation as defined in RFC6877.
CLAT is customer-side translator that algorithmically translates 1:1
private IPv4 addresses to global IPv6 addresses, and vice versa.
It is implemented as part of ipfw_nat64 kernel module. When module
is loaded or compiled into the kernel, it registers "nat64clat" external
action. External action named instance can be created using `create`
command and then used in ipfw rules. The create command accepts two
IPv6 prefixes `plat_prefix` and `clat_prefix`. If plat_prefix is ommitted,
IPv6 NAT64 Well-Known prefix 64:ff9b::/96 will be used.

  # ipfw nat64clat CLAT create clat_prefix SRC_PFX plat_prefix DST_PFX
  # ipfw add nat64clat CLAT ip4 from IPv4_PFX to any out
  # ipfw add nat64clat CLAT ip6 from DST_PFX to SRC_PFX in

Obtained from:	Yandex LLC
Submitted by:	Boris N. Lytochkin
MFC after:	1 month
Relnotes:	yes
Sponsored by:	Yandex LLC
2019-03-18 11:44:53 +00:00
ae
2770fa04e1 Add SPDX-License-Identifier and update year in copyright.
MFC after:	1 month
2019-03-18 10:50:32 +00:00
ae
6b7a62da46 Modify struct nat64_config.
Add second IPv6 prefix to generic config structure and rename another
fields to conform to RFC6877. Now it contains two prefixes and length:
PLAT is provider-side translator that translates N:1 global IPv6 addresses
to global IPv4 addresses. CLAT is customer-side translator (XLAT) that
algorithmically translates 1:1 IPv4 addresses to global IPv6 addresses.
Use PLAT prefix in stateless (nat64stl) and stateful (nat64lsn)
translators.

Modify nat64_extract_ip4() and nat64_embed_ip4() functions to accept
prefix length and use plat_plen to specify prefix length.

Retire net.inet.ip.fw.nat64_allow_private sysctl variable.
Add NAT64_ALLOW_PRIVATE flag and use "allow_private" config option to
configure this ability separately for each NAT64 instance.

Obtained from:	Yandex LLC
MFC after:	1 month
Sponsored by:	Yandex LLC
2019-03-18 10:39:14 +00:00
bz
9597307388 Update for IETF draft-ietf-6man-ipv6only-flag.
When we roam between networks and our link-state goes down, automatically remove
the IPv6-Only flag from the interface.  Otherwise we might switch from an
IPv6-only to and IPv4-only network and the flag would stay and we would prevent
IPv4 from working.

While the actual function call to clear the flag is under EXPERIMENTAL,
the eventhandler is not as we might want to re-use it for other
functionality on link-down event (such was re-calculate default routers
for example if there is more than one).

Reviewed by:	hrs
Differential Revision:	https://reviews.freebsd.org/D19487
2019-03-07 23:03:39 +00:00
bz
d6a3300a0b Update for IETF draft-ietf-6man-ipv6only-flag.
All changes are hidden behind the EXPERIMENTAL option and are not compiled
in by default.

Add ND6_IFF_IPV6_ONLY_MANUAL to be able to set the interface into no-IPv4-mode
manually without router advertisement options.  This will allow developers to
test software for the appropriate behaviour even on dual-stack networks or
IPv6-Only networks without the option being set in RA messages.
Update ifconfig to allow setting and displaying the flag.

Update the checks for the filters to check for either the automatic or the manual
flag to be set.  Add REVARP to the list of filtered IPv4-related protocols and add
an input filter similar to the output filter.

Add a check, when receiving the IPv6-Only RA flag to see if the receiving
interface has any IPv4 configured.  If it does, ignore the IPv6-Only flag.

Add a per-VNET global sysctl, which is on by default, to not process the automatic
RA IPv6-Only flag.  This way an administrator (if this is compiled in) has control
over the behaviour in case the node still relies on IPv4.
2019-03-06 23:31:42 +00:00
thj
79ac8c17fb When dropping a fragment queue count the number of fragments in the queue
When dropping a fragment queue, account for the number of fragments in the
queue. This improves accounting between the number of fragments received and
the number of fragments dropped.

Reviewed by:	jtl, bz, transport
Approved by:	jtl (mentor), bz (mentor)
Differential Revision:	https://review.freebsd.org/D17521
2019-02-19 19:57:55 +00:00
glebius
9978a7d924 New pfil(9) KPI together with newborn pfil API and control utility.
The KPI have been reviewed and cleansed of features that were planned
back 20 years ago and never implemented.  The pfil(9) internals have
been made opaque to protocols with only returned types and function
declarations exposed. The KPI is made more strict, but at the same time
more extensible, as kernel uses same command structures that userland
ioctl uses.

In nutshell [KA]PI is about declaring filtering points, declaring
filters and linking and unlinking them together.

New [KA]PI makes it possible to reconfigure pfil(9) configuration:
change order of hooks, rehook filter from one filtering point to a
different one, disconnect a hook on output leaving it on input only,
prepend/append a filter to existing list of filters.

Now it possible for a single packet filter to provide multiple rulesets
that may be linked to different points. Think of per-interface ACLs in
Cisco or Juniper. None of existing packet filters yet support that,
however limited usage is already possible, e.g. default ruleset can
be moved to single interface, as soon as interface would pride their
filtering points.

Another future feature is possiblity to create pfil heads, that provide
not an mbuf pointer but just a memory pointer with length. That would
allow filtering at very early stages of a packet lifecycle, e.g. when
packet has just been received by a NIC and no mbuf was yet allocated.

Differential Revision:	https://reviews.freebsd.org/D18951
2019-01-31 23:01:03 +00:00
hselasky
312adcd942 Fix refcounting leaks in IPv6 MLD code leading to loss of IPv6
connectivity.

Looking at past changes in this area like r337866, some refcounting
bugs have been introduced, one by one. For example like calling
in6m_disconnect() and in6m_rele_locked() in mld_v1_process_group_timer()
where previously no disconnect nor refcount decrement was done.
Calling in6m_disconnect() when it shouldn't causes IPv6 solitation to no
longer work, because all the multicast addresses receiving the solitation
messages are now deleted from the network interface.

This patch reverts some recent changes while improving the MLD
refcounting and concurrency model after the MLD code was converted
to using EPOCH(9).

List changes:
- All CK_STAILQ_FOREACH() macros are now properly enclosed into
  EPOCH(9) sections. This simplifies assertion of locking inside
  in6m_ifmultiaddr_get_inm().
- Corrected bad use of in6m_disconnect() leading to loss of IPv6
  connectivity for MLD v1.
- Factored out checks for valid inm structure into
  in6m_ifmultiaddr_get_inm().

PR:			233535
Differential Revision:	https://reviews.freebsd.org/D18887
Reviewed by:		bz (net)
Tested by:		ae
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-01-24 08:34:13 +00:00
hselasky
b9fedbd75b When detaching a network interface drain the workqueue freeing the inm's
because the destructor will access the if_ioctl() callback in the ifnet
pointer which is about to be freed. This prevents use-after-free.

PR:			233535
Differential Revision:	https://reviews.freebsd.org/D18887
Reviewed by:		bz (net)
Tested by:		ae
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-01-24 08:25:02 +00:00
hselasky
c9ba3bd6b3 Add debugging sysctl to disable incoming MLD v2 messages similar to the
existing sysctl for MLD v1 messages.

PR:			233535
Differential Revision:	https://reviews.freebsd.org/D18887
Reviewed by:		bz (net)
Tested by:		ae
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-01-24 08:18:02 +00:00
hselasky
e42cfab759 Fix duplicate acquiring of refcount when joining IPv6 multicast groups.
This was observed by starting and stopping rpcbind(8) multiple times.

PR:			233535
Differential Revision:	https://reviews.freebsd.org/D18887
Reviewed by:		bz (net)
Tested by:		ae
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-01-24 08:15:41 +00:00
markj
acda75d443 Style.
Reviewed by:	bz
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-01-23 22:19:49 +00:00
markj
af69719726 Fix an LLE lookup race.
After the afdata read lock was converted to epoch(9), readers could
observe a linked LLE and block on the LLE while a thread was
unlinking the LLE.  The writer would then release the lock and schedule
the LLE for deferred free, allowing readers to continue and potentially
schedule the LLE timer.  By the point the timer fires, the structure is
freed, typically resulting in a crash in the callout subsystem.

Fix the problem by modifying the lookup path to check for the LLE_LINKED
flag upon acquiring the LLE lock.  If it's not set, the lookup fails.

PR:		234296
Reviewed by:	bz
Tested by:	sbruno, Victor <chernov_victor@list.ru>,
		Mike Andrews <mandrews@bit0.com>
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18906
2019-01-23 22:18:23 +00:00
glebius
c49c728afe Remove unnecessary ifdef. With INVARIANTS all KASSERTs are empty statements,
so won't be compiled in.
2019-01-10 00:52:06 +00:00
hselasky
efc3b98861 Fix loopback traffic when using non-lo0 link local IPv6 addresses.
The loopback interface can only receive packets with a single scope ID,
namely the scope ID of the loopback interface itself. To mitigate this
packets which use the scope ID are appearing as received by the real
network interface, see "origifp" in the patch. The current code would
drop packets which are designated for loopback which use a link-local
scope ID in the destination address or source address, because they
won't match the lo0's scope ID. To fix this restore the network
interface pointer from the scope ID in the destination address for
the problematic cases. See comments added in patch for a more detailed
description.

This issue was introduced with route caching (ae@).

Reviewed by:		bz (network)
Differential Revision:	https://reviews.freebsd.org/D18769
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-01-09 14:28:08 +00:00
glebius
6d8cc191f9 Mechanical cleanup of epoch(9) usage in network stack.
- Remove macros that covertly create epoch_tracker on thread stack. Such
  macros a quite unsafe, e.g. will produce a buggy code if same macro is
  used in embedded scopes. Explicitly declare epoch_tracker always.

- Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list
  IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read
  locking macros to what they actually are - the net_epoch.
  Keeping them as is is very misleading. They all are named FOO_RLOCK(),
  while they no longer have lock semantics. Now they allow recursion and
  what's more important they now no longer guarantee protection against
  their companion WLOCK macros.
  Note: INP_HASH_RLOCK() has same problems, but not touched by this commit.

This is non functional mechanical change. The only functionally changed
functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter
epoch recursively.

Discussed with:	jtl, gallatin
2019-01-09 01:11:19 +00:00
mjg
7e31d1de7e Remove unused argument to priv_check_cred.
Patch mostly generated with cocinnelle:

@@
expression E1,E2;
@@

- priv_check_cred(E1,E2,0)
+ priv_check_cred(E1,E2)

Sponsored by:	The FreeBSD Foundation
2018-12-11 19:32:16 +00:00
markj
a74ba1d431 Clamp the INPCB port hash tables to IPPORT_MAX + 1 chains.
Memory beyond that limit was previously unused, wasting roughly 1MB per
8GB of RAM.  Also retire INP_PCBLBGROUP_PORTHASH, which was identical to
INP_PCBPORTHASH.

Reviewed by:	glebius
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D17803
2018-12-05 17:06:00 +00:00
markj
5c563658ea Plug some networking sysctl leaks.
Various network protocol sysctl handlers were not zero-filling their
output buffers and thus would export uninitialized stack memory to
userland.  Fix a number of such handlers.

Reported by:	Thomas Barabosch, Fraunhofer FKIE
Reviewed by:	tuexen
MFC after:	3 days
Security:	kernel memory disclosure
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18301
2018-11-22 20:49:41 +00:00
ae
1382ea4ffb Add ability to use dynamic external prefix in ipfw_nptv6 module.
Now an interface name can be specified for nptv6 instance instead of
ext_prefix. The module will track if_addr_ext events and when suitable
IPv6 address will be added to specified interface, it will be configured
as external prefix. When address disappears instance becomes unusable,
i.e. it doesn't match any packets.

Reviewed by:	0mp (manpages)
Tested by:	Dries Michiels <driesm dot michiels gmail com>
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D17765
2018-11-12 11:20:59 +00:00
vangyzen
490f74685f in6_ifattach_linklocal: handle immediate removal of the new LLA
If another thread immediately removes the link-local address
added by in6_update_ifa(), in6ifa_ifpforlinklocal() can return NULL,
so the following assertion (or dereference) is wrong.
Remove the assertion, and handle NULL somewhat better than panicking.
This matches all of the other callers of in6_update_ifa().

PR:		219250
Reviewed by:	bz, dab (both an earlier version)
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D17898
2018-11-08 19:50:23 +00:00
markj
96786af957 Remove redundant checks for a NULL lbgroup table.
No functional change intended.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17108
2018-11-01 15:52:49 +00:00
bz
3431d451a5 Initial implementation of draft-ietf-6man-ipv6only-flag.
This change defines the RA "6" (IPv6-Only) flag which routers
may advertise, kernel logic to check if all routers on a link
have the flag set and accordingly update a per-interface flag.

If all routers agree that it is an IPv6-only link, ether_output_frame(),
based on the interface flag, will filter out all ETHERTYPE_IP/ARP
frames, drop them, and return EAFNOSUPPORT to upper layers.

The change also updates ndp to show the "6" flag, ifconfig to
display the IPV6_ONLY nd6 flag if set, and rtadvd to allow
announcing the flag.

Further changes to tcpdump (contrib code) are availble and will
be upstreamed.

Tested the code (slightly earlier version) with 2 FreeBSD
IPv6 routers, a FreeBSD laptop on ethernet as well as wifi,
and with Win10 and OSX clients (which did not fall over with
the "6" flag set but not understood).

We may also want to (a) implement and RX filter, and (b) over
time enahnce user space to, say, stop dhclient from running
when the interface flag is set.  Also we might want to start
IPv6 before IPv4 in the future.

All the code is hidden under the EXPERIMENTAL option and not
compiled by default as the draft is a work-in-progress and
we cannot rely on the fact that IANA will assign the bits
as requested by the draft and hence they may change.

Dear 6man, you have running code.

Discussed with:	Bob Hinden, Brian E Carpenter
2018-10-30 20:08:48 +00:00
bz
00fd7d4a48 rip6_input() inp validation after epoch(9)
After r335924 rip6_input() needs inp validation to avoid
working on FREED inps.

Apply the relevant bits from r335497,r335501 (rip_input() change)
to the IPv6 counterpart.

PR:			232194
Reviewed by:		rgrimes, ae (,hps)
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D17594
2018-10-24 10:42:35 +00:00
ae
91cf1d92ac Add the check that current VNET is ready and access to srchash is allowed.
This change is similar to r339646. The callback that checks for appearing
and disappearing of tunnel ingress address can be called during VNET
teardown. To prevent access to already freed memory, add check to the
callback and epoch_wait() call to be sure that callback has finished its
work.

MFC after:	20 days
2018-10-23 13:11:45 +00:00
markj
741a850d6c Fix style bugs in in6_pcblookup_lbgroup().
This should have been a part of r338470.  No functional changes
intended.

Reported by:	gallatin
Reviewed by:	gallatin, Johannes Lundberg <johalun0@gmail.com>
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17109
2018-10-22 16:09:01 +00:00
ae
b620bf12c6 Add handling for appearing/disappearing of ingress addresses to if_gre(4).
* register handler for ingress address appearing/disappearing;
* add new srcaddr hash table for fast softc lookup by srcaddr;
* when srcaddr disappears, clear IFF_DRV_RUNNING flag from interface,
  and set it otherwise;

MFC after:	1 month
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D17214
2018-10-21 18:13:45 +00:00
ae
802ce6d2c8 Add handling for appearing/disappearing of ingress addresses to if_gif(4).
* register handler for ingress address appearing/disappearing;
* add new srcaddr hash table for fast softc lookup by srcaddr;
* when srcaddr disappears, clear IFF_DRV_RUNNING flag from interface,
  and set it otherwise;
* remove the note about ingress address from BUGS section.

MFC after:	1 month
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D17134
2018-10-21 18:06:15 +00:00
ae
8d3e25d418 Add ifaddr_event_ext event. It is similar to ifaddr_event, but the
handler receives the type of event IFADDR_EVENT_ADD/IFADDR_EVENT_DEL,
and the pointer to ifaddr. Also ifaddr_event now is implemented using
ifaddr_event_ext handler.

MFC after:	3 weeks
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D17100
2018-10-21 15:02:06 +00:00
jtl
1d041dd4b1 There are three places where we return from a function which entered an
epoch section without exiting that epoch section. This is bad for two
reasons: the epoch section won't exit, and we will leave the epoch tracker
from the stack on the epoch list.

Fix the epoch leak by making sure we exit epoch sections before returning.

Reviewed by:	ae, gallatin, mmacy
Approved by:	re (gjb, kib)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D17450
2018-10-09 13:26:06 +00:00
bz
d0aa68f4fe After r338257 is was possible to trigger a KASSERT() in ud6_output()
using an application trying to use a v4mapped destination address on a
kernel without INET support or on a v6only socket.
Catch this case and prevent the packet from going anywhere;
else, without the KASSERT() armed, a v4mapped destination
address might go out on the wire or other undefined behaviour
might happen, while with the KASSERT() we panic.

PR:		231728
Reported by:	Jeremy Faulkner (gldisater gmail.com)
Approved by:	re (kib)
2018-10-02 17:29:56 +00:00
bz
3555315153 In in6_pcbpurgeif0() called, e.g., from if_clone_destroy(),
once we have a lock, make sure the inp is not marked freed.
This can happen since the list traversal and locking was
converted to epoch(9).  If the inp is marked "freed", skip it.

This prevents a NULL pointer deref panic later on.

Reported by:	slavash (Mellanox)
Tested by:	slavash (Mellanox)
Reviewed by:	markj (no formal review but caught my unlock mistake)
Approved by:	re (kib)
2018-09-27 15:32:37 +00:00
bz
5ca96117e6 In icmp6_rip6_input(), once we have a lock, make sure the inp is
not freed.  This can happen since the list traversal and locking
was converted to epoch(9).  If the inp is marked "freed", skip it.

This prevents a NULL pointer deref panic in ip6_savecontrol_v4()
trying to access the socket hanging off the inp, which was gone
by the time we got there.

Reported by:	andrew
Tested by:	andrew
Approved by:	re (gjb)
2018-09-20 15:45:53 +00:00
bz
84630aa64f Update udp6_output() inp locking to avoid concurrency issues with
route cache updates.

Bring over locking changes applied to udp_output() for the route cache
in r297225 and fixed in r306559 which achieve multiple things:
(1) acquire an exclusive inp lock earlier depending on the expected
    conditions; we add a comment explaining this in udp6,
(2) having acquired the exclusive lock earlier eliminates a slight
    possible chance for a race condition which was present in v4 for
    multiple years as well and is now gone, and
(3) only pass the inp_route6 to ip6_output() if we are holding an
    exclusive inp lock, so that possible route cache updates in case
    of routing table generation number changes can happen safely.
In addition this change (as the legacy IP counterpart) decomposes the
tracking of inp and pcbinfo lock and adds extra assertions, that the
two together are acquired correctly.

PR:		230950
Reviewed by:	karels, markj
Approved by:	re (gjb)
Pointyhat to:	bz (for completely missing this bit)
Differential Revision:	https://reviews.freebsd.org/D17230
2018-09-19 18:49:37 +00:00
markj
6131d60f6a Fix synchronization of LB group access.
Lookups are protected by an epoch section, so the LB group linkage must
be a CK_LIST rather than a plain LIST.  Furthermore, we were not
deferring LB group frees, so in_pcbremlbgrouphash() could race with
readers and cause a use-after-free.

Reviewed by:	sbruno, Johannes Lundberg <johalun0@gmail.com>
Tested by:	gallatin
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17031
2018-09-10 19:00:29 +00:00
bz
209555fd34 Replicate r328271 from legacy IP to IPv6 using a single macro
to clear L2 and L3 route caches.
Also mark one function argument as __unused.

Reviewed by:	karels, ae
Approved by:	re (rgrimes)
Differential Revision:	https://reviews.freebsd.org/D17007
2018-09-03 22:27:27 +00:00
bz
78d4f16823 Replicate r307234 from legacy IP to IPv6 code, using the RO_RTFREE()
macro rather than hand crafted code.
No functional changes.

Reviewed by:	karels
Approved by:	re (rgrimes)
Differential Revision:	https://reviews.freebsd.org/D17006
2018-09-03 22:14:37 +00:00
bz
a789306797 As discussed in D6262 post-commit review, change inp_route to
inp_route6 for IPv6 code after r301217.
This was most likely a c&p error from the legacy IP code, which
did not matter as it is a union and both structures have the same
layout at the beginning.
No functional changes.

Reviewed by:	karels, ae
Approved by:	re (rgrimes)
Differential Revision:	https://reviews.freebsd.org/D17005
2018-09-03 22:12:48 +00:00
kp
82b4e78e14 frag6: Fix fragment reassembly
r337776 started hashing the fragments into buckets for faster lookup.

The hashkey is larger than intended. This results in random stack data being
included in the hashed data, which in turn means that fragments of the same
packet might end up in different buckets, causing the reassembly to fail.

Set the correct size for hashkey.

PR:		231045
Approved by:	re (kib)
MFC after:	3 days
2018-08-31 08:37:15 +00:00
gallatin
c5fa7c7210 Reject IPv4 SO_REUSEPORT_LB groups when looking up an IPv6 listening socket
Similar to how the IPv4 code will reject an IPv6 LB group,
we must ignore IPv4 LB groups when looking up an IPv6
listening socket.   If this is not done, a port only match
may return an IPv4 socket, which causes problems (like
sending IPv6 packets with a hopcount of 0, making them unrouteable).

Thanks to rrs for all the work to diagnose this.

Approved by:	re (rgrimes)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D16899
2018-08-27 18:13:20 +00:00
bz
be242dfa60 Unbreak RSS builds after r338257. Folding both RSS blocks together
I missed the closing } of the new combined block.

Pointyhat to:	bz
Reported by:	np
Approved by:	re (kib)
2018-08-24 21:49:21 +00:00
bz
415411ca28 MFp4 bz_ipv6_fast:
Migrate udp6_send() v4mapped code to udp6_output() saving us a re-lock and
further simplifying the address-family handling code by eliminating
AF_INET checks and almost all v4mapped handling right after the start
as cases could actually not happen anymore.

Rework output path locking similar to UDP4 allowing for better
parallelism (see r222488, and later versions).

Sponsored by: The FreeBSD Foundation (2012)
Sponsored by: iXsystems (2012)
Differential Revision:	https://reviews.freebsd.org/D3721
2018-08-23 16:54:22 +00:00
mmacy
47fa74161c in_mcast: fix copy paste error when clearing flag 2018-08-22 04:09:55 +00:00
mmacy
d41a2ca02d Fix null deref in mld_v1_transmit_report
After r337866 it is possible for an in_multi6 to be referenced while
mid teardown. Handle case of cleared ifnet pointer.

Reported by:	ae
2018-08-21 23:03:02 +00:00
ae
93218b657a Properly initialize IP version in IPv6 header. This was missed in r334673.
Reported by:	Lars Schotte <lars at gustik dot eu>
2018-08-16 09:19:06 +00:00
mmacy
99cec0a00c Fix in6_multi double free
This is actually several different bugs:
- The code is not designed to handle inpcb deletion after interface deletion
  - add reference for inpcb membership
- The multicast address has to be removed from interface lists when the refcount
  goes to zero OR when the interface goes away
  - decouple list disconnect from refcount (v6 only for now)
- ifmultiaddr can exist past being on interface lists
  - add flag for tracking whether or not it's enqueued
- deferring freeing moptions makes the incpb cleanup code simpler but opens the
  door wider still to races
  - call inp_gcmoptions synchronously after dropping the the inpcb lock

Fundamentally multicast needs a rewrite - but keep applying band-aids for now.

Tested by: kp
Reported by: novel, kp, lwhsu
2018-08-15 20:23:08 +00:00
jtl
bd4f87b859 Lower the default limits on the IPv6 reassembly queue.
Currently, the limits are quite high. On machines with millions of
mbuf clusters, the reassembly queue limits can also run into
the millions. Lower these values.

Also, try to ensure that no bucket will have a reassembly
queue larger than approximately 100 items. This limits the cost to
find the correct reassembly queue when processing an incoming
fragment.

Due to the low limits on each bucket's length, increase the size of
the hash table from 64 to 1024.

Reviewed by:	jhb
Security:	FreeBSD-SA-18:10.ip
Security:	CVE-2018-6923
2018-08-14 17:32:07 +00:00
jtl
55789af7ee Drop 0-byte IPv6 fragments.
Currently, we process IPv6 fragments with 0 bytes of payload, add them
to the reassembly queue, and do not recognize them as duplicating or
overlapping with adjacent 0-byte fragments. An attacker can exploit this
to create long fragment queues.

There is no legitimate reason for a fragment with no payload. However,
because IPv6 packets with an empty payload are acceptable, allow an
"atomic" fragment with no payload.

Reviewed by:	jhb
Security:	FreeBSD-SA-18:10.ip
Security:	CVE-2018-6923
2018-08-14 17:29:22 +00:00
jtl
e5f23fbf44 Implement a limit on on the number of IPv6 reassembly queues per bucket.
There is a hashing algorithm which should distribute IPv6 reassembly
queues across the available buckets in a relatively even way. However,
if there is a flaw in the hashing algorithm which allows a large number
of IPv6 fragment reassembly queues to end up in a single bucket, a per-
bucket limit could help mitigate the performance impact of this flaw.

Implement such a limit, with a default of twice the maximum number of
reassembly queues divided by the number of buckets. Recalculate the
limit any time the maximum number of reassembly queues changes.
However, allow the user to override the value using a sysctl
(net.inet6.ip6.maxfragbucketsize).

Reviewed by:	jhb
Security:	FreeBSD-SA-18:10.ip
Security:	CVE-2018-6923
2018-08-14 17:27:41 +00:00
jtl
a7668fa529 Add a limit of the number of fragments per IPv6 packet.
The IPv4 fragment reassembly code supports a limit on the number of
fragments per packet. The default limit is currently 17 fragments.
Among other things, this limit serves to limit the number of fragments
the code must parse when trying to reassembly a packet.

Add a limit to the IPv6 reassembly code. By default, limit a packet
to 65 fragments (64 on the queue, plus one final fragment to complete
the packet). This allows an average fragment size of 1,008 bytes, which
should be sufficient to hold a fragment. (Recall that the IPv6 minimum
MTU is 1280 bytes. Therefore, this configuration allows a full-size
IPv6 packet to be fragmented on a link with the minimum MTU and still
carry approximately 272 bytes of headers before the fragmented portion
of the packet.)

Users can adjust this limit using the net.inet6.ip6.maxfragsperpacket
sysctl.

Reviewed by:	jhb
Security:	FreeBSD-SA-18:10.ip
Security:	CVE-2018-6923
2018-08-14 17:26:07 +00:00
jtl
1f361945df Make the IPv6 fragment limits be global, rather than per-VNET, limits.
The IPv6 reassembly fragment limit is based on the number of mbuf clusters,
which are a global resource. However, the limit is currently applied
on a per-VNET basis. Given enough VNETs (or given sufficient customization
on enough VNETs), it is possible that the sum of all the VNET fragment
limits will exceed the number of mbuf clusters available in the system.

Given the fact that the fragment limits are intended (at least in part) to
regulate access to a global resource, the IPv6 fragment limit should
be applied on a global basis.

Note that it is still possible to disable fragmentation for a particular
VNET by setting the net.inet6.ip6.maxfragpackets sysctl to 0 for that
VNET. In addition, it is now possible to disable fragmentation globally
by setting the net.inet6.ip6.maxfrags sysctl to 0.

Reviewed by:	jhb
Security:	FreeBSD-SA-18:10.ip
Security:	CVE-2018-6923
2018-08-14 17:24:26 +00:00
jtl
dca433c72f Improve IPv6 reassembly performance by hashing fragments into buckets.
Currently, all IPv6 fragment reassembly queues are kept in a flat
linked list. This has a number of implications. Two significant
implications are: all reassembly operations share a common lock,
and it is possible for the linked list to grow quite large.

Improve IPv6 reassembly performance by hashing fragments into buckets,
each of which has its own lock. Calculate the hash key using a Jenkins
hash with a random seed.

Reviewed by:	jhb
Security:	FreeBSD-SA-18:10.ip
Security:	CVE-2018-6923
2018-08-14 17:17:37 +00:00
tuexen
f64223beb5 Use a macro to set the assoc state. I missed this in r337706. 2018-08-14 08:33:47 +00:00
ae
694891e438 Restore ability to send ICMP and ICMPv6 redirects.
It was lost when tryforward appeared. Now ip[6]_tryforward will be enabled
only when sending redirects for corresponding IP version is disabled via
sysctl. Otherwise will be used default forwarding function.

PR:		221137
Submitted by:	mckay@
MFC after:	2 weeks
2018-08-14 07:54:14 +00:00
luporl
2f30606f2f [ppc] Fix kernel panic when using BOOTP_NFSROOT
On PowerPC (and possibly other architectures), that doesn't use
EARLY_AP_STARTUP, the config task queue may be used initialized.
This was observed while trying to mount the root fs from NFS, as
reported here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230168.

This patch has 2 main changes:
1- Perform a basic initialization of qgroup_config, similar to
what is done in taskqgroup_adjust, but simpler.
This makes qgroup_config ready to be used during NFS root mount.

2- When EARLY_AP_STARTUP is not used, call inm_init() and
in6m_init() right before SI_SUB_ROOT_CONF, because bootp needs
to send multicast packages to request an IP.

PR:		Bug 230168
Reported by:	sbruno
Reviewed by:	jhibbits, mmacy, sbruno
Approved by:	jhibbits
Differential Revision:	D16633
2018-08-09 14:04:51 +00:00
tuexen
17f71a271f Add a dtrace provider for UDP-Lite.
The dtrace provider for UDP-Lite is modeled after the UDP provider.
This fixes the bug that UDP-Lite packets were triggering the UDP
provider.
Thanks to dteske@ for providing the dwatch module.

Reviewed by:		dteske@, markj@, rrs@
Relnotes:		yes
Differential Revision:	https://reviews.freebsd.org/D16377
2018-07-31 22:56:03 +00:00
tuexen
e122a5a1f6 Allow implicit TCP connection setup for TCP/IPv6.
TCP/IPv4 allows an implicit connection setup using sendto(), which
is used for TTCP and TCP fast open. This patch adds support for
TCP/IPv6.
While there, improve some tests for detecting multicast addresses,
which are mapped.

Reviewed by:		bz@, kbowling@, rrs@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16458
2018-07-30 21:27:26 +00:00
asomers
b3776cb8de Make timespecadd(3) and friends public
The timespecadd(3) family of macros were imported from NetBSD back in
r35029. However, they were initially guarded by #ifdef _KERNEL. In the
meantime, we have grown at least 28 syscalls that use timespecs in some
way, leading many programs both inside and outside of the base system to
redefine those macros. It's better just to make the definitions public.

Our kernel currently defines two-argument versions of timespecadd and
timespecsub.  NetBSD, OpenBSD, and FreeDesktop.org's libbsd, however, define
three-argument versions.  Solaris also defines a three-argument version, but
only in its kernel.  This revision changes our definition to match the
common three-argument version.

Bump _FreeBSD_version due to the breaking KPI change.

Discussed with:	cem, jilles, ian, bde
Differential Revision:	https://reviews.freebsd.org/D14725
2018-07-30 15:46:40 +00:00
andrew
a6605d2938 Use the new VNET_DEFINE_STATIC macro when we are defining static VNET
variables.

Reviewed by:	bz
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D16147
2018-07-24 16:35:52 +00:00
tuexen
ff46e28acc Add missing dtrace probes for received UDP packets.
Fire UDP receive probes when a packet is received and there is no
endpoint consuming it. Fire the probe also if the TTL of the
received packet is smaller than the minimum required by the endpoint.

Clarify also in the man page, when the probe fires.

Reviewed by:		dteske@, markj@, rrs@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16046
2018-07-20 15:32:20 +00:00
tuexen
9bf2bb1b21 Whitespace changes due to changes in ident. 2018-07-19 20:16:33 +00:00
tuexen
14de4a3d5b Revert https://svnweb.freebsd.org/changeset/base/336503
since I also ran the export script with different parameters.
2018-07-19 20:11:14 +00:00
tuexen
5810243631 Whitespace changes due to change if ident. 2018-07-19 19:33:42 +00:00
ae
d94c744a40 Move invoking of callout_stop(&lle->lle_timer) into llentry_free().
This deduplicates the code a bit, and also implicitly adds missing
callout_stop() to in[6]_lltable_delete_entry() functions.

PR:		209682, 225927
Submitted by:	hselasky (previous version)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D4605
2018-07-17 11:33:23 +00:00
mmacy
fd2ad050dc acquire inp lock around ip6_pcbopt to fix IPV6_TCLASS panic
Simple fix to address panics relating to setting IPV6_TCLASS
with setsockopt(). The premise of this change is that it is
ok to call malloc with M_NOWAIT while holding a lock on the
in6p.

If it later turns out that it is not ok, then major surgery
will be required, as ip6_setpktopt() will have to be fixed
(as it also calls malloc with M_NOWAIT) which pulls in the
ip6_pcbopts(), ip6_setpktopts(), ip6_setpktopt() call chain.

Submitted by:	Jason Eggnet
Reviewed by:	rrs, transport, sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D16201
2018-07-15 00:47:06 +00:00
mmacy
1d12a6fcd3 fix 335919 - check "last" not "inp" where appropriate
Submitted by:	ae
Reported by:	cy
2018-07-04 16:34:07 +00:00
mmacy
14de8a2820 epoch(9): allow preemptible epochs to compose
- Add tracker argument to preemptible epochs
- Inline epoch read path in kernel and tied modules
- Change in_epoch to take an epoch as argument
- Simplify tfb_tcp_do_segment to not take a ti_locked argument,
  there's no longer any benefit to dropping the pcbinfo lock
  and trying to do so just adds an error prone branchfest to
  these functions
- Remove cases of same function recursion on the epoch as
  recursing is no longer free.
- Remove the the TAILQ_ENTRY and epoch_section from struct
  thread as the tracker field is now stack or heap allocated
  as appropriate.

Tested by: pho and Limelight Networks
Reviewed by: kbowling at llnw dot com
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16066
2018-07-04 02:47:16 +00:00
mmacy
17a2ff6226 udp6_input: validate inpcb before use
When traversing pcbinfo lists (rather than calling lookup) we need to
explicitly validate an inpcb before use.
2018-07-03 23:30:53 +00:00
mmacy
4dacdb5826 in6_pcblookup_hash: validate inp for liveness 2018-07-01 01:01:59 +00:00
ae
fd52110019 Add NULL pointer check.
encap_lookup_t method can be invoked by IP encap subsytem even if none
of gif/gre/me interfaces are exist. Hash tables are allocated on demand,
when first interface is created. So, make NULL pointer check before
doing access to hash table.

PR:		229378
2018-06-28 11:39:27 +00:00
ae
a58623ba71 Switch RIB and RADIX_NODE_HEAD lock from rwlock(9) to rmlock(9).
Using of rwlock with multiqueue NICs for IP forwarding on high pps
produces high lock contention and inefficient. Rmlock fits better for
such workloads.

Reviewed by:	melifaro, olivier
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D15789
2018-06-16 08:26:23 +00:00
ae
1879a98b2d Add NULL check like the rest of code has.
It is possible that ifma_protospec becomes NULL in this function for
some entry, but it is still referenced and thus it will not unlinked
from the list. Then "restart" condition triggers and this entry with
NULL ifma_protospec will lead to page fault.

PR:		228982
2018-06-14 09:36:25 +00:00
ae
5568626787 Remove stale comment. in6_ifdetach() can be called from places
where addresses are not removed yet.
2018-06-14 09:29:39 +00:00
mmacy
255aa2f16f Fix PCBGROUPS build post CK conversion of pcbinfo 2018-06-13 23:19:54 +00:00
ae
e6c79fbed1 Rework if_gre(4) to use encap_lookup_t method to speedup lookup
of needed interface when many gre interfaces are present.

Remove rmlock from gre_softc, use epoch(9) and CK_LIST instead.
Move more AF-related code into AF-related locations. Use hash table to
speedup lookup of needed softc.
2018-06-13 11:11:33 +00:00
mmacy
1cbc14be82 mechanical CK macro conversion of inpcbinfo lists
This is a dependency for converting the inpcbinfo hash and info rlocks
to epoch.
2018-06-12 22:18:20 +00:00
sbruno
d0aeaa5af7 Load balance sockets with new SO_REUSEPORT_LB option.
This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.

Most of the code was copied from a similar patch for DragonflyBSD.

However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.

Required changes to structures:
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.

Limitations:
As DragonflyBSD, a load balance group is limited to 256 pcbs (256 programs or
threads sharing the same socket).

This is a substantially different contribution as compared to its original
incarnation at svn r332894 and reverted at svn r332967.  Thanks to rwatson@
for the substantive feedback that is included in this commit.

Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Obtained from:	DragonflyBSD
Relnotes:	Yes
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D11003
2018-06-06 15:45:57 +00:00
ae
f21fcbdd6f Use m_copyback() function to write delayed checksum when it isn't located
in the first mbuf of the chain.

MFC after:	1 week
2018-06-06 10:46:24 +00:00
ae
e06398dd48 Fix LINT-NOINET build.
Use known at build time size for min_length value. Also remove the
check from in6_gre_encapcheck(), now it is done in generic code.
2018-06-06 05:17:21 +00:00
ae
d1ee857bcf Rework if_gif(4) to use new encap_lookup_t method to speedup lookup
of needed interface when many gif interfaces are present.

Remove rmlock from gif_softc, use epoch(9) and CK_LIST instead.
Move more AF-related code into AF-related locations.
Use hash table to speedup lookup of needed softc. Interfaces
with GIF_IGNORE_SOURCE flag are stored in plain CK_LIST.
Sysctl net.link.gif.parallel_tunnels is removed. The removal was planed
16 years ago, and actually it could work only for outbound direction.
Each protocol, that can be handled by if_gif(4) interface is registered
by separate encap handler, this helps avoid invoking the handler
for unrelated protocols (GRE, PIM, etc.).

This change allows dramatically improve performance when many gif(4)
interfaces are used.

Sponsored by:	Yandex LLC
2018-06-05 21:24:59 +00:00
ae
8066b881af Constify argument of in6_getscope(). 2018-06-05 20:54:29 +00:00
ae
dfbd18b5fe Rework IP encapsulation handling code.
Currently it has several disadvantages:
- it uses single mutex to protect internal structures. It is used by
  data- and control- path, thus there are no parallelism at all.
- it uses single list to keep encap handlers for both INET and INET6
  families.
- struct encaptab keeps unneeded information (src, dst, masks, protosw),
  that isn't used by code in the source tree.
- matches are prioritized and when many tunneling interfaces are
  registered, encapcheck handler of each interface is invoked for each
  packet. The search takes O(n) for n interfaces. All this work is done
  with exclusive lock held.

What this patch includes:
- the datapath is converted to be lockless using epoch(9) KPI.
- struct encaptab now linked using CK_LIST.
- all unused fields removed from struct encaptab. Several new fields
  addedr: min_length is the minimum packet length, that encapsulation
  handler expects to see; exact_match is maximum number of bits, that
  can return an encapsulation handler, when it wants to consume a packet.
- IPv6 and IPv4 handlers are stored in separate lists;
- added new "encap_lookup_t" method, that will be used later. It is
  targeted to speedup lookup of needed interface, when gif(4)/gre(4) have
  many interfaces.
- the need to use protosw structure is eliminated. The only pr_input
  method was used from this structure, so I don't see the need to keep
  using it.
- encap_input_t method changed to avoid using mbuf tags to store softc
  pointer. Now it is passed directly trough encap_input_t method.
  encap_getarg() funtions is removed.
- all sockaddr structures and code that uses them removed. We don't have
  any code in the tree that uses them. All consumers use encap_attach_func()
  method, that relies on invoking of encapcheck() to determine the needed
  handler.
- introduced struct encap_config, it contains parameters of encap handler
  that is going to be registered by encap_attach() function.
- encap handlers are stored in lists ordered by exact_match value, thus
  handlers that need more bits to match will be checked first, and if
  encapcheck method returns exact_match value, the search will be stopped.
- all current consumers changed to use new KPI.

Reviewed by:	mmacy
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D15617
2018-06-05 20:51:01 +00:00
ae
a4e83add6a Remove empty encap_init() function.
MFC after:	2 weeks
2018-05-29 12:32:08 +00:00
mmacy
c937b516d8 CK: update consumers to use CK macros across the board
r334189 changed the fields to have names distinct from those in queue.h
in order to expose the oversights as compile time errors
2018-05-24 23:21:23 +00:00
mmacy
ecd6e9d307 UDP: further performance improvements on tx
Cumulative throughput while running 64
  netperf -H $DUT -t UDP_STREAM -- -m 1
on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps

Single stream throughput increases from 910kpps to 1.18Mpps

Baseline:
https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg

- Protect read access to global ifnet list with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg

- Protect short lived ifaddr references with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg

- Convert if_afdata read lock path to epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg

A fix for the inpcbhash contention is pending sufficient time
on a canary at LLNW.

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15409
2018-05-23 21:02:14 +00:00
emaste
3de0b917bf Pair CURVNET_SET and CURVNET_RESTORE in a block
Per vnet(9), CURVNET_SET and CURVNET_RESTORE cannot be used as a single
statement for a conditional and CURVNET_RESTORE must be in the same
block as CURVNET_SET (or a subblock).

Reviewed by:	andrew
Sponsored by:	The FreeBSD Foundation
2018-05-21 13:08:44 +00:00
emaste
326c8de9ed Revert r333968, it broke all archs but i386 and amd64 2018-05-21 11:56:07 +00:00
mmacy
649df389bf in(6)_mcast: Expand out vnet set / restore macro so that they work in a conditional block
Reported by:	zec at fer.hr
2018-05-21 08:34:10 +00:00
mmacy
117b59274f make sure vnet is set when freeing
Reported by:	pho
2018-05-20 20:48:26 +00:00
mmacy
ce91e745ec ip(6)_freemoptions: defer imo destruction to epoch callback task
Avoid the ugly unlock / lock of the inpcbinfo where we need to
figure out what kind of lock we hold by simply deferring the
operation to another context. (Also a small dependency for
converting the pcbinfo read lock to epoch)
2018-05-20 00:22:28 +00:00
mmacy
7aeac9ef18 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
brooks
f7adfa5151 Unwrap some not-so-long lines now that extra tabs been removed. 2018-05-15 17:59:46 +00:00
brooks
f5d8b3c62e Remove stray tabs in in6_lltable_dump_entry(). NFC. 2018-05-15 17:57:46 +00:00
shurd
210352aeb1 Fix LORs in in6?_leave_group()
r333175 updated the join_group functions, but not the leave_group ones.

Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15393
2018-05-11 21:42:27 +00:00
gallatin
21f42492ac Fix a panic in the IPv6 multicast code.
Use LIST_FOREACH_SAFE in in6m_disconnect() since we're
deleting and freeing item from the membership list
while traversing the list.

Reviewed by:	mmacy
Sponsored by:	Netflix
2018-05-10 16:19:41 +00:00
hselasky
95d5665495 Fix for missing network interface address event when adding the default IPv6
based link-local address.

The default link local address for IPv6 is added as part of bringing the
network interface up. Move the call to "EVENTHANDLER_INVOKE(ifaddr_event,)"
from the SIOCAIFADDR_IN6 ioctl(2) handler to in6_notify_ifa() which should
catch all the cases of adding IPv6 based addresses to a network interface.
Add a witness warning in case the event handler is not allowed to sleep.

Reviewed by:	network (ae), kib
Differential Revision:	https://reviews.freebsd.org/D13407
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-05-08 11:39:01 +00:00
mmacy
d3f138323c r333175 introduced deferred deletion of multicast addresses in order to permit the driver ioctl
to sleep on commands to the NIC when updating multicast filters. More generally this permitted
driver's to use an sx as a softc lock. Unfortunately this change introduced a race whereby a
a multicast update would still be queued for deletion when ifconfig deleted the interface
thus calling down in to _purgemaddrs and synchronously deleting _all_ of the multicast addresses
on the interface.

Synchronously remove all external references to a multicast address before enqueueing for delete.

Reported by:	lwhsu
Approved by:	sbruno
2018-05-06 20:34:13 +00:00
mmacy
d4a327c789 Currently in_pcbfree will unconditionally wunlock the pcbinfo lock
to avoid a LOR on the multicast list lock in the freemoptions routines.
As it turns out, tcp_usr_detach can acquire the tcbinfo lock readonly.
Trying to wunlock the pcbinfo lock in that context has caused a number
of reported crashes.

This change unclutters in_pcbfree and moves the handling of wunlock vs
runlock of pcbinfo to the freemoptions routine.

Reported by:	mjg@, bde@, o.hartmann at walstatt.org
Approved by:	sbruno
2018-05-05 22:40:40 +00:00
tuexen
10bc3d3127 Send an ICMPv6 PacketTooBig message in case of forwading a packet which
is too big for the outgoing interface and no firewall is involed.
This problem was introduced in
https://svnweb.freebsd.org/changeset/base/324996
Thanks to Irene Ruengeler for finding the bug and testing the fix.

Reviewed by:	kp@
MFC after:	3 days
2018-05-02 22:11:16 +00:00
shurd
7d4b8facc7 Separate list manipulation locking from state change in multicast
Multicast incorrectly calls in to drivers with a mutex held causing drivers
to have to go through all manner of contortions to use a non sleepable lock.
Serialize multicast updates instead.

Submitted by:	mmacy <mmacy@mattmacy.io>
Reviewed by:	shurd, sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14969
2018-05-02 19:36:29 +00:00
sbruno
257e6e5563 Revert r332894 at the request of the submitter.
Submitted by:	Johannes Lundberg <johalun0_gmail.com>
Sponsored by:	Limelight Networks
2018-04-24 19:55:12 +00:00
sbruno
bbf7d4dd03 Load balance sockets with new SO_REUSEPORT_LB option
This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.

Most of the code was copied from a similar patch for DragonflyBSD.

However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.

Required changes to structures
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.

Limitations
As DragonflyBSD, a load balance group is limited to 256 pcbs
(256 programs or threads sharing the same socket).

Submitted by:	Johannes Lundberg <johanlun0@gmail.com>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D11003
2018-04-23 19:51:00 +00:00
ae
ed1f56d3ff icmp6_reflect() sends ICMPv6 message with new IPv6 header. So, it is
considered as originated by our host packet. And thus rcvif should be
NULL, since it is used by ipfw(4) to determine that packet was originated
from this host. Some of icmp6_reflect() consumers reuse mbuf and m_pkthdr
without resetting rcvif pointer. To avoid this always reset m_pkthdr.rcvif
pointer to NULL in icmp6_reflect(). Also remove such line and comment
describing this from icmp6_error(), since it does not longer matters.

PR:		227674
Reported by:	eugen
MFC after:	1 week
2018-04-23 12:20:07 +00:00
brooks
26c165ead9 Remove support for the Arcnet protocol.
While Arcnet has some continued deployment in industrial controls, the
lack of drivers for any of the PCI, USB, or PCIe NICs on the market
suggests such users aren't running FreeBSD.

Evidence in the PR database suggests that the cm(4) driver (our sole
Arcnet NIC) was broken in 5.0 and has not worked since.

PR:		182297
Reviewed by:	jhibbits, vangyzen
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15057
2018-04-13 21:18:04 +00:00
ae
e6638f6749 Add check that mbuf had not multicast layer2 address.
Such packets should be handled by ip6_mforward().

Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2018-04-13 16:13:59 +00:00
brooks
6dcf9514b3 Remove support for FDDI networks.
Defines in net/if_media.h remain in case code copied from ifconfig is in
use elsewere (supporting non-existant media type is harmless).

Reviewed by:	kib, jhb
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15017
2018-04-11 17:28:24 +00:00
tuexen
c3e1813aee Fix a logical inversion bug.
Thanks to Irene Ruengeler for finding and reporting this bug.

MFC after:	3 days
2018-04-08 12:08:20 +00:00
brooks
9d79658aab Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
brooks
2b96daf50f Document and enforce assumptions about struct (in6_)ifreq.
- The two types must be type-punnable for shared members of ifr_ifru.
  This allows compatibility accessors to be shared.

- There must be no padding gap between ifr_name and ifr_ifru.  This is
  assumed in tcpdump's use of SIOCGIFFLAGS output which attempts to be
  broadly portable.  This is true for all current architectures, but very
  large (256-bit) fat-pointers could violate this invariant.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14910
2018-03-30 21:38:53 +00:00
brooks
349ad8a8de Remove a comment that suggests checking that a non-pointer is non-NULL.
Reviewed by:	melifaro, markj, hrs, ume
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14904
2018-03-30 18:26:29 +00:00
brooks
a45d44647f Remove infrastructure for token-ring networks.
Reviewed by:	cem, imp, jhb, jmallett
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14875
2018-03-28 23:33:26 +00:00
jtl
5ccb8e69d6 This change adds a flag to the DAD entry to indicate whether it is
currently on the queue. This prevents accidentally doubly-removing a DAD
entry from the queue, while also simplifying some of the logic in
nd6_dad_stop().

Reviewed by:	ae, hrs, vangyzen
MFC after:	2 weeks
Sponsored by:	Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D10943
2018-03-24 13:18:09 +00:00
jtl
14225f2440 Remove some unneccessary variable sets in IPv6 code, as detected by
clang's static analyzer.

Reviewed by:	bz
MFC after:	2 weeks
Sponsored by:	Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D10940
2018-03-24 12:43:34 +00:00
sbruno
5e726646d5 Revert r331379 as the "simple" lock changes have revealed a deeper problem
and need for a rethink.

Submitted by:	Jason Eggleston <jason@eggnet.com>
Sponsored by:	Limelight Networks
2018-03-23 18:34:38 +00:00
kp
109a7b5eec netpfil: Introduce PFIL_FWD flag
Forwarded packets passed through PFIL_OUT, which made it difficult for
firewalls to figure out if they were forwarding or producing packets. This in
turn is an issue for pf for IPv6 fragment handling: it needs to call
ip6_output() or ip6_forward() to handle the fragments. Figuring out which was
difficult (and until now, incorrect).
Having pfil distinguish the two removes an ugly piece of code from pf.

Introduce a new variant of the netpfil callbacks with a flags variable, which
has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if
a packet is forwarded.

Reviewed by:	ae, kevans
Differential Revision:	https://reviews.freebsd.org/D13715
2018-03-23 16:56:44 +00:00
sbruno
622ba6f45a Refactor ip6_getpcbopt() for better locking and memory management
Created GET_PKTOPT_EXT_HDR() and GET_PKTOPT_SOCKADDR() macros to
handle safely fetching options from in6p_outputopts, including
properly dealing with in6p locking and preparing memory for
sooptcopyout().

Changed the function signature of ip6_getpcbopt() to allow the
function to acquire and release locks on in6p as needed.

Submitted by:	Jason Eggleston <jason@eggnet.com>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14619
2018-03-22 23:34:48 +00:00
sbruno
57df63d5af Simple locking fixes in ip_ctloutput, ip6_ctloutput, rip_ctloutput.
Submitted by:	Jason Eggleston <jason@eggnet.com>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14624
2018-03-22 22:29:32 +00:00
sbruno
b74ecf8d2a Handle locking and memory safety for IPV6_PATHMTU in ip6_ctloutput().
Submitted by:	Jason Eggleston <jason@eggnet.com>
Reviewed by:	ae
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14622
2018-03-22 21:18:34 +00:00
sbruno
bcda73b875 Improve write locking in ip6_ctloutput() with macros.
Submitted by:	Jason Eggleston <jason@eggnet.com>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14620
2018-03-22 20:21:05 +00:00
jtl
ee029a5d0b If the INP lock is uncontested, avoid taking a reference and jumping
through the lock-switching hoops.

A few of the INP lookup operations that lock INPs after the lookup do
so using this mechanism (to maintain lock ordering):

1. Lock lookup structure.
2. Find INP.
3. Acquire reference on INP.
4. Drop lock on lookup structure.
5. Acquire INP lock.
6. Drop reference on INP.

This change provides a slightly shorter path for cases where the INP
lock is uncontested:

1. Lock lookup structure.
2. Find INP.
3. Try to acquire the INP lock.
4. If successful, drop lock on lookup structure.

Of course, if the INP lock is contested, the functions will need to
revert to the previous way of switching locks safely.

This saves a few atomic operations when the INP lock is uncontested.

Discussed with:	gallatin, rrs, rwatson
MFC after:	2 weeks
Sponsored by:	Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D12911
2018-03-21 15:54:46 +00:00
melifaro
7bb5ee0db4 Fix outgoing TCP/UDP packet drop on arp/ndp entry expiration.
Current arp/nd code relies on the feedback from the datapath indicating
 that the entry is still used. This mechanism is incorporated into the
 arpresolve()/nd6_resolve() routines. After the inpcb route cache
 introduction, the packet path for the locally-originated packets changed,
 passing cached lle pointer to the ether_output() directly. This resulted
 in the arp/ndp entry expire each time exactly after the configured max_age
 interval. During the small window between the ARP/NDP request and reply
 from the router, most of the packets got lost.

Fix this behaviour by plugging datapath notification code to the packet
 path used by route cache. Unify the notification code by using single
 inlined function with the per-AF callbacks.

Reported by:	sthaug at nethelp.no
Reviewed by:	ae
MFC after:	2 weeks
2018-03-17 17:05:48 +00:00
vangyzen
63337e8587 Update the MTU in affected routes when IPv6 RA changes the MTU
ip6_calcmtu() only looks at the interface MTU if neither the TCP hostcache
nor the route provides an MTU.  Update the routes so they do not provide
stale MTUs.

This fixes UNH IPv6 conformance test cases v6LC_4_1_08 and v6LC_4_1_09,
which use a RA to reduce the link MTU from 1500 to 1280.

Reported and tested by:	Farrell Woods <Farrell_Woods@Dell.com>
Reviewed by:	dab, melifaro
Discussed with:	ae
MFC after:	1 week
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D14257
2018-02-12 19:49:20 +00:00
vangyzen
192a6ce090 Fix ICMPv6 redirects
icmp6_redirect_input() validates that a redirect packet came from the
current gateway for the respective destination.  To do this, it compares
the source address, which has an embedded scope zone id, to the next-hop
address, which does not.  If the address is link-local, which should be
the case, the comparison fails and the redirect is ignored.

Insert the scope zone id into the next-hop address so the comparison
is accurate.

Unsurprisingly, this fixes 35 UNH IPv6 conformance test cases.

Submitted by:	Farrell Woods <Farrell_Woods@Dell.com> (initial revision)
Reviewed by:	ae melifaro dab
MFC after:	1 week
Relnotes:	yes
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D14254
2018-02-09 00:13:05 +00:00
ae
8d74fbedd3 Modify ip6_get_prevhdr() to be able use it safely.
Instead of returning pointer to the previous header, return its offset.
In frag6_input() use m_copyback() and determined offset to store next
header instead of accessing to it by pointer and assuming that the memory
is contiguous.

In rip6_input() use offset returned by ip6_get_prevhdr() instead of
calculating it from pointers arithmetic, because IP header can belong
to another mbuf in the chain.

Reported by:	Maxime Villard <max at m00nbsd dot net>
Reviewed by:	kp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D14158
2018-02-05 09:22:07 +00:00
ae
ca864f3d25 Merge r1.120 from NetBSD:
Fix a pretty simple, yet pretty tragic typo: we should return IPPROTO_DONE,
  not IPPROTO_NONE. With IPPROTO_NONE we will keep parsing the header chain
  on an mbuf that was already freed.

Reported by:	Maxime Villard <max at m00nbsd dot net>
MFC after:	3 days
2018-02-02 07:39:34 +00:00
vangyzen
08bd8b9104 ND6: Set the correct state for new neighbor cache entries
Restore state 6.  Many of the UNH tests end up exercising this
state, where we have a new neighbor cache entry and a new link-layer
entry is being created for it.  The link-layer address is currently
unknown so the initial state of the "llentry" should remain initialized
to ND6_LLINFO_NOSTATE so that the ND code will send a solicitation.
Setting this to ND6_LLINFO_STALE implies that the link-level entry
is valid and can be used (but needs to be refreshed via the Neighbor
Unreachability state machine).

https://forums.freebsd.org/threads/64287/

Submitted by:	Farrell Woods <Farrell_Woods@Dell.com>
Reviewed by:	mjoras, dab, ae
MFC after:	1 week
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D14059
2018-01-29 16:12:26 +00:00
ae
5c4c621097 Do not skip scope zone violation check, when mbuf has M_FASTFWD_OURS flag.
When mbuf has M_FASTFWD_OURS flag, this means that a destination address
is our local, but we still need to pass scope zone violation check,
because protocol level expects that IPv6 link-local addresses have
embedded scope zone indexes. This should fix the problem, when ipfw is
used to forward packets to local address and source address of a packet
is IPv6 LLA.

Reported by:	sbruno
MFC after:	3 weeks
2018-01-29 11:03:29 +00:00
ae
ffc9f88671 Assign IPv6 link-local address to loopback interfaces whith unit > 0.
When an interface has IFF_LOOPBACK flag in6_ifattach() tries to assing
IPv6 loopback address to this interface. It uses in6ifa_ifpwithaddr()
to check, that interface doesn't already have given address and then
uses in6_ifattach_loopback(). If in6_ifattach_loopback() fails, it just
exits and thus skips assignment of IPv6 LLA.
Fix this using in6ifa_ifwithaddr() function. If IPv6 loopback address is
already assigned in the system, do not call in6_ifattach_loopback().

PR:		138678
MFC after:	3 weeks
2018-01-29 10:33:55 +00:00
np
af35a0e296 Do not generate illegal mbuf chains during IP fragment reassembly. Only
the first mbuf of the reassembled datagram should have a pkthdr.

This was discovered with cxgbe(4) + IPSEC + ping with payload more than
interface MTU.  cxgbe can generate !M_WRITEABLE mbufs and this results
in m_unshare being called on the reassembled datagram, and it complains:

panic: m_unshare: m0 0xfffff80020f82600, m 0xfffff8005d054100 has M_PKTHDR

PR:		224922
Reviewed by:	ae@
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D14009
2018-01-24 05:09:21 +00:00
asomers
c9a63ac910 sys/netinet6: fix typos in comments. No functional change.
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
2018-01-23 19:40:05 +00:00
pfg
ced875130d Revert r327828, r327949, r327953, r328016-r328026, r328041:
Uses of mallocarray(9).

The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.

Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.

Reported by:	wosch
PR:		225197
2018-01-21 15:42:36 +00:00
pfg
bf156bc88c net*: make some use of mallocarray(9).
Focus on code where we are doing multiplications within malloc(9). None of
these ire likely to overflow, however the change is still useful as some
static checkers can benefit from the allocation attributes we use for
mallocarray.

This initial sweep only covers malloc(9) calls with M_NOWAIT. No good
reason but I started doing the changes before r327796 and at that time it
was convenient to make sure the sorrounding code could handle NULL values.

X-Differential revision: https://reviews.freebsd.org/D13837
2018-01-15 21:21:51 +00:00
pfg
508a3c553c Fix some typos.
Obtained from:	OpenBSD (CVS v1.5)
2017-12-28 20:40:56 +00:00
pfg
653c431b71 netinet6/ip6_id.c: niels kindly dropped clause 3/4 from the license.
This bring back r327293 from OpenBSD, with the important difference that
we are now getting it from their ip6_id.c file.

Obtained from:	OpenBSD (CVS v1.3)
2017-12-28 20:35:21 +00:00
pfg
322d1838a4 Start syncing changes from OpenBSD's ip6_id.c instead of ip_id.c.
correct non-repetitive ID code, based on comments from niels provos.
- seed2 is necessary, but use it as "seed2 + x" not "seed2 ^ x".
- skipping number is not needed, so disable it for 16bit generator (makes
  the repetition period to 30000)

Obtained from:	OpenBSD (CVS rev. 1.2)
MFC after:	1 week
2017-12-28 20:26:51 +00:00
pfg
ef54d93c46 Revert r327293
netinet6/ip6_id.c: niels kindly dropped clause 3/4 from the license.

I was looking at the wrong file. There is an important merge that must be
done before I can bring this change.
2017-12-28 20:10:10 +00:00
pfg
ae921d2f60 netinet6/ip6_id.c: niels kindly dropped clause 3/4 from the license.
This file is supposed to be based on the OpenBSD CVS v1.6 but checking
the OpenBSD repository the license had already dropped the 2&3 clasues by
then. Catch up with the licensing.

Obtained from:	OpenBSD (CVS 1.2)
2017-12-28 19:42:53 +00:00
kan
c8da6fae2c Do pass removing some write-only variables from the kernel.
This reduces noise when kernel is compiled by newer GCC versions,
such as one used by external toolchain ports.

Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial)
Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c)
Differential Revision: https://reviews.freebsd.org/D10385
2017-12-25 04:48:39 +00:00
kan
89a94cc4a4 Silence clang analyzer false positive.
clang does not know that two lookup calls will return the same
pointer, so it assumes correctly that using the old pointer
after dropping the reference to it is a bit risky.
2017-12-23 16:45:26 +00:00
ae
8ff989f0f9 Follow the RFC6980 and silently ignore following IPv6 NDP messages
that had the IPv6 fragmentation header:
 o  Neighbor Solicitation
 o  Neighbor Advertisement
 o  Router Solicitation
 o  Router Advertisement
 o  Redirect

Introduce M_FRAGMENTED mbuf flag, and set it after IPv6 fragment reassembly
is completed. Then check the presence of this flag in correspondig ND6
handling routines.

PR:		224247
MFC after:	2 weeks
2017-12-15 12:37:32 +00:00