Commit Graph

1812 Commits

Author SHA1 Message Date
thj
79ac8c17fb When dropping a fragment queue count the number of fragments in the queue
When dropping a fragment queue, account for the number of fragments in the
queue. This improves accounting between the number of fragments received and
the number of fragments dropped.

Reviewed by:	jtl, bz, transport
Approved by:	jtl (mentor), bz (mentor)
Differential Revision:	https://review.freebsd.org/D17521
2019-02-19 19:57:55 +00:00
glebius
9978a7d924 New pfil(9) KPI together with newborn pfil API and control utility.
The KPI have been reviewed and cleansed of features that were planned
back 20 years ago and never implemented.  The pfil(9) internals have
been made opaque to protocols with only returned types and function
declarations exposed. The KPI is made more strict, but at the same time
more extensible, as kernel uses same command structures that userland
ioctl uses.

In nutshell [KA]PI is about declaring filtering points, declaring
filters and linking and unlinking them together.

New [KA]PI makes it possible to reconfigure pfil(9) configuration:
change order of hooks, rehook filter from one filtering point to a
different one, disconnect a hook on output leaving it on input only,
prepend/append a filter to existing list of filters.

Now it possible for a single packet filter to provide multiple rulesets
that may be linked to different points. Think of per-interface ACLs in
Cisco or Juniper. None of existing packet filters yet support that,
however limited usage is already possible, e.g. default ruleset can
be moved to single interface, as soon as interface would pride their
filtering points.

Another future feature is possiblity to create pfil heads, that provide
not an mbuf pointer but just a memory pointer with length. That would
allow filtering at very early stages of a packet lifecycle, e.g. when
packet has just been received by a NIC and no mbuf was yet allocated.

Differential Revision:	https://reviews.freebsd.org/D18951
2019-01-31 23:01:03 +00:00
hselasky
312adcd942 Fix refcounting leaks in IPv6 MLD code leading to loss of IPv6
connectivity.

Looking at past changes in this area like r337866, some refcounting
bugs have been introduced, one by one. For example like calling
in6m_disconnect() and in6m_rele_locked() in mld_v1_process_group_timer()
where previously no disconnect nor refcount decrement was done.
Calling in6m_disconnect() when it shouldn't causes IPv6 solitation to no
longer work, because all the multicast addresses receiving the solitation
messages are now deleted from the network interface.

This patch reverts some recent changes while improving the MLD
refcounting and concurrency model after the MLD code was converted
to using EPOCH(9).

List changes:
- All CK_STAILQ_FOREACH() macros are now properly enclosed into
  EPOCH(9) sections. This simplifies assertion of locking inside
  in6m_ifmultiaddr_get_inm().
- Corrected bad use of in6m_disconnect() leading to loss of IPv6
  connectivity for MLD v1.
- Factored out checks for valid inm structure into
  in6m_ifmultiaddr_get_inm().

PR:			233535
Differential Revision:	https://reviews.freebsd.org/D18887
Reviewed by:		bz (net)
Tested by:		ae
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-01-24 08:34:13 +00:00
hselasky
b9fedbd75b When detaching a network interface drain the workqueue freeing the inm's
because the destructor will access the if_ioctl() callback in the ifnet
pointer which is about to be freed. This prevents use-after-free.

PR:			233535
Differential Revision:	https://reviews.freebsd.org/D18887
Reviewed by:		bz (net)
Tested by:		ae
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-01-24 08:25:02 +00:00
hselasky
c9ba3bd6b3 Add debugging sysctl to disable incoming MLD v2 messages similar to the
existing sysctl for MLD v1 messages.

PR:			233535
Differential Revision:	https://reviews.freebsd.org/D18887
Reviewed by:		bz (net)
Tested by:		ae
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-01-24 08:18:02 +00:00
hselasky
e42cfab759 Fix duplicate acquiring of refcount when joining IPv6 multicast groups.
This was observed by starting and stopping rpcbind(8) multiple times.

PR:			233535
Differential Revision:	https://reviews.freebsd.org/D18887
Reviewed by:		bz (net)
Tested by:		ae
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-01-24 08:15:41 +00:00
markj
acda75d443 Style.
Reviewed by:	bz
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-01-23 22:19:49 +00:00
markj
af69719726 Fix an LLE lookup race.
After the afdata read lock was converted to epoch(9), readers could
observe a linked LLE and block on the LLE while a thread was
unlinking the LLE.  The writer would then release the lock and schedule
the LLE for deferred free, allowing readers to continue and potentially
schedule the LLE timer.  By the point the timer fires, the structure is
freed, typically resulting in a crash in the callout subsystem.

Fix the problem by modifying the lookup path to check for the LLE_LINKED
flag upon acquiring the LLE lock.  If it's not set, the lookup fails.

PR:		234296
Reviewed by:	bz
Tested by:	sbruno, Victor <chernov_victor@list.ru>,
		Mike Andrews <mandrews@bit0.com>
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18906
2019-01-23 22:18:23 +00:00
glebius
c49c728afe Remove unnecessary ifdef. With INVARIANTS all KASSERTs are empty statements,
so won't be compiled in.
2019-01-10 00:52:06 +00:00
hselasky
efc3b98861 Fix loopback traffic when using non-lo0 link local IPv6 addresses.
The loopback interface can only receive packets with a single scope ID,
namely the scope ID of the loopback interface itself. To mitigate this
packets which use the scope ID are appearing as received by the real
network interface, see "origifp" in the patch. The current code would
drop packets which are designated for loopback which use a link-local
scope ID in the destination address or source address, because they
won't match the lo0's scope ID. To fix this restore the network
interface pointer from the scope ID in the destination address for
the problematic cases. See comments added in patch for a more detailed
description.

This issue was introduced with route caching (ae@).

Reviewed by:		bz (network)
Differential Revision:	https://reviews.freebsd.org/D18769
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-01-09 14:28:08 +00:00
glebius
6d8cc191f9 Mechanical cleanup of epoch(9) usage in network stack.
- Remove macros that covertly create epoch_tracker on thread stack. Such
  macros a quite unsafe, e.g. will produce a buggy code if same macro is
  used in embedded scopes. Explicitly declare epoch_tracker always.

- Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list
  IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read
  locking macros to what they actually are - the net_epoch.
  Keeping them as is is very misleading. They all are named FOO_RLOCK(),
  while they no longer have lock semantics. Now they allow recursion and
  what's more important they now no longer guarantee protection against
  their companion WLOCK macros.
  Note: INP_HASH_RLOCK() has same problems, but not touched by this commit.

This is non functional mechanical change. The only functionally changed
functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter
epoch recursively.

Discussed with:	jtl, gallatin
2019-01-09 01:11:19 +00:00
mjg
7e31d1de7e Remove unused argument to priv_check_cred.
Patch mostly generated with cocinnelle:

@@
expression E1,E2;
@@

- priv_check_cred(E1,E2,0)
+ priv_check_cred(E1,E2)

Sponsored by:	The FreeBSD Foundation
2018-12-11 19:32:16 +00:00
markj
a74ba1d431 Clamp the INPCB port hash tables to IPPORT_MAX + 1 chains.
Memory beyond that limit was previously unused, wasting roughly 1MB per
8GB of RAM.  Also retire INP_PCBLBGROUP_PORTHASH, which was identical to
INP_PCBPORTHASH.

Reviewed by:	glebius
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D17803
2018-12-05 17:06:00 +00:00
markj
5c563658ea Plug some networking sysctl leaks.
Various network protocol sysctl handlers were not zero-filling their
output buffers and thus would export uninitialized stack memory to
userland.  Fix a number of such handlers.

Reported by:	Thomas Barabosch, Fraunhofer FKIE
Reviewed by:	tuexen
MFC after:	3 days
Security:	kernel memory disclosure
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18301
2018-11-22 20:49:41 +00:00
ae
1382ea4ffb Add ability to use dynamic external prefix in ipfw_nptv6 module.
Now an interface name can be specified for nptv6 instance instead of
ext_prefix. The module will track if_addr_ext events and when suitable
IPv6 address will be added to specified interface, it will be configured
as external prefix. When address disappears instance becomes unusable,
i.e. it doesn't match any packets.

Reviewed by:	0mp (manpages)
Tested by:	Dries Michiels <driesm dot michiels gmail com>
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D17765
2018-11-12 11:20:59 +00:00
vangyzen
490f74685f in6_ifattach_linklocal: handle immediate removal of the new LLA
If another thread immediately removes the link-local address
added by in6_update_ifa(), in6ifa_ifpforlinklocal() can return NULL,
so the following assertion (or dereference) is wrong.
Remove the assertion, and handle NULL somewhat better than panicking.
This matches all of the other callers of in6_update_ifa().

PR:		219250
Reviewed by:	bz, dab (both an earlier version)
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D17898
2018-11-08 19:50:23 +00:00
markj
96786af957 Remove redundant checks for a NULL lbgroup table.
No functional change intended.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17108
2018-11-01 15:52:49 +00:00
bz
3431d451a5 Initial implementation of draft-ietf-6man-ipv6only-flag.
This change defines the RA "6" (IPv6-Only) flag which routers
may advertise, kernel logic to check if all routers on a link
have the flag set and accordingly update a per-interface flag.

If all routers agree that it is an IPv6-only link, ether_output_frame(),
based on the interface flag, will filter out all ETHERTYPE_IP/ARP
frames, drop them, and return EAFNOSUPPORT to upper layers.

The change also updates ndp to show the "6" flag, ifconfig to
display the IPV6_ONLY nd6 flag if set, and rtadvd to allow
announcing the flag.

Further changes to tcpdump (contrib code) are availble and will
be upstreamed.

Tested the code (slightly earlier version) with 2 FreeBSD
IPv6 routers, a FreeBSD laptop on ethernet as well as wifi,
and with Win10 and OSX clients (which did not fall over with
the "6" flag set but not understood).

We may also want to (a) implement and RX filter, and (b) over
time enahnce user space to, say, stop dhclient from running
when the interface flag is set.  Also we might want to start
IPv6 before IPv4 in the future.

All the code is hidden under the EXPERIMENTAL option and not
compiled by default as the draft is a work-in-progress and
we cannot rely on the fact that IANA will assign the bits
as requested by the draft and hence they may change.

Dear 6man, you have running code.

Discussed with:	Bob Hinden, Brian E Carpenter
2018-10-30 20:08:48 +00:00
bz
00fd7d4a48 rip6_input() inp validation after epoch(9)
After r335924 rip6_input() needs inp validation to avoid
working on FREED inps.

Apply the relevant bits from r335497,r335501 (rip_input() change)
to the IPv6 counterpart.

PR:			232194
Reviewed by:		rgrimes, ae (,hps)
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D17594
2018-10-24 10:42:35 +00:00
ae
91cf1d92ac Add the check that current VNET is ready and access to srchash is allowed.
This change is similar to r339646. The callback that checks for appearing
and disappearing of tunnel ingress address can be called during VNET
teardown. To prevent access to already freed memory, add check to the
callback and epoch_wait() call to be sure that callback has finished its
work.

MFC after:	20 days
2018-10-23 13:11:45 +00:00
markj
741a850d6c Fix style bugs in in6_pcblookup_lbgroup().
This should have been a part of r338470.  No functional changes
intended.

Reported by:	gallatin
Reviewed by:	gallatin, Johannes Lundberg <johalun0@gmail.com>
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17109
2018-10-22 16:09:01 +00:00
ae
b620bf12c6 Add handling for appearing/disappearing of ingress addresses to if_gre(4).
* register handler for ingress address appearing/disappearing;
* add new srcaddr hash table for fast softc lookup by srcaddr;
* when srcaddr disappears, clear IFF_DRV_RUNNING flag from interface,
  and set it otherwise;

MFC after:	1 month
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D17214
2018-10-21 18:13:45 +00:00
ae
802ce6d2c8 Add handling for appearing/disappearing of ingress addresses to if_gif(4).
* register handler for ingress address appearing/disappearing;
* add new srcaddr hash table for fast softc lookup by srcaddr;
* when srcaddr disappears, clear IFF_DRV_RUNNING flag from interface,
  and set it otherwise;
* remove the note about ingress address from BUGS section.

MFC after:	1 month
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D17134
2018-10-21 18:06:15 +00:00
ae
8d3e25d418 Add ifaddr_event_ext event. It is similar to ifaddr_event, but the
handler receives the type of event IFADDR_EVENT_ADD/IFADDR_EVENT_DEL,
and the pointer to ifaddr. Also ifaddr_event now is implemented using
ifaddr_event_ext handler.

MFC after:	3 weeks
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D17100
2018-10-21 15:02:06 +00:00
jtl
1d041dd4b1 There are three places where we return from a function which entered an
epoch section without exiting that epoch section. This is bad for two
reasons: the epoch section won't exit, and we will leave the epoch tracker
from the stack on the epoch list.

Fix the epoch leak by making sure we exit epoch sections before returning.

Reviewed by:	ae, gallatin, mmacy
Approved by:	re (gjb, kib)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D17450
2018-10-09 13:26:06 +00:00
bz
d0aa68f4fe After r338257 is was possible to trigger a KASSERT() in ud6_output()
using an application trying to use a v4mapped destination address on a
kernel without INET support or on a v6only socket.
Catch this case and prevent the packet from going anywhere;
else, without the KASSERT() armed, a v4mapped destination
address might go out on the wire or other undefined behaviour
might happen, while with the KASSERT() we panic.

PR:		231728
Reported by:	Jeremy Faulkner (gldisater gmail.com)
Approved by:	re (kib)
2018-10-02 17:29:56 +00:00
bz
3555315153 In in6_pcbpurgeif0() called, e.g., from if_clone_destroy(),
once we have a lock, make sure the inp is not marked freed.
This can happen since the list traversal and locking was
converted to epoch(9).  If the inp is marked "freed", skip it.

This prevents a NULL pointer deref panic later on.

Reported by:	slavash (Mellanox)
Tested by:	slavash (Mellanox)
Reviewed by:	markj (no formal review but caught my unlock mistake)
Approved by:	re (kib)
2018-09-27 15:32:37 +00:00
bz
5ca96117e6 In icmp6_rip6_input(), once we have a lock, make sure the inp is
not freed.  This can happen since the list traversal and locking
was converted to epoch(9).  If the inp is marked "freed", skip it.

This prevents a NULL pointer deref panic in ip6_savecontrol_v4()
trying to access the socket hanging off the inp, which was gone
by the time we got there.

Reported by:	andrew
Tested by:	andrew
Approved by:	re (gjb)
2018-09-20 15:45:53 +00:00
bz
84630aa64f Update udp6_output() inp locking to avoid concurrency issues with
route cache updates.

Bring over locking changes applied to udp_output() for the route cache
in r297225 and fixed in r306559 which achieve multiple things:
(1) acquire an exclusive inp lock earlier depending on the expected
    conditions; we add a comment explaining this in udp6,
(2) having acquired the exclusive lock earlier eliminates a slight
    possible chance for a race condition which was present in v4 for
    multiple years as well and is now gone, and
(3) only pass the inp_route6 to ip6_output() if we are holding an
    exclusive inp lock, so that possible route cache updates in case
    of routing table generation number changes can happen safely.
In addition this change (as the legacy IP counterpart) decomposes the
tracking of inp and pcbinfo lock and adds extra assertions, that the
two together are acquired correctly.

PR:		230950
Reviewed by:	karels, markj
Approved by:	re (gjb)
Pointyhat to:	bz (for completely missing this bit)
Differential Revision:	https://reviews.freebsd.org/D17230
2018-09-19 18:49:37 +00:00
markj
6131d60f6a Fix synchronization of LB group access.
Lookups are protected by an epoch section, so the LB group linkage must
be a CK_LIST rather than a plain LIST.  Furthermore, we were not
deferring LB group frees, so in_pcbremlbgrouphash() could race with
readers and cause a use-after-free.

Reviewed by:	sbruno, Johannes Lundberg <johalun0@gmail.com>
Tested by:	gallatin
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17031
2018-09-10 19:00:29 +00:00
bz
209555fd34 Replicate r328271 from legacy IP to IPv6 using a single macro
to clear L2 and L3 route caches.
Also mark one function argument as __unused.

Reviewed by:	karels, ae
Approved by:	re (rgrimes)
Differential Revision:	https://reviews.freebsd.org/D17007
2018-09-03 22:27:27 +00:00
bz
78d4f16823 Replicate r307234 from legacy IP to IPv6 code, using the RO_RTFREE()
macro rather than hand crafted code.
No functional changes.

Reviewed by:	karels
Approved by:	re (rgrimes)
Differential Revision:	https://reviews.freebsd.org/D17006
2018-09-03 22:14:37 +00:00
bz
a789306797 As discussed in D6262 post-commit review, change inp_route to
inp_route6 for IPv6 code after r301217.
This was most likely a c&p error from the legacy IP code, which
did not matter as it is a union and both structures have the same
layout at the beginning.
No functional changes.

Reviewed by:	karels, ae
Approved by:	re (rgrimes)
Differential Revision:	https://reviews.freebsd.org/D17005
2018-09-03 22:12:48 +00:00
kp
82b4e78e14 frag6: Fix fragment reassembly
r337776 started hashing the fragments into buckets for faster lookup.

The hashkey is larger than intended. This results in random stack data being
included in the hashed data, which in turn means that fragments of the same
packet might end up in different buckets, causing the reassembly to fail.

Set the correct size for hashkey.

PR:		231045
Approved by:	re (kib)
MFC after:	3 days
2018-08-31 08:37:15 +00:00
gallatin
c5fa7c7210 Reject IPv4 SO_REUSEPORT_LB groups when looking up an IPv6 listening socket
Similar to how the IPv4 code will reject an IPv6 LB group,
we must ignore IPv4 LB groups when looking up an IPv6
listening socket.   If this is not done, a port only match
may return an IPv4 socket, which causes problems (like
sending IPv6 packets with a hopcount of 0, making them unrouteable).

Thanks to rrs for all the work to diagnose this.

Approved by:	re (rgrimes)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D16899
2018-08-27 18:13:20 +00:00
bz
be242dfa60 Unbreak RSS builds after r338257. Folding both RSS blocks together
I missed the closing } of the new combined block.

Pointyhat to:	bz
Reported by:	np
Approved by:	re (kib)
2018-08-24 21:49:21 +00:00
bz
415411ca28 MFp4 bz_ipv6_fast:
Migrate udp6_send() v4mapped code to udp6_output() saving us a re-lock and
further simplifying the address-family handling code by eliminating
AF_INET checks and almost all v4mapped handling right after the start
as cases could actually not happen anymore.

Rework output path locking similar to UDP4 allowing for better
parallelism (see r222488, and later versions).

Sponsored by: The FreeBSD Foundation (2012)
Sponsored by: iXsystems (2012)
Differential Revision:	https://reviews.freebsd.org/D3721
2018-08-23 16:54:22 +00:00
mmacy
47fa74161c in_mcast: fix copy paste error when clearing flag 2018-08-22 04:09:55 +00:00
mmacy
d41a2ca02d Fix null deref in mld_v1_transmit_report
After r337866 it is possible for an in_multi6 to be referenced while
mid teardown. Handle case of cleared ifnet pointer.

Reported by:	ae
2018-08-21 23:03:02 +00:00
ae
93218b657a Properly initialize IP version in IPv6 header. This was missed in r334673.
Reported by:	Lars Schotte <lars at gustik dot eu>
2018-08-16 09:19:06 +00:00
mmacy
99cec0a00c Fix in6_multi double free
This is actually several different bugs:
- The code is not designed to handle inpcb deletion after interface deletion
  - add reference for inpcb membership
- The multicast address has to be removed from interface lists when the refcount
  goes to zero OR when the interface goes away
  - decouple list disconnect from refcount (v6 only for now)
- ifmultiaddr can exist past being on interface lists
  - add flag for tracking whether or not it's enqueued
- deferring freeing moptions makes the incpb cleanup code simpler but opens the
  door wider still to races
  - call inp_gcmoptions synchronously after dropping the the inpcb lock

Fundamentally multicast needs a rewrite - but keep applying band-aids for now.

Tested by: kp
Reported by: novel, kp, lwhsu
2018-08-15 20:23:08 +00:00
jtl
bd4f87b859 Lower the default limits on the IPv6 reassembly queue.
Currently, the limits are quite high. On machines with millions of
mbuf clusters, the reassembly queue limits can also run into
the millions. Lower these values.

Also, try to ensure that no bucket will have a reassembly
queue larger than approximately 100 items. This limits the cost to
find the correct reassembly queue when processing an incoming
fragment.

Due to the low limits on each bucket's length, increase the size of
the hash table from 64 to 1024.

Reviewed by:	jhb
Security:	FreeBSD-SA-18:10.ip
Security:	CVE-2018-6923
2018-08-14 17:32:07 +00:00
jtl
55789af7ee Drop 0-byte IPv6 fragments.
Currently, we process IPv6 fragments with 0 bytes of payload, add them
to the reassembly queue, and do not recognize them as duplicating or
overlapping with adjacent 0-byte fragments. An attacker can exploit this
to create long fragment queues.

There is no legitimate reason for a fragment with no payload. However,
because IPv6 packets with an empty payload are acceptable, allow an
"atomic" fragment with no payload.

Reviewed by:	jhb
Security:	FreeBSD-SA-18:10.ip
Security:	CVE-2018-6923
2018-08-14 17:29:22 +00:00
jtl
e5f23fbf44 Implement a limit on on the number of IPv6 reassembly queues per bucket.
There is a hashing algorithm which should distribute IPv6 reassembly
queues across the available buckets in a relatively even way. However,
if there is a flaw in the hashing algorithm which allows a large number
of IPv6 fragment reassembly queues to end up in a single bucket, a per-
bucket limit could help mitigate the performance impact of this flaw.

Implement such a limit, with a default of twice the maximum number of
reassembly queues divided by the number of buckets. Recalculate the
limit any time the maximum number of reassembly queues changes.
However, allow the user to override the value using a sysctl
(net.inet6.ip6.maxfragbucketsize).

Reviewed by:	jhb
Security:	FreeBSD-SA-18:10.ip
Security:	CVE-2018-6923
2018-08-14 17:27:41 +00:00
jtl
a7668fa529 Add a limit of the number of fragments per IPv6 packet.
The IPv4 fragment reassembly code supports a limit on the number of
fragments per packet. The default limit is currently 17 fragments.
Among other things, this limit serves to limit the number of fragments
the code must parse when trying to reassembly a packet.

Add a limit to the IPv6 reassembly code. By default, limit a packet
to 65 fragments (64 on the queue, plus one final fragment to complete
the packet). This allows an average fragment size of 1,008 bytes, which
should be sufficient to hold a fragment. (Recall that the IPv6 minimum
MTU is 1280 bytes. Therefore, this configuration allows a full-size
IPv6 packet to be fragmented on a link with the minimum MTU and still
carry approximately 272 bytes of headers before the fragmented portion
of the packet.)

Users can adjust this limit using the net.inet6.ip6.maxfragsperpacket
sysctl.

Reviewed by:	jhb
Security:	FreeBSD-SA-18:10.ip
Security:	CVE-2018-6923
2018-08-14 17:26:07 +00:00
jtl
1f361945df Make the IPv6 fragment limits be global, rather than per-VNET, limits.
The IPv6 reassembly fragment limit is based on the number of mbuf clusters,
which are a global resource. However, the limit is currently applied
on a per-VNET basis. Given enough VNETs (or given sufficient customization
on enough VNETs), it is possible that the sum of all the VNET fragment
limits will exceed the number of mbuf clusters available in the system.

Given the fact that the fragment limits are intended (at least in part) to
regulate access to a global resource, the IPv6 fragment limit should
be applied on a global basis.

Note that it is still possible to disable fragmentation for a particular
VNET by setting the net.inet6.ip6.maxfragpackets sysctl to 0 for that
VNET. In addition, it is now possible to disable fragmentation globally
by setting the net.inet6.ip6.maxfrags sysctl to 0.

Reviewed by:	jhb
Security:	FreeBSD-SA-18:10.ip
Security:	CVE-2018-6923
2018-08-14 17:24:26 +00:00
jtl
dca433c72f Improve IPv6 reassembly performance by hashing fragments into buckets.
Currently, all IPv6 fragment reassembly queues are kept in a flat
linked list. This has a number of implications. Two significant
implications are: all reassembly operations share a common lock,
and it is possible for the linked list to grow quite large.

Improve IPv6 reassembly performance by hashing fragments into buckets,
each of which has its own lock. Calculate the hash key using a Jenkins
hash with a random seed.

Reviewed by:	jhb
Security:	FreeBSD-SA-18:10.ip
Security:	CVE-2018-6923
2018-08-14 17:17:37 +00:00
tuexen
f64223beb5 Use a macro to set the assoc state. I missed this in r337706. 2018-08-14 08:33:47 +00:00
ae
694891e438 Restore ability to send ICMP and ICMPv6 redirects.
It was lost when tryforward appeared. Now ip[6]_tryforward will be enabled
only when sending redirects for corresponding IP version is disabled via
sysctl. Otherwise will be used default forwarding function.

PR:		221137
Submitted by:	mckay@
MFC after:	2 weeks
2018-08-14 07:54:14 +00:00
luporl
2f30606f2f [ppc] Fix kernel panic when using BOOTP_NFSROOT
On PowerPC (and possibly other architectures), that doesn't use
EARLY_AP_STARTUP, the config task queue may be used initialized.
This was observed while trying to mount the root fs from NFS, as
reported here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230168.

This patch has 2 main changes:
1- Perform a basic initialization of qgroup_config, similar to
what is done in taskqgroup_adjust, but simpler.
This makes qgroup_config ready to be used during NFS root mount.

2- When EARLY_AP_STARTUP is not used, call inm_init() and
in6m_init() right before SI_SUB_ROOT_CONF, because bootp needs
to send multicast packages to request an IP.

PR:		Bug 230168
Reported by:	sbruno
Reviewed by:	jhibbits, mmacy, sbruno
Approved by:	jhibbits
Differential Revision:	D16633
2018-08-09 14:04:51 +00:00