Commit Graph

159 Commits

Author SHA1 Message Date
Alexander V. Chernikov
78546dad4e Eliminate last rtalloc_ign() caller.
Differential Revision:	https://reviews.freebsd.org/D3927
2015-10-27 21:25:40 +00:00
Kristof Provost
c110fc49da pf: Fix TSO issues
In certain configurations (mostly but not exclusively as a VM on Xen) pf
produced packets with an invalid TCP checksum.

The problem was that pf could only handle packets with a full checksum. The
FreeBSD IP stack produces TCP packets with a pseudo-header checksum (only
addresses, length and protocol).
Certain network interfaces expect to see the pseudo-header checksum, so they
end up producing packets with invalid checksums.

To fix this stop calculating the full checksum and teach pf to only update TCP
checksums if TSO is disabled or the change affects the pseudo-header checksum.

PR:		154428, 193579, 198868
Reviewed by:	sbruno
MFC after:	1 week
Relnotes:	yes
Sponsored by:	RootBSD
Differential Revision:	https://reviews.freebsd.org/D3779
2015-10-14 16:21:41 +00:00
Alexander V. Chernikov
1fe201c322 Simplify the way of attaching IPv6 link-layer header.
Problem description:
How do we currently perform layer 2 resolution and header imposition:

For IPv4 we have the following chain:
  ip_output() -> (ether|atm|whatever)_output() -> arpresolve()

Lookup is done in proper place (link-layer output routine) and it is possible
  to provide cached lle data.

For IPv6 situation is more complex:
  ip6_output() -> nd6_output() -> nd6_output_ifp() -> (whatever)_output() ->
    nd6_storelladdr()

We have ip6_ouput() which calls nd6_output() instead of link output routine.
nd6_output() does the following:
  * checks if lle exists, creates it if needed (similar to arpresolve())
  * performes lle state transitions (similar to arpresolve())
  * calls nd6_output_ifp() which pushes packets to link output routine along
    with running SeND/MAC hooks regardless of lle state
    (e.g. works as run-hooks placeholder).

After that, iface output routine like ether_output() calls nd6_storelladdr()
  which performs lle lookup once again.

As a result, we perform lookup twice for each outgoing packet for most types
  of interfaces. We also need to maintain runtime-checked table of 'nd6-free'
  interfaces (see nd6_need_cache()).

Fix this behavior by eliminating first ND lookup. To be more specific:
  * make all nd6_output() consumers use nd6_output_ifp() instead
  * rename nd6_output[_slow]() to nd6_resolve_[slow]()
  * convert nd6_resolve() and nd6_resolve_slow() to arpresolve() semantics,
    e.g. copy L2 address to buffer instead of pushing packet towards lower
    layers
  * Make all nd6_storelladdr() users use nd6_resolve()
  * eliminate nd6_storelladdr()

The resulting callchain is the following:
  ip6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_resolve()

Error handling:
Currently sending packet to non-existing la results in ip6_<output|forward>
  -> nd6_output() -> nd6_output _lle() which returns 0.
In new scenario packet is propagated to <ether|whatever>_output() ->
  nd6_resolve() which will return EWOULDBLOCK, and that result
  will be converted to 0.

(And EWOULDBLOCK is actually used by IB/TOE code).

Sponsored by:		Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D1469
2015-09-16 14:26:28 +00:00
Kristof Provost
2f6c345adf pf: Fix misdetection of forwarding when net.link.bridge.pfil_bridge is set
If net.link.bridge.pfil_bridge is set we can end up thinking we're forwarding in
pf_test6() because the rcvif and the ifp (output interface) are different.
In that case we're bridging though, and the rcvif the the bridge member on which
the packet was received and ifp is the bridge itself.
If we'd set dir to PF_FWD we'd end up calling ip6_forward() which is incorrect.

Instead check if the rcvif is a member of the ifp bridge. (In other words, the
if_bridge is the ifp's softc). If that's the case we're not forwarding but
bridging.

PR:	202351
Reviewed by:	eri
Differential Revision:	https://reviews.freebsd.org/D3534
2015-09-01 19:04:04 +00:00
Kristof Provost
64b3b4d611 pf: Remove support for 'scrub fragment crop|drop-ovl'
The crop/drop-ovl fragment scrub modes are not very useful and likely to confuse
users into making poor choices.
It's also a fairly large amount of complex code, so just remove the support
altogether.

Users who have 'scrub fragment crop|drop-ovl' in their pf configuration will be
implicitly converted to 'scrub fragment reassemble'.

Reviewed by:	gnn, eri
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D3466
2015-08-27 21:27:47 +00:00
Luiz Otavio O Souza
22932fc9be Reapply r196551 which was accidentally reverted by r223637 (update to
OpenBSD pf 4.5).

Fix argument ordering to memcpy as well as the size of the copy in the
(theoretical) case that pfi_buffer_cnt should be greater than ~_max.

This fix the failure when you hit the self table size and force it to be
resized.

MFC after:	3 days
Sponsored by:	Rubicon Communications (Netgate)
2015-08-24 21:41:05 +00:00
Luiz Otavio O Souza
0a70aaf8f5 Add ALTQ(9) support for the CoDel algorithm.
CoDel is a parameterless queue discipline that handles variable bandwidth
and RTT.

It can be used as the single queue discipline on an interface or as a sub
discipline of existing queue disciplines such as PRIQ, CBQ, HFSC, FAIRQ.

Differential Revision:	https://reviews.freebsd.org/D3272
Reviewd by:	rpaulo, gnn (previous version)
Obtained from:	pfSense
Sponsored by:	Rubicon Communications (Netgate)
2015-08-21 22:02:22 +00:00
Luiz Otavio O Souza
f2fc809dcd Fix the copy of addresses passed from userland in table replace command.
The size2 is the maximum userland buffer size (used when the addresses are
copied back to userland).

Obtained from:	pfSense
MFC after:	3 days
Sponsored by:	Rubicon Communications (Netgate)
2015-08-17 23:03:54 +00:00
Mariusz Zaborski
643ef281cd Use correct src/dst ports when removing states.
Submitted by:	Milosz Kaniewski <m.kaniewski@wheelsystems.com>,
		UMEZAWA Takeshi <umezawa@iij.ad.jp> (orginal)
Reviewed by:	glebius
Approved by:	pjd (mentor)
Obtained from:	OpenBSD
MFC after:	3 days
2015-08-11 17:24:34 +00:00
Kristof Provost
48c29b118e pf: Always initialise pf_fragment.fr_flags
When we allocate the struct pf_fragment in pf_fillup_fragment() we forgot to
initialise the fr_flags field. As a result we sometimes mistakenly thought the
fragment to not be a buffered fragment. This resulted in panics because we'd end
up freeing the pf_fragment but not removing it from V_pf_fragqueue (believing it
to be part of V_pf_cachequeue).
The next time we iterated V_pf_fragqueue we'd use a freed object and panic.

While here also fix a pf_fragment use after free in pf_normalize_ip().
pf_reassemble() frees the pf_fragment, so we can't use it any more.

PR:		201879, 201932
MFC after:	5 days
2015-07-29 06:35:36 +00:00
Renato Botelho
299c819a75 Simplify logic added in r285945 as suggested by glebius
Approved by:	glebius
MFC after:	3 days
Sponsored by:	Netgate
2015-07-28 14:59:29 +00:00
Renato Botelho
b1b98a2db7 Respect pf rule log option before log dropped packets with IP options or
dangerous v6 headers

Reviewed by:	gnn, eri
Approved by:	gnn
Obtained from:	pfSense
MFC after:	3 days
Sponsored by:	Netgate
Differential Revision:	https://reviews.freebsd.org/D3222
2015-07-28 10:31:34 +00:00
Gleb Smirnoff
3e437fd2c6 Fix a typo in r280169. Of course we are interested in deleting nsn only
if we have just created it and we were the last reference.

Submitted by:	dhartmei
2015-07-28 09:36:26 +00:00
Ermal Luçi
a5b789f65a ALTQ FAIRQ discipline import from DragonFLY
Differential Revision:  https://reviews.freebsd.org/D2847
Reviewed by:    glebius, wblock(manpage)
Approved by:    gnn(mentor)
Obtained from:  pfSense
Sponsored by:   Netgate
2015-06-24 19:16:41 +00:00
Kristof Provost
06ba348d27 pf: Remove frc_direction
We don't use the direction of the fragments for anything. The frc_direction
field is assigned, but never read.
Just remove it.

Differential Revision:	https://reviews.freebsd.org/D2773
Approved by:	philip (mentor)
2015-06-11 17:57:47 +00:00
Kristof Provost
837b925aba pf: Save the protocol number in the pf_fragment
When we try to look up a pf_fragment with pf_find_fragment() we compare (see
pf_frag_compare()) addresses (and family), id but also protocol.  We failed to
save the protocol to the pf_fragment in pf_fragcache(), resulting in failing
reassembly.

Differential Revision:	https://reviews.freebsd.org/D2772
2015-06-11 13:26:16 +00:00
Kristof Provost
0b7eba6ad4 pf: address family must be set when creating a pf_fragment
Fix a panic when handling fragmented ip4 packets with 'drop-ovl' set.
In that scenario we take a different branch in pf_normalize_ip(), taking us to
pf_fragcache() (rather than pf_reassemble()). In pf_fragcache() we create a
pf_fragment, but do not set the address family. This leads to a panic when we
try to insert that into pf_frag_tree because pf_addr_cmp(), which is used to
compare the pf_fragments doesn't know what to do if the address family is not
set.

Simply ensure that the address family is set correctly (always AF_INET in this
path).

PR:			200330
Differential Revision:	https://reviews.freebsd.org/D2769
Approved by:		philip (mentor), gnn (mentor)
2015-06-10 13:44:04 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
Gleb Smirnoff
3dd01a884c Use MTX_SYSINIT() instead of mtx_init() to separate mutex initialization
from associated structures initialization.  The mutexes are global, while
the structures are per-vnet.

Submitted by:	Nikos Vassiliadis <nvass gmx.com>
2015-05-19 14:04:21 +00:00
Gleb Smirnoff
30fe681e44 During module unload unlock rules before destroying UMA zones, which
may sleep in uma_drain(). It is safe to unlock here, since we are already
dehooked from pfil(9) and all pf threads had quit.

Sponsored by:	Nginx, Inc.
2015-05-19 14:02:40 +00:00
Gleb Smirnoff
78680d05d1 A miss from r283061: don't dereference NULL is pf_get_mtag() fails.
PR:		200222
Submitted by:	Franco Fichtner <franco opnsense.org>
2015-05-18 15:51:27 +00:00
Gleb Smirnoff
b7f69c506d Don't dereference NULL is pf_get_mtag() fails.
PR:		200222
Submitted by:	Franco Fichtner <franco opnsense.org>
2015-05-18 15:05:12 +00:00
Gleb Smirnoff
772e66a6fc Move ALTQ from contrib to net/altq. The ALTQ code is for many years
discontinued by its initial authors. In FreeBSD the code was already
slightly edited during the pf(4) SMP project. It is about to be edited
more in the projects/ifnet. Moving out of contrib also allows to remove
several hacks to the make glue.

Reviewed by:	net@
2015-04-16 20:22:40 +00:00
Kristof Provost
3d1bbe5fa0 pf: Fix forwarding detection
If the direction is not PF_OUT we can never be forwarding. Some input packets
have rcvif != ifp (looped back packets), which lead us to ip6_forward() inbound
packets, causing panics.

Equally, we need to ensure that packets were really received and not locally
generated before trying to ip6_forward() them.

Differential Revision:	https://reviews.freebsd.org/D2286
Approved by:		gnn(mentor)
2015-04-14 19:07:37 +00:00
George V. Neville-Neil
916e17fd56 I can find no reason to allow packets with both SYN and FIN bits
set past this point in the code. The packet should be dropped and
not massaged as it is here.

Differential Revision:  https://reviews.freebsd.org/D2266
Submitted by: eri
Sponsored by: Rubicon Communications (Netgate)
2015-04-14 14:43:42 +00:00
Kristof Provost
1873dcc8c9 pf: Skip firewall for refragmented ip6 packets
In cases where we scrub (fragment reassemble) on both input and output
we risk ending up in infinite loops when forwarding packets.

Fragmented packets come in and get collected until we can defragment. At
that point the defragmented packet is handed back to the ip stack (at
the pfil point in ip6_input(). Normal processing continues.

Eventually we figure out that the packet has to be forwarded and we end
up at the pfil hook in ip6_forward(). After doing the inspection on the
defragmented packet we see that the packet has been defragmented and
because we're forwarding we have to refragment it.

In pf_refragment6() we split the packet up again and then ip6_forward()
the individual fragments.  Those fragments hit the pfil hook on the way
out, so they're collected until we can reconstruct the full packet, at
which point we're right back where we left off and things continue until
we run out of stack.

Break that loop by marking the fragments generated by pf_refragment6()
as M_SKIP_FIREWALL. There's no point in processing those packets in the
firewall anyway. We've already filtered on the full packet.

Differential Revision:	https://reviews.freebsd.org/D2197
Reviewed by:	glebius, gnn
Approved by:	gnn (mentor)
2015-04-06 19:05:00 +00:00
Gleb Smirnoff
6d947416cc o Use new function ip_fillid() in all places throughout the kernel,
where we want to create a new IP datagram.
o Add support for RFC6864, which allows to set IP ID for atomic IP
  datagrams to any value, to improve performance. The behaviour is
  controlled by net.inet.ip.rfc6864 sysctl knob, which is enabled by
  default.
o In case if we generate IP ID, use counter(9) to improve performance.
o Gather all code related to IP ID into ip_id.c.

Differential Revision:		https://reviews.freebsd.org/D2177
Reviewed by:			adrian, cy, rpaulo
Tested by:			Emeric POUPON <emeric.poupon stormshield.eu>
Sponsored by:			Netflix
Sponsored by:			Nginx, Inc.
Relnotes:			yes
2015-04-01 22:26:39 +00:00
Kristof Provost
7dce9b515b pf: Deal with runt packets
On Ethernet packets have a minimal length, so very short packets get padding
appended to them. This padding is not stripped off in ip6_input() (due to
support for IPv6 Jumbograms, RFC2675).
That means PF needs to be careful when reassembling fragmented packets to not
include the padding in the reassembled packet.

While here also remove the 'Magic from ip_input.' bits. Splitting up and
re-joining an mbuf chain here doesn't make any sense.

Differential Revision:	https://reviews.freebsd.org/D2189
Approved by:		gnn (mentor)
2015-04-01 12:16:56 +00:00
Kristof Provost
798318490e Preserve IPv6 fragment IDs accross reassembly and refragmentation
When forwarding fragmented IPv6 packets and filtering with PF we
reassemble and refragment. That means we generate new fragment headers
and a new fragment ID.

We already save the fragment IDs so we can do the reassembly so it's
straightforward to apply the incoming fragment ID on the refragmented
packets.

Differential Revision:	https://reviews.freebsd.org/D2188
Approved by:		gnn (mentor)
2015-04-01 12:15:01 +00:00
Sergey Kandaurov
a4879be402 Static'ize pf_fillup_fragment body to match its declaration.
Missed in 278925.
2015-03-26 13:31:04 +00:00
Gleb Smirnoff
3e8c6d74bb Always lock the hash row of a source node when updating its 'states' counter.
PR:		182401
Sponsored by:	Nginx, Inc.
2015-03-17 12:19:28 +00:00
Andrey V. Elsukov
998fbd14b8 Reset mbuf pointer to NULL in fastroute case to indicate that mbuf was
consumed by filter. This fixes several panics due to accessing to mbuf
after free.

Submitted by:	Kristof Provost
MFC after:	1 week
2015-03-12 08:57:24 +00:00
Gleb Smirnoff
4ac6485cc6 Even more fixes to !INET and !INET6 kernels.
In collaboration with:	pluknet
2015-02-17 22:33:22 +00:00
Gleb Smirnoff
0324938a0f - Improve INET/INET6 scope.
- style(9) declarations.
- Make couple of local functions static.
2015-02-16 23:50:53 +00:00
Gleb Smirnoff
8dc98c2a36 Toss declarations to fix regular build and NO_INET6 build. 2015-02-16 21:52:28 +00:00
Gleb Smirnoff
39a58828ef In the forwarding case refragment the reassembled packets with the same
size as they arrived in. This allows the sender to determine the optimal
fragment size by Path MTU Discovery.

Roughly based on the OpenBSD work by Alexander Bluhm.

Submitted by:		Kristof Provost
Differential Revision:	D1767
2015-02-16 07:01:02 +00:00
Gleb Smirnoff
f5ceb22b78 Update the pf fragment handling code to closer match recent OpenBSD.
That partially fixes IPv6 fragment handling. Thanks to Kristof for
working on that.

Submitted by:		Kristof Provost
Tested by:		peter
Differential Revision:	D1765
2015-02-16 03:38:27 +00:00
Gleb Smirnoff
efc6c51ffa Back out r276841, r276756, r276747, r276746. The change in r276747 is very
very questionable, since it makes vimages more dependent on each other. But
the reason for the backout is that it screwed up shutting down the pf purge
threads, and now kernel immedially panics on pf module unload. Although module
unloading isn't an advertised feature of pf, it is very important for
development process.

I'd like to not backout r276746, since in general it is good. But since it
has introduced numerous build breakages, that later were addressed in
r276841, r276756, r276747, I need to back it out as well. Better replay it
in clean fashion from scratch.
2015-01-22 01:23:16 +00:00
Craig Rodrigues
7259906eb0 Do not initialize pfi_unlnkdkifs_mtx and pf_frag_mtx.
They are already initialized by MTX_SYSINIT.

Submitted by: Nikos Vassiliadis <nvass@gmx.com>
2015-01-08 17:49:07 +00:00
Craig Rodrigues
8d665c6ba8 Reapply previous patch to fix build.
PR: 194515
2015-01-06 16:47:02 +00:00
Craig Rodrigues
4de985af0b Instead of creating a purge thread for every vnet, create
a single purge thread and clean up all vnets from this thread.

PR:                     194515
Differential Revision:  D1315
Submitted by:           Nikos Vassiliadis <nvass@gmx.com>
2015-01-06 09:03:03 +00:00
Craig Rodrigues
c75820c756 Merge: r258322 from projects/pf branch
Split functions that initialize various pf parts into their
    vimage parts and global parts.
    Since global parts appeared to be only mutex initializations, just
    abandon them and use MTX_SYSINIT() instead.
    Kill my incorrect VNET_FOREACH() iterator and instead use correct
    approach with VNET_SYSINIT().

PR:			194515
Differential Revision:	D1309
Submitted by: 		glebius, Nikos Vassiliadis <nvass@gmx.com>
Reviewed by: 		trociny, zec, gnn
2015-01-06 08:39:06 +00:00
Ermal Luçi
7b56cc430a pf(4) needs to have a correct checksum during its processing.
Calculate checksums for the IPv6 path when needed before
delving into pf(4) code as required.

PR:     172648, 179392
Reviewed by:    glebius@
Approved by:    gnn@
Obtained from:  pfSense
MFC after:      1 week
Sponsored by:   Netgate
2014-11-19 13:31:08 +00:00
Alexander V. Chernikov
5b07fc31cc Finish r274315: remove union 'u' from struct pf_send_entry.
Suggested by:	kib
2014-11-09 17:01:54 +00:00
Alexander V. Chernikov
a458ad86ee Remove unused 'struct route' fields. 2014-11-09 16:15:28 +00:00
Gleb Smirnoff
6df8a71067 Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.
Sponsored by:	Nginx, Inc.
2014-11-07 09:39:05 +00:00
Hans Petter Selasky
f0188618f2 Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
Dag-Erling Smørgrav
99e9de871a Add a complete implementation of MurmurHash3. Tweak both implementations
so they match the established idiom.  Document them in hash(9).

MFC after:	1 month
MFC with:	r272906
2014-10-18 22:15:11 +00:00
George V. Neville-Neil
1d2baefc13 Change the PF hash from Jenkins to Murmur3. In forwarding tests
this showed a conservative 3% incrase in PPS.

Differential Revision:	https://reviews.freebsd.org/D461
Submitted by:	des
Reviewed by:	emaste
MFC after:	1 month
2014-10-10 19:26:26 +00:00
Alexander V. Chernikov
31f0d081d8 Remove lock init from radix.c.
Radix has never managed its locking itself.
The only consumer using radix with embeded rwlock
is system routing table. Move per-AF lock inits there.
2014-10-01 14:39:06 +00:00