Commit Graph

137 Commits

Author SHA1 Message Date
Gleb Smirnoff
772e66a6fc Move ALTQ from contrib to net/altq. The ALTQ code is for many years
discontinued by its initial authors. In FreeBSD the code was already
slightly edited during the pf(4) SMP project. It is about to be edited
more in the projects/ifnet. Moving out of contrib also allows to remove
several hacks to the make glue.

Reviewed by:	net@
2015-04-16 20:22:40 +00:00
Kristof Provost
3d1bbe5fa0 pf: Fix forwarding detection
If the direction is not PF_OUT we can never be forwarding. Some input packets
have rcvif != ifp (looped back packets), which lead us to ip6_forward() inbound
packets, causing panics.

Equally, we need to ensure that packets were really received and not locally
generated before trying to ip6_forward() them.

Differential Revision:	https://reviews.freebsd.org/D2286
Approved by:		gnn(mentor)
2015-04-14 19:07:37 +00:00
George V. Neville-Neil
916e17fd56 I can find no reason to allow packets with both SYN and FIN bits
set past this point in the code. The packet should be dropped and
not massaged as it is here.

Differential Revision:  https://reviews.freebsd.org/D2266
Submitted by: eri
Sponsored by: Rubicon Communications (Netgate)
2015-04-14 14:43:42 +00:00
Kristof Provost
1873dcc8c9 pf: Skip firewall for refragmented ip6 packets
In cases where we scrub (fragment reassemble) on both input and output
we risk ending up in infinite loops when forwarding packets.

Fragmented packets come in and get collected until we can defragment. At
that point the defragmented packet is handed back to the ip stack (at
the pfil point in ip6_input(). Normal processing continues.

Eventually we figure out that the packet has to be forwarded and we end
up at the pfil hook in ip6_forward(). After doing the inspection on the
defragmented packet we see that the packet has been defragmented and
because we're forwarding we have to refragment it.

In pf_refragment6() we split the packet up again and then ip6_forward()
the individual fragments.  Those fragments hit the pfil hook on the way
out, so they're collected until we can reconstruct the full packet, at
which point we're right back where we left off and things continue until
we run out of stack.

Break that loop by marking the fragments generated by pf_refragment6()
as M_SKIP_FIREWALL. There's no point in processing those packets in the
firewall anyway. We've already filtered on the full packet.

Differential Revision:	https://reviews.freebsd.org/D2197
Reviewed by:	glebius, gnn
Approved by:	gnn (mentor)
2015-04-06 19:05:00 +00:00
Gleb Smirnoff
6d947416cc o Use new function ip_fillid() in all places throughout the kernel,
where we want to create a new IP datagram.
o Add support for RFC6864, which allows to set IP ID for atomic IP
  datagrams to any value, to improve performance. The behaviour is
  controlled by net.inet.ip.rfc6864 sysctl knob, which is enabled by
  default.
o In case if we generate IP ID, use counter(9) to improve performance.
o Gather all code related to IP ID into ip_id.c.

Differential Revision:		https://reviews.freebsd.org/D2177
Reviewed by:			adrian, cy, rpaulo
Tested by:			Emeric POUPON <emeric.poupon stormshield.eu>
Sponsored by:			Netflix
Sponsored by:			Nginx, Inc.
Relnotes:			yes
2015-04-01 22:26:39 +00:00
Kristof Provost
7dce9b515b pf: Deal with runt packets
On Ethernet packets have a minimal length, so very short packets get padding
appended to them. This padding is not stripped off in ip6_input() (due to
support for IPv6 Jumbograms, RFC2675).
That means PF needs to be careful when reassembling fragmented packets to not
include the padding in the reassembled packet.

While here also remove the 'Magic from ip_input.' bits. Splitting up and
re-joining an mbuf chain here doesn't make any sense.

Differential Revision:	https://reviews.freebsd.org/D2189
Approved by:		gnn (mentor)
2015-04-01 12:16:56 +00:00
Kristof Provost
798318490e Preserve IPv6 fragment IDs accross reassembly and refragmentation
When forwarding fragmented IPv6 packets and filtering with PF we
reassemble and refragment. That means we generate new fragment headers
and a new fragment ID.

We already save the fragment IDs so we can do the reassembly so it's
straightforward to apply the incoming fragment ID on the refragmented
packets.

Differential Revision:	https://reviews.freebsd.org/D2188
Approved by:		gnn (mentor)
2015-04-01 12:15:01 +00:00
Sergey Kandaurov
a4879be402 Static'ize pf_fillup_fragment body to match its declaration.
Missed in 278925.
2015-03-26 13:31:04 +00:00
Gleb Smirnoff
3e8c6d74bb Always lock the hash row of a source node when updating its 'states' counter.
PR:		182401
Sponsored by:	Nginx, Inc.
2015-03-17 12:19:28 +00:00
Andrey V. Elsukov
998fbd14b8 Reset mbuf pointer to NULL in fastroute case to indicate that mbuf was
consumed by filter. This fixes several panics due to accessing to mbuf
after free.

Submitted by:	Kristof Provost
MFC after:	1 week
2015-03-12 08:57:24 +00:00
Gleb Smirnoff
4ac6485cc6 Even more fixes to !INET and !INET6 kernels.
In collaboration with:	pluknet
2015-02-17 22:33:22 +00:00
Gleb Smirnoff
0324938a0f - Improve INET/INET6 scope.
- style(9) declarations.
- Make couple of local functions static.
2015-02-16 23:50:53 +00:00
Gleb Smirnoff
8dc98c2a36 Toss declarations to fix regular build and NO_INET6 build. 2015-02-16 21:52:28 +00:00
Gleb Smirnoff
39a58828ef In the forwarding case refragment the reassembled packets with the same
size as they arrived in. This allows the sender to determine the optimal
fragment size by Path MTU Discovery.

Roughly based on the OpenBSD work by Alexander Bluhm.

Submitted by:		Kristof Provost
Differential Revision:	D1767
2015-02-16 07:01:02 +00:00
Gleb Smirnoff
f5ceb22b78 Update the pf fragment handling code to closer match recent OpenBSD.
That partially fixes IPv6 fragment handling. Thanks to Kristof for
working on that.

Submitted by:		Kristof Provost
Tested by:		peter
Differential Revision:	D1765
2015-02-16 03:38:27 +00:00
Gleb Smirnoff
efc6c51ffa Back out r276841, r276756, r276747, r276746. The change in r276747 is very
very questionable, since it makes vimages more dependent on each other. But
the reason for the backout is that it screwed up shutting down the pf purge
threads, and now kernel immedially panics on pf module unload. Although module
unloading isn't an advertised feature of pf, it is very important for
development process.

I'd like to not backout r276746, since in general it is good. But since it
has introduced numerous build breakages, that later were addressed in
r276841, r276756, r276747, I need to back it out as well. Better replay it
in clean fashion from scratch.
2015-01-22 01:23:16 +00:00
Craig Rodrigues
7259906eb0 Do not initialize pfi_unlnkdkifs_mtx and pf_frag_mtx.
They are already initialized by MTX_SYSINIT.

Submitted by: Nikos Vassiliadis <nvass@gmx.com>
2015-01-08 17:49:07 +00:00
Craig Rodrigues
8d665c6ba8 Reapply previous patch to fix build.
PR: 194515
2015-01-06 16:47:02 +00:00
Craig Rodrigues
4de985af0b Instead of creating a purge thread for every vnet, create
a single purge thread and clean up all vnets from this thread.

PR:                     194515
Differential Revision:  D1315
Submitted by:           Nikos Vassiliadis <nvass@gmx.com>
2015-01-06 09:03:03 +00:00
Craig Rodrigues
c75820c756 Merge: r258322 from projects/pf branch
Split functions that initialize various pf parts into their
    vimage parts and global parts.
    Since global parts appeared to be only mutex initializations, just
    abandon them and use MTX_SYSINIT() instead.
    Kill my incorrect VNET_FOREACH() iterator and instead use correct
    approach with VNET_SYSINIT().

PR:			194515
Differential Revision:	D1309
Submitted by: 		glebius, Nikos Vassiliadis <nvass@gmx.com>
Reviewed by: 		trociny, zec, gnn
2015-01-06 08:39:06 +00:00
Ermal Luçi
7b56cc430a pf(4) needs to have a correct checksum during its processing.
Calculate checksums for the IPv6 path when needed before
delving into pf(4) code as required.

PR:     172648, 179392
Reviewed by:    glebius@
Approved by:    gnn@
Obtained from:  pfSense
MFC after:      1 week
Sponsored by:   Netgate
2014-11-19 13:31:08 +00:00
Alexander V. Chernikov
5b07fc31cc Finish r274315: remove union 'u' from struct pf_send_entry.
Suggested by:	kib
2014-11-09 17:01:54 +00:00
Alexander V. Chernikov
a458ad86ee Remove unused 'struct route' fields. 2014-11-09 16:15:28 +00:00
Gleb Smirnoff
6df8a71067 Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.
Sponsored by:	Nginx, Inc.
2014-11-07 09:39:05 +00:00
Hans Petter Selasky
f0188618f2 Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
Dag-Erling Smørgrav
99e9de871a Add a complete implementation of MurmurHash3. Tweak both implementations
so they match the established idiom.  Document them in hash(9).

MFC after:	1 month
MFC with:	r272906
2014-10-18 22:15:11 +00:00
George V. Neville-Neil
1d2baefc13 Change the PF hash from Jenkins to Murmur3. In forwarding tests
this showed a conservative 3% incrase in PPS.

Differential Revision:	https://reviews.freebsd.org/D461
Submitted by:	des
Reviewed by:	emaste
MFC after:	1 month
2014-10-10 19:26:26 +00:00
Alexander V. Chernikov
31f0d081d8 Remove lock init from radix.c.
Radix has never managed its locking itself.
The only consumer using radix with embeded rwlock
is system routing table. Move per-AF lock inits there.
2014-10-01 14:39:06 +00:00
Gleb Smirnoff
495a22b595 Use rn_detachhead() instead of direct free(9) for radix tables.
Sponsored by:	Nginx, Inc.
2014-10-01 13:35:41 +00:00
Gleb Smirnoff
2a6009bfa6 Mechanically convert to if_inc_counter(). 2014-09-19 09:19:29 +00:00
Gleb Smirnoff
56b61ca27a Remove ifq_drops from struct ifqueue. Now queue drops are accounted in
struct ifnet if_oqdrops.

Some netgraph modules used ifqueue w/o ifnet. Accounting of queue drops
is simply removed from them. There were no API to read this statistic.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-19 09:01:19 +00:00
Gleb Smirnoff
450cecf0a0 - Provide a sleepable lock to protect against ioctl() vs ioctl() races.
- Use the new lock to protect against simultaneous DIOCSTART and/or
  DIOCSTOP ioctls.

Reported & tested by:	jmallett
Sponsored by:		Nginx, Inc.
2014-09-12 08:39:15 +00:00
Gleb Smirnoff
bf7dcda366 Clean up unused CSUM_FRAGMENT.
Sponsored by:	Nginx, Inc.
2014-09-03 08:30:18 +00:00
Gleb Smirnoff
b616ae250c Explicitly free packet on PF_DROP, otherwise a "quick" rule with
"route-to" may still forward it.

PR:		177808
Submitted by:	Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de>
Sponsored by:	InnoGames GmbH
2014-09-01 13:00:45 +00:00
Gleb Smirnoff
e85343b1a5 Do not lookup source node twice when pf_map_addr() is used.
PR:		184003
Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by:	InnoGames GmbH
2014-08-15 14:16:08 +00:00
Gleb Smirnoff
afab0f7e01 pf_map_addr() can fail and in this case we should drop the packet,
otherwise bad consequences including a routing loop can occur.

Move pf_set_rt_ifp() earlier in state creation sequence and
inline it, cutting some extra code.

PR:		183997
Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by:	InnoGames GmbH
2014-08-15 14:02:24 +00:00
Gleb Smirnoff
11341cf97e Fix synproxy with IPv6. pf_test6() was missing a check for M_SKIP_FIREWALL.
PR:		127920
Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by:	InnoGames GmbH
2014-08-15 04:35:34 +00:00
Kevin Lo
73d76e77b6 Change pr_output's prototype to avoid the need for explicit casts.
This is a follow up to r269699.

Phabric:	D564
Reviewed by:	jhb
2014-08-15 02:43:02 +00:00
Gleb Smirnoff
a9572d8f02 - Count global pf(4) statistics in counter(9).
- Do not count global number of states and of src_nodes,
  use uma_zone_get_cur() to obtain values.
- Struct pf_status becomes merely an ioctl API structure,
  and moves to netpfil/pf/pf.h with its constants.
- V_pf_status is now of type struct pf_kstatus.

Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by:	InnoGames GmbH
2014-08-14 18:57:46 +00:00
Kevin Lo
8f5a8818f5 Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have
only one protocol switch structure that is shared between ipv4 and ipv6.

Phabric:	D476
Reviewed by:	jhb
2014-08-08 01:57:15 +00:00
Gleb Smirnoff
8ff2bd98d6 On machines with strict alignment copy pfsync_state_key from packet
on stack to avoid unaligned access.

PR:		187381
Submitted by:	Lytochkin Boris <lytboris gmail.com>
2014-07-10 12:41:58 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
John Baldwin
b437b06c79 Fix pf(4) to build with MAXCPU set to 256. MAXCPU is actually a count,
not a maximum ID value (so it is a cap on mp_ncpus, not mp_maxid).
2014-05-29 19:17:10 +00:00
Gleb Smirnoff
0e4f18aa68 o In pf_normalize_ip() we don't need mtag in
!(PFRULE_FRAGCROP|PFRULE_FRAGDROP) case.
o In the (PFRULE_FRAGCROP|PFRULE_FRAGDROP) case we should allocate mtag
  if we don't find any.

Tested by:	Ian FREISLICH <ianf cloudseed.co.za>
2014-05-17 12:30:27 +00:00
Gleb Smirnoff
53f4b0cf9b The current API for adding rules with pool addresses is the following:
- DIOCADDADDR adds addresses and puts them into V_pf_pabuf
- DIOCADDRULE takes all addresses from V_pf_pabuf and links
  them into rule.

The ugly part is that if address is a table, then it is initialized
in DIOCADDRULE, because we need ruleset, and DIOCADDADDR doesn't
supply ruleset. But if address is a dynaddr, we need address family,
and address family could be different for different addresses in one
rule, so dynaddr is initialized in DIOCADDADDR.

This leads to the entangled state of addresses on V_pf_pabuf. Some are
initialized, and some not. That's why running pf_empty_pool(&V_pf_pabuf)
can lead to a panic on a NULL table address.

Since proper fix requires API/ABI change, for now simply plug the panic
in pf_empty_pool().

Reported by:	danger
2014-04-25 11:36:11 +00:00
Martin Matuska
ecb47cf9c5 Backport from projects/pf r263908:
De-virtualize UMA zone pf_mtag_z and move to global initialization part.

The m_tag struct does not know about vnet context and the pf_mtag_free()
callback is called unaware of current vnet. This causes a panic.

MFC after:	1 week
2014-04-20 09:17:48 +00:00
Gleb Smirnoff
79bde95f51 Backout r257223,r257224,r257225,r257246,r257710. The changes caused
some regressions in ICMP handling, and right now me and Baptiste
are out of time on analyzing them.

PR:		188253
2014-04-16 09:25:20 +00:00
Martin Matuska
42311ccc0b Merge from projects/pf r264198:
Execute pf_overload_task() in vnet context. Fixes a vnet kernel panic.

Reviewed by:	trociny
MFC after:	1 week
2014-04-07 07:06:13 +00:00