127 Commits

Author SHA1 Message Date
glebius
a29f5e7ca8 Move ALTQ from contrib to net/altq. The ALTQ code is for many years
discontinued by its initial authors. In FreeBSD the code was already
slightly edited during the pf(4) SMP project. It is about to be edited
more in the projects/ifnet. Moving out of contrib also allows to remove
several hacks to the make glue.

Reviewed by:	net@
2015-04-16 20:22:40 +00:00
kp
859bfca800 pf: Fix forwarding detection
If the direction is not PF_OUT we can never be forwarding. Some input packets
have rcvif != ifp (looped back packets), which lead us to ip6_forward() inbound
packets, causing panics.

Equally, we need to ensure that packets were really received and not locally
generated before trying to ip6_forward() them.

Differential Revision:	https://reviews.freebsd.org/D2286
Approved by:		gnn(mentor)
2015-04-14 19:07:37 +00:00
gnn
be303b042b I can find no reason to allow packets with both SYN and FIN bits
set past this point in the code. The packet should be dropped and
not massaged as it is here.

Differential Revision:  https://reviews.freebsd.org/D2266
Submitted by: eri
Sponsored by: Rubicon Communications (Netgate)
2015-04-14 14:43:42 +00:00
kp
e192a810c5 pf: Skip firewall for refragmented ip6 packets
In cases where we scrub (fragment reassemble) on both input and output
we risk ending up in infinite loops when forwarding packets.

Fragmented packets come in and get collected until we can defragment. At
that point the defragmented packet is handed back to the ip stack (at
the pfil point in ip6_input(). Normal processing continues.

Eventually we figure out that the packet has to be forwarded and we end
up at the pfil hook in ip6_forward(). After doing the inspection on the
defragmented packet we see that the packet has been defragmented and
because we're forwarding we have to refragment it.

In pf_refragment6() we split the packet up again and then ip6_forward()
the individual fragments.  Those fragments hit the pfil hook on the way
out, so they're collected until we can reconstruct the full packet, at
which point we're right back where we left off and things continue until
we run out of stack.

Break that loop by marking the fragments generated by pf_refragment6()
as M_SKIP_FIREWALL. There's no point in processing those packets in the
firewall anyway. We've already filtered on the full packet.

Differential Revision:	https://reviews.freebsd.org/D2197
Reviewed by:	glebius, gnn
Approved by:	gnn (mentor)
2015-04-06 19:05:00 +00:00
glebius
7c22152af0 o Use new function ip_fillid() in all places throughout the kernel,
where we want to create a new IP datagram.
o Add support for RFC6864, which allows to set IP ID for atomic IP
  datagrams to any value, to improve performance. The behaviour is
  controlled by net.inet.ip.rfc6864 sysctl knob, which is enabled by
  default.
o In case if we generate IP ID, use counter(9) to improve performance.
o Gather all code related to IP ID into ip_id.c.

Differential Revision:		https://reviews.freebsd.org/D2177
Reviewed by:			adrian, cy, rpaulo
Tested by:			Emeric POUPON <emeric.poupon stormshield.eu>
Sponsored by:			Netflix
Sponsored by:			Nginx, Inc.
Relnotes:			yes
2015-04-01 22:26:39 +00:00
kp
67c45e2f58 pf: Deal with runt packets
On Ethernet packets have a minimal length, so very short packets get padding
appended to them. This padding is not stripped off in ip6_input() (due to
support for IPv6 Jumbograms, RFC2675).
That means PF needs to be careful when reassembling fragmented packets to not
include the padding in the reassembled packet.

While here also remove the 'Magic from ip_input.' bits. Splitting up and
re-joining an mbuf chain here doesn't make any sense.

Differential Revision:	https://reviews.freebsd.org/D2189
Approved by:		gnn (mentor)
2015-04-01 12:16:56 +00:00
kp
86dedea3cb Preserve IPv6 fragment IDs accross reassembly and refragmentation
When forwarding fragmented IPv6 packets and filtering with PF we
reassemble and refragment. That means we generate new fragment headers
and a new fragment ID.

We already save the fragment IDs so we can do the reassembly so it's
straightforward to apply the incoming fragment ID on the refragmented
packets.

Differential Revision:	https://reviews.freebsd.org/D2188
Approved by:		gnn (mentor)
2015-04-01 12:15:01 +00:00
pluknet
1dcc5ccab3 Static'ize pf_fillup_fragment body to match its declaration.
Missed in 278925.
2015-03-26 13:31:04 +00:00
glebius
d0d9f03f17 Always lock the hash row of a source node when updating its 'states' counter.
PR:		182401
Sponsored by:	Nginx, Inc.
2015-03-17 12:19:28 +00:00
ae
cc29b99b5c Reset mbuf pointer to NULL in fastroute case to indicate that mbuf was
consumed by filter. This fixes several panics due to accessing to mbuf
after free.

Submitted by:	Kristof Provost
MFC after:	1 week
2015-03-12 08:57:24 +00:00
glebius
f9f2edcf7b Even more fixes to !INET and !INET6 kernels.
In collaboration with:	pluknet
2015-02-17 22:33:22 +00:00
glebius
534401756a - Improve INET/INET6 scope.
- style(9) declarations.
- Make couple of local functions static.
2015-02-16 23:50:53 +00:00
glebius
16f1b2f354 Toss declarations to fix regular build and NO_INET6 build. 2015-02-16 21:52:28 +00:00
glebius
15b1e688ce In the forwarding case refragment the reassembled packets with the same
size as they arrived in. This allows the sender to determine the optimal
fragment size by Path MTU Discovery.

Roughly based on the OpenBSD work by Alexander Bluhm.

Submitted by:		Kristof Provost
Differential Revision:	D1767
2015-02-16 07:01:02 +00:00
glebius
9faacbf76a Update the pf fragment handling code to closer match recent OpenBSD.
That partially fixes IPv6 fragment handling. Thanks to Kristof for
working on that.

Submitted by:		Kristof Provost
Tested by:		peter
Differential Revision:	D1765
2015-02-16 03:38:27 +00:00
glebius
12e7b30255 Back out r276841, r276756, r276747, r276746. The change in r276747 is very
very questionable, since it makes vimages more dependent on each other. But
the reason for the backout is that it screwed up shutting down the pf purge
threads, and now kernel immedially panics on pf module unload. Although module
unloading isn't an advertised feature of pf, it is very important for
development process.

I'd like to not backout r276746, since in general it is good. But since it
has introduced numerous build breakages, that later were addressed in
r276841, r276756, r276747, I need to back it out as well. Better replay it
in clean fashion from scratch.
2015-01-22 01:23:16 +00:00
rodrigc
400655a4d3 Do not initialize pfi_unlnkdkifs_mtx and pf_frag_mtx.
They are already initialized by MTX_SYSINIT.

Submitted by: Nikos Vassiliadis <nvass@gmx.com>
2015-01-08 17:49:07 +00:00
rodrigc
89bede2eff Reapply previous patch to fix build.
PR: 194515
2015-01-06 16:47:02 +00:00
rodrigc
b15d5b05bd Instead of creating a purge thread for every vnet, create
a single purge thread and clean up all vnets from this thread.

PR:                     194515
Differential Revision:  D1315
Submitted by:           Nikos Vassiliadis <nvass@gmx.com>
2015-01-06 09:03:03 +00:00
rodrigc
58319f89ed Merge: r258322 from projects/pf branch
Split functions that initialize various pf parts into their
    vimage parts and global parts.
    Since global parts appeared to be only mutex initializations, just
    abandon them and use MTX_SYSINIT() instead.
    Kill my incorrect VNET_FOREACH() iterator and instead use correct
    approach with VNET_SYSINIT().

PR:			194515
Differential Revision:	D1309
Submitted by: 		glebius, Nikos Vassiliadis <nvass@gmx.com>
Reviewed by: 		trociny, zec, gnn
2015-01-06 08:39:06 +00:00
eri
0d0f5282c7 pf(4) needs to have a correct checksum during its processing.
Calculate checksums for the IPv6 path when needed before
delving into pf(4) code as required.

PR:     172648, 179392
Reviewed by:    glebius@
Approved by:    gnn@
Obtained from:  pfSense
MFC after:      1 week
Sponsored by:   Netgate
2014-11-19 13:31:08 +00:00
melifaro
5edf5a79dc Finish r274315: remove union 'u' from struct pf_send_entry.
Suggested by:	kib
2014-11-09 17:01:54 +00:00
melifaro
6b3c0c962e Remove unused 'struct route' fields. 2014-11-09 16:15:28 +00:00
glebius
99f4ec50e8 Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.
Sponsored by:	Nginx, Inc.
2014-11-07 09:39:05 +00:00
hselasky
49c137f7be Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
des
325f19fddb Add a complete implementation of MurmurHash3. Tweak both implementations
so they match the established idiom.  Document them in hash(9).

MFC after:	1 month
MFC with:	r272906
2014-10-18 22:15:11 +00:00
gnn
23f601a6ca Change the PF hash from Jenkins to Murmur3. In forwarding tests
this showed a conservative 3% incrase in PPS.

Differential Revision:	https://reviews.freebsd.org/D461
Submitted by:	des
Reviewed by:	emaste
MFC after:	1 month
2014-10-10 19:26:26 +00:00
melifaro
d8b683d70f Remove lock init from radix.c.
Radix has never managed its locking itself.
The only consumer using radix with embeded rwlock
is system routing table. Move per-AF lock inits there.
2014-10-01 14:39:06 +00:00
glebius
713d87864c Use rn_detachhead() instead of direct free(9) for radix tables.
Sponsored by:	Nginx, Inc.
2014-10-01 13:35:41 +00:00
glebius
16745af543 Mechanically convert to if_inc_counter(). 2014-09-19 09:19:29 +00:00
glebius
72f04611ec Remove ifq_drops from struct ifqueue. Now queue drops are accounted in
struct ifnet if_oqdrops.

Some netgraph modules used ifqueue w/o ifnet. Accounting of queue drops
is simply removed from them. There were no API to read this statistic.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-19 09:01:19 +00:00
glebius
759eeea220 - Provide a sleepable lock to protect against ioctl() vs ioctl() races.
- Use the new lock to protect against simultaneous DIOCSTART and/or
  DIOCSTOP ioctls.

Reported & tested by:	jmallett
Sponsored by:		Nginx, Inc.
2014-09-12 08:39:15 +00:00
glebius
2e01608625 Clean up unused CSUM_FRAGMENT.
Sponsored by:	Nginx, Inc.
2014-09-03 08:30:18 +00:00
glebius
0cbf499e97 Explicitly free packet on PF_DROP, otherwise a "quick" rule with
"route-to" may still forward it.

PR:		177808
Submitted by:	Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de>
Sponsored by:	InnoGames GmbH
2014-09-01 13:00:45 +00:00
glebius
4242d9acba Do not lookup source node twice when pf_map_addr() is used.
PR:		184003
Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by:	InnoGames GmbH
2014-08-15 14:16:08 +00:00
glebius
9227a25906 pf_map_addr() can fail and in this case we should drop the packet,
otherwise bad consequences including a routing loop can occur.

Move pf_set_rt_ifp() earlier in state creation sequence and
inline it, cutting some extra code.

PR:		183997
Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by:	InnoGames GmbH
2014-08-15 14:02:24 +00:00
glebius
45bdeab3db Fix synproxy with IPv6. pf_test6() was missing a check for M_SKIP_FIREWALL.
PR:		127920
Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by:	InnoGames GmbH
2014-08-15 04:35:34 +00:00
kevlo
dd40fa7e62 Change pr_output's prototype to avoid the need for explicit casts.
This is a follow up to r269699.

Phabric:	D564
Reviewed by:	jhb
2014-08-15 02:43:02 +00:00
glebius
7d0b571895 - Count global pf(4) statistics in counter(9).
- Do not count global number of states and of src_nodes,
  use uma_zone_get_cur() to obtain values.
- Struct pf_status becomes merely an ioctl API structure,
  and moves to netpfil/pf/pf.h with its constants.
- V_pf_status is now of type struct pf_kstatus.

Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by:	InnoGames GmbH
2014-08-14 18:57:46 +00:00
kevlo
7727a3c215 Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have
only one protocol switch structure that is shared between ipv4 and ipv6.

Phabric:	D476
Reviewed by:	jhb
2014-08-08 01:57:15 +00:00
glebius
98615618b9 On machines with strict alignment copy pfsync_state_key from packet
on stack to avoid unaligned access.

PR:		187381
Submitted by:	Lytochkin Boris <lytboris gmail.com>
2014-07-10 12:41:58 +00:00
hselasky
35b126e324 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
gjb
fc21f40567 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
hselasky
bd1ed65f0f Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
jhb
91a569ad69 Fix pf(4) to build with MAXCPU set to 256. MAXCPU is actually a count,
not a maximum ID value (so it is a cap on mp_ncpus, not mp_maxid).
2014-05-29 19:17:10 +00:00
glebius
9412c23d6c o In pf_normalize_ip() we don't need mtag in
!(PFRULE_FRAGCROP|PFRULE_FRAGDROP) case.
o In the (PFRULE_FRAGCROP|PFRULE_FRAGDROP) case we should allocate mtag
  if we don't find any.

Tested by:	Ian FREISLICH <ianf cloudseed.co.za>
2014-05-17 12:30:27 +00:00
glebius
597bcfe53d The current API for adding rules with pool addresses is the following:
- DIOCADDADDR adds addresses and puts them into V_pf_pabuf
- DIOCADDRULE takes all addresses from V_pf_pabuf and links
  them into rule.

The ugly part is that if address is a table, then it is initialized
in DIOCADDRULE, because we need ruleset, and DIOCADDADDR doesn't
supply ruleset. But if address is a dynaddr, we need address family,
and address family could be different for different addresses in one
rule, so dynaddr is initialized in DIOCADDADDR.

This leads to the entangled state of addresses on V_pf_pabuf. Some are
initialized, and some not. That's why running pf_empty_pool(&V_pf_pabuf)
can lead to a panic on a NULL table address.

Since proper fix requires API/ABI change, for now simply plug the panic
in pf_empty_pool().

Reported by:	danger
2014-04-25 11:36:11 +00:00
mm
532d55ab5f Backport from projects/pf r263908:
De-virtualize UMA zone pf_mtag_z and move to global initialization part.

The m_tag struct does not know about vnet context and the pf_mtag_free()
callback is called unaware of current vnet. This causes a panic.

MFC after:	1 week
2014-04-20 09:17:48 +00:00
glebius
97ee1da70b Backout r257223,r257224,r257225,r257246,r257710. The changes caused
some regressions in ICMP handling, and right now me and Baptiste
are out of time on analyzing them.

PR:		188253
2014-04-16 09:25:20 +00:00
mm
257cccbfaa Merge from projects/pf r264198:
Execute pf_overload_task() in vnet context. Fixes a vnet kernel panic.

Reviewed by:	trociny
MFC after:	1 week
2014-04-07 07:06:13 +00:00