Commit Graph

312 Commits

Author SHA1 Message Date
Randall Stewart
7c4676ddee This fixes several places where callout_stops return is examined. The
new return codes of -1 were mistakenly being considered "true". Callout_stop
now returns -1 to indicate the callout had either already completed or
was not running and 0 to indicate it could not be stopped.  Also update
the manual page to make it more consistent no non-zero in the callout_stop
or callout_reset descriptions.

MFC after:	1 Month with associated callout change.
2015-11-13 22:51:35 +00:00
Kristof Provost
5a505b317a pf: Fix broken rule skip calculation
r289932 accidentally broke the rule skip calculation. The address family
argument to PF_ANEQ() is now important, and because it was set to 0 the macro
always evaluated to false.
This resulted in incorrect skip values, which in turn broke the rule
evaluations.
2015-11-07 23:51:42 +00:00
Kristof Provost
679e3c77b7 pf: Fix IPv6 checksums with route-to.
When using route-to (or reply-to) pf sends the packet directly to the output
interface. If that interface doesn't support checksum offloading the checksum
has to be calculated in software.
That was already done in the IPv4 case, but not for the IPv6 case. As a result
we'd emit packets with pseudo-header checksums (i.e. incorrect checksums).

This issue was exposed by the changes in r289316 when pf stopped performing full
checksum calculations for all packets.

Submitted by:	Luoqi Chen
MFC after:	1 week
2015-10-29 20:45:53 +00:00
Alexander V. Chernikov
78546dad4e Eliminate last rtalloc_ign() caller.
Differential Revision:	https://reviews.freebsd.org/D3927
2015-10-27 21:25:40 +00:00
Kristof Provost
c110fc49da pf: Fix TSO issues
In certain configurations (mostly but not exclusively as a VM on Xen) pf
produced packets with an invalid TCP checksum.

The problem was that pf could only handle packets with a full checksum. The
FreeBSD IP stack produces TCP packets with a pseudo-header checksum (only
addresses, length and protocol).
Certain network interfaces expect to see the pseudo-header checksum, so they
end up producing packets with invalid checksums.

To fix this stop calculating the full checksum and teach pf to only update TCP
checksums if TSO is disabled or the change affects the pseudo-header checksum.

PR:		154428, 193579, 198868
Reviewed by:	sbruno
MFC after:	1 week
Relnotes:	yes
Sponsored by:	RootBSD
Differential Revision:	https://reviews.freebsd.org/D3779
2015-10-14 16:21:41 +00:00
Alexander V. Chernikov
1fe201c322 Simplify the way of attaching IPv6 link-layer header.
Problem description:
How do we currently perform layer 2 resolution and header imposition:

For IPv4 we have the following chain:
  ip_output() -> (ether|atm|whatever)_output() -> arpresolve()

Lookup is done in proper place (link-layer output routine) and it is possible
  to provide cached lle data.

For IPv6 situation is more complex:
  ip6_output() -> nd6_output() -> nd6_output_ifp() -> (whatever)_output() ->
    nd6_storelladdr()

We have ip6_ouput() which calls nd6_output() instead of link output routine.
nd6_output() does the following:
  * checks if lle exists, creates it if needed (similar to arpresolve())
  * performes lle state transitions (similar to arpresolve())
  * calls nd6_output_ifp() which pushes packets to link output routine along
    with running SeND/MAC hooks regardless of lle state
    (e.g. works as run-hooks placeholder).

After that, iface output routine like ether_output() calls nd6_storelladdr()
  which performs lle lookup once again.

As a result, we perform lookup twice for each outgoing packet for most types
  of interfaces. We also need to maintain runtime-checked table of 'nd6-free'
  interfaces (see nd6_need_cache()).

Fix this behavior by eliminating first ND lookup. To be more specific:
  * make all nd6_output() consumers use nd6_output_ifp() instead
  * rename nd6_output[_slow]() to nd6_resolve_[slow]()
  * convert nd6_resolve() and nd6_resolve_slow() to arpresolve() semantics,
    e.g. copy L2 address to buffer instead of pushing packet towards lower
    layers
  * Make all nd6_storelladdr() users use nd6_resolve()
  * eliminate nd6_storelladdr()

The resulting callchain is the following:
  ip6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_resolve()

Error handling:
Currently sending packet to non-existing la results in ip6_<output|forward>
  -> nd6_output() -> nd6_output _lle() which returns 0.
In new scenario packet is propagated to <ether|whatever>_output() ->
  nd6_resolve() which will return EWOULDBLOCK, and that result
  will be converted to 0.

(And EWOULDBLOCK is actually used by IB/TOE code).

Sponsored by:		Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D1469
2015-09-16 14:26:28 +00:00
Kristof Provost
2f6c345adf pf: Fix misdetection of forwarding when net.link.bridge.pfil_bridge is set
If net.link.bridge.pfil_bridge is set we can end up thinking we're forwarding in
pf_test6() because the rcvif and the ifp (output interface) are different.
In that case we're bridging though, and the rcvif the the bridge member on which
the packet was received and ifp is the bridge itself.
If we'd set dir to PF_FWD we'd end up calling ip6_forward() which is incorrect.

Instead check if the rcvif is a member of the ifp bridge. (In other words, the
if_bridge is the ifp's softc). If that's the case we're not forwarding but
bridging.

PR:	202351
Reviewed by:	eri
Differential Revision:	https://reviews.freebsd.org/D3534
2015-09-01 19:04:04 +00:00
Kristof Provost
64b3b4d611 pf: Remove support for 'scrub fragment crop|drop-ovl'
The crop/drop-ovl fragment scrub modes are not very useful and likely to confuse
users into making poor choices.
It's also a fairly large amount of complex code, so just remove the support
altogether.

Users who have 'scrub fragment crop|drop-ovl' in their pf configuration will be
implicitly converted to 'scrub fragment reassemble'.

Reviewed by:	gnn, eri
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D3466
2015-08-27 21:27:47 +00:00
Luiz Otavio O Souza
22932fc9be Reapply r196551 which was accidentally reverted by r223637 (update to
OpenBSD pf 4.5).

Fix argument ordering to memcpy as well as the size of the copy in the
(theoretical) case that pfi_buffer_cnt should be greater than ~_max.

This fix the failure when you hit the self table size and force it to be
resized.

MFC after:	3 days
Sponsored by:	Rubicon Communications (Netgate)
2015-08-24 21:41:05 +00:00
Luiz Otavio O Souza
0a70aaf8f5 Add ALTQ(9) support for the CoDel algorithm.
CoDel is a parameterless queue discipline that handles variable bandwidth
and RTT.

It can be used as the single queue discipline on an interface or as a sub
discipline of existing queue disciplines such as PRIQ, CBQ, HFSC, FAIRQ.

Differential Revision:	https://reviews.freebsd.org/D3272
Reviewd by:	rpaulo, gnn (previous version)
Obtained from:	pfSense
Sponsored by:	Rubicon Communications (Netgate)
2015-08-21 22:02:22 +00:00
Luiz Otavio O Souza
f2fc809dcd Fix the copy of addresses passed from userland in table replace command.
The size2 is the maximum userland buffer size (used when the addresses are
copied back to userland).

Obtained from:	pfSense
MFC after:	3 days
Sponsored by:	Rubicon Communications (Netgate)
2015-08-17 23:03:54 +00:00
Mariusz Zaborski
643ef281cd Use correct src/dst ports when removing states.
Submitted by:	Milosz Kaniewski <m.kaniewski@wheelsystems.com>,
		UMEZAWA Takeshi <umezawa@iij.ad.jp> (orginal)
Reviewed by:	glebius
Approved by:	pjd (mentor)
Obtained from:	OpenBSD
MFC after:	3 days
2015-08-11 17:24:34 +00:00
Kristof Provost
48c29b118e pf: Always initialise pf_fragment.fr_flags
When we allocate the struct pf_fragment in pf_fillup_fragment() we forgot to
initialise the fr_flags field. As a result we sometimes mistakenly thought the
fragment to not be a buffered fragment. This resulted in panics because we'd end
up freeing the pf_fragment but not removing it from V_pf_fragqueue (believing it
to be part of V_pf_cachequeue).
The next time we iterated V_pf_fragqueue we'd use a freed object and panic.

While here also fix a pf_fragment use after free in pf_normalize_ip().
pf_reassemble() frees the pf_fragment, so we can't use it any more.

PR:		201879, 201932
MFC after:	5 days
2015-07-29 06:35:36 +00:00
Renato Botelho
299c819a75 Simplify logic added in r285945 as suggested by glebius
Approved by:	glebius
MFC after:	3 days
Sponsored by:	Netgate
2015-07-28 14:59:29 +00:00
Renato Botelho
b1b98a2db7 Respect pf rule log option before log dropped packets with IP options or
dangerous v6 headers

Reviewed by:	gnn, eri
Approved by:	gnn
Obtained from:	pfSense
MFC after:	3 days
Sponsored by:	Netgate
Differential Revision:	https://reviews.freebsd.org/D3222
2015-07-28 10:31:34 +00:00
Gleb Smirnoff
3e437fd2c6 Fix a typo in r280169. Of course we are interested in deleting nsn only
if we have just created it and we were the last reference.

Submitted by:	dhartmei
2015-07-28 09:36:26 +00:00
Ermal Luçi
a5b789f65a ALTQ FAIRQ discipline import from DragonFLY
Differential Revision:  https://reviews.freebsd.org/D2847
Reviewed by:    glebius, wblock(manpage)
Approved by:    gnn(mentor)
Obtained from:  pfSense
Sponsored by:   Netgate
2015-06-24 19:16:41 +00:00
Kristof Provost
06ba348d27 pf: Remove frc_direction
We don't use the direction of the fragments for anything. The frc_direction
field is assigned, but never read.
Just remove it.

Differential Revision:	https://reviews.freebsd.org/D2773
Approved by:	philip (mentor)
2015-06-11 17:57:47 +00:00
Kristof Provost
837b925aba pf: Save the protocol number in the pf_fragment
When we try to look up a pf_fragment with pf_find_fragment() we compare (see
pf_frag_compare()) addresses (and family), id but also protocol.  We failed to
save the protocol to the pf_fragment in pf_fragcache(), resulting in failing
reassembly.

Differential Revision:	https://reviews.freebsd.org/D2772
2015-06-11 13:26:16 +00:00
Kristof Provost
0b7eba6ad4 pf: address family must be set when creating a pf_fragment
Fix a panic when handling fragmented ip4 packets with 'drop-ovl' set.
In that scenario we take a different branch in pf_normalize_ip(), taking us to
pf_fragcache() (rather than pf_reassemble()). In pf_fragcache() we create a
pf_fragment, but do not set the address family. This leads to a panic when we
try to insert that into pf_frag_tree because pf_addr_cmp(), which is used to
compare the pf_fragments doesn't know what to do if the address family is not
set.

Simply ensure that the address family is set correctly (always AF_INET in this
path).

PR:			200330
Differential Revision:	https://reviews.freebsd.org/D2769
Approved by:		philip (mentor), gnn (mentor)
2015-06-10 13:44:04 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
Gleb Smirnoff
3dd01a884c Use MTX_SYSINIT() instead of mtx_init() to separate mutex initialization
from associated structures initialization.  The mutexes are global, while
the structures are per-vnet.

Submitted by:	Nikos Vassiliadis <nvass gmx.com>
2015-05-19 14:04:21 +00:00
Gleb Smirnoff
30fe681e44 During module unload unlock rules before destroying UMA zones, which
may sleep in uma_drain(). It is safe to unlock here, since we are already
dehooked from pfil(9) and all pf threads had quit.

Sponsored by:	Nginx, Inc.
2015-05-19 14:02:40 +00:00
Gleb Smirnoff
78680d05d1 A miss from r283061: don't dereference NULL is pf_get_mtag() fails.
PR:		200222
Submitted by:	Franco Fichtner <franco opnsense.org>
2015-05-18 15:51:27 +00:00
Gleb Smirnoff
b7f69c506d Don't dereference NULL is pf_get_mtag() fails.
PR:		200222
Submitted by:	Franco Fichtner <franco opnsense.org>
2015-05-18 15:05:12 +00:00
Gleb Smirnoff
772e66a6fc Move ALTQ from contrib to net/altq. The ALTQ code is for many years
discontinued by its initial authors. In FreeBSD the code was already
slightly edited during the pf(4) SMP project. It is about to be edited
more in the projects/ifnet. Moving out of contrib also allows to remove
several hacks to the make glue.

Reviewed by:	net@
2015-04-16 20:22:40 +00:00
Kristof Provost
3d1bbe5fa0 pf: Fix forwarding detection
If the direction is not PF_OUT we can never be forwarding. Some input packets
have rcvif != ifp (looped back packets), which lead us to ip6_forward() inbound
packets, causing panics.

Equally, we need to ensure that packets were really received and not locally
generated before trying to ip6_forward() them.

Differential Revision:	https://reviews.freebsd.org/D2286
Approved by:		gnn(mentor)
2015-04-14 19:07:37 +00:00
George V. Neville-Neil
916e17fd56 I can find no reason to allow packets with both SYN and FIN bits
set past this point in the code. The packet should be dropped and
not massaged as it is here.

Differential Revision:  https://reviews.freebsd.org/D2266
Submitted by: eri
Sponsored by: Rubicon Communications (Netgate)
2015-04-14 14:43:42 +00:00
Kristof Provost
1873dcc8c9 pf: Skip firewall for refragmented ip6 packets
In cases where we scrub (fragment reassemble) on both input and output
we risk ending up in infinite loops when forwarding packets.

Fragmented packets come in and get collected until we can defragment. At
that point the defragmented packet is handed back to the ip stack (at
the pfil point in ip6_input(). Normal processing continues.

Eventually we figure out that the packet has to be forwarded and we end
up at the pfil hook in ip6_forward(). After doing the inspection on the
defragmented packet we see that the packet has been defragmented and
because we're forwarding we have to refragment it.

In pf_refragment6() we split the packet up again and then ip6_forward()
the individual fragments.  Those fragments hit the pfil hook on the way
out, so they're collected until we can reconstruct the full packet, at
which point we're right back where we left off and things continue until
we run out of stack.

Break that loop by marking the fragments generated by pf_refragment6()
as M_SKIP_FIREWALL. There's no point in processing those packets in the
firewall anyway. We've already filtered on the full packet.

Differential Revision:	https://reviews.freebsd.org/D2197
Reviewed by:	glebius, gnn
Approved by:	gnn (mentor)
2015-04-06 19:05:00 +00:00
Gleb Smirnoff
6d947416cc o Use new function ip_fillid() in all places throughout the kernel,
where we want to create a new IP datagram.
o Add support for RFC6864, which allows to set IP ID for atomic IP
  datagrams to any value, to improve performance. The behaviour is
  controlled by net.inet.ip.rfc6864 sysctl knob, which is enabled by
  default.
o In case if we generate IP ID, use counter(9) to improve performance.
o Gather all code related to IP ID into ip_id.c.

Differential Revision:		https://reviews.freebsd.org/D2177
Reviewed by:			adrian, cy, rpaulo
Tested by:			Emeric POUPON <emeric.poupon stormshield.eu>
Sponsored by:			Netflix
Sponsored by:			Nginx, Inc.
Relnotes:			yes
2015-04-01 22:26:39 +00:00
Kristof Provost
7dce9b515b pf: Deal with runt packets
On Ethernet packets have a minimal length, so very short packets get padding
appended to them. This padding is not stripped off in ip6_input() (due to
support for IPv6 Jumbograms, RFC2675).
That means PF needs to be careful when reassembling fragmented packets to not
include the padding in the reassembled packet.

While here also remove the 'Magic from ip_input.' bits. Splitting up and
re-joining an mbuf chain here doesn't make any sense.

Differential Revision:	https://reviews.freebsd.org/D2189
Approved by:		gnn (mentor)
2015-04-01 12:16:56 +00:00
Kristof Provost
798318490e Preserve IPv6 fragment IDs accross reassembly and refragmentation
When forwarding fragmented IPv6 packets and filtering with PF we
reassemble and refragment. That means we generate new fragment headers
and a new fragment ID.

We already save the fragment IDs so we can do the reassembly so it's
straightforward to apply the incoming fragment ID on the refragmented
packets.

Differential Revision:	https://reviews.freebsd.org/D2188
Approved by:		gnn (mentor)
2015-04-01 12:15:01 +00:00
Sergey Kandaurov
a4879be402 Static'ize pf_fillup_fragment body to match its declaration.
Missed in 278925.
2015-03-26 13:31:04 +00:00
Gleb Smirnoff
3e8c6d74bb Always lock the hash row of a source node when updating its 'states' counter.
PR:		182401
Sponsored by:	Nginx, Inc.
2015-03-17 12:19:28 +00:00
Andrey V. Elsukov
998fbd14b8 Reset mbuf pointer to NULL in fastroute case to indicate that mbuf was
consumed by filter. This fixes several panics due to accessing to mbuf
after free.

Submitted by:	Kristof Provost
MFC after:	1 week
2015-03-12 08:57:24 +00:00
Gleb Smirnoff
4ac6485cc6 Even more fixes to !INET and !INET6 kernels.
In collaboration with:	pluknet
2015-02-17 22:33:22 +00:00
Gleb Smirnoff
0324938a0f - Improve INET/INET6 scope.
- style(9) declarations.
- Make couple of local functions static.
2015-02-16 23:50:53 +00:00
Gleb Smirnoff
8dc98c2a36 Toss declarations to fix regular build and NO_INET6 build. 2015-02-16 21:52:28 +00:00
Gleb Smirnoff
39a58828ef In the forwarding case refragment the reassembled packets with the same
size as they arrived in. This allows the sender to determine the optimal
fragment size by Path MTU Discovery.

Roughly based on the OpenBSD work by Alexander Bluhm.

Submitted by:		Kristof Provost
Differential Revision:	D1767
2015-02-16 07:01:02 +00:00
Gleb Smirnoff
f5ceb22b78 Update the pf fragment handling code to closer match recent OpenBSD.
That partially fixes IPv6 fragment handling. Thanks to Kristof for
working on that.

Submitted by:		Kristof Provost
Tested by:		peter
Differential Revision:	D1765
2015-02-16 03:38:27 +00:00
Gleb Smirnoff
efc6c51ffa Back out r276841, r276756, r276747, r276746. The change in r276747 is very
very questionable, since it makes vimages more dependent on each other. But
the reason for the backout is that it screwed up shutting down the pf purge
threads, and now kernel immedially panics on pf module unload. Although module
unloading isn't an advertised feature of pf, it is very important for
development process.

I'd like to not backout r276746, since in general it is good. But since it
has introduced numerous build breakages, that later were addressed in
r276841, r276756, r276747, I need to back it out as well. Better replay it
in clean fashion from scratch.
2015-01-22 01:23:16 +00:00
Craig Rodrigues
7259906eb0 Do not initialize pfi_unlnkdkifs_mtx and pf_frag_mtx.
They are already initialized by MTX_SYSINIT.

Submitted by: Nikos Vassiliadis <nvass@gmx.com>
2015-01-08 17:49:07 +00:00
Craig Rodrigues
8d665c6ba8 Reapply previous patch to fix build.
PR: 194515
2015-01-06 16:47:02 +00:00
Craig Rodrigues
4de985af0b Instead of creating a purge thread for every vnet, create
a single purge thread and clean up all vnets from this thread.

PR:                     194515
Differential Revision:  D1315
Submitted by:           Nikos Vassiliadis <nvass@gmx.com>
2015-01-06 09:03:03 +00:00
Craig Rodrigues
c75820c756 Merge: r258322 from projects/pf branch
Split functions that initialize various pf parts into their
    vimage parts and global parts.
    Since global parts appeared to be only mutex initializations, just
    abandon them and use MTX_SYSINIT() instead.
    Kill my incorrect VNET_FOREACH() iterator and instead use correct
    approach with VNET_SYSINIT().

PR:			194515
Differential Revision:	D1309
Submitted by: 		glebius, Nikos Vassiliadis <nvass@gmx.com>
Reviewed by: 		trociny, zec, gnn
2015-01-06 08:39:06 +00:00
Ermal Luçi
7b56cc430a pf(4) needs to have a correct checksum during its processing.
Calculate checksums for the IPv6 path when needed before
delving into pf(4) code as required.

PR:     172648, 179392
Reviewed by:    glebius@
Approved by:    gnn@
Obtained from:  pfSense
MFC after:      1 week
Sponsored by:   Netgate
2014-11-19 13:31:08 +00:00
Alexander V. Chernikov
5b07fc31cc Finish r274315: remove union 'u' from struct pf_send_entry.
Suggested by:	kib
2014-11-09 17:01:54 +00:00
Alexander V. Chernikov
a458ad86ee Remove unused 'struct route' fields. 2014-11-09 16:15:28 +00:00
Gleb Smirnoff
6df8a71067 Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.
Sponsored by:	Nginx, Inc.
2014-11-07 09:39:05 +00:00
Hans Petter Selasky
f0188618f2 Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
Dag-Erling Smørgrav
99e9de871a Add a complete implementation of MurmurHash3. Tweak both implementations
so they match the established idiom.  Document them in hash(9).

MFC after:	1 month
MFC with:	r272906
2014-10-18 22:15:11 +00:00
George V. Neville-Neil
1d2baefc13 Change the PF hash from Jenkins to Murmur3. In forwarding tests
this showed a conservative 3% incrase in PPS.

Differential Revision:	https://reviews.freebsd.org/D461
Submitted by:	des
Reviewed by:	emaste
MFC after:	1 month
2014-10-10 19:26:26 +00:00
Alexander V. Chernikov
31f0d081d8 Remove lock init from radix.c.
Radix has never managed its locking itself.
The only consumer using radix with embeded rwlock
is system routing table. Move per-AF lock inits there.
2014-10-01 14:39:06 +00:00
Gleb Smirnoff
495a22b595 Use rn_detachhead() instead of direct free(9) for radix tables.
Sponsored by:	Nginx, Inc.
2014-10-01 13:35:41 +00:00
Gleb Smirnoff
2a6009bfa6 Mechanically convert to if_inc_counter(). 2014-09-19 09:19:29 +00:00
Gleb Smirnoff
56b61ca27a Remove ifq_drops from struct ifqueue. Now queue drops are accounted in
struct ifnet if_oqdrops.

Some netgraph modules used ifqueue w/o ifnet. Accounting of queue drops
is simply removed from them. There were no API to read this statistic.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-19 09:01:19 +00:00
Gleb Smirnoff
450cecf0a0 - Provide a sleepable lock to protect against ioctl() vs ioctl() races.
- Use the new lock to protect against simultaneous DIOCSTART and/or
  DIOCSTOP ioctls.

Reported & tested by:	jmallett
Sponsored by:		Nginx, Inc.
2014-09-12 08:39:15 +00:00
Gleb Smirnoff
bf7dcda366 Clean up unused CSUM_FRAGMENT.
Sponsored by:	Nginx, Inc.
2014-09-03 08:30:18 +00:00
Gleb Smirnoff
b616ae250c Explicitly free packet on PF_DROP, otherwise a "quick" rule with
"route-to" may still forward it.

PR:		177808
Submitted by:	Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de>
Sponsored by:	InnoGames GmbH
2014-09-01 13:00:45 +00:00
Gleb Smirnoff
e85343b1a5 Do not lookup source node twice when pf_map_addr() is used.
PR:		184003
Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by:	InnoGames GmbH
2014-08-15 14:16:08 +00:00
Gleb Smirnoff
afab0f7e01 pf_map_addr() can fail and in this case we should drop the packet,
otherwise bad consequences including a routing loop can occur.

Move pf_set_rt_ifp() earlier in state creation sequence and
inline it, cutting some extra code.

PR:		183997
Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by:	InnoGames GmbH
2014-08-15 14:02:24 +00:00
Gleb Smirnoff
11341cf97e Fix synproxy with IPv6. pf_test6() was missing a check for M_SKIP_FIREWALL.
PR:		127920
Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by:	InnoGames GmbH
2014-08-15 04:35:34 +00:00
Kevin Lo
73d76e77b6 Change pr_output's prototype to avoid the need for explicit casts.
This is a follow up to r269699.

Phabric:	D564
Reviewed by:	jhb
2014-08-15 02:43:02 +00:00
Gleb Smirnoff
a9572d8f02 - Count global pf(4) statistics in counter(9).
- Do not count global number of states and of src_nodes,
  use uma_zone_get_cur() to obtain values.
- Struct pf_status becomes merely an ioctl API structure,
  and moves to netpfil/pf/pf.h with its constants.
- V_pf_status is now of type struct pf_kstatus.

Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by:	InnoGames GmbH
2014-08-14 18:57:46 +00:00
Kevin Lo
8f5a8818f5 Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have
only one protocol switch structure that is shared between ipv4 and ipv6.

Phabric:	D476
Reviewed by:	jhb
2014-08-08 01:57:15 +00:00
Gleb Smirnoff
8ff2bd98d6 On machines with strict alignment copy pfsync_state_key from packet
on stack to avoid unaligned access.

PR:		187381
Submitted by:	Lytochkin Boris <lytboris gmail.com>
2014-07-10 12:41:58 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
John Baldwin
b437b06c79 Fix pf(4) to build with MAXCPU set to 256. MAXCPU is actually a count,
not a maximum ID value (so it is a cap on mp_ncpus, not mp_maxid).
2014-05-29 19:17:10 +00:00
Gleb Smirnoff
0e4f18aa68 o In pf_normalize_ip() we don't need mtag in
!(PFRULE_FRAGCROP|PFRULE_FRAGDROP) case.
o In the (PFRULE_FRAGCROP|PFRULE_FRAGDROP) case we should allocate mtag
  if we don't find any.

Tested by:	Ian FREISLICH <ianf cloudseed.co.za>
2014-05-17 12:30:27 +00:00
Gleb Smirnoff
53f4b0cf9b The current API for adding rules with pool addresses is the following:
- DIOCADDADDR adds addresses and puts them into V_pf_pabuf
- DIOCADDRULE takes all addresses from V_pf_pabuf and links
  them into rule.

The ugly part is that if address is a table, then it is initialized
in DIOCADDRULE, because we need ruleset, and DIOCADDADDR doesn't
supply ruleset. But if address is a dynaddr, we need address family,
and address family could be different for different addresses in one
rule, so dynaddr is initialized in DIOCADDADDR.

This leads to the entangled state of addresses on V_pf_pabuf. Some are
initialized, and some not. That's why running pf_empty_pool(&V_pf_pabuf)
can lead to a panic on a NULL table address.

Since proper fix requires API/ABI change, for now simply plug the panic
in pf_empty_pool().

Reported by:	danger
2014-04-25 11:36:11 +00:00
Martin Matuska
ecb47cf9c5 Backport from projects/pf r263908:
De-virtualize UMA zone pf_mtag_z and move to global initialization part.

The m_tag struct does not know about vnet context and the pf_mtag_free()
callback is called unaware of current vnet. This causes a panic.

MFC after:	1 week
2014-04-20 09:17:48 +00:00
Gleb Smirnoff
79bde95f51 Backout r257223,r257224,r257225,r257246,r257710. The changes caused
some regressions in ICMP handling, and right now me and Baptiste
are out of time on analyzing them.

PR:		188253
2014-04-16 09:25:20 +00:00
Martin Matuska
42311ccc0b Merge from projects/pf r264198:
Execute pf_overload_task() in vnet context. Fixes a vnet kernel panic.

Reviewed by:	trociny
MFC after:	1 week
2014-04-07 07:06:13 +00:00
Martin Matuska
0a7c583acc Execute pf_overload_task() in vnet context. Fixes a vnet kernel panic.
Reviewed by:	trociny
2014-04-06 19:19:25 +00:00
Martin Matuska
7e92ce7380 De-virtualize UMA zone pf_mtag_z and move to global initialization part.
The m_tag struct does not know about vnet context and the pf_mtag_free()
callback is called unaware of current vnet. This causes a panic.

Reviewed by:	Nikos Vassiliadis, trociny@
2014-03-29 09:05:25 +00:00
Martin Matuska
1709ccf9d3 Merge head up to r263906. 2014-03-29 08:39:53 +00:00
Martin Matuska
d318d97fb5 Merge from projects/pf r251993 (glebius@):
De-vnet hash sizes and hash masks.

Submitted by:	Nikos Vassiliadis <nvass gmx.com>
Reviewed by:	trociny

MFC after:	1 month
2014-03-25 06:55:53 +00:00
Gleb Smirnoff
e3a7aa6f56 - Remove rt_metrics_lite and simply put its members into rtentry.
- Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This
  removes another cache trashing ++ from packet forwarding path.
- Create zini/fini methods for the rtentry UMA zone. Via initialize
  mutex and counter in them.
- Fix reporting of rmx_pksent to routing socket.
- Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode.

The change is mostly targeted for stable/10 merge. For head,
rt_pksent is expected to just disappear.

Discussed with:		melifaro
Sponsored by:		Netflix
Sponsored by:		Nginx, Inc.
2014-03-05 01:17:47 +00:00
Gleb Smirnoff
fb3541ad15 Instead of playing games with casts simply add 3 more members to the
structure pf_rule, that are used when the structure is passed via
ioctl().

PR:		187074
2014-03-05 00:40:03 +00:00
Martin Matuska
5748b897da Merge head up to r262222 (last merge was incomplete). 2014-02-19 22:02:15 +00:00
Martin Matuska
dc64d6b7e1 Revert r262196
I am going to split this into two individual patches and test it with
the projects/pf branch that may get merged later.
2014-02-19 17:06:04 +00:00
Martin Matuska
a93b9a64fe De-virtualize pf_mtag_z [1]
Process V_pf_overloadqueue in vnet context [2]

This fixes two VIMAGE kernel panics and allows to simultaneously run host-pf
and vnet jails. pf inside jails remains broken.

PR:		kern/182964
Submitted by:	glebius@FreeBSD.org [2], myself [1]
Tested by:	rodrigc@FreeBSD.org, myself
MFC after:	2 weeks
2014-02-18 22:17:12 +00:00
Gleb Smirnoff
48278b8846 Once pf became not covered by a single mutex, many counters in it became
race prone. Some just gather statistics, but some are later used in
different calculations.

A real problem was the race provoked underflow of the states_cur counter
on a rule. Once it goes below zero, it wraps to UINT32_MAX. Later this
value is used in pf_state_expires() and any state created by this rule
is immediately expired.

Thus, make fields states_cur, states_tot and src_nodes of struct
pf_rule be counter(9)s.

Thanks to Dennis for providing me shell access to problematic box and
his help with reproducing, debugging and investigating the problem.

Thanks to:		Dennis Yusupoff <dyr smartspb.net>
Also reported by:	dumbbell, pgj, Rambler
Sponsored by:		Nginx, Inc.
2014-02-14 10:05:21 +00:00
Gleb Smirnoff
be3d21a2cf Remove NULL pointer dereference.
CID:	1009118
2014-01-22 15:58:43 +00:00
Gleb Smirnoff
d26bbeb948 Fix resource leak and simplify code for DIOCCHANGEADDR.
CID:	1007035
2014-01-22 15:44:38 +00:00
Gleb Smirnoff
a830c4524d When pf_get_translation() fails, it should leave *sn pointer pristine,
otherwise we will panic in pf_test_rule().

PR:		182557
2014-01-06 19:05:04 +00:00
Dimitry Andric
85838e48bd Fix incorrect header guard define in sys/netpfil/pf/pf.h, which snuck in
in r257186.  Found by clang 3.4.
2013-12-22 19:47:22 +00:00
Gleb Smirnoff
0b5d46ce4d Fix fallout from r258479: in pf_free_src_node() the node must already
be unlinked.

Reported by:	Konstantin Kukushkin <dark rambler-co.ru>
Sponsored by:	Nginx, Inc.
2013-12-22 12:10:36 +00:00
Gleb Smirnoff
19acaecac3 The DIOCKILLSRCNODES operation was implemented with O(m*n) complexity,
where "m" is number of source nodes and "n" is number of states. Thus,
on heavy loaded router its processing consumed a lot of CPU time.

Reimplement it with O(m+n) complexity. We first scan through source
nodes and disconnect matching ones, putting them on the freelist and
marking with a cookie value in their expire field. Then we scan through
the states, detecting references to source nodes with a cookie, and
disconnect them as well. Then the freelist is passed to pf_free_src_nodes().

In collaboration with:	Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de>
PR:		kern/176763
Sponsored by:	InnoGames GmbH
Sponsored by:	Nginx, Inc.
2013-11-22 19:22:26 +00:00
Gleb Smirnoff
d77c1b3269 To support upcoming changes change internal API for source node handling:
- Removed pf_remove_src_node().
- Introduce pf_unlink_src_node() and pf_unlink_src_node_locked().
  These function do not proceed with freeing of a node, just disconnect
  it from storage.
- New function pf_free_src_nodes() works on a list of previously
  disconnected nodes and frees them.
- Utilize new API in pf_purge_expired_src_nodes().

In collaboration with:	Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de>

Sponsored by:	InnoGames GmbH
Sponsored by:	Nginx, Inc.
2013-11-22 19:16:34 +00:00
Gleb Smirnoff
1320f8c0d5 Fix off by ones when scanning source nodes hash.
Sponsored by:	Nginx, Inc.
2013-11-22 18:57:27 +00:00
Gleb Smirnoff
4280d14d2b Style: don't compare unsigned <= 0.
Sponsored by:	Nginx, Inc.
2013-11-22 18:54:06 +00:00
Gleb Smirnoff
654957c2c8 Merge head up to r258343. 2013-11-19 12:21:47 +00:00
Gleb Smirnoff
f053058cee - Split functions that initialize various pf parts into their vimage
parts and global parts.
- Since global parts appeared to be only mutex initializations, just
  abandon them and use MTX_SYSINIT() instead.
- Kill my incorrect VNET_FOREACH() iterator and instead use correct
  approach with VNET_SYSINIT().

Submitted by:	Nikos Vassiliadis <nvass gmx.com>
Reviewed by:	trociny
2013-11-18 22:18:07 +00:00
Gleb Smirnoff
e4e01d9cec Some fixups to pf_get_sport after r257223:
- Do not return blindly if proto isn't ICMP.
- The dport is in network order, so fix comparisons.
- Remove ridiculous htonl(arc4random()).
- Push local variable to a narrower block.
2013-11-14 14:20:35 +00:00
Gleb Smirnoff
50d3286d9d Merge head r232040 through r258006. 2013-11-11 20:33:25 +00:00
Gleb Smirnoff
6c71335c62 Fix fallout from r257223. Since pf_test_state_icmp() can call
pf_icmp_state_lookup() twice, we need to unlock previously found state.

Reported & tested by:	gavin
2013-11-05 16:54:25 +00:00
Gleb Smirnoff
e1b58d2cff Code logic of handling PFTM_PURGE into pf_find_state(). 2013-11-04 08:20:06 +00:00
Gleb Smirnoff
7710f9f14a Remove unused PFTM_UNTIL_PACKET const. 2013-11-04 08:15:59 +00:00
Gleb Smirnoff
f9b2a21c9e Merge head r232040 through r257457.
M    usr.sbin/portsnap/portsnap/portsnap.8
M    usr.sbin/portsnap/portsnap/portsnap.sh
M    usr.sbin/tcpdump/tcpdump/Makefile
2013-10-31 17:33:29 +00:00
Gleb Smirnoff
1ce5620d32 - Fix VIMAGE build.
- Fix build with gcc.
2013-10-28 10:12:19 +00:00
Baptiste Daroussin
0664b03c16 Import pf.c 1.638 from OpenBSD
Original log:
Some ICMP types that also have icmp_id, pointed out by markus@

Obtained from:	OpenBSD
2013-10-27 20:56:23 +00:00
Baptiste Daroussin
5fff3f1010 Improt pf.c 1.636 from OpenBSD
Original log:
Make sure pd2 has a pointer to the icmp header in the payload; fixes
panic seen with some some icmp types in icmp error message payloads.

Obtained from:	OpenBSD
2013-10-27 20:52:09 +00:00
Baptiste Daroussin
44df0d9356 Import pf.c 1.635 and pf_lb.c 1.4 from OpenBSD
Stricter state checking for ICMP and ICMPv6 packets: include the ICMP type

in one port of the state key, using the type to determine which
side should be the id, and which should be the type. Also:
- Handle ICMP6 messages which are typically sent to multicast
  addresses but recieve unicast replies, by doing fallthrough lookups
  against the correct multicast address.  - Clear up some mistaken
  assumptions in the PF code:
- Not all ICMP packets have an icmp_id, so simulate
  one based on other data if we can, otherwise set it to 0.
  - Don't modify the icmp id field in NAT unless it's echo
  - Use the full range of possible id's when NATing icmp6 echoy

Difference with OpenBSD version:
- C99ify the new code
- WITHOUT_INET6 safe

Reviewed by:	glebius
Obtained from:	OpenBSD
2013-10-27 20:44:42 +00:00
Gleb Smirnoff
75bf2db380 Move new pf includes to the pf directory. The pfvar.h remain
in net, to avoid compatibility breakage for no sake.

The future plan is to split most of non-kernel parts of
pfvar.h into pf.h, and then make pfvar.h a kernel only
include breaking compatibility.

Discussed with:		bz
2013-10-27 16:25:57 +00:00
Gleb Smirnoff
eedc7fd9e8 Provide includes that are needed in these files, and before were read
in implicitly via if.h -> if_var.h pollution.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 18:18:50 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Gleb Smirnoff
0bfd163f52 Merge head r233826 through r256722. 2013-10-18 09:32:02 +00:00
Gleb Smirnoff
8fc6e19c2c Merge 1.12 of pf_lb.c from OpenBSD, with some changes. Original commit:
date: 2010/02/04 14:10:12;  author: sthen;  state: Exp;  lines: +24 -19;
  pf_get_sport() picks a random port from the port range specified in a
  nat rule. It should check to see if it's in-use (i.e. matches an existing
  PF state), if it is, it cycles sequentially through other ports until
  it finds a free one. However the check was being done with the state
  keys the wrong way round so it was never actually finding the state
  to be in-use.

  - switch the keys to correct this, avoiding random state collisions
  with nat. Fixes PR 6300 and problems reported by robert@ and viq.

  - check pf_get_sport() return code in pf_test(); if port allocation
  fails the packet should be dropped rather than sent out untranslated.

  Help/ok claudio@.

Some additional changes to 1.12:

- We also need to bzero() the key to zero padding, otherwise key
  won't match.
- Collapse two if blocks into one with ||, since both conditions
  lead to the same processing.
- Only naddr changes in the cycle, so move initialization of other
  fields above the cycle.
- s/u_intXX_t/uintXX_t/g

PR:		kern/181690
Submitted by:	Olivier Cochard-Labbé <olivier cochard.me>
Sponsored by:	Nginx, Inc.
2013-09-02 10:14:25 +00:00
Andre Oppermann
86bd049144 Add m_clrprotoflags() to clear protocol specific mbuf flags at up and
downwards layer crossings.

Consistently use it within IP, IPv6 and ethernet protocols.

Discussed with:	trociny, glebius
2013-08-19 13:27:32 +00:00
Andrey V. Elsukov
415077bad9 Fix a possible NULL-pointer dereference on the pfsync(4) reconfiguration.
Reported by:	Eugene M. Zheganin
2013-07-29 13:17:18 +00:00
Gleb Smirnoff
6828cc99e1 De-vnet hash sizes and hash masks.
Submitted by:	Nikos Vassiliadis <nvass gmx.com>
Reviewed by:	trociny
2013-06-19 13:37:29 +00:00
Gleb Smirnoff
93ecffe50b Improve locking strategy between keys hash and ID hash.
Before this change state creating sequence was:

1) lock wire key hash
2) link state's wire key
3) unlock wire key hash
4) lock stack key hash
5) link state's stack key
6) unlock stack key hash
7) lock ID hash
8) link into ID hash
9) unlock ID hash

What could happen here is that other thread finds the state via key
hash lookup after 6), locks ID hash and does some processing of the
state. When the thread creating state unblocks, it finds the state
it was inserting already non-virgin.

Now we perform proper interlocking between key hash locks and ID hash
lock:

1) lock wire & stack hashes
2) link state's keys
3) lock ID hash
4) unlock wire & stack hashes
5) link into ID hash
6) unlock ID hash

To achieve that, the following hacking was performed in pf_state_key_attach():

- Key hash mutex is marked with MTX_DUPOK.
- To avoid deadlock on 2 key hash mutexes, we lock them in order determined
  by their address value.
- pf_state_key_attach() had a magic to reuse a > FIN_WAIT_2 state. It unlinked
  the conflicting state synchronously. In theory this could require locking
  a third key hash, which we can't do now.
  Now we do not remove the state immediately, instead we leave this task to
  the purge thread. To avoid conflicts in a short period before state is
  purged, we push to the very end of the TAILQ.
- On success, before dropping key hash locks, pf_state_key_attach() locks
  ID hash and returns.

Tested by:	Ian FREISLICH <ianf clue.co.za>
2013-06-13 06:07:19 +00:00
Gleb Smirnoff
5af77b3ebd Return meaningful error code from pf_state_key_attach() and
pf_state_insert().
2013-05-11 18:06:51 +00:00
Gleb Smirnoff
03911dec5b Better debug message. 2013-05-11 18:03:36 +00:00
Gleb Smirnoff
048c95417d Fix DIOCADDSTATE operation. 2013-05-11 17:58:26 +00:00
Gleb Smirnoff
b69d74e834 Invalid creatorid is always EINVAL, not only when we are in verbose mode. 2013-05-11 17:57:52 +00:00
Gleb Smirnoff
f8aa444783 Improve KASSERT() message. 2013-05-06 21:44:06 +00:00
Gleb Smirnoff
7a954bbbce Simplify printf(). 2013-05-06 21:43:15 +00:00
Gleb Smirnoff
47e8d432d5 Add const qualifier to the dst parameter of the ifnet if_output method. 2013-04-26 12:50:32 +00:00
Gleb Smirnoff
dc4ad05ecd Use m_get/m_gethdr instead of compat macros.
Sponsored by:	Nginx, Inc.
2013-03-15 12:55:30 +00:00
Gleb Smirnoff
41a7572b26 Functions m_getm2() and m_get2() have different order of arguments,
and that can drive someone crazy. While m_get2() is young and not
documented yet, change its order of arguments to match m_getm2().

Sorry for churn, but better now than later.
2013-03-12 13:42:47 +00:00
Gleb Smirnoff
e2a55a0021 Finish the r244185. This fixes ever growing counter of pfsync bad
length packets, which was actually harmless.

Note that peers with different version of head/ may grow this
counter, but it is harmless - all pfsync data is processed.

Reported & tested by:	Anton Yuzhaninov <citrin citrin.ru>
Sponsored by:		Nginx, Inc
2013-02-15 09:03:56 +00:00
Gleb Smirnoff
d8aa10cc35 In netpfil/pf:
- Add my copyright to files I've touched a lot this year.
  - Add dash in front of all copyright notices according to style(9).
  - Move $OpenBSD$ down below copyright notices.
  - Remove extra line between cdefs.h and __FBSDID.
2012-12-28 09:19:49 +00:00
Pawel Jakub Dawidek
f5002be657 Warn about reaching various PF limits.
Reviewed by:	glebius
Obtained from:	WHEEL Systems
2012-12-17 10:10:13 +00:00
Mikolaj Golub
bf1e95a21c In pfioctl, if the permission checks failed we returned with vnet context
set.

As the checks don't require vnet context, this is fixed by setting
vnet after the checks.

PR:		kern/160541
Submitted by:	Nikos Vassiliadis (slightly different approach)
2012-12-15 17:19:36 +00:00
Gleb Smirnoff
f094f811fb Fix error in r235991. No-sleep version of IFNET_RLOCK() should
be used here, since we may hold the main pf rulesets rwlock.

Reported by:	Fleuriot Damien <ml my.gd>
2012-12-14 13:01:16 +00:00
Gleb Smirnoff
4c794f5c06 Fix VIMAGE build broken in r244185.
Submitted by:	Nikolai Lifanov <lifanov mail.lifanov.com>
2012-12-14 08:02:35 +00:00
Gleb Smirnoff
9ff7e6e922 Merge rev. 1.119 from OpenBSD:
date: 2009/03/31 01:21:29;  author: dlg;  state: Exp;  lines: +9 -16
  ...

  this also firms up some of the input parsing so it handles short frames a
  bit better.

This actually fixes reading beyond mbuf data area in pfsync_input(), that
may happen at certain pfsync datagrams.
2012-12-13 12:51:22 +00:00
Gleb Smirnoff
feaa4dd2d0 Initialize state id prior to attaching state to key hash. Otherwise a
race can happen, when pf_find_state() finds state via key hash, and locks
id hash slot 0 instead of appropriate to state id slot.
2012-12-13 12:48:57 +00:00
Gleb Smirnoff
fed7635002 Merge 1.127 from OpenBSD, that closes a regression from 1.125 (merged
as r242694):
  do better detection of when we have a better version of the tcp sequence
  windows than our peer.

  this resolves the last of the pfsync traffic storm issues ive been able to
  produce, and therefore makes it possible to do usable active-active
  statuful firewalls with pf.
2012-12-11 08:37:08 +00:00
Gleb Smirnoff
59cc9fde4f Rule memory garbage collecting in new pf scans only states that are on
id hash. If a state has been disconnected from id hash, its rule pointers
can no longer be dereferenced, and referenced memory can't be modified.
Thus, move rule statistics from pf_free_rule() to pf_unlink_rule() and
update them prior to releasing id hash slot lock.

Reported by:	Ian FREISLICH <ianf cloudseed.co.za>
2012-12-06 08:38:14 +00:00
Gleb Smirnoff
38cc0bfa26 Close possible races between state deletion and sent being sent out
from pfsync:
- Call into pfsync_delete_state() holding the state lock.
- Set the state timeout to PFTM_UNLINKED after state has been moved
  to the PFSYNC_S_DEL queue in pfsync.

Reported by:	Ian FREISLICH <ianf cloudseed.co.za>
2012-12-06 08:32:28 +00:00
Gleb Smirnoff
8db7e13f1d Remove extra PFSYNC_LOCK() in pfsync_bulk_update() which lead to lock
recursion.

Reported by:	Ian FREISLICH <ianf cloudseed.co.za>
2012-12-06 08:22:08 +00:00
Gleb Smirnoff
5da39c565b Revert erroneous r242693. A state may have PFTM_UNLINKED being on the
PFSYNC_S_DEL queue of pfsync.
2012-12-06 08:15:06 +00:00
Gleb Smirnoff
f18ab0ffa3 Merge rev. 1.125 from OpenBSD:
date: 2009/06/12 02:03:51;  author: dlg;  state: Exp;  lines: +59 -69
  rewrite the way states from pfsync are merged into the local state tree
  and the conditions on which pfsync will notify its peers on a stale update.

  each side (ie, the sending and receiving side) of the state update is
  compared separately. any side that is further along than the local state
  tree is merged. if any side is further along in the local state table, an
  update is sent out telling the peers about it.
2012-11-07 07:35:05 +00:00
Gleb Smirnoff
d75efebeab It may happen that pfsync holds the last reference on a state. In this
case keys had already been freed. If encountering such state, then
just release last reference.

Not sure this can happen as a runtime race, but can be reproduced by
the following scenario:

- enable pfsync
- disable pfsync
- wait some time
- enable pfsync
2012-11-07 07:30:40 +00:00
Gleb Smirnoff
078468ede4 o Remove last argument to ip_fragment(), and obtain all needed information
on checksums directly from mbuf flags. This simplifies code.
o Clear CSUM_IP from the mbuf in ip_fragment() if we did checksums in
  hardware. Some driver may not announce CSUM_IP in theur if_hwassist,
  although try to do checksums if CSUM_IP set on mbuf. Example is em(4).
o While here, consistently use CSUM_IP instead of its alias CSUM_DELAY_IP.
  After this change CSUM_DELAY_IP vanishes from the stack.

Submitted by:	Sebastian Kuzminsky <seb lineratesystems.com>
2012-10-26 21:06:33 +00:00
Gleb Smirnoff
8f134647ca Switch the entire IPv4 stack to keep the IP packet header
in network byte order. Any host byte order processing is
done in local variables and host byte order values are
never[1] written to a packet.

  After this change a packet processed by the stack isn't
modified at all[2] except for TTL.

  After this change a network stack hacker doesn't need to
scratch his head trying to figure out what is the byte order
at the given place in the stack.

[1] One exception still remains. The raw sockets convert host
byte order before pass a packet to an application. Probably
this would remain for ages for compatibility.

[2] The ip_input() still subtructs header len from ip->ip_len,
but this is planned to be fixed soon.

Reviewed by:	luigi, Maxim Dounin <mdounin mdounin.ru>
Tested by:	ray, Olivier Cochard-Labbe <olivier cochard.me>
2012-10-22 21:09:03 +00:00
Gleb Smirnoff
42a58907c3 Make the "struct if_clone" opaque to users of the cloning API. Users
now use function calls:

  if_clone_simple()
  if_clone_advanced()

to initialize a cloner, instead of macros that initialize if_clone
structure.

Discussed with:		brooks, bz, 1 year ago
2012-10-16 13:37:54 +00:00
Kevin Lo
9823d52705 Revert previous commit...
Pointyhat to:	kevlo (myself)
2012-10-10 08:36:38 +00:00
Kevin Lo
a10cee30c9 Prefer NULL over 0 for pointers 2012-10-09 08:27:40 +00:00
Gleb Smirnoff
b833c0d990 Any pfil(9) hooks should be called with already set VNET context.
Reviewed by:	bz
2012-10-08 23:02:32 +00:00
Gleb Smirnoff
23e9c6dc1e After r241245 it appeared that in_delayed_cksum(), which still expects
host byte order, was sometimes called with net byte order. Since we are
moving towards net byte order throughout the stack, the function was
converted to expect net byte order, and its consumers fixed appropriately:
  - ip_output(), ipfilter(4) not changed, since already call
    in_delayed_cksum() with header in net byte order.
  - divert(4), ng_nat(4), ipfw_nat(4) now don't need to swap byte order
    there and back.
  - mrouting code and IPv6 ipsec now need to switch byte order there and
    back, but I hope, this is temporary solution.
  - In ipsec(4) shifted switch to net byte order prior to in_delayed_cksum().
  - pf_route() catches up on r241245 changes to ip_output().
2012-10-08 08:03:58 +00:00
Gleb Smirnoff
21d172a3f1 A step in resolving mess with byte ordering for AF_INET. After this change:
- All packets in NETISR_IP queue are in net byte order.
  - ip_input() is entered in net byte order and converts packet
    to host byte order right _after_ processing pfil(9) hooks.
  - ip_output() is entered in host byte order and converts packet
    to net byte order right _before_ processing pfil(9) hooks.
  - ip_fragment() accepts and emits packet in net byte order.
  - ip_forward(), ip_mloopback() use host byte order (untouched actually).
  - ip_fastforward() no longer modifies packet at all (except ip_ttl).
  - Swapping of byte order there and back removed from the following modules:
    pf(4), ipfw(4), enc(4), if_bridge(4).
  - Swapping of byte order added to ipfilter(4), based on __FreeBSD_version
  - __FreeBSD_version bumped.
  - pfil(9) manual page updated.

Reviewed by:	ray, luigi, eri, melifaro
Tested by:	glebius (LE), ray (BE)
2012-10-06 10:02:11 +00:00
Gleb Smirnoff
ea2951beed The pfil(9) layer guarantees us presence of the protocol header,
so remove extra check, that is always false.

P.S. Also, goto there lead to unlocking a not locked rwlock.
2012-10-06 07:06:57 +00:00
Gleb Smirnoff
aa955cb5b8 To reduce volume of pfsync traffic:
- Scan request update queue to prevent doubles.
- Do not push undersized daragram in pfsync_update_request().
2012-10-02 12:44:46 +00:00
Gleb Smirnoff
7b6fbb7367 Clear and re-setup all function pointers that glue pf(4) and pfsync(4)
together whenever the pfsync0 is brought down or up respectively.
2012-09-29 20:11:00 +00:00
Gleb Smirnoff
0fa4aaa7e6 Simplify send out queue code:
- Write method of a queue now is void,length of item is taken
  as queue property.
- Write methods don't need to know about mbud, supply just buf
  to them.
- No need for safe queue iterator in pfsync_sendout().

Obtained from:	OpenBSD
2012-09-29 20:02:26 +00:00
Gleb Smirnoff
e2cfe42430 Simplify and somewhat redesign interaction between pf_purge_thread() and
pf_purge_expired_states().

Now pf purging daemon stores the current hash table index on stack
in pf_purge_thread(), and supplies it to next iteration of
pf_purge_expired_states(). The latter returns new index back.

The important change is that whenever pf_purge_expired_states() wraps
around the array it returns immediately. This makes our knowledge about
status of states expiry run more consistent. Prior to this change it
could happen that n-th run stopped on i-th entry, and returned (1) as
full run complete, then next (n+1) full run stopped on j-th entry, where
j < i, and that broke the mark-and-sweep algorythm that saves references
rules. A referenced rule was freed, and this later lead to a crash.
2012-09-28 20:43:03 +00:00
Gleb Smirnoff
51e02a31d0 EBUSY is a better reply for refusing to unload pf(4) or pfsync(4).
Submitted by:	pluknet
2012-09-22 19:03:11 +00:00
Gleb Smirnoff
29bdd62c85 When connection rate hits and we overload a source to a table,
we are actually editing table, which means editing rules,
thus we need writer access to 'em.

Fix this by offloading the update of table to the same taskqueue,
we already use for flushing. Since taskqueues major task is now
overloading, and flushing is optional, do mechanical rename
s/flush/overload/ in the code related to the taskqueue.

Since overloading tasks do unsafe referencing of rules, provide
a bandaid in pf_purge_unlinked_rules(). If the latter sees any
queued tasks, then it skips purging for this run.

In table code:
- Assert any lock in pfr_lookup_addr().
- Assert writer lock in pfr_route_kentry().
2012-09-22 10:14:47 +00:00
Gleb Smirnoff
e706fd3a3a In pfr_insert_kentry() return ENOMEM if memory allocation failed. 2012-09-22 10:04:48 +00:00
Gleb Smirnoff
7348c5240d Fix fallout from r236397 in pfr_update_stats(), that was missed
later in r237155. We need to zero sockaddr before lookup. While
here, make pfr_update_stats() panic on unknown af.
2012-09-22 10:02:44 +00:00
Gleb Smirnoff
b7340ded6e Reduce copy/paste when freeing an source node. 2012-09-20 07:04:08 +00:00
Gleb Smirnoff
22c914789e Utilize Jenkins hash with random seed for source nodes storage. 2012-09-20 06:52:05 +00:00
Gleb Smirnoff
7f7ef494f1 Provide kernel compile time option to make pf(4) default rule to drop.
This is important to secure a small timeframe at boot time, when
network is already configured, but pf(4) is not yet.

PR:		kern/171622
Submitted by:	Olivier Cochard-LabbИ <olivier cochard.me>
2012-09-18 11:07:19 +00:00
Gleb Smirnoff
1d6139c0e4 Make ruleset anchors in pf(4) reentrant. We've got two problems here:
1) Ruleset parser uses a global variable for anchor stack.
2) When processing a wildcard anchor, matching anchors are marked.

To fix the first one:

o Allocate anchor processing stack on stack. To make this allocation
  as small as possible, following measures taken:
  - Maximum stack size reduced from 64 to 32.
  - The struct pf_anchor_stackframe trimmed by one pointer - parent.
    We can always obtain the parent via the rule pointer.
  - When pf_test_rule() calls pf_get_translation(), the former lends
    its stack to the latter, to avoid recursive allocation 32 entries.

The second one appeared more tricky. The code, that marks anchors was
added in OpenBSD rev. 1.516 of pf.c. According to commit log, the idea
is to enable the "quick" keyword on an anchor rule. The feature isn't
documented anywhere. The most obscure part of the 1.516 was that code
examines the "match" mark on a just processed child, which couldn't be
put here by current frame. Since this wasn't documented even in the
commit message and functionality of this is not clear to me, I decided
to drop this examination for now. The rest of 1.516 is redone in a
thread safe manner - the mark isn't put on the anchor itself, but on
current stack frame. To avoid growing stack frame, we utilize LSB
from the rule pointer, relying on kernel malloc(9) returning pointer
aligned addresses.

Discussed with:		dhartmei
2012-09-18 10:54:56 +00:00
Gleb Smirnoff
effbcf3842 Fix DIOCNATLOOK: zero key padding before performing lookup. 2012-09-18 09:15:32 +00:00
Gleb Smirnoff
3b3a8eb937 o Create directory sys/netpfil, where all packet filters should
reside, and move there ipfw(4) and pf(4).

o Move most modified parts of pf out of contrib.

Actual movements:

sys/contrib/pf/net/*.c		-> sys/netpfil/pf/
sys/contrib/pf/net/*.h		-> sys/net/
contrib/pf/pfctl/*.c		-> sbin/pfctl
contrib/pf/pfctl/*.h		-> sbin/pfctl
contrib/pf/pfctl/pfctl.8	-> sbin/pfctl
contrib/pf/pfctl/*.4		-> share/man/man4
contrib/pf/pfctl/*.5		-> share/man/man5

sys/netinet/ipfw		-> sys/netpfil/ipfw

The arguable movement is pf/net/*.h -> sys/net. There are
future plans to refactor pf includes, so I decided not to
break things twice.

Not modified bits of pf left in contrib: authpf, ftp-proxy,
tftp-proxy, pflogd.

The ipfw(4) movement is planned to be merged to stable/9,
to make head and stable match.

Discussed with:		bz, luigi
2012-09-14 11:51:49 +00:00