52 Commits

Author SHA1 Message Date
John Baldwin
b437b06c79 Fix pf(4) to build with MAXCPU set to 256. MAXCPU is actually a count,
not a maximum ID value (so it is a cap on mp_ncpus, not mp_maxid).
2014-05-29 19:17:10 +00:00
Martin Matuska
ecb47cf9c5 Backport from projects/pf r263908:
De-virtualize UMA zone pf_mtag_z and move to global initialization part.

The m_tag struct does not know about vnet context and the pf_mtag_free()
callback is called unaware of current vnet. This causes a panic.

MFC after:	1 week
2014-04-20 09:17:48 +00:00
Gleb Smirnoff
79bde95f51 Backout r257223,r257224,r257225,r257246,r257710. The changes caused
some regressions in ICMP handling, and right now me and Baptiste
are out of time on analyzing them.

PR:		188253
2014-04-16 09:25:20 +00:00
Martin Matuska
42311ccc0b Merge from projects/pf r264198:
Execute pf_overload_task() in vnet context. Fixes a vnet kernel panic.

Reviewed by:	trociny
MFC after:	1 week
2014-04-07 07:06:13 +00:00
Martin Matuska
0a7c583acc Execute pf_overload_task() in vnet context. Fixes a vnet kernel panic.
Reviewed by:	trociny
2014-04-06 19:19:25 +00:00
Martin Matuska
7e92ce7380 De-virtualize UMA zone pf_mtag_z and move to global initialization part.
The m_tag struct does not know about vnet context and the pf_mtag_free()
callback is called unaware of current vnet. This causes a panic.

Reviewed by:	Nikos Vassiliadis, trociny@
2014-03-29 09:05:25 +00:00
Martin Matuska
1709ccf9d3 Merge head up to r263906. 2014-03-29 08:39:53 +00:00
Martin Matuska
d318d97fb5 Merge from projects/pf r251993 (glebius@):
De-vnet hash sizes and hash masks.

Submitted by:	Nikos Vassiliadis <nvass gmx.com>
Reviewed by:	trociny

MFC after:	1 month
2014-03-25 06:55:53 +00:00
Gleb Smirnoff
e3a7aa6f56 - Remove rt_metrics_lite and simply put its members into rtentry.
- Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This
  removes another cache trashing ++ from packet forwarding path.
- Create zini/fini methods for the rtentry UMA zone. Via initialize
  mutex and counter in them.
- Fix reporting of rmx_pksent to routing socket.
- Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode.

The change is mostly targeted for stable/10 merge. For head,
rt_pksent is expected to just disappear.

Discussed with:		melifaro
Sponsored by:		Netflix
Sponsored by:		Nginx, Inc.
2014-03-05 01:17:47 +00:00
Martin Matuska
5748b897da Merge head up to r262222 (last merge was incomplete). 2014-02-19 22:02:15 +00:00
Martin Matuska
dc64d6b7e1 Revert r262196
I am going to split this into two individual patches and test it with
the projects/pf branch that may get merged later.
2014-02-19 17:06:04 +00:00
Martin Matuska
a93b9a64fe De-virtualize pf_mtag_z [1]
Process V_pf_overloadqueue in vnet context [2]

This fixes two VIMAGE kernel panics and allows to simultaneously run host-pf
and vnet jails. pf inside jails remains broken.

PR:		kern/182964
Submitted by:	glebius@FreeBSD.org [2], myself [1]
Tested by:	rodrigc@FreeBSD.org, myself
MFC after:	2 weeks
2014-02-18 22:17:12 +00:00
Gleb Smirnoff
48278b8846 Once pf became not covered by a single mutex, many counters in it became
race prone. Some just gather statistics, but some are later used in
different calculations.

A real problem was the race provoked underflow of the states_cur counter
on a rule. Once it goes below zero, it wraps to UINT32_MAX. Later this
value is used in pf_state_expires() and any state created by this rule
is immediately expired.

Thus, make fields states_cur, states_tot and src_nodes of struct
pf_rule be counter(9)s.

Thanks to Dennis for providing me shell access to problematic box and
his help with reproducing, debugging and investigating the problem.

Thanks to:		Dennis Yusupoff <dyr smartspb.net>
Also reported by:	dumbbell, pgj, Rambler
Sponsored by:		Nginx, Inc.
2014-02-14 10:05:21 +00:00
Gleb Smirnoff
be3d21a2cf Remove NULL pointer dereference.
CID:	1009118
2014-01-22 15:58:43 +00:00
Gleb Smirnoff
0b5d46ce4d Fix fallout from r258479: in pf_free_src_node() the node must already
be unlinked.

Reported by:	Konstantin Kukushkin <dark rambler-co.ru>
Sponsored by:	Nginx, Inc.
2013-12-22 12:10:36 +00:00
Gleb Smirnoff
d77c1b3269 To support upcoming changes change internal API for source node handling:
- Removed pf_remove_src_node().
- Introduce pf_unlink_src_node() and pf_unlink_src_node_locked().
  These function do not proceed with freeing of a node, just disconnect
  it from storage.
- New function pf_free_src_nodes() works on a list of previously
  disconnected nodes and frees them.
- Utilize new API in pf_purge_expired_src_nodes().

In collaboration with:	Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de>

Sponsored by:	InnoGames GmbH
Sponsored by:	Nginx, Inc.
2013-11-22 19:16:34 +00:00
Gleb Smirnoff
4280d14d2b Style: don't compare unsigned <= 0.
Sponsored by:	Nginx, Inc.
2013-11-22 18:54:06 +00:00
Gleb Smirnoff
f053058cee - Split functions that initialize various pf parts into their vimage
parts and global parts.
- Since global parts appeared to be only mutex initializations, just
  abandon them and use MTX_SYSINIT() instead.
- Kill my incorrect VNET_FOREACH() iterator and instead use correct
  approach with VNET_SYSINIT().

Submitted by:	Nikos Vassiliadis <nvass gmx.com>
Reviewed by:	trociny
2013-11-18 22:18:07 +00:00
Gleb Smirnoff
50d3286d9d Merge head r232040 through r258006. 2013-11-11 20:33:25 +00:00
Gleb Smirnoff
6c71335c62 Fix fallout from r257223. Since pf_test_state_icmp() can call
pf_icmp_state_lookup() twice, we need to unlock previously found state.

Reported & tested by:	gavin
2013-11-05 16:54:25 +00:00
Gleb Smirnoff
e1b58d2cff Code logic of handling PFTM_PURGE into pf_find_state(). 2013-11-04 08:20:06 +00:00
Gleb Smirnoff
7710f9f14a Remove unused PFTM_UNTIL_PACKET const. 2013-11-04 08:15:59 +00:00
Gleb Smirnoff
f9b2a21c9e Merge head r232040 through r257457.
M    usr.sbin/portsnap/portsnap/portsnap.8
M    usr.sbin/portsnap/portsnap/portsnap.sh
M    usr.sbin/tcpdump/tcpdump/Makefile
2013-10-31 17:33:29 +00:00
Gleb Smirnoff
1ce5620d32 - Fix VIMAGE build.
- Fix build with gcc.
2013-10-28 10:12:19 +00:00
Baptiste Daroussin
0664b03c16 Import pf.c 1.638 from OpenBSD
Original log:
Some ICMP types that also have icmp_id, pointed out by markus@

Obtained from:	OpenBSD
2013-10-27 20:56:23 +00:00
Baptiste Daroussin
5fff3f1010 Improt pf.c 1.636 from OpenBSD
Original log:
Make sure pd2 has a pointer to the icmp header in the payload; fixes
panic seen with some some icmp types in icmp error message payloads.

Obtained from:	OpenBSD
2013-10-27 20:52:09 +00:00
Baptiste Daroussin
44df0d9356 Import pf.c 1.635 and pf_lb.c 1.4 from OpenBSD
Stricter state checking for ICMP and ICMPv6 packets: include the ICMP type

in one port of the state key, using the type to determine which
side should be the id, and which should be the type. Also:
- Handle ICMP6 messages which are typically sent to multicast
  addresses but recieve unicast replies, by doing fallthrough lookups
  against the correct multicast address.  - Clear up some mistaken
  assumptions in the PF code:
- Not all ICMP packets have an icmp_id, so simulate
  one based on other data if we can, otherwise set it to 0.
  - Don't modify the icmp id field in NAT unless it's echo
  - Use the full range of possible id's when NATing icmp6 echoy

Difference with OpenBSD version:
- C99ify the new code
- WITHOUT_INET6 safe

Reviewed by:	glebius
Obtained from:	OpenBSD
2013-10-27 20:44:42 +00:00
Gleb Smirnoff
75bf2db380 Move new pf includes to the pf directory. The pfvar.h remain
in net, to avoid compatibility breakage for no sake.

The future plan is to split most of non-kernel parts of
pfvar.h into pf.h, and then make pfvar.h a kernel only
include breaking compatibility.

Discussed with:		bz
2013-10-27 16:25:57 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Gleb Smirnoff
0bfd163f52 Merge head r233826 through r256722. 2013-10-18 09:32:02 +00:00
Andre Oppermann
86bd049144 Add m_clrprotoflags() to clear protocol specific mbuf flags at up and
downwards layer crossings.

Consistently use it within IP, IPv6 and ethernet protocols.

Discussed with:	trociny, glebius
2013-08-19 13:27:32 +00:00
Gleb Smirnoff
6828cc99e1 De-vnet hash sizes and hash masks.
Submitted by:	Nikos Vassiliadis <nvass gmx.com>
Reviewed by:	trociny
2013-06-19 13:37:29 +00:00
Gleb Smirnoff
93ecffe50b Improve locking strategy between keys hash and ID hash.
Before this change state creating sequence was:

1) lock wire key hash
2) link state's wire key
3) unlock wire key hash
4) lock stack key hash
5) link state's stack key
6) unlock stack key hash
7) lock ID hash
8) link into ID hash
9) unlock ID hash

What could happen here is that other thread finds the state via key
hash lookup after 6), locks ID hash and does some processing of the
state. When the thread creating state unblocks, it finds the state
it was inserting already non-virgin.

Now we perform proper interlocking between key hash locks and ID hash
lock:

1) lock wire & stack hashes
2) link state's keys
3) lock ID hash
4) unlock wire & stack hashes
5) link into ID hash
6) unlock ID hash

To achieve that, the following hacking was performed in pf_state_key_attach():

- Key hash mutex is marked with MTX_DUPOK.
- To avoid deadlock on 2 key hash mutexes, we lock them in order determined
  by their address value.
- pf_state_key_attach() had a magic to reuse a > FIN_WAIT_2 state. It unlinked
  the conflicting state synchronously. In theory this could require locking
  a third key hash, which we can't do now.
  Now we do not remove the state immediately, instead we leave this task to
  the purge thread. To avoid conflicts in a short period before state is
  purged, we push to the very end of the TAILQ.
- On success, before dropping key hash locks, pf_state_key_attach() locks
  ID hash and returns.

Tested by:	Ian FREISLICH <ianf clue.co.za>
2013-06-13 06:07:19 +00:00
Gleb Smirnoff
5af77b3ebd Return meaningful error code from pf_state_key_attach() and
pf_state_insert().
2013-05-11 18:06:51 +00:00
Gleb Smirnoff
03911dec5b Better debug message. 2013-05-11 18:03:36 +00:00
Gleb Smirnoff
7a954bbbce Simplify printf(). 2013-05-06 21:43:15 +00:00
Gleb Smirnoff
dc4ad05ecd Use m_get/m_gethdr instead of compat macros.
Sponsored by:	Nginx, Inc.
2013-03-15 12:55:30 +00:00
Gleb Smirnoff
d8aa10cc35 In netpfil/pf:
- Add my copyright to files I've touched a lot this year.
  - Add dash in front of all copyright notices according to style(9).
  - Move $OpenBSD$ down below copyright notices.
  - Remove extra line between cdefs.h and __FBSDID.
2012-12-28 09:19:49 +00:00
Pawel Jakub Dawidek
f5002be657 Warn about reaching various PF limits.
Reviewed by:	glebius
Obtained from:	WHEEL Systems
2012-12-17 10:10:13 +00:00
Gleb Smirnoff
feaa4dd2d0 Initialize state id prior to attaching state to key hash. Otherwise a
race can happen, when pf_find_state() finds state via key hash, and locks
id hash slot 0 instead of appropriate to state id slot.
2012-12-13 12:48:57 +00:00
Gleb Smirnoff
59cc9fde4f Rule memory garbage collecting in new pf scans only states that are on
id hash. If a state has been disconnected from id hash, its rule pointers
can no longer be dereferenced, and referenced memory can't be modified.
Thus, move rule statistics from pf_free_rule() to pf_unlink_rule() and
update them prior to releasing id hash slot lock.

Reported by:	Ian FREISLICH <ianf cloudseed.co.za>
2012-12-06 08:38:14 +00:00
Gleb Smirnoff
38cc0bfa26 Close possible races between state deletion and sent being sent out
from pfsync:
- Call into pfsync_delete_state() holding the state lock.
- Set the state timeout to PFTM_UNLINKED after state has been moved
  to the PFSYNC_S_DEL queue in pfsync.

Reported by:	Ian FREISLICH <ianf cloudseed.co.za>
2012-12-06 08:32:28 +00:00
Gleb Smirnoff
078468ede4 o Remove last argument to ip_fragment(), and obtain all needed information
on checksums directly from mbuf flags. This simplifies code.
o Clear CSUM_IP from the mbuf in ip_fragment() if we did checksums in
  hardware. Some driver may not announce CSUM_IP in theur if_hwassist,
  although try to do checksums if CSUM_IP set on mbuf. Example is em(4).
o While here, consistently use CSUM_IP instead of its alias CSUM_DELAY_IP.
  After this change CSUM_DELAY_IP vanishes from the stack.

Submitted by:	Sebastian Kuzminsky <seb lineratesystems.com>
2012-10-26 21:06:33 +00:00
Gleb Smirnoff
8f134647ca Switch the entire IPv4 stack to keep the IP packet header
in network byte order. Any host byte order processing is
done in local variables and host byte order values are
never[1] written to a packet.

  After this change a packet processed by the stack isn't
modified at all[2] except for TTL.

  After this change a network stack hacker doesn't need to
scratch his head trying to figure out what is the byte order
at the given place in the stack.

[1] One exception still remains. The raw sockets convert host
byte order before pass a packet to an application. Probably
this would remain for ages for compatibility.

[2] The ip_input() still subtructs header len from ip->ip_len,
but this is planned to be fixed soon.

Reviewed by:	luigi, Maxim Dounin <mdounin mdounin.ru>
Tested by:	ray, Olivier Cochard-Labbe <olivier cochard.me>
2012-10-22 21:09:03 +00:00
Gleb Smirnoff
23e9c6dc1e After r241245 it appeared that in_delayed_cksum(), which still expects
host byte order, was sometimes called with net byte order. Since we are
moving towards net byte order throughout the stack, the function was
converted to expect net byte order, and its consumers fixed appropriately:
  - ip_output(), ipfilter(4) not changed, since already call
    in_delayed_cksum() with header in net byte order.
  - divert(4), ng_nat(4), ipfw_nat(4) now don't need to swap byte order
    there and back.
  - mrouting code and IPv6 ipsec now need to switch byte order there and
    back, but I hope, this is temporary solution.
  - In ipsec(4) shifted switch to net byte order prior to in_delayed_cksum().
  - pf_route() catches up on r241245 changes to ip_output().
2012-10-08 08:03:58 +00:00
Gleb Smirnoff
ea2951beed The pfil(9) layer guarantees us presence of the protocol header,
so remove extra check, that is always false.

P.S. Also, goto there lead to unlocking a not locked rwlock.
2012-10-06 07:06:57 +00:00
Gleb Smirnoff
e2cfe42430 Simplify and somewhat redesign interaction between pf_purge_thread() and
pf_purge_expired_states().

Now pf purging daemon stores the current hash table index on stack
in pf_purge_thread(), and supplies it to next iteration of
pf_purge_expired_states(). The latter returns new index back.

The important change is that whenever pf_purge_expired_states() wraps
around the array it returns immediately. This makes our knowledge about
status of states expiry run more consistent. Prior to this change it
could happen that n-th run stopped on i-th entry, and returned (1) as
full run complete, then next (n+1) full run stopped on j-th entry, where
j < i, and that broke the mark-and-sweep algorythm that saves references
rules. A referenced rule was freed, and this later lead to a crash.
2012-09-28 20:43:03 +00:00
Gleb Smirnoff
29bdd62c85 When connection rate hits and we overload a source to a table,
we are actually editing table, which means editing rules,
thus we need writer access to 'em.

Fix this by offloading the update of table to the same taskqueue,
we already use for flushing. Since taskqueues major task is now
overloading, and flushing is optional, do mechanical rename
s/flush/overload/ in the code related to the taskqueue.

Since overloading tasks do unsafe referencing of rules, provide
a bandaid in pf_purge_unlinked_rules(). If the latter sees any
queued tasks, then it skips purging for this run.

In table code:
- Assert any lock in pfr_lookup_addr().
- Assert writer lock in pfr_route_kentry().
2012-09-22 10:14:47 +00:00
Gleb Smirnoff
b7340ded6e Reduce copy/paste when freeing an source node. 2012-09-20 07:04:08 +00:00
Gleb Smirnoff
22c914789e Utilize Jenkins hash with random seed for source nodes storage. 2012-09-20 06:52:05 +00:00