Commit Graph

782 Commits

Author SHA1 Message Date
Baptiste Daroussin
5fff3f1010 Improt pf.c 1.636 from OpenBSD
Original log:
Make sure pd2 has a pointer to the icmp header in the payload; fixes
panic seen with some some icmp types in icmp error message payloads.

Obtained from:	OpenBSD
2013-10-27 20:52:09 +00:00
Baptiste Daroussin
44df0d9356 Import pf.c 1.635 and pf_lb.c 1.4 from OpenBSD
Stricter state checking for ICMP and ICMPv6 packets: include the ICMP type

in one port of the state key, using the type to determine which
side should be the id, and which should be the type. Also:
- Handle ICMP6 messages which are typically sent to multicast
  addresses but recieve unicast replies, by doing fallthrough lookups
  against the correct multicast address.  - Clear up some mistaken
  assumptions in the PF code:
- Not all ICMP packets have an icmp_id, so simulate
  one based on other data if we can, otherwise set it to 0.
  - Don't modify the icmp id field in NAT unless it's echo
  - Use the full range of possible id's when NATing icmp6 echoy

Difference with OpenBSD version:
- C99ify the new code
- WITHOUT_INET6 safe

Reviewed by:	glebius
Obtained from:	OpenBSD
2013-10-27 20:44:42 +00:00
Gleb Smirnoff
75bf2db380 Move new pf includes to the pf directory. The pfvar.h remain
in net, to avoid compatibility breakage for no sake.

The future plan is to split most of non-kernel parts of
pfvar.h into pf.h, and then make pfvar.h a kernel only
include breaking compatibility.

Discussed with:		bz
2013-10-27 16:25:57 +00:00
Gleb Smirnoff
eedc7fd9e8 Provide includes that are needed in these files, and before were read
in implicitly via if.h -> if_var.h pollution.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 18:18:50 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Gleb Smirnoff
0bfd163f52 Merge head r233826 through r256722. 2013-10-18 09:32:02 +00:00
Philip Paeps
b49bf73f75 Use the correct EtherType for logging IPv6 packets.
Reviewed by:	melifaro
Approved by:	re (kib, glebius)
MFC after:	3 days
2013-09-28 15:49:36 +00:00
Gleb Smirnoff
8fc6e19c2c Merge 1.12 of pf_lb.c from OpenBSD, with some changes. Original commit:
date: 2010/02/04 14:10:12;  author: sthen;  state: Exp;  lines: +24 -19;
  pf_get_sport() picks a random port from the port range specified in a
  nat rule. It should check to see if it's in-use (i.e. matches an existing
  PF state), if it is, it cycles sequentially through other ports until
  it finds a free one. However the check was being done with the state
  keys the wrong way round so it was never actually finding the state
  to be in-use.

  - switch the keys to correct this, avoiding random state collisions
  with nat. Fixes PR 6300 and problems reported by robert@ and viq.

  - check pf_get_sport() return code in pf_test(); if port allocation
  fails the packet should be dropped rather than sent out untranslated.

  Help/ok claudio@.

Some additional changes to 1.12:

- We also need to bzero() the key to zero padding, otherwise key
  won't match.
- Collapse two if blocks into one with ||, since both conditions
  lead to the same processing.
- Only naddr changes in the cycle, so move initialization of other
  fields above the cycle.
- s/u_intXX_t/uintXX_t/g

PR:		kern/181690
Submitted by:	Olivier Cochard-Labbé <olivier cochard.me>
Sponsored by:	Nginx, Inc.
2013-09-02 10:14:25 +00:00
Alexander Motin
5f4fc3dbcb Make dummynet use new direct callout(9) execution mechanism. Since the only
thing done by the dummynet handler is taskqueue_enqueue() call, it doesn't
need extra switch to the clock SWI context.

On idle system this change in half reduces number of active CPU cycles and
wakes up only one CPU from sleep instead of two.

I was going to make this change much earlier as part of calloutng project,
but waited for better solution with skipping idle ticks to be implemented.
Unfortunately with 10.0 release coming it is better get at least this.
2013-08-24 13:34:36 +00:00
Mikolaj Golub
8856400bcb Make ipfw nat init/unint work correctly for VIMAGE:
* Do per vnet instance cleanup (previously it was only for vnet0 on
  module unload, and led to libalias leaks and possible panics due to
  stale pointer dereferences).

* Instead of protecting ipfw hooks registering/deregistering by only
  vnet0 lock (which does not prevent pointers access from another
  vnets), introduce per vnet ipfw_nat_loaded variable. The variable is
  set after hooks are registered and unset before they are deregistered.

* Devirtualize ifaddr_event_tag as we run only one event handler for
  all vnets.

* It is supposed that ifaddr_change event handler is called in the
  interface vnet context, so add an assertion.

Reviewed by:	zec
MFC after:	2 weeks
2013-08-24 11:59:51 +00:00
Andre Oppermann
86bd049144 Add m_clrprotoflags() to clear protocol specific mbuf flags at up and
downwards layer crossings.

Consistently use it within IP, IPv6 and ethernet protocols.

Discussed with:	trociny, glebius
2013-08-19 13:27:32 +00:00
Andrey V. Elsukov
415077bad9 Fix a possible NULL-pointer dereference on the pfsync(4) reconfiguration.
Reported by:	Eugene M. Zheganin
2013-07-29 13:17:18 +00:00
Gleb Smirnoff
6828cc99e1 De-vnet hash sizes and hash masks.
Submitted by:	Nikos Vassiliadis <nvass gmx.com>
Reviewed by:	trociny
2013-06-19 13:37:29 +00:00
Gleb Smirnoff
93ecffe50b Improve locking strategy between keys hash and ID hash.
Before this change state creating sequence was:

1) lock wire key hash
2) link state's wire key
3) unlock wire key hash
4) lock stack key hash
5) link state's stack key
6) unlock stack key hash
7) lock ID hash
8) link into ID hash
9) unlock ID hash

What could happen here is that other thread finds the state via key
hash lookup after 6), locks ID hash and does some processing of the
state. When the thread creating state unblocks, it finds the state
it was inserting already non-virgin.

Now we perform proper interlocking between key hash locks and ID hash
lock:

1) lock wire & stack hashes
2) link state's keys
3) lock ID hash
4) unlock wire & stack hashes
5) link into ID hash
6) unlock ID hash

To achieve that, the following hacking was performed in pf_state_key_attach():

- Key hash mutex is marked with MTX_DUPOK.
- To avoid deadlock on 2 key hash mutexes, we lock them in order determined
  by their address value.
- pf_state_key_attach() had a magic to reuse a > FIN_WAIT_2 state. It unlinked
  the conflicting state synchronously. In theory this could require locking
  a third key hash, which we can't do now.
  Now we do not remove the state immediately, instead we leave this task to
  the purge thread. To avoid conflicts in a short period before state is
  purged, we push to the very end of the TAILQ.
- On success, before dropping key hash locks, pf_state_key_attach() locks
  ID hash and returns.

Tested by:	Ian FREISLICH <ianf clue.co.za>
2013-06-13 06:07:19 +00:00
Gleb Smirnoff
5af77b3ebd Return meaningful error code from pf_state_key_attach() and
pf_state_insert().
2013-05-11 18:06:51 +00:00
Gleb Smirnoff
03911dec5b Better debug message. 2013-05-11 18:03:36 +00:00
Gleb Smirnoff
048c95417d Fix DIOCADDSTATE operation. 2013-05-11 17:58:26 +00:00
Gleb Smirnoff
b69d74e834 Invalid creatorid is always EINVAL, not only when we are in verbose mode. 2013-05-11 17:57:52 +00:00
Gleb Smirnoff
f8aa444783 Improve KASSERT() message. 2013-05-06 21:44:06 +00:00
Gleb Smirnoff
7a954bbbce Simplify printf(). 2013-05-06 21:43:15 +00:00
Alexander V. Chernikov
454189c130 Use unified method for accessing / updating cached rule pointers.
MFC after:	2 weeks
2013-05-04 18:24:30 +00:00
Eitan Adler
578acad37e Correct a few sizeof()s
Submitted by:	swildner@DragonFlyBSD.org
Reviewed by:	alfred
2013-05-01 04:37:34 +00:00
Gleb Smirnoff
a642ae6825 Remove useless ifdef KLD_MODULE from dummynet module unload path. This
fixes panic on unload.

Reported by:	pho
2013-04-29 06:11:19 +00:00
Gleb Smirnoff
47e8d432d5 Add const qualifier to the dst parameter of the ifnet if_output method. 2013-04-26 12:50:32 +00:00
Alexander V. Chernikov
4037b82802 Fix ipfw rule validation partially broken by r248552.
Pointed by:	avg
MFC with:	r248552
2013-04-01 11:28:52 +00:00
Andrey V. Elsukov
5b4661289d When we are removing a specific set, call ipfw_expire_dyn_rules only once.
Obtained from:	Yandex LLC
MFC after:	1 week
2013-03-25 07:43:46 +00:00
Alexander V. Chernikov
ae01d73c04 Add ipfw support for setting/matching DiffServ codepoints (DSCP).
Setting DSCP support is done via O_SETDSCP which works for both
IPv4 and IPv6 packets. Fast checksum recalculation (RFC 1624) is done for IPv4.
Dscp can be specified by name (AFXY, CSX, BE, EF), by value
(0..63) or via tablearg.

Matching DSCP is done via another opcode (O_DSCP) which accepts several
classes at once (af11,af22,be). Classes are stored in bitmask (2 u32 words).

Many people made their variants of this patch, the ones I'm aware of are
(in alphabetic order):

Dmitrii Tejblum
Marcelo Araujo
Roman Bogorodskiy (novel)
Sergey Matveichuk (sem)
Sergey Ryabin

PR:		kern/102471, kern/121122
MFC after:	2 weeks
2013-03-20 10:35:33 +00:00
Andrey V. Elsukov
93bb4f9ed5 Separate the locking macros that are used in the packet flow path
from others. This helps easy switch to use pfil(4) lock.
2013-03-19 06:04:17 +00:00
Gleb Smirnoff
dc4ad05ecd Use m_get/m_gethdr instead of compat macros.
Sponsored by:	Nginx, Inc.
2013-03-15 12:55:30 +00:00
Gleb Smirnoff
41a7572b26 Functions m_getm2() and m_get2() have different order of arguments,
and that can drive someone crazy. While m_get2() is young and not
documented yet, change its order of arguments to match m_getm2().

Sorry for churn, but better now than later.
2013-03-12 13:42:47 +00:00
Alexander V. Chernikov
39bddcde96 Fix callout expiring dynamic rules.
PR:		kern/175530
Submitted by:	Vladimir Spiridenkov <vs@gtn.ru>
MFC after:	2 weeks
2013-03-02 14:47:10 +00:00
Gleb Smirnoff
e2a55a0021 Finish the r244185. This fixes ever growing counter of pfsync bad
length packets, which was actually harmless.

Note that peers with different version of head/ may grow this
counter, but it is harmless - all pfsync data is processed.

Reported & tested by:	Anton Yuzhaninov <citrin citrin.ru>
Sponsored by:		Nginx, Inc
2013-02-15 09:03:56 +00:00
Gleb Smirnoff
d8aa10cc35 In netpfil/pf:
- Add my copyright to files I've touched a lot this year.
  - Add dash in front of all copyright notices according to style(9).
  - Move $OpenBSD$ down below copyright notices.
  - Remove extra line between cdefs.h and __FBSDID.
2012-12-28 09:19:49 +00:00
Alexander V. Chernikov
3abd4586a4 Add parentheses to IP_FW_ARG_TABLEARG() definition.
Suggested by:	glebius
MFC with:	r244633
2012-12-23 18:35:42 +00:00
Alexander V. Chernikov
f37de965cc Use unified IP_FW_ARG_TABLEARG() macro for most tablearg checks.
Log real value instead of IP_FW_TABLEARG (65535) in ipfw_log().

Noticed by:	Vitaliy Tokarenko <rphone@ukr.net>
MFC after:	2 weeks
2012-12-23 16:28:18 +00:00
Pawel Jakub Dawidek
f5002be657 Warn about reaching various PF limits.
Reviewed by:	glebius
Obtained from:	WHEEL Systems
2012-12-17 10:10:13 +00:00
Mikolaj Golub
bf1e95a21c In pfioctl, if the permission checks failed we returned with vnet context
set.

As the checks don't require vnet context, this is fixed by setting
vnet after the checks.

PR:		kern/160541
Submitted by:	Nikos Vassiliadis (slightly different approach)
2012-12-15 17:19:36 +00:00
Gleb Smirnoff
f094f811fb Fix error in r235991. No-sleep version of IFNET_RLOCK() should
be used here, since we may hold the main pf rulesets rwlock.

Reported by:	Fleuriot Damien <ml my.gd>
2012-12-14 13:01:16 +00:00
Gleb Smirnoff
4c794f5c06 Fix VIMAGE build broken in r244185.
Submitted by:	Nikolai Lifanov <lifanov mail.lifanov.com>
2012-12-14 08:02:35 +00:00
Gleb Smirnoff
9ff7e6e922 Merge rev. 1.119 from OpenBSD:
date: 2009/03/31 01:21:29;  author: dlg;  state: Exp;  lines: +9 -16
  ...

  this also firms up some of the input parsing so it handles short frames a
  bit better.

This actually fixes reading beyond mbuf data area in pfsync_input(), that
may happen at certain pfsync datagrams.
2012-12-13 12:51:22 +00:00
Gleb Smirnoff
feaa4dd2d0 Initialize state id prior to attaching state to key hash. Otherwise a
race can happen, when pf_find_state() finds state via key hash, and locks
id hash slot 0 instead of appropriate to state id slot.
2012-12-13 12:48:57 +00:00
Gleb Smirnoff
fed7635002 Merge 1.127 from OpenBSD, that closes a regression from 1.125 (merged
as r242694):
  do better detection of when we have a better version of the tcp sequence
  windows than our peer.

  this resolves the last of the pfsync traffic storm issues ive been able to
  produce, and therefore makes it possible to do usable active-active
  statuful firewalls with pf.
2012-12-11 08:37:08 +00:00
Gleb Smirnoff
59cc9fde4f Rule memory garbage collecting in new pf scans only states that are on
id hash. If a state has been disconnected from id hash, its rule pointers
can no longer be dereferenced, and referenced memory can't be modified.
Thus, move rule statistics from pf_free_rule() to pf_unlink_rule() and
update them prior to releasing id hash slot lock.

Reported by:	Ian FREISLICH <ianf cloudseed.co.za>
2012-12-06 08:38:14 +00:00
Gleb Smirnoff
38cc0bfa26 Close possible races between state deletion and sent being sent out
from pfsync:
- Call into pfsync_delete_state() holding the state lock.
- Set the state timeout to PFTM_UNLINKED after state has been moved
  to the PFSYNC_S_DEL queue in pfsync.

Reported by:	Ian FREISLICH <ianf cloudseed.co.za>
2012-12-06 08:32:28 +00:00
Gleb Smirnoff
8db7e13f1d Remove extra PFSYNC_LOCK() in pfsync_bulk_update() which lead to lock
recursion.

Reported by:	Ian FREISLICH <ianf cloudseed.co.za>
2012-12-06 08:22:08 +00:00
Gleb Smirnoff
5da39c565b Revert erroneous r242693. A state may have PFTM_UNLINKED being on the
PFSYNC_S_DEL queue of pfsync.
2012-12-06 08:15:06 +00:00
Gleb Smirnoff
eb1b1807af Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually
2012-12-05 08:04:20 +00:00
Alexander V. Chernikov
c187c1fbf8 Use common macros for working with rule/dynamic counters.
This is done as preparation to introduce per-cpu ipfw counters.

MFC after:	3 weeks
2012-11-30 19:36:55 +00:00
Alexander V. Chernikov
2e089d5c04 Make ipfw dynamic states operations SMP-ready.
* Global IPFW_DYN_LOCK() is changed to per-bucket mutex.
* State expiration is done in ipfw_tick every second.
* No expiration is done on forwarding path.
* hash table resize is done automatically and does not flush all states.
* Dynamic UMA zone is now allocated per each VNET
* State limiting is now done via UMA(9) api.

Discussed with:	ipfw
MFC after:	3 weeks
Sponsored by:	Yandex LLC
2012-11-30 16:33:22 +00:00
Alexander V. Chernikov
73332e7c82 Simplify sending keepalives.
Prepare ipfw_tick() to be used by other consumers.

Reviewed by:	ae(basically)
MFC after:	2 weeks
2012-11-09 18:23:38 +00:00
Gleb Smirnoff
f18ab0ffa3 Merge rev. 1.125 from OpenBSD:
date: 2009/06/12 02:03:51;  author: dlg;  state: Exp;  lines: +59 -69
  rewrite the way states from pfsync are merged into the local state tree
  and the conditions on which pfsync will notify its peers on a stale update.

  each side (ie, the sending and receiving side) of the state update is
  compared separately. any side that is further along than the local state
  tree is merged. if any side is further along in the local state table, an
  update is sent out telling the peers about it.
2012-11-07 07:35:05 +00:00
Gleb Smirnoff
d75efebeab It may happen that pfsync holds the last reference on a state. In this
case keys had already been freed. If encountering such state, then
just release last reference.

Not sure this can happen as a runtime race, but can be reproduced by
the following scenario:

- enable pfsync
- disable pfsync
- wait some time
- enable pfsync
2012-11-07 07:30:40 +00:00
Alexander V. Chernikov
5d0cd92651 Add assertion to enforce 'nat global' locking requierements changed by r241908.
Suggested by:	adrian, glebius
MFC after:	3 days
2012-11-05 22:54:00 +00:00
Alexander V. Chernikov
a730ff05c1 Use unified print_dyn_rule_flags() function for debugging messages
instead of hand-made printfs in every place.

MFC after:	1 week
2012-11-05 22:30:56 +00:00
Andrey V. Elsukov
ffdbf9da3b Remove the recently added sysctl variable net.pfil.forward.
Instead, add protocol specific mbuf flags M_IP_NEXTHOP and
M_IP6_NEXTHOP. Use them to indicate that the mbuf's chain
contains the PACKET_TAG_IPFORWARD tag. And do a tag lookup
only when this flag is set.

Suggested by:	andre
2012-11-02 01:20:55 +00:00
Gleb Smirnoff
078468ede4 o Remove last argument to ip_fragment(), and obtain all needed information
on checksums directly from mbuf flags. This simplifies code.
o Clear CSUM_IP from the mbuf in ip_fragment() if we did checksums in
  hardware. Some driver may not announce CSUM_IP in theur if_hwassist,
  although try to do checksums if CSUM_IP set on mbuf. Example is em(4).
o While here, consistently use CSUM_IP instead of its alias CSUM_DELAY_IP.
  After this change CSUM_DELAY_IP vanishes from the stack.

Submitted by:	Sebastian Kuzminsky <seb lineratesystems.com>
2012-10-26 21:06:33 +00:00
Andrey V. Elsukov
c1de64a495 Remove the IPFIREWALL_FORWARD kernel option and make possible to turn
on the related functionality in the runtime via the sysctl variable
net.pfil.forward. It is turned off by default.

Sponsored by:	Yandex LLC
Discussed with:	net@
MFC after:	2 weeks
2012-10-25 09:39:14 +00:00
Gleb Smirnoff
8f134647ca Switch the entire IPv4 stack to keep the IP packet header
in network byte order. Any host byte order processing is
done in local variables and host byte order values are
never[1] written to a packet.

  After this change a packet processed by the stack isn't
modified at all[2] except for TTL.

  After this change a network stack hacker doesn't need to
scratch his head trying to figure out what is the byte order
at the given place in the stack.

[1] One exception still remains. The raw sockets convert host
byte order before pass a packet to an application. Probably
this would remain for ages for compatibility.

[2] The ip_input() still subtructs header len from ip->ip_len,
but this is planned to be fixed soon.

Reviewed by:	luigi, Maxim Dounin <mdounin mdounin.ru>
Tested by:	ray, Olivier Cochard-Labbe <olivier cochard.me>
2012-10-22 21:09:03 +00:00
Alexander V. Chernikov
10ab2de085 Remove unnecessary chain read lock in ipfw nat 'global' code.
Document case when ipfw chain lock must be held while calling ipfw_nat().

MFC after:	2 weeks
2012-10-22 19:22:31 +00:00
Gleb Smirnoff
42a58907c3 Make the "struct if_clone" opaque to users of the cloning API. Users
now use function calls:

  if_clone_simple()
  if_clone_advanced()

to initialize a cloner, instead of macros that initialize if_clone
structure.

Discussed with:		brooks, bz, 1 year ago
2012-10-16 13:37:54 +00:00
Kevin Lo
9823d52705 Revert previous commit...
Pointyhat to:	kevlo (myself)
2012-10-10 08:36:38 +00:00
Kevin Lo
a10cee30c9 Prefer NULL over 0 for pointers 2012-10-09 08:27:40 +00:00
Kevin Lo
10d66948a8 Fix typo: s/unknow/unknown 2012-10-09 06:15:16 +00:00
Gleb Smirnoff
b833c0d990 Any pfil(9) hooks should be called with already set VNET context.
Reviewed by:	bz
2012-10-08 23:02:32 +00:00
Gleb Smirnoff
8f35d5f3e9 Catch up with r241245 and do not return packet back in host byte order. 2012-10-08 22:58:28 +00:00
Gleb Smirnoff
23e9c6dc1e After r241245 it appeared that in_delayed_cksum(), which still expects
host byte order, was sometimes called with net byte order. Since we are
moving towards net byte order throughout the stack, the function was
converted to expect net byte order, and its consumers fixed appropriately:
  - ip_output(), ipfilter(4) not changed, since already call
    in_delayed_cksum() with header in net byte order.
  - divert(4), ng_nat(4), ipfw_nat(4) now don't need to swap byte order
    there and back.
  - mrouting code and IPv6 ipsec now need to switch byte order there and
    back, but I hope, this is temporary solution.
  - In ipsec(4) shifted switch to net byte order prior to in_delayed_cksum().
  - pf_route() catches up on r241245 changes to ip_output().
2012-10-08 08:03:58 +00:00
Gleb Smirnoff
21d172a3f1 A step in resolving mess with byte ordering for AF_INET. After this change:
- All packets in NETISR_IP queue are in net byte order.
  - ip_input() is entered in net byte order and converts packet
    to host byte order right _after_ processing pfil(9) hooks.
  - ip_output() is entered in host byte order and converts packet
    to net byte order right _before_ processing pfil(9) hooks.
  - ip_fragment() accepts and emits packet in net byte order.
  - ip_forward(), ip_mloopback() use host byte order (untouched actually).
  - ip_fastforward() no longer modifies packet at all (except ip_ttl).
  - Swapping of byte order there and back removed from the following modules:
    pf(4), ipfw(4), enc(4), if_bridge(4).
  - Swapping of byte order added to ipfilter(4), based on __FreeBSD_version
  - __FreeBSD_version bumped.
  - pfil(9) manual page updated.

Reviewed by:	ray, luigi, eri, melifaro
Tested by:	glebius (LE), ray (BE)
2012-10-06 10:02:11 +00:00
Gleb Smirnoff
ea2951beed The pfil(9) layer guarantees us presence of the protocol header,
so remove extra check, that is always false.

P.S. Also, goto there lead to unlocking a not locked rwlock.
2012-10-06 07:06:57 +00:00
Gleb Smirnoff
aa955cb5b8 To reduce volume of pfsync traffic:
- Scan request update queue to prevent doubles.
- Do not push undersized daragram in pfsync_update_request().
2012-10-02 12:44:46 +00:00
Gleb Smirnoff
7b6fbb7367 Clear and re-setup all function pointers that glue pf(4) and pfsync(4)
together whenever the pfsync0 is brought down or up respectively.
2012-09-29 20:11:00 +00:00
Gleb Smirnoff
0fa4aaa7e6 Simplify send out queue code:
- Write method of a queue now is void,length of item is taken
  as queue property.
- Write methods don't need to know about mbud, supply just buf
  to them.
- No need for safe queue iterator in pfsync_sendout().

Obtained from:	OpenBSD
2012-09-29 20:02:26 +00:00
Gleb Smirnoff
e2cfe42430 Simplify and somewhat redesign interaction between pf_purge_thread() and
pf_purge_expired_states().

Now pf purging daemon stores the current hash table index on stack
in pf_purge_thread(), and supplies it to next iteration of
pf_purge_expired_states(). The latter returns new index back.

The important change is that whenever pf_purge_expired_states() wraps
around the array it returns immediately. This makes our knowledge about
status of states expiry run more consistent. Prior to this change it
could happen that n-th run stopped on i-th entry, and returned (1) as
full run complete, then next (n+1) full run stopped on j-th entry, where
j < i, and that broke the mark-and-sweep algorythm that saves references
rules. A referenced rule was freed, and this later lead to a crash.
2012-09-28 20:43:03 +00:00
Gleb Smirnoff
51e02a31d0 EBUSY is a better reply for refusing to unload pf(4) or pfsync(4).
Submitted by:	pluknet
2012-09-22 19:03:11 +00:00
Gleb Smirnoff
29bdd62c85 When connection rate hits and we overload a source to a table,
we are actually editing table, which means editing rules,
thus we need writer access to 'em.

Fix this by offloading the update of table to the same taskqueue,
we already use for flushing. Since taskqueues major task is now
overloading, and flushing is optional, do mechanical rename
s/flush/overload/ in the code related to the taskqueue.

Since overloading tasks do unsafe referencing of rules, provide
a bandaid in pf_purge_unlinked_rules(). If the latter sees any
queued tasks, then it skips purging for this run.

In table code:
- Assert any lock in pfr_lookup_addr().
- Assert writer lock in pfr_route_kentry().
2012-09-22 10:14:47 +00:00
Gleb Smirnoff
e706fd3a3a In pfr_insert_kentry() return ENOMEM if memory allocation failed. 2012-09-22 10:04:48 +00:00
Gleb Smirnoff
7348c5240d Fix fallout from r236397 in pfr_update_stats(), that was missed
later in r237155. We need to zero sockaddr before lookup. While
here, make pfr_update_stats() panic on unknown af.
2012-09-22 10:02:44 +00:00
Gleb Smirnoff
b7340ded6e Reduce copy/paste when freeing an source node. 2012-09-20 07:04:08 +00:00
Gleb Smirnoff
22c914789e Utilize Jenkins hash with random seed for source nodes storage. 2012-09-20 06:52:05 +00:00
Gleb Smirnoff
7f7ef494f1 Provide kernel compile time option to make pf(4) default rule to drop.
This is important to secure a small timeframe at boot time, when
network is already configured, but pf(4) is not yet.

PR:		kern/171622
Submitted by:	Olivier Cochard-LabbИ <olivier cochard.me>
2012-09-18 11:07:19 +00:00
Gleb Smirnoff
1d6139c0e4 Make ruleset anchors in pf(4) reentrant. We've got two problems here:
1) Ruleset parser uses a global variable for anchor stack.
2) When processing a wildcard anchor, matching anchors are marked.

To fix the first one:

o Allocate anchor processing stack on stack. To make this allocation
  as small as possible, following measures taken:
  - Maximum stack size reduced from 64 to 32.
  - The struct pf_anchor_stackframe trimmed by one pointer - parent.
    We can always obtain the parent via the rule pointer.
  - When pf_test_rule() calls pf_get_translation(), the former lends
    its stack to the latter, to avoid recursive allocation 32 entries.

The second one appeared more tricky. The code, that marks anchors was
added in OpenBSD rev. 1.516 of pf.c. According to commit log, the idea
is to enable the "quick" keyword on an anchor rule. The feature isn't
documented anywhere. The most obscure part of the 1.516 was that code
examines the "match" mark on a just processed child, which couldn't be
put here by current frame. Since this wasn't documented even in the
commit message and functionality of this is not clear to me, I decided
to drop this examination for now. The rest of 1.516 is redone in a
thread safe manner - the mark isn't put on the anchor itself, but on
current stack frame. To avoid growing stack frame, we utilize LSB
from the rule pointer, relying on kernel malloc(9) returning pointer
aligned addresses.

Discussed with:		dhartmei
2012-09-18 10:54:56 +00:00
Gleb Smirnoff
effbcf3842 Fix DIOCNATLOOK: zero key padding before performing lookup. 2012-09-18 09:15:32 +00:00
Gleb Smirnoff
3b3a8eb937 o Create directory sys/netpfil, where all packet filters should
reside, and move there ipfw(4) and pf(4).

o Move most modified parts of pf out of contrib.

Actual movements:

sys/contrib/pf/net/*.c		-> sys/netpfil/pf/
sys/contrib/pf/net/*.h		-> sys/net/
contrib/pf/pfctl/*.c		-> sbin/pfctl
contrib/pf/pfctl/*.h		-> sbin/pfctl
contrib/pf/pfctl/pfctl.8	-> sbin/pfctl
contrib/pf/pfctl/*.4		-> share/man/man4
contrib/pf/pfctl/*.5		-> share/man/man5

sys/netinet/ipfw		-> sys/netpfil/ipfw

The arguable movement is pf/net/*.h -> sys/net. There are
future plans to refactor pf includes, so I decided not to
break things twice.

Not modified bits of pf left in contrib: authpf, ftp-proxy,
tftp-proxy, pflogd.

The ipfw(4) movement is planned to be merged to stable/9,
to make head and stable match.

Discussed with:		bz, luigi
2012-09-14 11:51:49 +00:00