Commit Graph

100 Commits

Author SHA1 Message Date
Alexander V. Chernikov
a19b3f74af Simplify O_NAT opcode handling.
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2013-11-28 15:28:51 +00:00
Alexander V. Chernikov
1058f17749 Check ipfw table numbers in both user and kernel space before rule addition.
Found by:	Saychik Pavel <umka@localka.net>
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2013-11-28 10:28:28 +00:00
Craig Rodrigues
d04dc94cac In sys/netpfil/ipfw/ip_fw_nat.c:vnet_ipfw_nat_uninit() we call "IPFW_WLOCK(chain);".
This lock gets deleted in sys/netpfil/ipfw/ip_fw2.c:vnet_ipfw_uninit().

Therefore, vnet_ipfw_nat_uninit() *must* be called before vnet_ipfw_uninit(),
but this doesn't always happen, because the VNET_SYSINIT order is the same for both functions.
In sys/net/netpfil/ipfw/ip_fw2.c and sys/net/netpfil/ipfw/ip_fw_nat.c,
IPFW_SI_SUB_FIREWALL == IPFW_NAT_SI_SUB_FIREWALL == SI_SUB_PROTO_IFATTACHDOMAIN
and
IPFW_MODULE_ORDER == IPFW_NAT_MODULE_ORDER

Consequently, if VIMAGE is enabled, and jails are created and destroyed,
the system sometimes crashes, because we are trying to use a deleted lock.

To reproduce the problem:
  (1)  Take a GENERIC kernel config, and add options for: VIMAGE, WITNESS,
       INVARIANTS.
  (2)  Run this command in a loop:
       jail -l -u root -c path=/ name=foo persist vnet && jexec foo ifconfig lo0 127.0.0.1/8 && jail -r foo

       (see http://lists.freebsd.org/pipermail/freebsd-current/2010-November/021280.html )

Fix the problem by increasing the value of IPFW_NAT_SI_SUB_FIREWALL,
so that vnet_ipfw_nat_uninit() runs after vnet_ipfw_uninit().
2013-11-25 20:20:34 +00:00
Gleb Smirnoff
19acaecac3 The DIOCKILLSRCNODES operation was implemented with O(m*n) complexity,
where "m" is number of source nodes and "n" is number of states. Thus,
on heavy loaded router its processing consumed a lot of CPU time.

Reimplement it with O(m+n) complexity. We first scan through source
nodes and disconnect matching ones, putting them on the freelist and
marking with a cookie value in their expire field. Then we scan through
the states, detecting references to source nodes with a cookie, and
disconnect them as well. Then the freelist is passed to pf_free_src_nodes().

In collaboration with:	Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de>
PR:		kern/176763
Sponsored by:	InnoGames GmbH
Sponsored by:	Nginx, Inc.
2013-11-22 19:22:26 +00:00
Gleb Smirnoff
d77c1b3269 To support upcoming changes change internal API for source node handling:
- Removed pf_remove_src_node().
- Introduce pf_unlink_src_node() and pf_unlink_src_node_locked().
  These function do not proceed with freeing of a node, just disconnect
  it from storage.
- New function pf_free_src_nodes() works on a list of previously
  disconnected nodes and frees them.
- Utilize new API in pf_purge_expired_src_nodes().

In collaboration with:	Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de>

Sponsored by:	InnoGames GmbH
Sponsored by:	Nginx, Inc.
2013-11-22 19:16:34 +00:00
Gleb Smirnoff
1320f8c0d5 Fix off by ones when scanning source nodes hash.
Sponsored by:	Nginx, Inc.
2013-11-22 18:57:27 +00:00
Gleb Smirnoff
4280d14d2b Style: don't compare unsigned <= 0.
Sponsored by:	Nginx, Inc.
2013-11-22 18:54:06 +00:00
Luigi Rizzo
41d10f903d add a counter on the struct mq (a queue of mbufs),
and add a block for userspace compiling.
2013-11-22 05:02:37 +00:00
Luigi Rizzo
f783a35ced disable some ipfw match options when compiling in userspace 2013-11-22 05:01:38 +00:00
Luigi Rizzo
d0f65d47ec make this code compile in userspace on OSX 2013-11-22 05:00:18 +00:00
Luigi Rizzo
413c8aaa87 more support for userspace compiling of this code:
emulate the uma_zone for dynamic rules.
2013-11-22 04:59:17 +00:00
Luigi Rizzo
77024cbc3e make ipfw_check_packet() and ipfw_check_frame() public,
so they can be used in the userspace version of ipfw/dummynet
(normally using netmap for the I/O path).

This is the first of a few commits to ease compiling the
ipfw kernel code in userspace.
2013-11-22 04:57:50 +00:00
Gleb Smirnoff
e4e01d9cec Some fixups to pf_get_sport after r257223:
- Do not return blindly if proto isn't ICMP.
- The dport is in network order, so fix comparisons.
- Remove ridiculous htonl(arc4random()).
- Push local variable to a narrower block.
2013-11-14 14:20:35 +00:00
Gleb Smirnoff
6c71335c62 Fix fallout from r257223. Since pf_test_state_icmp() can call
pf_icmp_state_lookup() twice, we need to unlock previously found state.

Reported & tested by:	gavin
2013-11-05 16:54:25 +00:00
Gleb Smirnoff
b1b9dcae46 Remove net.link.ether.inet.useloopback sysctl tunable. It was always on by
default from the very beginning. It was placed in wrong namespace
net.link.ether, originally it had been at another wrong namespace. It was
incorrectly documented at incorrect manual page arp(8). Since new-ARP commit,
the tunable have been consulted only on route addition, and ignored on route
deletion. Behaviour of a system with tunable turned off is not fully correct,
and has no advantages comparing to normal behavior.
2013-11-05 07:32:09 +00:00
Gleb Smirnoff
e1b58d2cff Code logic of handling PFTM_PURGE into pf_find_state(). 2013-11-04 08:20:06 +00:00
Gleb Smirnoff
7710f9f14a Remove unused PFTM_UNTIL_PACKET const. 2013-11-04 08:15:59 +00:00
Gleb Smirnoff
1ce5620d32 - Fix VIMAGE build.
- Fix build with gcc.
2013-10-28 10:12:19 +00:00
Gleb Smirnoff
c3322cb91c Include necessary headers that now are available due to pollution
via if_var.h.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-28 07:29:16 +00:00
Baptiste Daroussin
0664b03c16 Import pf.c 1.638 from OpenBSD
Original log:
Some ICMP types that also have icmp_id, pointed out by markus@

Obtained from:	OpenBSD
2013-10-27 20:56:23 +00:00
Baptiste Daroussin
5fff3f1010 Improt pf.c 1.636 from OpenBSD
Original log:
Make sure pd2 has a pointer to the icmp header in the payload; fixes
panic seen with some some icmp types in icmp error message payloads.

Obtained from:	OpenBSD
2013-10-27 20:52:09 +00:00
Baptiste Daroussin
44df0d9356 Import pf.c 1.635 and pf_lb.c 1.4 from OpenBSD
Stricter state checking for ICMP and ICMPv6 packets: include the ICMP type

in one port of the state key, using the type to determine which
side should be the id, and which should be the type. Also:
- Handle ICMP6 messages which are typically sent to multicast
  addresses but recieve unicast replies, by doing fallthrough lookups
  against the correct multicast address.  - Clear up some mistaken
  assumptions in the PF code:
- Not all ICMP packets have an icmp_id, so simulate
  one based on other data if we can, otherwise set it to 0.
  - Don't modify the icmp id field in NAT unless it's echo
  - Use the full range of possible id's when NATing icmp6 echoy

Difference with OpenBSD version:
- C99ify the new code
- WITHOUT_INET6 safe

Reviewed by:	glebius
Obtained from:	OpenBSD
2013-10-27 20:44:42 +00:00
Gleb Smirnoff
75bf2db380 Move new pf includes to the pf directory. The pfvar.h remain
in net, to avoid compatibility breakage for no sake.

The future plan is to split most of non-kernel parts of
pfvar.h into pf.h, and then make pfvar.h a kernel only
include breaking compatibility.

Discussed with:		bz
2013-10-27 16:25:57 +00:00
Gleb Smirnoff
eedc7fd9e8 Provide includes that are needed in these files, and before were read
in implicitly via if.h -> if_var.h pollution.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 18:18:50 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Philip Paeps
b49bf73f75 Use the correct EtherType for logging IPv6 packets.
Reviewed by:	melifaro
Approved by:	re (kib, glebius)
MFC after:	3 days
2013-09-28 15:49:36 +00:00
Gleb Smirnoff
8fc6e19c2c Merge 1.12 of pf_lb.c from OpenBSD, with some changes. Original commit:
date: 2010/02/04 14:10:12;  author: sthen;  state: Exp;  lines: +24 -19;
  pf_get_sport() picks a random port from the port range specified in a
  nat rule. It should check to see if it's in-use (i.e. matches an existing
  PF state), if it is, it cycles sequentially through other ports until
  it finds a free one. However the check was being done with the state
  keys the wrong way round so it was never actually finding the state
  to be in-use.

  - switch the keys to correct this, avoiding random state collisions
  with nat. Fixes PR 6300 and problems reported by robert@ and viq.

  - check pf_get_sport() return code in pf_test(); if port allocation
  fails the packet should be dropped rather than sent out untranslated.

  Help/ok claudio@.

Some additional changes to 1.12:

- We also need to bzero() the key to zero padding, otherwise key
  won't match.
- Collapse two if blocks into one with ||, since both conditions
  lead to the same processing.
- Only naddr changes in the cycle, so move initialization of other
  fields above the cycle.
- s/u_intXX_t/uintXX_t/g

PR:		kern/181690
Submitted by:	Olivier Cochard-Labbé <olivier cochard.me>
Sponsored by:	Nginx, Inc.
2013-09-02 10:14:25 +00:00
Alexander Motin
5f4fc3dbcb Make dummynet use new direct callout(9) execution mechanism. Since the only
thing done by the dummynet handler is taskqueue_enqueue() call, it doesn't
need extra switch to the clock SWI context.

On idle system this change in half reduces number of active CPU cycles and
wakes up only one CPU from sleep instead of two.

I was going to make this change much earlier as part of calloutng project,
but waited for better solution with skipping idle ticks to be implemented.
Unfortunately with 10.0 release coming it is better get at least this.
2013-08-24 13:34:36 +00:00
Mikolaj Golub
8856400bcb Make ipfw nat init/unint work correctly for VIMAGE:
* Do per vnet instance cleanup (previously it was only for vnet0 on
  module unload, and led to libalias leaks and possible panics due to
  stale pointer dereferences).

* Instead of protecting ipfw hooks registering/deregistering by only
  vnet0 lock (which does not prevent pointers access from another
  vnets), introduce per vnet ipfw_nat_loaded variable. The variable is
  set after hooks are registered and unset before they are deregistered.

* Devirtualize ifaddr_event_tag as we run only one event handler for
  all vnets.

* It is supposed that ifaddr_change event handler is called in the
  interface vnet context, so add an assertion.

Reviewed by:	zec
MFC after:	2 weeks
2013-08-24 11:59:51 +00:00
Andre Oppermann
86bd049144 Add m_clrprotoflags() to clear protocol specific mbuf flags at up and
downwards layer crossings.

Consistently use it within IP, IPv6 and ethernet protocols.

Discussed with:	trociny, glebius
2013-08-19 13:27:32 +00:00
Andrey V. Elsukov
415077bad9 Fix a possible NULL-pointer dereference on the pfsync(4) reconfiguration.
Reported by:	Eugene M. Zheganin
2013-07-29 13:17:18 +00:00
Gleb Smirnoff
93ecffe50b Improve locking strategy between keys hash and ID hash.
Before this change state creating sequence was:

1) lock wire key hash
2) link state's wire key
3) unlock wire key hash
4) lock stack key hash
5) link state's stack key
6) unlock stack key hash
7) lock ID hash
8) link into ID hash
9) unlock ID hash

What could happen here is that other thread finds the state via key
hash lookup after 6), locks ID hash and does some processing of the
state. When the thread creating state unblocks, it finds the state
it was inserting already non-virgin.

Now we perform proper interlocking between key hash locks and ID hash
lock:

1) lock wire & stack hashes
2) link state's keys
3) lock ID hash
4) unlock wire & stack hashes
5) link into ID hash
6) unlock ID hash

To achieve that, the following hacking was performed in pf_state_key_attach():

- Key hash mutex is marked with MTX_DUPOK.
- To avoid deadlock on 2 key hash mutexes, we lock them in order determined
  by their address value.
- pf_state_key_attach() had a magic to reuse a > FIN_WAIT_2 state. It unlinked
  the conflicting state synchronously. In theory this could require locking
  a third key hash, which we can't do now.
  Now we do not remove the state immediately, instead we leave this task to
  the purge thread. To avoid conflicts in a short period before state is
  purged, we push to the very end of the TAILQ.
- On success, before dropping key hash locks, pf_state_key_attach() locks
  ID hash and returns.

Tested by:	Ian FREISLICH <ianf clue.co.za>
2013-06-13 06:07:19 +00:00
Gleb Smirnoff
5af77b3ebd Return meaningful error code from pf_state_key_attach() and
pf_state_insert().
2013-05-11 18:06:51 +00:00
Gleb Smirnoff
03911dec5b Better debug message. 2013-05-11 18:03:36 +00:00
Gleb Smirnoff
048c95417d Fix DIOCADDSTATE operation. 2013-05-11 17:58:26 +00:00
Gleb Smirnoff
b69d74e834 Invalid creatorid is always EINVAL, not only when we are in verbose mode. 2013-05-11 17:57:52 +00:00
Gleb Smirnoff
f8aa444783 Improve KASSERT() message. 2013-05-06 21:44:06 +00:00
Gleb Smirnoff
7a954bbbce Simplify printf(). 2013-05-06 21:43:15 +00:00
Alexander V. Chernikov
454189c130 Use unified method for accessing / updating cached rule pointers.
MFC after:	2 weeks
2013-05-04 18:24:30 +00:00
Eitan Adler
578acad37e Correct a few sizeof()s
Submitted by:	swildner@DragonFlyBSD.org
Reviewed by:	alfred
2013-05-01 04:37:34 +00:00
Gleb Smirnoff
a642ae6825 Remove useless ifdef KLD_MODULE from dummynet module unload path. This
fixes panic on unload.

Reported by:	pho
2013-04-29 06:11:19 +00:00
Gleb Smirnoff
47e8d432d5 Add const qualifier to the dst parameter of the ifnet if_output method. 2013-04-26 12:50:32 +00:00
Alexander V. Chernikov
4037b82802 Fix ipfw rule validation partially broken by r248552.
Pointed by:	avg
MFC with:	r248552
2013-04-01 11:28:52 +00:00
Andrey V. Elsukov
5b4661289d When we are removing a specific set, call ipfw_expire_dyn_rules only once.
Obtained from:	Yandex LLC
MFC after:	1 week
2013-03-25 07:43:46 +00:00
Alexander V. Chernikov
ae01d73c04 Add ipfw support for setting/matching DiffServ codepoints (DSCP).
Setting DSCP support is done via O_SETDSCP which works for both
IPv4 and IPv6 packets. Fast checksum recalculation (RFC 1624) is done for IPv4.
Dscp can be specified by name (AFXY, CSX, BE, EF), by value
(0..63) or via tablearg.

Matching DSCP is done via another opcode (O_DSCP) which accepts several
classes at once (af11,af22,be). Classes are stored in bitmask (2 u32 words).

Many people made their variants of this patch, the ones I'm aware of are
(in alphabetic order):

Dmitrii Tejblum
Marcelo Araujo
Roman Bogorodskiy (novel)
Sergey Matveichuk (sem)
Sergey Ryabin

PR:		kern/102471, kern/121122
MFC after:	2 weeks
2013-03-20 10:35:33 +00:00
Andrey V. Elsukov
93bb4f9ed5 Separate the locking macros that are used in the packet flow path
from others. This helps easy switch to use pfil(4) lock.
2013-03-19 06:04:17 +00:00
Gleb Smirnoff
dc4ad05ecd Use m_get/m_gethdr instead of compat macros.
Sponsored by:	Nginx, Inc.
2013-03-15 12:55:30 +00:00
Gleb Smirnoff
41a7572b26 Functions m_getm2() and m_get2() have different order of arguments,
and that can drive someone crazy. While m_get2() is young and not
documented yet, change its order of arguments to match m_getm2().

Sorry for churn, but better now than later.
2013-03-12 13:42:47 +00:00
Alexander V. Chernikov
39bddcde96 Fix callout expiring dynamic rules.
PR:		kern/175530
Submitted by:	Vladimir Spiridenkov <vs@gtn.ru>
MFC after:	2 weeks
2013-03-02 14:47:10 +00:00
Gleb Smirnoff
e2a55a0021 Finish the r244185. This fixes ever growing counter of pfsync bad
length packets, which was actually harmless.

Note that peers with different version of head/ may grow this
counter, but it is harmless - all pfsync data is processed.

Reported & tested by:	Anton Yuzhaninov <citrin citrin.ru>
Sponsored by:		Nginx, Inc
2013-02-15 09:03:56 +00:00