Commit Graph

141 Commits

Author SHA1 Message Date
Hiren Panchasara
fc5e1956d9 ECN marking implenetation for dummynet.
Changes include both DCTCP and RFC 3168 ECN marking methodology.

DCTCP draft: http://tools.ietf.org/html/draft-bensley-tcpm-dctcp-00

Submitted by:	Midori Kato (aoimidori27@gmail.com)
Worked with:	Lars Eggert (lars@netapp.com)
Reviewed by:	luigi, hiren
2014-06-01 07:28:24 +00:00
John Baldwin
b437b06c79 Fix pf(4) to build with MAXCPU set to 256. MAXCPU is actually a count,
not a maximum ID value (so it is a cap on mp_ncpus, not mp_maxid).
2014-05-29 19:17:10 +00:00
Andrey V. Elsukov
3a5db2d492 Since ipfw nat configures all options in one step, we should set all bits
in the mask when calling LibAliasSetMode() to properly clear unneeded
options.

PR:		189655
MFC after:	1 week
Sponsored by:	Yandex LLC
2014-05-18 14:25:19 +00:00
Alexander V. Chernikov
c3015737f3 Fix wrong formatting of 0.0.0.0/X table records in ipfw(8).
Add `flags` u16 field to the hole in ipfw_table_xentry structure.
Kernel has been guessing address family for supplied record based
on xent length size.
Userland, however, has been getting fixed-size ipfw_table_xentry structures
guessing address family by checking address by IN6_IS_ADDR_V4COMPAT().

Fix this behavior by providing specific IPFW_TCF_INET flag for IPv4 records.

PR:		bin/189471
Submitted by:	Dennis Yusupoff <dyr@smartspb.net>
MFC after:	2 weeks
2014-05-17 13:45:03 +00:00
Gleb Smirnoff
0e4f18aa68 o In pf_normalize_ip() we don't need mtag in
!(PFRULE_FRAGCROP|PFRULE_FRAGDROP) case.
o In the (PFRULE_FRAGCROP|PFRULE_FRAGDROP) case we should allocate mtag
  if we don't find any.

Tested by:	Ian FREISLICH <ianf cloudseed.co.za>
2014-05-17 12:30:27 +00:00
Mikolaj Golub
8a477d48e0 Define startup order the same way as it is in dummynet. 2014-04-26 08:05:16 +00:00
Gleb Smirnoff
53f4b0cf9b The current API for adding rules with pool addresses is the following:
- DIOCADDADDR adds addresses and puts them into V_pf_pabuf
- DIOCADDRULE takes all addresses from V_pf_pabuf and links
  them into rule.

The ugly part is that if address is a table, then it is initialized
in DIOCADDRULE, because we need ruleset, and DIOCADDADDR doesn't
supply ruleset. But if address is a dynaddr, we need address family,
and address family could be different for different addresses in one
rule, so dynaddr is initialized in DIOCADDADDR.

This leads to the entangled state of addresses on V_pf_pabuf. Some are
initialized, and some not. That's why running pf_empty_pool(&V_pf_pabuf)
can lead to a panic on a NULL table address.

Since proper fix requires API/ABI change, for now simply plug the panic
in pf_empty_pool().

Reported by:	danger
2014-04-25 11:36:11 +00:00
Martin Matuska
ecb47cf9c5 Backport from projects/pf r263908:
De-virtualize UMA zone pf_mtag_z and move to global initialization part.

The m_tag struct does not know about vnet context and the pf_mtag_free()
callback is called unaware of current vnet. This causes a panic.

MFC after:	1 week
2014-04-20 09:17:48 +00:00
Andrey V. Elsukov
69155b132d Set oif only for outgoing packets.
PR:		188543
MFC after:	1 week
Sponsored by:	Yandex LLC
2014-04-16 14:37:11 +00:00
Gleb Smirnoff
79bde95f51 Backout r257223,r257224,r257225,r257246,r257710. The changes caused
some regressions in ICMP handling, and right now me and Baptiste
are out of time on analyzing them.

PR:		188253
2014-04-16 09:25:20 +00:00
Christian Brueffer
7946c49fd8 Free resources and error cases; re-indent a curly brace while here.
CID:		1199366
Found with:	Coverity Prevent(tm)
MFC after:	1 week
2014-04-13 21:13:33 +00:00
Martin Matuska
42311ccc0b Merge from projects/pf r264198:
Execute pf_overload_task() in vnet context. Fixes a vnet kernel panic.

Reviewed by:	trociny
MFC after:	1 week
2014-04-07 07:06:13 +00:00
Martin Matuska
0a7c583acc Execute pf_overload_task() in vnet context. Fixes a vnet kernel panic.
Reviewed by:	trociny
2014-04-06 19:19:25 +00:00
Martin Matuska
7e92ce7380 De-virtualize UMA zone pf_mtag_z and move to global initialization part.
The m_tag struct does not know about vnet context and the pf_mtag_free()
callback is called unaware of current vnet. This causes a panic.

Reviewed by:	Nikos Vassiliadis, trociny@
2014-03-29 09:05:25 +00:00
Martin Matuska
1709ccf9d3 Merge head up to r263906. 2014-03-29 08:39:53 +00:00
Martin Matuska
d318d97fb5 Merge from projects/pf r251993 (glebius@):
De-vnet hash sizes and hash masks.

Submitted by:	Nikos Vassiliadis <nvass gmx.com>
Reviewed by:	trociny

MFC after:	1 month
2014-03-25 06:55:53 +00:00
Gleb Smirnoff
620ee5d386 Fix breakage in ipfw+VIMAGE after r261590.
PR:		kern/187665
Sponsored by:	Nginx, Inc.
2014-03-21 17:07:18 +00:00
Gleb Smirnoff
e3a7aa6f56 - Remove rt_metrics_lite and simply put its members into rtentry.
- Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This
  removes another cache trashing ++ from packet forwarding path.
- Create zini/fini methods for the rtentry UMA zone. Via initialize
  mutex and counter in them.
- Fix reporting of rmx_pksent to routing socket.
- Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode.

The change is mostly targeted for stable/10 merge. For head,
rt_pksent is expected to just disappear.

Discussed with:		melifaro
Sponsored by:		Netflix
Sponsored by:		Nginx, Inc.
2014-03-05 01:17:47 +00:00
Gleb Smirnoff
fb3541ad15 Instead of playing games with casts simply add 3 more members to the
structure pf_rule, that are used when the structure is passed via
ioctl().

PR:		187074
2014-03-05 00:40:03 +00:00
Martin Matuska
5748b897da Merge head up to r262222 (last merge was incomplete). 2014-02-19 22:02:15 +00:00
Martin Matuska
dc64d6b7e1 Revert r262196
I am going to split this into two individual patches and test it with
the projects/pf branch that may get merged later.
2014-02-19 17:06:04 +00:00
Martin Matuska
a93b9a64fe De-virtualize pf_mtag_z [1]
Process V_pf_overloadqueue in vnet context [2]

This fixes two VIMAGE kernel panics and allows to simultaneously run host-pf
and vnet jails. pf inside jails remains broken.

PR:		kern/182964
Submitted by:	glebius@FreeBSD.org [2], myself [1]
Tested by:	rodrigc@FreeBSD.org, myself
MFC after:	2 weeks
2014-02-18 22:17:12 +00:00
George V. Neville-Neil
f850d57232 Summary: Two quick edits to the implementation notes as they're no
longer stored in netinet but in netpfil.
2014-02-15 18:36:31 +00:00
Dimitry Andric
a3043eeef2 Under sys/netpfil/ipfw, surround two IPv6-specific static functions with
#ifdef INET6, since they are unused when INET6 is disabled.

MFC after:	3 days
2014-02-15 12:25:01 +00:00
Gleb Smirnoff
48278b8846 Once pf became not covered by a single mutex, many counters in it became
race prone. Some just gather statistics, but some are later used in
different calculations.

A real problem was the race provoked underflow of the states_cur counter
on a rule. Once it goes below zero, it wraps to UINT32_MAX. Later this
value is used in pf_state_expires() and any state created by this rule
is immediately expired.

Thus, make fields states_cur, states_tot and src_nodes of struct
pf_rule be counter(9)s.

Thanks to Dennis for providing me shell access to problematic box and
his help with reproducing, debugging and investigating the problem.

Thanks to:		Dennis Yusupoff <dyr smartspb.net>
Also reported by:	dumbbell, pgj, Rambler
Sponsored by:		Nginx, Inc.
2014-02-14 10:05:21 +00:00
Alexander V. Chernikov
5fa3fdd3d9 Reorder struct ip_fw_chain:
* move rarely-used fields down
* move uh_lock to different cacheline
* remove some usused fields

Sponsored by:	Yandex LLC
2014-01-24 09:13:30 +00:00
Gleb Smirnoff
be3d21a2cf Remove NULL pointer dereference.
CID:	1009118
2014-01-22 15:58:43 +00:00
Gleb Smirnoff
d26bbeb948 Fix resource leak and simplify code for DIOCCHANGEADDR.
CID:	1007035
2014-01-22 15:44:38 +00:00
Alexander V. Chernikov
c254a2091c Revert r260548. We really should not use IPFW_WLOCK() here
but this requires some more playing with IPFW_UH_WLOCK(). Leave till later.
2014-01-11 18:27:34 +00:00
Alexander V. Chernikov
ac5863eed8 We don't need chain write lock since we're not modifying its contents.
LibAliasSetAddress() uses its own mutex to serialize changes.

While here, convert ifp->if_xname access to if_name() function.

MFC after:	2 weeks
Sponsored by:	Yandex LLC
2014-01-11 16:50:41 +00:00
Gleb Smirnoff
a830c4524d When pf_get_translation() fails, it should leave *sn pointer pristine,
otherwise we will panic in pf_test_rule().

PR:		182557
2014-01-06 19:05:04 +00:00
Alexander V. Chernikov
d28d2aa46d Use rnh_matchaddr instead of rnh_lookup for longest-prefix match.
rnh_lookup is effectively the same as rnh_matchaddr if called with
empy network mask.

MFC after:	2 weeks
2014-01-03 23:11:26 +00:00
Dimitry Andric
85838e48bd Fix incorrect header guard define in sys/netpfil/pf/pf.h, which snuck in
in r257186.  Found by clang 3.4.
2013-12-22 19:47:22 +00:00
Gleb Smirnoff
0b5d46ce4d Fix fallout from r258479: in pf_free_src_node() the node must already
be unlinked.

Reported by:	Konstantin Kukushkin <dark rambler-co.ru>
Sponsored by:	Nginx, Inc.
2013-12-22 12:10:36 +00:00
Alexander V. Chernikov
fb2b51fab1 Add net.inet.ip.fw.dyn_keep_states sysctl which
re-links dynamic states to default rule instead of
flushing on rule deletion.
This can be useful while performing ruleset reload
(think about `atomic` reload via changing sets).
Currently it is turned off by default.

MFC after:	2 weeks
Sponsored by:	Yandex LLC
2013-12-18 20:17:05 +00:00
Alexander V. Chernikov
a19b3f74af Simplify O_NAT opcode handling.
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2013-11-28 15:28:51 +00:00
Alexander V. Chernikov
1058f17749 Check ipfw table numbers in both user and kernel space before rule addition.
Found by:	Saychik Pavel <umka@localka.net>
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2013-11-28 10:28:28 +00:00
Craig Rodrigues
d04dc94cac In sys/netpfil/ipfw/ip_fw_nat.c:vnet_ipfw_nat_uninit() we call "IPFW_WLOCK(chain);".
This lock gets deleted in sys/netpfil/ipfw/ip_fw2.c:vnet_ipfw_uninit().

Therefore, vnet_ipfw_nat_uninit() *must* be called before vnet_ipfw_uninit(),
but this doesn't always happen, because the VNET_SYSINIT order is the same for both functions.
In sys/net/netpfil/ipfw/ip_fw2.c and sys/net/netpfil/ipfw/ip_fw_nat.c,
IPFW_SI_SUB_FIREWALL == IPFW_NAT_SI_SUB_FIREWALL == SI_SUB_PROTO_IFATTACHDOMAIN
and
IPFW_MODULE_ORDER == IPFW_NAT_MODULE_ORDER

Consequently, if VIMAGE is enabled, and jails are created and destroyed,
the system sometimes crashes, because we are trying to use a deleted lock.

To reproduce the problem:
  (1)  Take a GENERIC kernel config, and add options for: VIMAGE, WITNESS,
       INVARIANTS.
  (2)  Run this command in a loop:
       jail -l -u root -c path=/ name=foo persist vnet && jexec foo ifconfig lo0 127.0.0.1/8 && jail -r foo

       (see http://lists.freebsd.org/pipermail/freebsd-current/2010-November/021280.html )

Fix the problem by increasing the value of IPFW_NAT_SI_SUB_FIREWALL,
so that vnet_ipfw_nat_uninit() runs after vnet_ipfw_uninit().
2013-11-25 20:20:34 +00:00
Gleb Smirnoff
19acaecac3 The DIOCKILLSRCNODES operation was implemented with O(m*n) complexity,
where "m" is number of source nodes and "n" is number of states. Thus,
on heavy loaded router its processing consumed a lot of CPU time.

Reimplement it with O(m+n) complexity. We first scan through source
nodes and disconnect matching ones, putting them on the freelist and
marking with a cookie value in their expire field. Then we scan through
the states, detecting references to source nodes with a cookie, and
disconnect them as well. Then the freelist is passed to pf_free_src_nodes().

In collaboration with:	Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de>
PR:		kern/176763
Sponsored by:	InnoGames GmbH
Sponsored by:	Nginx, Inc.
2013-11-22 19:22:26 +00:00
Gleb Smirnoff
d77c1b3269 To support upcoming changes change internal API for source node handling:
- Removed pf_remove_src_node().
- Introduce pf_unlink_src_node() and pf_unlink_src_node_locked().
  These function do not proceed with freeing of a node, just disconnect
  it from storage.
- New function pf_free_src_nodes() works on a list of previously
  disconnected nodes and frees them.
- Utilize new API in pf_purge_expired_src_nodes().

In collaboration with:	Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de>

Sponsored by:	InnoGames GmbH
Sponsored by:	Nginx, Inc.
2013-11-22 19:16:34 +00:00
Gleb Smirnoff
1320f8c0d5 Fix off by ones when scanning source nodes hash.
Sponsored by:	Nginx, Inc.
2013-11-22 18:57:27 +00:00
Gleb Smirnoff
4280d14d2b Style: don't compare unsigned <= 0.
Sponsored by:	Nginx, Inc.
2013-11-22 18:54:06 +00:00
Luigi Rizzo
41d10f903d add a counter on the struct mq (a queue of mbufs),
and add a block for userspace compiling.
2013-11-22 05:02:37 +00:00
Luigi Rizzo
f783a35ced disable some ipfw match options when compiling in userspace 2013-11-22 05:01:38 +00:00
Luigi Rizzo
d0f65d47ec make this code compile in userspace on OSX 2013-11-22 05:00:18 +00:00
Luigi Rizzo
413c8aaa87 more support for userspace compiling of this code:
emulate the uma_zone for dynamic rules.
2013-11-22 04:59:17 +00:00
Luigi Rizzo
77024cbc3e make ipfw_check_packet() and ipfw_check_frame() public,
so they can be used in the userspace version of ipfw/dummynet
(normally using netmap for the I/O path).

This is the first of a few commits to ease compiling the
ipfw kernel code in userspace.
2013-11-22 04:57:50 +00:00
Gleb Smirnoff
654957c2c8 Merge head up to r258343. 2013-11-19 12:21:47 +00:00
Gleb Smirnoff
f053058cee - Split functions that initialize various pf parts into their vimage
parts and global parts.
- Since global parts appeared to be only mutex initializations, just
  abandon them and use MTX_SYSINIT() instead.
- Kill my incorrect VNET_FOREACH() iterator and instead use correct
  approach with VNET_SYSINIT().

Submitted by:	Nikos Vassiliadis <nvass gmx.com>
Reviewed by:	trociny
2013-11-18 22:18:07 +00:00
Gleb Smirnoff
e4e01d9cec Some fixups to pf_get_sport after r257223:
- Do not return blindly if proto isn't ICMP.
- The dport is in network order, so fix comparisons.
- Remove ridiculous htonl(arc4random()).
- Push local variable to a narrower block.
2013-11-14 14:20:35 +00:00