Commit Graph

162 Commits

Author SHA1 Message Date
Mark Johnston
b93a796b06 pf: Handle unmapped mbufs when computing checksums
PR:		254419
Reviewed by:	gallatin, kp
Tested by:	Igor A. Valkov <viaprog@gmail.com>
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29378
2021-03-23 10:04:31 -04:00
Kristof Provost
cecfaf9bed pf: Fully remove interrupt events on vnet cleanup
swi_remove() removes the software interrupt handler but does not remove
the associated interrupt event.
This is visible when creating and remove a vnet jail in `procstat -t
12`.

We can remove it manually with intr_event_destroy().

PR:		254171
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D29211
2021-03-12 12:12:43 +01:00
Kristof Provost
28dc2c954f pf: Simplify cleanup
We can now counter_u64_free(NULL), so remove the checks.

MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29190
2021-03-12 12:12:35 +01:00
Kristof Provost
bb4a7d94b9 net: Introduce IPV6_DSCP(), IPV6_ECN() and IPV6_TRAFFIC_CLASS() macros
Introduce convenience macros to retrieve the DSCP, ECN or traffic class
bits from an IPv6 header.

Use them where appropriate.

Reviewed by:	ae (previous version), rscheff, tuexen, rgrimes
MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29056
2021-03-04 20:56:48 +01:00
Kristof Provost
f19323847c pf: Retrieve DSCP value from the IPv6 header
Teach pf to read the DSCP value from the IPv6 header so that we can
match on them.

Reviewed by:	donner
MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29048
2021-03-04 20:56:48 +01:00
Yannis Planus
0c458752ce pf: duplicate frames only once when using dup-to pf rule
When using DUP-TO rule, frames are duplicated 3 times on both output
interfaces and duplication interface. Add a flag to not duplicate a
duplicated frame.

Inspired by a patch from Miłosz Kaniewski milosz.kaniewski at gmail.com
https://lists.freebsd.org/pipermail/freebsd-pf/2015-November/007886.html

Reviewed by:		kp@
Differential Revision:	https://reviews.freebsd.org/D27018
2021-01-28 16:46:44 +01:00
Kristof Provost
5a3b9507d7 pf: Convert pfi_kkif to use counter_u64
Improve caching behaviour by using counter_u64 rather than variables
shared between cores.

The result of converting all counters to counter(9) (i.e. this full
patch series) is a significant improvement in throughput. As tested by
olivier@, on Intel Xeon E5-2697Av4 (16Cores, 32 threads) hardware with
Mellanox ConnectX-4 MCX416A-CCAT (100GBase-SR4) nics we see:

x FreeBSD 20201223: inet packets-per-second
+ FreeBSD 20201223 with pf patches: inet packets-per-second
+--------------------------------------------------------------------------+
|                                                                        + |
| xx                                                                     + |
|xxx                                                                    +++|
||A|                                                                       |
|                                                                       |A||
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5       9216962       9526356       9343902     9371057.6     116720.36
+   5      19427190      19698400      19502922      19546509     109084.92
Difference at 95.0% confidence
        1.01755e+07 +/- 164756
        108.584% +/- 2.9359%
        (Student's t, pooled s = 112967)

Reviewed by:	philip
MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D27763
2021-01-05 23:35:37 +01:00
Kristof Provost
320c11165b pf: Split pfi_kif into a user and kernel space structure
No functional change.

MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D27761
2021-01-05 23:35:37 +01:00
Kristof Provost
c3adacdad4 pf: Change pf_krule counters to use counter_u64
This improves the cache behaviour of pf and results in improved
throughput.

MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D27760
2021-01-05 23:35:37 +01:00
Kristof Provost
e86bddea9f pf: Split pf_rule into kernel and user space versions
No functional change intended.

MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D27758
2021-01-05 23:35:36 +01:00
Kristof Provost
fbbf270eef pf: Use counter_u64 in pf_src_node
Reviewd by:	philip
MFC after:      2 weeks
Sponsored by:   Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D27756
2021-01-05 23:35:36 +01:00
Kristof Provost
17ad7334ca pf: Split pf_src_node into a kernel and userspace struct
Introduce a kernel version of struct pf_src_node (pf_ksrc_node).

This will allow us to improve the in-kernel data structure without
breaking userspace compatibility.

Reviewed by:	philip
MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D27707
2021-01-05 23:35:36 +01:00
Kristof Provost
1c00efe98e pf: Use counter(9) for pf_state byte/packet tracking
This improves cache behaviour by not writing to the same variable from
multiple cores simultaneously.

pf_state is only used in the kernel, so can be safely modified.

Reviewed by:	Lutz Donnerhacke, philip
MFC after:	1 week
Sponsed by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D27661
2020-12-23 12:03:21 +01:00
Kristof Provost
c3f69af03a pf: Fix unaligned checksum updates
The algorithm we use to update checksums only works correctly if the
updated data is aligned on 16-bit boundaries (relative to the start of
the packet).

Import the OpenBSD fix for this issue.

PR:		240416
Obtained from:	OpenBSD
MFC after:	1 week
Reviewed by:	tuexen (previous version)
Differential Revision:	https://reviews.freebsd.org/D27696
2020-12-23 12:03:20 +01:00
Kristof Provost
3420068a73 pf: Allow net.pf.request_maxcount to be set from loader.conf
Mark request_maxcount as RWTUN so we can set it both at runtime and from
loader.conf. This avoids usings getting caught out by the change from tunable
to run time configuration.

Suggested by:	Franco Fichtner
MFC after:	3 days
2020-12-12 20:14:39 +00:00
Brooks Davis
9ee99cec1f hme(4): Remove as previous announced
The hme (Happy Meal Ethernet) driver was the onboard NIC in most
supported sparc64 platforms. A few PCI NICs do exist, but we have seen
no evidence of use on non-sparc systems.

Reviewed by:	imp, emaste, bcr
Sponsored by:	DARPA
2020-12-11 21:40:38 +00:00
Kristof Provost
71c9acef8c pf: Fix incorrect assertion
We never set PFRULE_RULESRCTRACK when calling pf_insert_src_node(). We do set
PFRULE_SRCTRACK, so update the assertion to match.

MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D27254
2020-11-20 10:08:33 +00:00
Mateusz Guzik
662c13053f net: clean up empty lines in .c and .h files 2020-09-01 21:19:14 +00:00
Mark Johnston
95033af923 Add the SCTP_SUPPORT kernel option.
This is in preparation for enabling a loadable SCTP stack.  Analogous to
IPSEC/IPSEC_SUPPORT, the SCTP_SUPPORT kernel option must be configured
in order to support a loadable SCTP implementation.

Discussed with:	tuexen
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-06-18 19:32:34 +00:00
Alexander V. Chernikov
643ce94878 Convert pf rtable checks to the new routing KPI.
Switch uRPF to use specific fib(9)-provided uRPF.
Switch MSS calculation to the latest fib(9) kpi.

Reviewed by:	kp
Differential Revision:	https://reviews.freebsd.org/D24386
2020-04-15 13:00:48 +00:00
Pawel Biernacki
10b49b2302 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (6 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

Mark all nodes in pf, pfsync and carp as MPSAFE.

Reviewed by:	kp
Approved by:	kib (mentor, blanket)
Differential Revision:	https://reviews.freebsd.org/D23634
2020-02-21 16:23:00 +00:00
Kristof Provost
cca2ea64e9 pf: Make request_maxcount runtime adjustable
There's no reason for this to be a tunable. It's perfectly safe to
change this at runtime.

Reviewed by:	Lutz Donnerhacke
Differential Revision:	https://reviews.freebsd.org/D22737
2019-12-14 02:06:07 +00:00
Kristof Provost
492f3a312a pf: Add endline to all DPFPRINTF()
DPFPRINTF() doesn't automatically add an endline, so be consistent and
always add it.
2019-11-24 13:53:36 +00:00
Kristof Provost
a0d571cbef pf: Must be in NET_EPOCH to call icmp_error
icmp_reflect(), called through icmp_error() requires us to be in NET_EPOCH.
Failure to hold it leads to the following panic (with INVARIANTS):

  panic: Assertion in_epoch(net_epoch_preempt) failed at /usr/src/sys/netinet/ip_icmp.c:742
  cpuid = 2
  time = 1571233273
  KDB: stack backtrace:
  db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00e0977920
  vpanic() at vpanic+0x17e/frame 0xfffffe00e0977980
  panic() at panic+0x43/frame 0xfffffe00e09779e0
  icmp_reflect() at icmp_reflect+0x625/frame 0xfffffe00e0977aa0
  icmp_error() at icmp_error+0x720/frame 0xfffffe00e0977b10
  pf_intr() at pf_intr+0xd5/frame 0xfffffe00e0977b50
  ithread_loop() at ithread_loop+0x1c6/frame 0xfffffe00e0977bb0
  fork_exit() at fork_exit+0x80/frame 0xfffffe00e0977bf0
  fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00e0977bf0

Note that we now enter NET_EPOCH twice if we enter ip_output() from pf_intr(),
but ip_output() will soon be converted to a function that requires epoch, so
entering NET_EPOCH directly from pf_intr() makes more sense.

Discussed with:	glebius@
2019-10-18 03:36:26 +00:00
Mark Johnston
bff630d1dc Fix the build after r353458.
MFC with:	r353458
Sponsored by:	The FreeBSD Foundation
2019-10-13 00:08:17 +00:00
Mark Johnston
6cc9ab8610 Add a missing include of opt_sctp.h.
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-10-12 23:01:16 +00:00
Kristof Provost
f287767d4f pf: Remove partial RFC2675 support
Remove our (very partial) support for RFC2675 Jumbograms. They're not
used, not actually supported and not a good idea.

Reviewed by:	thj@
Differential Revision:	https://reviews.freebsd.org/D21086
2019-07-29 13:21:31 +00:00
Xin LI
f89d207279 Separate kernel crc32() implementation to its own header (gsb_crc32.h) and
rename the source to gsb_crc32.c.

This is a prerequisite of unifying kernel zlib instances.

PR:		229763
Submitted by:	Yoshihiro Ota <ota at j.email.ne.jp>
Differential Revision:	https://reviews.freebsd.org/D20193
2019-06-17 19:49:08 +00:00
Li-Wen Hsu
d086d41363 Remove an uneeded indentation introduced in r223637 to silence gcc warnging
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-05-25 23:58:09 +00:00
Rodney W. Grimes
6c1c6ae537 Use IN_foo() macros from sys/netinet/in.h inplace of handcrafted code
There are a few places that use hand crafted versions of the macros
from sys/netinet/in.h making it difficult to actually alter the
values in use by these macros.  Correct that by replacing handcrafted
code with proper macro usage.

Reviewed by:		karels, kristof
Approved by:		bde (mentor)
MFC after:		3 weeks
Sponsored by:		John Gilmore
Differential Revision:	https://reviews.freebsd.org/D19317
2019-04-04 19:01:13 +00:00
Conrad Meyer
a8a16c7128 Replace read_random(9) with more appropriate arc4rand(9) KPIs
Reviewed by:	ae, delphij
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19760
2019-04-04 01:02:50 +00:00
Kristof Provost
64af73aade pf: Ensure that IP addresses match in ICMP error packets
States in pf(4) let ICMP and ICMP6 packets pass if they have a
packet in their payload that matches an exiting connection.  It was
not checked whether the outer ICMP packet has the same destination
IP as the source IP of the inner protocol packet.  Enforce that
these addresses match, to prevent ICMP packets that do not make
sense.

Reported by:	Nicolas Collignon, Corentin Bayet, Eloi Vanderbeken, Luca Moro at Synacktiv
Obtained from:	OpenBSD
Security:	CVE-2019-5598
2019-03-21 08:09:52 +00:00
Gleb Smirnoff
1830dae3d3 Make second argument of ip_divert(), that specifies packet direction a bool.
This allows pf(4) to avoid including ipfw(4) private files.
2019-03-14 22:23:09 +00:00
Kristof Provost
22c58991e3 pf: Small performance tweak
Because fetching a counter is a rather expansive function we should use
counter_u64_fetch() in pf_state_expires() only when necessary. A "rdr
pass" rule should not cause more effort than separate "rdr" and "pass"
rules. For rules with adaptive timeout values the call of
counter_u64_fetch() should be accepted, but otherwise not.

From the man page:
    The adaptive timeout values can be defined both globally and for
    each rule.  When used on a per-rule basis, the values relate to the
    number of states created by the rule, otherwise to the total number
    of states.

This handling of adaptive timeouts is done in pf_state_expires().  The
calculation needs three values: start, end and states.

1. Normal rules "pass .." without adaptive setting meaning "start = 0"
   runs in the else-section and therefore takes "start" and "end" from
   the global default settings and sets "states" to pf_status.states
   (= total number of states).

2. Special rules like
   "pass .. keep state (adaptive.start 500 adaptive.end 1000)"
   have start != 0, run in the if-section and take "start" and "end"
   from the rule and set "states" to the number of states created by
   their rule using counter_u64_fetch().

Thats all ok, but there is a third case without special handling in the
above code snippet:

3. All "rdr/nat pass .." statements use together the pf_default_rule.
   Therefore we have "start != 0" in this case and we run the
   if-section but we better should run the else-section in this case and
   do not fetch the counter of the pf_default_rule but take the total
   number of states.

Submitted by:	Andreas Longwitz <longwitz@incore.de>
MFC after:	2 weeks
2019-02-24 17:23:55 +00:00
Patrick Kelsey
8f2ac65690 Reduce the time it takes the kernel to install a new PF config containing a large number of queues
In general, the time savings come from separating the active and
inactive queues lists into separate interface and non-interface queue
lists, and changing the rule and queue tag management from list-based
to hash-bashed.

In HFSC, a linear scan of the class table during each queue destroy
was also eliminated.

There are now two new tunables to control the hash size used for each
tag set (default for each is 128):

net.pf.queue_tag_hashsize
net.pf.rule_tag_hashsize

Reviewed by:	kp
MFC after:	1 week
Sponsored by:	RG Nets
Differential Revision:	https://reviews.freebsd.org/D19131
2019-02-11 05:17:31 +00:00
Kristof Provost
336683f24f pf: Fix endless loop on NAT exhaustion with sticky-address
When we try to find a source port in pf_get_sport() it's possible that
all available source ports will be in use. In that case we call
pf_map_addr() to try to find a new source IP to try from. If there are
no more available source IPs pf_map_addr() will return 1 and we stop
trying.

However, if sticky-address is set we'll always return the same IP
address, even if we've already tried that one.
We need to check the supplied address, because if that's the one we'd
set it means pf_get_sport() has already tried it, and we should error
out rather than keep trying.

PR:		233867
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D18483
2018-12-12 20:15:06 +00:00
Kristof Provost
5b551954ab pf: Prevent integer overflow in PF when calculating the adaptive timeout.
Mainly states of established TCP connections would be affected resulting
in immediate state removal once the number of states is bigger than
adaptive.start.  Disabling adaptive timeouts is a workaround to avoid this bug.
Issue found and initial diff by Mathieu Blanc (mathieu.blanc at cea dot fr)

Reported by: Andreas Longwitz <longwitz AT incore.de>
Obtained from:  OpenBSD
MFC after:	2 weeks
2018-12-11 21:44:39 +00:00
Kristof Provost
5f6cf24e2d pfsync: Make pfsync callbacks per-vnet
The callbacks are installed and removed depending on the state of the
pfsync device, which is per-vnet. The callbacks must also be per-vnet.

MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D17499
2018-11-02 16:47:07 +00:00
Kristof Provost
13d640d376 pf: Fix copy/paste error in IPv6 address rewriting
We checked the destination address, but replaced the source address. This was
fixed in OpenBSD as part of their NAT rework, which we don't want to import
right now.

CID:		1009561
MFC after:	3 weeks
2018-10-24 00:19:44 +00:00
Kristof Provost
1563a27e1f pf synproxy will do the 3WHS on behalf of the target machine, and once
the 3WHS is completed, establish the backend connection. The trigger
for "3WHS completed" is the reception of the first ACK. However, we
should not proceed if that ACK also has RST or FIN set.

PR:		197484
Obtained from:	OpenBSD
MFC after:	2 weeks
2018-10-20 18:37:21 +00:00
John-Mark Gurney
032d3aaa96 Significantly improve pf purge cpu usage by only taking locks
when there is work to do.  This reduces CPU consumption to one
third on systems.  This will help keep the thread CPU usage under
control now that the default hash size has increased.

Reviewed by:	kp
Approved by:	re (kib)
Differential Revision:	https://reviews.freebsd.org/D17097
2018-09-16 00:44:23 +00:00
Andrew Turner
5f901c92a8 Use the new VNET_DEFINE_STATIC macro when we are defining static VNET
variables.

Reviewed by:	bz
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D16147
2018-07-24 16:35:52 +00:00
Kristof Provost
32ece669c2 pf: Fix synproxy
Synproxy was accidentally broken by r335569. The 'return (action)' must be
executed for every non-PF_PASS result, but the error packet (TCP RST or ICMP
error) should only be sent if the packet was dropped (i.e. PF_DROP) and the
return flag is set.

PR:		229477
Submitted by:	Andre Albsmeier <mail AT fbsd.e4m.org>
MFC after:	1 week
2018-07-14 10:14:59 +00:00
Kristof Provost
150182e309 pf: Support "return" statements in passing rules when they fail.
Normally pf rules are expected to do one of two things: pass the traffic or
block it. Blocking can be silent - "drop", or loud - "return", "return-rst",
"return-icmp". Yet there is a 3rd category of traffic passing through pf:
Packets matching a "pass" rule but when applying the rule fails. This happens
when redirection table is empty or when src node or state creation fails. Such
rules always fail silently without notifying the sender.

Allow users to configure this behaviour too, so that pf returns an error packet
in these cases.

PR:		226850
Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
MFC after:	1 week
Sponsored by:	InnoGames GmbH
2018-06-22 21:59:30 +00:00
Kristof Provost
0b799353d8 pf: Fix deadlock with route-to
If a locally generated packet is routed (with route-to/reply-to/dup-to) out of
a different interface it's passed through the firewall again. This meant we
lost the inp pointer and if we required the pointer (e.g. for user ID matching)
we'd deadlock trying to acquire an inp lock we've already got.

Pass the inp pointer along with pf_route()/pf_route6().

PR:		228782
MFC after:	1 week
2018-06-09 14:17:06 +00:00
Kristof Provost
455969d305 pf: Replace rwlock on PF_RULES_LOCK with rmlock
Given that PF_RULES_LOCK is a mostly read lock, replace the rwlock with rmlock.
This change improves packet processing rate in high pps environments.
Benchmarking by olivier@ shows a 65% improvement in pps.

While here, also eliminate all appearances of "sys/rwlock.h" includes since it
is not used anymore.

Submitted by:	farrokhi@
Differential Revision:	https://reviews.freebsd.org/D15502
2018-05-30 07:11:33 +00:00
Sean Bruno
2695c9c109 Retire ixgb(4)
This driver was for an early and uncommon legacy PCI 10GbE for a single
ASIC, Intel 82597EX. Intel quickly shifted to the long lived ixgbe family.

Submitted by:	kbowling
Reviewed by:	brooks imp jeffrey.e.pieper@intel.com
Relnotes:	yes
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15234
2018-05-02 15:59:15 +00:00
Kristof Provost
c41420d5dc pf: limit ioctl to a reasonable and tuneable number of elements
pf ioctls frequently take a variable number of elements as argument. This can
potentially allow users to request very large allocations.  These will fail,
but even a failing M_NOWAIT might tie up resources and result in concurrent
M_WAITOK allocations entering vm_wait and inducing reclamation of caches.

Limit these ioctls to what should be a reasonable value, but allow users to
tune it should they need to.

Differential Revision:	https://reviews.freebsd.org/D15018
2018-04-11 11:43:12 +00:00
Kristof Provost
effaab8861 netpfil: Introduce PFIL_FWD flag
Forwarded packets passed through PFIL_OUT, which made it difficult for
firewalls to figure out if they were forwarding or producing packets. This in
turn is an issue for pf for IPv6 fragment handling: it needs to call
ip6_output() or ip6_forward() to handle the fragments. Figuring out which was
difficult (and until now, incorrect).
Having pfil distinguish the two removes an ugly piece of code from pf.

Introduce a new variant of the netpfil callbacks with a flags variable, which
has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if
a packet is forwarded.

Reviewed by:	ae, kevans
Differential Revision:	https://reviews.freebsd.org/D13715
2018-03-23 16:56:44 +00:00
Kristof Provost
bf56a3fe47 pf: Cope with overly large net.pf.states_hashsize
If the user configures a states_hashsize or source_nodes_hashsize value we may
not have enough memory to allocate this. This used to lock up pf, because these
allocations used M_WAITOK.

Cope with this by attempting the allocation with M_NOWAIT and falling back to
the default sizes (with M_WAITOK) if these fail.

PR:		209475
Submitted by:	Fehmi Noyan Isi <fnoyanisi AT yahoo.com>
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D14367
2018-02-25 08:56:44 +00:00