uma_zfree() can be called on a NULL pointer. Simplify the pf code a
little by removing the redundant checks.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Functions manipulating source nodes can fail due to various reasons like
memory allocation errors, hitting configured limits or lack of
redirection targets. Ensure those errors are properly caught and
propagated in the code. Increase the error counters not only when
parsing the main ruleset but the NAT ruleset too.
Cherry-picked from development of D39880
Reviewed by: kp
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D39940
Reduce number of hashing operations when handling source nodes by always
having a pointer to the hash row mutex in the source node. Provide
macros for handling and asserting the mutex. Calculate the hash only
once in pf_find_src_node() and then use this hash in subsequent
operations.
Cherry-picked from development of D39880
Reviewed by: kp, mjg
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D39888
Similar to the PF_TAG_DUMMYNET we must also clear the route tag if
dummynet didn't keep the packet. In that case we'd continue immediately
and there'd be no need for the route tag. Keeping it could lead to
unexpected routing of traffic.
See also: 27407a6adc
See also: https://redmine.pfsense.org/issues/14055
Sponsored by: Rubicon Communications, LLC ("Netgate")
Make Ethernet rules more similar to the usual layer 3 rules by also
allowing ridentifier and labels to be set on them.
Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")
Packet Mark is an analogue to ipfw tags with O(1) lookup from mbuf while
regular tags require a single-linked list traversal.
Mark is a 32-bit number that can be looked up in a table
[with 'number' table-type], matched or compared with a number with optional
mask applied before comparison.
Having generic nature, Mark can be used in a variety of needs.
For example, it could be used as a security group: mark will hold a security
group id and represent a group of packet flows that shares same access
control policy.
Reviewed By: pauamma_gundo.com
Differential Revision: https://reviews.freebsd.org/D39555
MFC after: 1 month
Both pf_rules_lock and pf_ioctl_lock only ever affect one vnet, so
there's no point in having these locks affect other vnets.
(In fact, the only lock in pf that can affect multiple vnets is
pf_end_lock.)
That's especially important for the rules lock, because taking the write
lock suspends all network traffic until it's released. This will reduce
the impact a vnet running pf can have on other vnets, and improve
concurrency on machines running multiple pf-enabled vnets.
Reviewed by: zlei
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D39658
Introduce the OpenBSD syntax of "scrub" option for "match" and "pass"
rules and the "set reassemble" flag. The patch is backward-compatible,
pf.conf can be still written in FreeBSD-style.
Obtained from: OpenBSD
MFC after: never
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D38025
It only served to complicate cleanup, and added no value.
While here drop packets in pfsync_defer_tmo() if we don't have a syncif,
rather than just leaving them on the queue.
Reviewed by: markj
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D39248
The pd_tmo callout has an associated mutex, which we must hold while
calling callout_stop().
Reported by: markj
Reviewed by: markj
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D39223
We can't do anything if ip6_output() fails, other than discard the
packet which ip6_output() already does for us.
Mark the return value as ignored.
Reported by: emaste, Coverity
Sponsored by: Rubicon Communications, LLC (Netgate)
pfsync_undefer_state() takes the bucket lock, but could get called from
places (e.g. from pfsync_update_state() or pfsync_delete_state()) where
we already held the lock.
As it can also be called from places where we don't yet hold the lock
create new locked variant for use when the lock is already held. Keep
using pfsync_undefer_state() where the lock must still be taken.
PR: 268246
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC (Netgate)
The callout for pfsync_defer_tmo() is created with
CALLOUT_RETURNUNLOCKED, because while the callout framework takes care
of taking the lock we want to run a few operations outside of the lock,
so we unlock ourselves.
However, if `sc->sc_sync_if == NULL` we return without releasing the
lock, and leak the lock, causing later deadlocks.
Ensure we always release the bucket lock when we exit pfsync_defer_tmo()
PR: 268246
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC (Netgate)
Link-local traffic needs to have a scope embedded before it's passed on
to ip6_output(). Do so in pf_refragment6(), because when we end up here
in the output path we may have passed through ip6_output() already
(before being reassembled), where the scope would have been removed.
Re-embed the scope so that link-local traffic is sent correctly.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D39062
Re-introduce PFIL_FWD, because pf's pf_refragment6() needs to know if
we're ip6_forward()-ing or ip6_output()-ing.
ip6_forward() relies on m->m_pkthdr.rcvif, at least for link-local
traffic (for in6_get_unicast_scopeid()). rcvif is not set for locally
generated traffic (e.g. from icmp6_reflect()), so we need to call the
correct output function.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revisi: https://reviews.freebsd.org/D39061
When we send out a deferred packet we must make sure to call
ip6_output() for IPv6 packets. If not we might end up attempting to
ip_fragment() an IPv6 packet, which could lead to us reading outside of
the mbuf.
PR: 268246
Reviewed by: melifaro, zlei
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D38586
The 0b70e3e78b changed the original design of a single entry point
into pfil(9) chains providing separate functions for the filtering
points that always provide mbufs and know the direction of a flow.
The motivation was to reduce branching. The logical continuation
would be to do the same for the filtering points that always provide
a memory pointer and retire the single entry point.
o Hooks now provide two functions: one for mbufs and optional for
memory pointers.
o pfil_hook_args() has a new member and pfil_add_hook() has a
requirement to zero out uninitialized data. Bump PFIL_VERSION.
o As it was before, a hook function for a memory pointer may realloc
into an mbuf. Such mbuf would be returned via a pointer that must
be provided in argument.
o The only hook that supports memory pointers is ipfw:default-link.
It is rewritten to provide two functions.
o All remaining uses of pfil_run_hooks() are converted to
pfil_mem_in().
o Transparent union of pfil_packet_t and tricks to fix pointer
alignment are retired. Internal pfil_realloc() reduces down to
m_devget() and thus is retired, too.
Reviewed by: mjg, ocochard
Differential revision: https://reviews.freebsd.org/D37977
Under the scenario with a packet with length of 67 bytes, a header length
using the default of 20 bytes and a TCP data offset (th_off) of 48 will
cause m_pullup() to fail to make sure bytes are arragned contiguously.
m_pullup() will free the mbuf chain and return a null. ipfilter stores
the resultant mbuf address (or the resulting NULL) in its fr_info_t
structure. Unfortuntely the eroneous packet is not flagged for drop.
This results in a kernel page fault at line 410 of sys/netinet/ip_fastfwd.c
as it tries to use a now previously freed, by m_pullup(), mbuf.
PR: 266442
Reported by: Robert Morris <rtm@lcs.mit.edu>
MFC after: 1 week
Summary:
In preparation of making if_t completely opaque outside of the netstack,
explicitly include the header. <net/if_var.h> will stop including the
header in the future.
Sponsored by: Juniper Networks, Inc.
Reviewed by: glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38200
The pfsync_defer_tmo() callout needs to set the correct vnet before it
can transmit packets. It used the rcvif in the mbuf to get this vnet,
but that doesn't work for locally originated traffic. In that case the
rcvif pointer is NULL, and the dereference leads to a panic.
Instead use the sc_sync_if, which is always set (if pfsync is enabled,
at least).
PR: 268246
MFC after: 2 weeks
The cost of enabling syncookies in adaptive mode is very low (basically
a single atomic add when we create a new half-open state), and the
payoff when under SYN flood is huge.
So, enable adaptive mode by default.
Suggested by: Eirik Øverby
Basic scenario: we have a closed connection (In TCPS_FIN_WAIT_2), and
get a new connection (i.e. SYN) re-using the tuple.
Without syncookies we look at the SYN, and completely unlink the old,
closed state on the SYN.
With syncookies we send a generated SYN|ACK back, and drop the SYN,
never looking at the state table.
So when the ACK (i.e. the third step in the three way handshake for
connection setup) turns up, we’ve not actually removed the old state, so
we find it, and don’t do the syncookie dance, or allow the new
connection to get set up.
Explicitly check for this in pf_test_state_tcp(). If we find a state in
TCPS_FIN_WAIT_2 and the syncookie is valid we delete the existing state
so we can set up the new state.
Note that when we verify the syncookie in pf_test_state_tcp() we don't
decrement the number of half-open connections to avoid an incorrect
double decrement.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D37919
The name doesn't get modified, and it gets passed through to a hash
function that accepts only a const pointer. Const it for correctness.
Sponsored by: Juniper Networks, Inc.
This use of "volatile" in the vnet definitions doesn't have any effect.
VNET_DEFINE_STATE(volatile int, ...) should work, but let's avoid using
"volatile" altogether and convert to atomic_load/atomic_store. Also
convert to bool while here.
Reviewed by: kp, mjg
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D37684
The detach of the interface and group were leaving pfi_ifnet memory
behind. Check if the kif still has references, and clean it up if it
doesn't
On interface detach, the group deletion was notified first and then a
change notification was sent. This would recreate the group in the kif
layer. Reorder the change to before the delete.
PR: 257218
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D37569
For the TCP protocol inpcb storage specify allocation size that would
provide space to most of the data a TCP connection needs, embedding
into struct tcpcb several structures, that previously were allocated
separately.
The most import one is the inpcb itself. With embedding we can provide
strong guarantee that with a valid TCP inpcb the tcpcb is always valid
and vice versa. Also we reduce number of allocs/frees per connection.
The embedded inpcb is placed in the beginning of the struct tcpcb,
since in_pcballoc() requires that. However, later we may want to move
it around for cache line efficiency, and this can be done with a little
effort. The new intotcpcb() macro is ready for such move.
The congestion algorithm data, the TCP timers and osd(9) data are
also embedded into tcpcb, and temprorary struct tcpcb_mem goes away.
There was no extra allocation here, but we went through extra pointer
every time we accessed this data.
One interesting side effect is that now TCP data is allocated from
SMR-protected zone. Potentially this allows the TCP stacks or other
TCP related modules to utilize that for their own synchronization.
Large part of the change was done with sed script:
s/tp->ccv->/tp->t_ccv./g
s/tp->ccv/\&tp->t_ccv/g
s/tp->cc_algo/tp->t_cc/g
s/tp->t_timers->tt_/tp->tt_/g
s/CCV\(ccv, osd\)/\&CCV(ccv, t_osd)/g
Dependency side effect is that code that needs to know struct tcpcb
should also know struct inpcb, that added several <netinet/in_pcb.h>.
Differential revision: https://reviews.freebsd.org/D37127
scrub rules have defaulted to handling fragments for a long time, but
since we removed "fragment crop" and "fragment drop-ovl" in 64b3b4d611
this has become less obvious and more expensive ("reassemble" being the
more expensive option, even if it's the one the vast majority of users
should be using).
Extend the 'scrub' syntax to allow fragment reassembly to be disabled,
while retaining the other scrub behaviour (e.g. TTL changes, random-id,
..) using 'scrub fragment no reassemble'.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37459
The recent refactoring to prepare for pfsync over IPv6 introduced a
memory leak.
If we don't have a sync peer configured we return early (without sending
out a packet), but failed to free the newly allocated packet.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Serialize rcvif when enqueing packets for codel. We already tried to
restore the serialized rcvif in fq_codel_extract_head(), but that
doesn't work when we fail to serialize it first, so we ended up dropping
all packets passed through codel.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37318
Work is ongoing to add support for pfsync over IPv6. This required some
changes to allow for differentiating between the two families in a more
generic way.
This patch converts the relevant ioctls to using nvlists, making future
extensions (such as supporting IPv6 addresses) easier.
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D36277
The NAT module use of the tcphdr.th_x2 field now collides with the
use of this TCP header flag as AccECN (AE) bit. Use the topmost
bit instead to allow negotiation of AccECN across a NAT device.
Event: IETF 115 Hackathon
Reviewed By: #transport, tuexen
MFC after: 3 days
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D37300
Allow pf (l2) to be used to redirect ethernet packets to a different
interface.
The intended use case is to send 802.1x challenges out to a side
interface, to enable AT&T links to function with pfSense as a gateway,
rather than the AT&T provided hardware.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37193
When syncookies are in adaptive mode they may be active or inactive.
Expose this status to users.
Suggested by: Guido van Rooij
Sponsored by: Rubicon Communications, LLC ("Netgate")
Rather than using a per-cpu state counter, and adding in the CPU id we
can atomically increment the number.
This has the advantage of removing the assumption that the CPU ID fits
in 8 bits.
Event: Aberdeen Hackathon 2022
Reviewed by: mjg
Differential Revision: https://reviews.freebsd.org/D36915
Use time_t rather than uint32_t to represent the timestamps. That means
we have 64 bits rather than 32 on all platforms except i386, avoiding
the Y2K38 issues on most platforms.
Reviewed by: Zhenlei Huang
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D36837
GCC warns about the mismatched sizes on 32-bit platforms where
uintmax_t is larger in size than a pointer.
Reviewed by: imp, cy
Differential Revision: https://reviews.freebsd.org/D36753