There are some error paths in ioctl handlers that will call
pf_krule_free() before the rule's rpool.mtx field is initialized,
causing a panic with INVARIANTS enabled.
Fix the problem by introducing pf_krule_alloc() and initializing the
mutex there. This does mean that the rule->krule and pool->kpool
conversion functions need to stop zeroing the input structure, but I
don't see a nicer way to handle this except perhaps by guarding the
mtx_destroy() with a mtx_initialized() check.
Constify some related functions while here and add a regression test
based on a syzkaller reproducer.
Reported by: syzbot+77cd12872691d219c158@syzkaller.appspotmail.com
Reviewed by: kp
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34115
If there are no more entries, or if we fail to restore the rcvif of a
queued mbuf dn_dequeue() can return NULL.
Cope with this.
Reviewed by: glebius
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D34078
This fixed panic with interface being removed while packet
was sitting on a queue. This allows to pass all dummynet
tests including forthcoming dummynet:ipfw_interface_removal
and dummynet:pf_interface_removal and demonstrates use of
m_rcvif_serialize() and m_rcvif_restore().
Reviewed by: kp
Differential revision: https://reviews.freebsd.org/D33267
The new lock introduced in 5f5e32f1b3 needs to be initialised early so
that it can be safely destroyed if we error out.
Reported-by: syzbot+d76113e9a4ae0c0fcac2@syzkaller.appspotmail.com
MFC after: 3 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
The roundrobin pool stores its state in the rule, which could
potentially lead to invalid addresses being returned.
For example, thread A just executed PF_AINC(&rpool->counter) and
immediately afterwards thread B executes PF_ACPY(naddr, &rpool->counter)
(i.e. after the pf_match_addr() check of rpool->counter).
Lock the rpool with its own mutex to prevent these races. The
performance impact of this is expected to be low, as each rule has its
own lock, and the lock is also only relevant when state is being created
(so only for the initial packets of a connection, not for all traffic).
See also: https://redmine.pfsense.org/issues/12660
Reviewed by: glebius
MFC after: 3 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33874
Postpone zeroing out pd until after the PFI_IFLAG_SKIP/M_SKIP_FIREWALL
checks. We don't need it until then, and it saves us a few CPU cycles in
some cases.
This isn't expected to make a measurable performance change though.
Reviewed by: mjg, glebius
Pointed out by: markj
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33815
It's never set, so we can remove both the check for it and the
definition.
Reviewed by: mjg, glebius
Pointed out by: markj
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33814
SDT probe frb_natv4in is only available when an error is encountered.
Make it also available when no error is encountered, i.e. NATed and
not translated.
MFC after: 1 week
Replace the INLINE macro with inline. Some ancient compilers supported
__inline__ instead of inline. The INLINE hack compensated for it.
Ancient compilers are history.
Reported by: glebius
MFC after: 1 month
Convert ipfilter kernel function declarations from K&R to ANSI. This
syncs our function declarations with NetBSD hg commit 75edcd7552a0
(apply our changes). Though not copied from NetBSD, this change was
partially inspired by NetBSD's work and inspired by style(9).
Reviewed by: glebius (for #network)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D33595
Through fixes and improvements our ipfilter sources have diverged
enough to warrant move from contrib into sys/netpil. Now that I'm
planning on implementing MSS clamping as in iptables it makes more
sense to move ipfilter to netpfil.
This is the first of three commits the ipfilter move.
Suggested by glebius on two occaions.
Suggested by and discussed with: glebius
Reviewed by: glebius, kp (for #network)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D33510
When we exposed the PFSYNCF_OK flag to userspace in 5f5bf88949 we
unintentionally caused defer mode to always be enabled.
The ioctl check only looked for nonzero, not for the PFSYNCF_DEFER flag.
Fix this check and ensure ifconfig sets the flag.
Reviewed by: glebius
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33244
* Ensure we unlock the pfsync lock in pfsync_defer()
* We must hold the bucket lock when calling pfsync_push()
* The pfsync_defer_tmo() callout locks the bucket lock, not the pfsync
lock
Reviewed by: glebius
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33243
Don't use a fixed number of ticks, but take hz into account so we have a
consistent timeout, regardless of what hz is set up.
Use a 20ms timeout, becaues that's what OpenBSD uses.
Reviewed by: glebius
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33242
This flag is stored in if_drv_flags, not if_flags.
Reviewed by: glebius
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33241
In pfsync_defer() we must wait to lock sc until we've ensured it's not
NULL.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33240
There were two issues with the new pflog packet length.
The first is that the length is expected to be a multiple of
sizeof(long), but we'd assumed it had to be a multiple of
sizeof(uint32_t).
The second is that there's some broken software out there (such as
Wireshark) that makes incorrect assumptions about the amount of padding.
That is, Wireshark assumes there's always three bytes of padding, rather
than however much is needed to get to a multiple of sizeof(long).
Fix this by adding extra padding, and a fake field to maintain
Wireshark's assumption.
Reported by: Ozkan KIRIK <ozkan.kirik@gmail.com>
Tested by: Ozkan KIRIK <ozkan.kirik@gmail.com>
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33236
The socket option handler tries to ensure that the option length is no
larger than some reasonable maximum, and no smaller than sizeof(struct
dn_id). But the loaded option length is stored in an int, which is
converted to an unsigned integer for the comparison with a size_t, so
negative values are not caught and instead get passed to malloc().
Change the code to use a size_t for the buffer size.
Reviewed by: kp
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33133
do_config() processes a buffer of variable-length dummynet commands.
The loop which processes this buffer loads the fixed-length header
before checking whether there are any bytes left to read, so it performs
a 4-byte read past the end of the buffer before terminating.
Restructure the loop to avoid this.
Reported by: Jenkins (KASAN job)
Reviewed by: kp
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33132
This test failed if ipfw was loaded (as well as pf). pf used the same
tag as dummynet to indicate a packet had already gone through dummynet.
However, ipfw removes this tag, so pf didn't realise the packet had
already gone through dummynet.
Introduce a separate flag, in the existing pf mtag rather than re-using
the ipfw tag. There were no free flag bits, but PF_TAG_FRAGCACHE is no
longer used so its bit can be re-purposed.
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33087
In e5c4987e3f we fixed issues with nat and dummynet, but only changed
the IPv4 code. Make the same change for IPv6 as well.
Reviewed by: glebius
MFC after: 3 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33086
DIOCKEEPCOUNTERS used to overlap with DIOCGIFSPEEDV0, which has been
fixed in 14, but remains in stable/12 and stable/13.
Support the old, overlapping, call under COMPAT_FREEBSD13.
Reviewed by: jhb
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33001
Allow users to set a number on rules which will be exposed as part of
the pflog header.
The intent behind this is to allow users to correlate rules across
updates (remember that pf rules continue to exist and match existing
states, even if they're removed from the active ruleset) and pflog.
Obtained from: pfSense
MFC after: 3 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D32750
Dummynet differs from ALTQ in that ALTQ schedules packets after they
leave pf. Dummynet schedules them after they leave pf, but then
re-injects them.
We currently deal with this by ensuring we don't re-schedule a packet we
get from dummynet, but this produces unexpected results when combined
with NAT, as dummynet processing is done after the NAT transformation.
In other words, the second time the packet is handed to pf it may have a
different source and destination address.
Simplify this by moving dummynet processing to after all other pf
processing, and not re-processing (but always passing) packets from
dummynet.
This fixes NAT of dummynet delayed packets, and also reduces processing
overhead (because we only do state/rule lookup for each dummynet packet
once, rather than twice).
MFC after: 3 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D32665
The modification to the hash are already naturally locked by
in_control_sx. Convert the hash lists to CK lists. Remove the
in_ifaddr_rmlock. Assert the network epoch where necessary.
Most cases when the hash lookup is done the epoch is already entered.
Cover a few cases, that need entering the epoch, which mostly is
initial configuration of tunnel interfaces and multicast addresses.
Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D32584
When we route-to a packet that later turns out to not fit in the
outbound interface MTU we generate an ICMP error.
However, if we've already changed those (i.e. we've passed through a NAT
rule) we have to undo the transformation first.
Obtained from: pfSense
MFC after: 3 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D32571