"record-state" is similar to "keep-state", but it doesn't produce implicit
O_PROBE_STATE opcode in a rule. "set-limit" is like "limit", but it has the
same feature as "record-state", it is single opcode without implicit
O_PROBE_STATE opcode. "defer-action" is targeted to be used with dynamic
states. When rule with this opcode is matched, the rule's action will
not be executed, instead dynamic state will be created. And when this
state will be matched by "check-state", then rule action will be executed.
This allows create a more complicated rulesets.
Submitted by: lev
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D1776
On arm64 (and possible other architectures) we are unable to use static
DPCPU data in kernel modules. This is because the compiler will generate
PC-relative accesses, however the runtime-linker expects to be able to
relocate these.
In preparation to fix this create two macros depending on if the data is
global or static.
Reviewed by: bz, emaste, markj
Sponsored by: ABT Systems Ltd
Differential Revision: https://reviews.freebsd.org/D16140
Several third-parties use at least some of these ioctls. While it would be
better for regression testing if they were used in base (or at least in the
test suite), it's currently not worth the trouble to push through removal.
Submitted by: antoine, markj
Several ioctls are unused in pf, in the sense that no base utility
references them. Additionally, a cursory review of pf-based ports
indicates they're not used elsewhere either. Some of them have been
unused since the original import. As far as I can tell, they're also
unused in OpenBSD. Finally, removing this code removes the need for
future pf work to take them into account.
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D16076
States learned via pfsync from a peer with the same ruleset checksum were not
getting assigned to rules like they should because pfsync_in_upd() wasn't
passing the PFSYNC_SI_CKSUM flag along to pfsync_state_import.
PR: 229092
Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net>
Obtained from: OpenBSD
MFC after: 1 week
Sponsored by: InnoGames GmbH
Normally pf rules are expected to do one of two things: pass the traffic or
block it. Blocking can be silent - "drop", or loud - "return", "return-rst",
"return-icmp". Yet there is a 3rd category of traffic passing through pf:
Packets matching a "pass" rule but when applying the rule fails. This happens
when redirection table is empty or when src node or state creation fails. Such
rules always fail silently without notifying the sender.
Allow users to configure this behaviour too, so that pf returns an error packet
in these cases.
PR: 226850
Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net>
MFC after: 1 week
Sponsored by: InnoGames GmbH
Using of rwlock with multiqueue NICs for IP forwarding on high pps
produces high lock contention and inefficient. Rmlock fits better for
such workloads.
Reviewed by: melifaro, olivier
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D15789
If a locally generated packet is routed (with route-to/reply-to/dup-to) out of
a different interface it's passed through the firewall again. This meant we
lost the inp pointer and if we required the pointer (e.g. for user ID matching)
we'd deadlock trying to acquire an inp lock we've already got.
Pass the inp pointer along with pf_route()/pf_route6().
PR: 228782
MFC after: 1 week
Per-cpu zone allocations are very rarely done compared to regular zones.
The intent is to avoid pessimizing the latter case with per-cpu specific
code.
In particular contrary to the claim in r334824, M_ZERO is sometimes being
used for such zones. But the zeroing method is completely different and
braching on it in the fast path for regular zones is a waste of time.
Given that PF_RULES_LOCK is a mostly read lock, replace the rwlock with rmlock.
This change improves packet processing rate in high pps environments.
Benchmarking by olivier@ shows a 65% improvement in pps.
While here, also eliminate all appearances of "sys/rwlock.h" includes since it
is not used anymore.
Submitted by: farrokhi@
Differential Revision: https://reviews.freebsd.org/D15502
This feature is disabled by default and was removed when dynamic states
implementation changed to be lockless. Now it is reimplemented with small
differences - when dyn_keep_states sysctl variable is enabled,
dyn_match_ipv[46]_state() function doesn't match child states of deleted
rule. And thus they are keept alive until expired. ipfw_dyn_lookup_state()
function does check that state was not orphaned, and if so, it returns
pointer to default_rule and its position in the rules map. The main visible
difference is that orphaned states still have the same rule number that
they have before parent rule deleted, because now a state has many fields
related to rule and changing them all atomically to point to default_rule
seems hard enough.
Reported by: <lantw44 at gmail.com>
MFC after: 2 days
dyn_lookup_ipv[46]_state_locked(). These checks are remnants of not
ready to be committed code, and they are there by accident.
Due to the race these checks can lead to creating of duplicate states
when concurrent threads in the same time will try to add state for two
packets of the same flow, but in reverse directions and matched by
different parent rules.
Reported by: lev
MFC after: 3 days
o Modify ipfw(8) to be able set any prefix6 not just Well-Known,
and also show configured prefix6;
o relocate some definitions and macros into proper place;
o convert nat64_debug and nat64_allow_private variables to be
VNET-compatible;
o add struct nat64_config that keeps generic configuration needed
to NAT64 code;
o add nat64_check_prefix6() function to check validness of specified
by user IPv6 prefix according to RFC6052;
o use nat64_check_private_ip4() and nat64_embed_ip4() functions
instead of nat64_get_ip4() and nat64_set_ip4() macros. This allows
to use any configured IPv6 prefixes that are allowed by RFC6052;
o introduce NAT64_WKPFX flag, that is set when IPv6 prefix is
Well-Known IPv6 prefix. It is used to reduce overhead to check this;
o modify nat64lsn_cfg and nat64stl_cfg structures to use nat64_config
structure. And respectivelly modify the rest of code;
o remove now unused ro argument from nat64_output() function;
o remove __FreeBSD_version ifdef, NAT64 was not merged to older versions;
o add commented -DIPFIREWALL_NAT64_DIRECT_OUTPUT flag to module's Makefile
as example.
Obtained from: Yandex LLC
MFC after: 1 month
Sponsored by: Yandex LLC
This driver was for an early and uncommon legacy PCI 10GbE for a single
ASIC, Intel 82597EX. Intel quickly shifted to the long lived ixgbe family.
Submitted by: kbowling
Reviewed by: brooks imp jeffrey.e.pieper@intel.com
Relnotes: yes
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15234
given mbuf is considered as not matched.
If mbuf was consumed or freed during handling, we must return
IP_FW_DENY, since ipfw's pfil handler ipfw_check_packet() expects
IP_FW_DENY when mbuf pointer is NULL. This fixes KASSERT panics
when NAT64 is used with INVARIANTS. Also remove unused nomatch_final
field from struct nat64lsn_cfg.
Reported by: Justin Holcomb <justin at justinholcomb dot me>
Obtained from: Yandex LLC
MFC after: 1 week
Sponsored by: Yandex LLC
pf ioctls frequently take a variable number of elements as argument. This can
potentially allow users to request very large allocations. These will fail,
but even a failing M_NOWAIT might tie up resources and result in concurrent
M_WAITOK allocations entering vm_wait and inducing reclamation of caches.
Limit these ioctls to what should be a reasonable value, but allow users to
tune it should they need to.
Differential Revision: https://reviews.freebsd.org/D15018
Ensure that multiplications for memory allocations cannot overflow, and
that we'll not try to allocate M_WAITOK for potentially overly large
allocations.
MFC after: 1 week
These ioctls can process a number of items at a time, which puts us at
risk of overflow in mallocarray() and of impossibly large allocations
even if we don't overflow.
There's no obvious limit to the request size for these, so we limit the
requests to something which won't overflow. Change the memory allocation
to M_NOWAIT so excessive requests will fail rather than stall forever.
MFC after: 1 week
These ioctls can process a number of items at a time, which puts us at
risk of overflow in mallocarray() and of impossibly large allocations
even if we don't overflow.
Limit the allocation to required size (or the user allocation, if that's
smaller). That does mean we need to do the allocation with the rules
lock held (so the number doesn't change while we're doing this), so it
can't M_WAITOK.
MFC after: 1 week
The DIOCRADDTABLES and DIOCRDELTABLES ioctls can process a number of
tables at a time, and as such try to allocate <number of tables> *
sizeof(struct pfr_table). This multiplication can overflow. Thanks to
mallocarray() this is not exploitable, but an overflow does panic the
system.
Arbitrarily limit this to 65535 tables. pfctl only ever processes one
table at a time, so it presents no issues there.
MFC after: 1 week
This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size). This is believed to be sufficent to
fully support ifconfig on 32-bit systems.
Reviewed by: kib
Obtained from: CheriBSD
MFC after: 1 week
Relnotes: yes
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14900
Forwarded packets passed through PFIL_OUT, which made it difficult for
firewalls to figure out if they were forwarding or producing packets. This in
turn is an issue for pf for IPv6 fragment handling: it needs to call
ip6_output() or ip6_forward() to handle the fragments. Figuring out which was
difficult (and until now, incorrect).
Having pfil distinguish the two removes an ugly piece of code from pf.
Introduce a new variant of the netpfil callbacks with a flags variable, which
has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if
a packet is forwarded.
Reviewed by: ae, kevans
Differential Revision: https://reviews.freebsd.org/D13715
If a user attempts to add two tables with the same name the duplicate table
will not be added, but we forgot to free the duplicate table, leaking memory.
Ensure we free the duplicate table in the error path.
Reported by: Coverity
CID: 1382111
MFC after: 3 weeks
ip_reass() expects IPv4 packet and will just corrupt any IPv6 packets
that it gets. Until proper IPv6 fragments handling function will be
implemented, pass IPv6 packets to next rule.
PR: 170604
MFC after: 1 week
If the user configures a states_hashsize or source_nodes_hashsize value we may
not have enough memory to allocate this. This used to lock up pf, because these
allocations used M_WAITOK.
Cope with this by attempting the allocation with M_NOWAIT and falling back to
the default sizes (with M_WAITOK) if these fail.
PR: 209475
Submitted by: Fehmi Noyan Isi <fnoyanisi AT yahoo.com>
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D14367
o added struct ipfw_dyn_info that keeps all needed for ipfw_chk and
for dynamic states implementation information;
o added DYN_LOOKUP_NEEDED() macro that can be used to determine the
need of new lookup of dynamic states;
o ipfw_dyn_rule now becomes obsolete. Currently it used to pass
information from kernel to userland only.
o IPv4 and IPv6 states now described by different structures
dyn_ipv4_state and dyn_ipv6_state;
o IPv6 scope zones support is added;
o ipfw(4) now depends from Concurrency Kit;
o states are linked with "entry" field using CK_SLIST. This allows
lockless lookup and protected by mutex modifications.
o the "expired" SLIST field is used for states expiring.
o struct dyn_data is used to keep generic information for both IPv4
and IPv6;
o struct dyn_parent is used to keep O_LIMIT_PARENT information;
o IPv4 and IPv6 states are stored in different hash tables;
o O_LIMIT_PARENT states now are kept separately from O_LIMIT and
O_KEEP_STATE states;
o per-cpu dyn_hp pointers are used to implement hazard pointers and they
prevent freeing states that are locklessly used by lookup threads;
o mutexes to protect modification of lists in hash tables now kept in
separate arrays. 65535 limit to maximum number of hash buckets now
removed.
o Separate lookup and install functions added for IPv4 and IPv6 states
and for parent states.
o By default now is used Jenkinks hash function.
Obtained from: Yandex LLC
MFC after: 42 days
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D12685
When INVARIANTS is not set the 'last' variable is not used, which can generate
compiler warnings.
If this invariant is ever violated it'd result in a KASSERT failure in
refcount_release(), so this one is not strictly required.
specified in the arg1 into ICMPv6 destination unreachable code according
to RFC7915.
Obtained from: Yandex LLC
MFC after: 2 weeks
Sponsored by: Yandex LLC
pf_unlink_state() releases a reference to the state without checking if
this is the last reference. It can't be, because pf_state_insert()
initialises it to two. KASSERT() that this is always the case.
CID: 1347140
When allocating memory through malloc(9), we always expect the amount of
memory requested to be unsigned as a negative value would either stand for
an error or an overflow.
Unsign some values, found when considering the use of mallocarray(9), to
avoid unnecessary casting. Also consider that indexes should be of
at least the same size/type as the upper limit they pretend to index.
MFC after: 3 weeks
Now it is possible to use UDPLite's port numbers in rules,
create dynamic states for UDPLite packets and see "UDPLite" for matched
packets in log.
Obtained from: Yandex LLC
MFC after: 2 weeks
Sponsored by: Yandex LLC
userspace to control NUMA policy administratively and programmatically.
Implement domainset based iterators in the page layer.
Remove the now legacy numa_* syscalls.
Cleanup some header polution created by having seq.h in proc.h.
Reviewed by: markj, kib
Discussed with: alc
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D13403
pfioctl() handles several ioctl that takes variable length input, these
include:
- DIOCRADDTABLES
- DIOCRDELTABLES
- DIOCRGETTABLES
- DIOCRGETTSTATS
- DIOCRCLRTSTATS
- DIOCRSETTFLAGS
All of them take a pfioc_table struct as input from userland. One of
its elements (pfrio_size) is used in a buffer length calculation.
The calculation contains an integer overflow which if triggered can lead
to out of bound reads and writes later on.
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
No functional change intended.
This is similar to the TCP case. where a TCP RST segment can be sent.
There is one limitation: When sending an ABORT in response to an incoming
packet, it should be tested if there is no ABORT chunk in the received
packet. Currently, it is only checked if the first chunk is an ABORT
chunk to avoid parsing the whole packet, which could result in a DOS attack.
Thanks to Timo Voelker for helping me to test this patch.
Reviewed by: bcr@ (man page part), ae@ (generic, non-SCTP part)
Differential Revision: https://reviews.freebsd.org/D13239
Hide the locking logic used in the dynamic states implementation from
generic code. Rename ipfw_install_state() and ipfw_lookup_dyn_rule()
function to have similar names: ipfw_dyn_install_state() and
ipfw_dyn_lookup_state(). Move dynamic rule counters updating to the
ipfw_dyn_lookup_state() function. Now this function return NULL when
there is no state and pointer to the parent rule when state is found.
Thus now there is no need to return pointer to dynamic rule, and no need
to hold bucket lock for this state. Remove ipfw_dyn_unlock() function.
Obtained from: Yandex LLC
MFC after: 1 week
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D11657
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
Do not invoke IPv4 NAT handler for non IPv4 packets. Libalias expects
a packet is IPv4. And in case when it is IPv6, it just translates them
as IPv4. This leads to corruption and in some cases to panics.
In particular a panic can happen when value of ip6_plen modified to
something that leads to IP fragmentation, but actual packet length does
not match the IP length.
Packets that are not IPv4 will be dropped by NAT rule.
Reported by: Viktor Dukhovni <freebsd at dukhovni dot org>
MFC after: 1 week
IPsec support can be loaded as kernel module, thus do not depend from
kernel option IPSEC and always build O_IPSEC opcode implementation as
enabled.
Obtained from: Yandex LLC
MFC after: 1 week
Sponsored by: Yandex LLC
fq_pie schedulers packet classification functions in layer2 (bridge mode).
Dummynet AQM packet marking function ecn_mark() and fq_codel/fq_pie
schedulers packet classification functions (fq_codel_classify_flow()
and fq_pie_classify_flow()) assume mbuf is pointing at L3 (IP)
packet. However, this assumption is incorrect if ipfw/dummynet is
used to manage layer2 traffic (bridge mode) since mbuf will point
at L2 frame. This patch solves this problem by identifying the
source of the frame/packet (L2 or L3) and adding ETHER_HDR_LEN
offset when converting an mbuf pointer to ip pointer if the traffic
is from layer2. More specifically, in dummynet packet tagging
function, tag_mbuf(), iphdr_off is set to ETHER_HDR_LEN if the
traffic is from layer2 and set to zero otherwise. Whenever an access
to IP header is required, mtodo(m, dn_tag_get(m)->iphdr_off) is
used instead of mtod(m, struct ip *) to correctly convert mbuf
pointer to ip pointer in both L2 and L3 traffic.
Submitted by: lstewart
MFC after: 2 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D12506
To properly handle 'fwd tablearg,port' opcode, copy sin_port value from
sockaddr_in structure stored in the opcode into corresponding hopstore
field.
PR: 222953
MFC after: 1 week
Acquiring of IPFW_WLOCK is requried for cases when we are going to
change some data that can be accessed during processing of packets flow.
When we create new named object, there are not yet any rules, that
references it, thus holding IPFW_UH_WLOCK is enough to safely update
needed structures. When we destroy an object, we do this only when its
reference counter becomes zero. And it is safe to not acquire IPFW_WLOCK,
because noone references it. The another case is when we failed to finish
some action and thus we are doing rollback and destroying an object, in
this case it is still not referenced by rules and no need to acquire
IPFW_WLOCK.
This also fixes panic with INVARIANTS due to recursive IPFW_WLOCK acquiring.
MFC after: 1 week
Sponsored by: Yandex LLC
This is an import of Alexander Bluhm's OpenBSD commit r1.60,
the first chunk had to be modified because on OpenBSD the
'cut' declaration is located elsewhere.
Upstream report by Jingmin Zhou:
https://marc.info/?l=openbsd-pf&m=150020133510896&w=2
OpenBSD commit message:
Use a 32 bit variable to detect integer overflow when searching for
an unused nat port. Prevents a possible endless loop if high port
is 65535 or low port is 0.
report and analysis Jingmin Zhou; OK sashan@ visa@
Quoted from: https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net/pf_lb.c
PR: 221201
Submitted by: Fabian Keil <fk@fabiankeil.de>
Obtained from: OpenBSD via ElectroBSD
MFC after: 1 week
Previously, GRE packets in IPv6 tunnels would be dropped by IPFW (unless
net.inet6.ip6.fw.deny_unknown_exthdrs was unset).
PR: 220640
Submitted by: Kun Xie <kxie@xiplink.com>
MFC after: 1 week
pf_purge_thread() breaks up the work of iterating all states (in
pf_purge_expired_states()) and tracks progress in the idx variable.
If multiple vnets exist this results in pf_purge_thread() only calling
pf_purge_expired_states() for part of the states (the first part of the
first vnet, second part of the second vnet and so on).
Combined with the mark-and-sweep approach to cleaning up old rules (in
V_pf_unlinked_rules) that resulted in pf freeing rules that were still
referenced by states. This in turn caused panics when pf_state_expires()
encounters that state and attempts to access the rule.
We need to track the progress per vnet, not globally, so idx is moved
into a per-vnet V_pf_purge_idx.
PR: 219251
Sponsored by: Hackathon Essen 2017
(TS) method is used. When packet timestamp is used, the "current_qdelay"
keeps storing the last queue delay value calculated in the dequeue
function. Therefore, when a burst of packets arrives followed by
a pause, the "current_qdelay" will store a high value caused by the
burst and stick to that value during the pause because the queue
delay measurement is done inside the dequeue function. This causes
the drop probability calculation function to calculate high drop
probability value instead of zero and prevents the burst allowance
mechanism from working properly. Fix this problem by resetting
"current_qdelay" inside the drop probability calculation function
when the queue length is zero and TS option is used.
Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au>
MFC after: 1 week
defined. On machines without arithmetic shift instructions, zero bits
may be shifted in from the left, giving a large positive result instead
of the desired divide-by power-of-2. Fix this by operating on the
absolute value and compensating for the possible negation later.
Reverse the order of the underflow/overflow tests and the exponential
decay calculation to avoid the possibility of an erroneous overflow
detection if p is a sufficiently small non-negative value. Also
check for negative values of prob before doing the exponential decay
to avoid another instance of of right shifting a negative value.
Tested by: Rasool Al-Saadi <ralsaadi@swin.edu.au>
MFC after: 1 week
When running the vnet init code (pf_load_vnet()) we used to iterate over
all vnets, marking them as unhooked.
This is incorrect and leads to panics if pf is unloaded, as the unload
code does not unregister the pfil hooks (because the vnet is marked as
unhooked).
There's no need or reason to touch other vnets during initialisation.
Their pf_load_vnet() function will be triggered, which handles all
required initialisation.
Reviewed by: zec, gnn
Differential Revision: https://reviews.freebsd.org/D10592
vnet_pf_uninit() is called through vnet_deregister_sysuninit() and
linker_file_unload() when the pf module is unloaded. This is executed
after pf_unload() so we end up trying to take locks which have been
destroyed already.
Move pf_unload() to a separate SYSUNINIT() to ensure it's called after
all the vnet_pf_uninit() calls.
Differential Revision: https://reviews.freebsd.org/D10025
When forwarding pf tracks the size of the largest fragment in a fragmented
packet, and refragments based on this size.
It failed to ensure that this size was a multiple of 8 (as is required for all
but the last fragment), so it could end up generating incorrect fragments.
For example, if we received an 8 byte and 12 byte fragment pf would emit a first
fragment with 12 bytes of payload and the final fragment would claim to be at
offset 8 (not 12).
We now assert that the fragment size is a multiple of 8 in ip6_fragment(), so
other users won't make the same mistake.
Reported by: Antonios Atlasis <aatlasis at secfu net>
MFC after: 3 days
The 'pktid' variable is modified while being used twice between
sequence points, probably due to htonl() is macro.
Reported by: PVS-Studio
MFC after: 1 week
to pass rule number and rule set to userland. In r272840 the kernel
internal rule representation was changed and the rulenum field of
struct ip_fw_rule got the type uint32_t, but userlevel representation
still have the type uint16_t. To not overflow the size of pointer
on the systems with 32-bit pointer size use separate variable to
copy rulenum and set.
Reported by: PVS-Studio
MFC after: 1 week
Some dummynet modules used strcpy() to copy from a larger buffer
(dn_aqm->name) to a smaller buffer (dn_extra_parms->name). It happens that
the lengths of the strings in the dn_aqm buffers were always hardcoded to be
smaller than the dn_extra_parms buffer ("CODEL", "PIE").
Use strlcpy() instead, to appease static checkers. No functional change.
Reported by: Coverity
CIDs: 1356163, 1356165
Sponsored by: Dell EMC Isilon
Make PFIL's lock global and use it for this purpose.
This reduces the number of locks needed to acquire for each packet.
Obtained from: Yandex LLC
MFC after: 2 weeks
Sponsored by: Yandex LLC
No objection from: #network
Differential Revision: https://reviews.freebsd.org/D10154
The module is designed for modification of a packets of any protocols.
For now it implements only TCP MSS modification. It adds the external
action handler for "tcp-setmss" action.
A rule with tcp-setmss action does additional check for protocol and
TCP flags. If SYN flag is present, it parses TCP options and modifies
MSS option if its value is greater than configured value in the rule.
Then it adjustes TCP checksum if needed. After handling the search
continues with the next rule.
Obtained from: Yandex LLC
MFC after: 2 weeks
Relnotes: yes
Sponsored by: Yandex LLC
No objection from: #network
Differential Revision: https://reviews.freebsd.org/D10150
This opcode can be used to attach some data to external action opcode.
And unlike to O_EXTERNAL_INSTANCE opcode, this opcode does not require
creating of named instance to pass configuration arguments to external
action handler. The data is coming just next to O_EXTERNAL_ACTION opcode.
The userlevel part currenly supports formatting for opcode with ipfw_insn
size, by default it expects u16 numeric value in the arg1.
Obtained from: Yandex LLC
MFC after: 2 weeks
Sponsored by: Yandex LLC
external action is completed, but the rule search is continued.
External action handler can change the content of @args argument,
that is used for dynamic state lookup. Enforce the new lookup to be able
install new state, when the search is continued.
Obtained from: Yandex LLC
MFC after: 1 week
Sponsored by: Yandex LLC
Prevent possible races in the pf_unload() / pf_purge_thread() shutdown
code. Lock the pf_purge_thread() with the new pf_end_lock to prevent
these races.
Use a shared/exclusive lock, as we need to also acquire another sx lock
(VNET_LIST_RLOCK). It's fine for both pf_purge_thread() and pf_unload()
to sleep,
Pointed out by: eri, glebius, jhb
Differential Revision: https://reviews.freebsd.org/D10026
In pf_route6() we re-run the ruleset with PF_FWD if the packet goes out
of a different interface. pf_test6() needs to know that the packet was
forwarded (in case it needs to refragment so it knows whether to call
ip6_output() or ip6_forward()).
This lead pf_test6() to try to evaluate rules against the PF_FWD
direction, which isn't supported, so it needs to treat PF_FWD as PF_OUT.
Once fwdir is set correctly the correct output/forward function will be
called.
PR: 217883
Submitted by: Kajetan Staszkiewicz
MFC after: 1 week
Sponsored by: InnoGames GmbH
- PIE_MAX_PROB is compared to variable of int64_t and the type promotion
rules can cause the value of that variable to be treated as unsigned.
If the value is actually negative, then the result of the comparsion
is incorrect, causing the algorithm to perform poorly in some
situations. Changing the constant to be signed cause the comparision
to work correctly.
- PIE_SCALE is also compared to signed values. Fortunately they are
also compared to zero and negative values are discarded so this is
more of a cosmetic fix.
- PIE_DQ_THRESHOLD is only compared to unsigned values, but it is small
enough that the automatic promotion to unsigned is harmless.
Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au>
MFC after: 1 week
Rules are unlinked in shutdown_pf(), so we must call
pf_unload_vnet_purge(), which frees unlinked rules, after that, not
before.
Reviewed by: eri, bz
Differential Revision: https://reviews.freebsd.org/D10040
When we unload we don't hold the pf_rules_lock, so we cannot call rw_sleep()
with it, because it would release a lock we do not hold. There's no need for the
lock either, so we can just tsleep().
While here also make the same change in pf_purge_thread(), because it explicitly
takes the lock before rw_sleep() and then immediately releases it afterwards.
If the call to pf_state_key_clone() in pf_get_translation() fails (i.e. there's
no more memory for it) it frees skp. This is wrong, because skp is a
pf_state_key **, so we need to free *skp, as is done later in the function.
Getting it wrong means we try to free a stack variable of the calling
pf_test_rule() function, and we panic.
o check the size of O_IP_SRC_LOOKUP opcode, it can not exceed the size of
ipfw_insn_u32;
o rename ipfw_lookup_table_extended() function into ipfw_lookup_table() and
remove old ipfw_lookup_table();
o use args->f_id.flow_id6 that is in host byte order to get DSCP value;
o add SCTP ports support to 'lookup src/dst-port' opcode;
o add IPv6 support to 'lookup src/dst-ip' opcode.
PR: 217292
Reviewed by: melifaro
MFC after: 2 weeks
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D9873
When we doing reference counting of named objects in the new rule,
for existing objects check that opcode references to correct object,
otherwise return EINVAL.
PR: 217391
MFC after: 1 week
Sponsored by: Yandex LLC
in valuestate array.
When opcode has size equal to ipfw_insn_u32, this means that it should
additionally match value specified in d[0] with table entry value.
ipfw_table_lookup() returns table value index, use TARG_VAL() macro to
convert it to its value. The actual 32-bit value stored in the tag field
of table_value structure, where all unspecified u32 values are kept.
PR: 217262
Reviewed by: melifaro
MFC after: 1 week
Sponsored by: Yandex LLC