Also add an M_ASSERTMAPPED() macro to verify that all mbufs in the chain
are mapped. Use it in ipfw_nat, which operates on a chain returned by
m_megapullup().
PR: 255164
Reviewed by: ae, gallatin
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29838
Use M_NOWAIT flag when hash growing is called from callout.
PR: 255041
Reviewed by: kevans
MFC after: 10 days
Differential Revision: https://reviews.freebsd.org/D29772
Just like with the packet counters move the timekeeping information into
dn_cfg. This reduces the global name space use for dummynet and will
make subsequent work to add vnet support and re-use in pf easier.
Reviewed by: donner
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Different Revision: https://reviews.freebsd.org/D29246
Move the packets counters into the dn_cfg struct. This reduces the
global name space use for dummynet and will make future work for things
like vnet support and re-use in pf easier.
Reviewed by: donner
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29245
to be a true RFC 6598 NAT444 setup, where each network segment (e.g. user,
subnet) can have their own dedicated port aliasing ranges.
Reviewed by: donner, kp
Approved by: 0mp (mentor), donner, kp
Differential Revision: https://reviews.freebsd.org/D23450
It helps to reduce complexity with debugging of large ipfw rulesets.
Also define several constants and translators, that can by used by
dtrace scripts with this probe.
Reviewed by: gnn
Obtained from: Yandex LLC
MFC after: 2 weeks
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D26879
NAT64LSN requires the presence of upper level protocol header
in a IPv4 datagram to find corresponding state to make translation.
Now it will be handled automatically by nat64lsn instance.
Reviewed by: melifaro
Obtained from: Yandex LLC
MFC after: 1 week
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D26758
"df", "rf" and "offset". This allows to match on specific
bits of ip_off field.
For compatibility reasons lack of keyword means "offset".
Reviewed by: ae
Differential Revision: https://reviews.freebsd.org/D26021
Upper level protocols defer checksums calculation in hope we have
checksums offloading in a network card. CSUM_DELAY_DATA flag is used
to determine that checksum calculation was deferred. And IP output
routine checks for this flag before pass mbuf to lower layer. Forwarded
packets have not this flag.
NAT64 uses checksums adjustment when it translates IP headers.
In most cases NAT64 is used for forwarded packets, but in case when it
handles locally originated packets we need to finish checksum calculation
that was deferred to correctly adjust it.
Add check for presence of CSUM_DELAY_DATA flag and finish checksum
calculation before adjustment.
Reported and tested by: Evgeniy Khramtsov <evgeniy at khramtsov org>
MFC after: 1 week
When dummynet initializes it prints a debug message with the current VNET
pointer unnecessarily revealing kernel memory layout. This appears to be left
over from when the first pieces of vimage support were added.
PR: 238658
Submitted by: huangfq.daxian@gmail.com
Reviewed by: markj, bz, gnn, kp, melifaro
Approved by: jtl (co-mentor), bz (co-mentor)
Event: July 2020 Bugathon
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D25619
fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees
over pointer validness and requiring on-stack data copying.
With no callers remaining, remove fib[46]_lookup_nh_ functions.
Submitted by: Neel Chauhan <neel AT neelc DOT org>
Differential Revision: https://reviews.freebsd.org/D25445
Nexthop objects implementation, defined in r359823,
introduced sys/net/route directory intended to hold all
routing-related code. Move recently-introduced route_temporal.c and
private route_var.h header there.
Differential Revision: https://reviews.freebsd.org/D24597
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
Rework tcpopts_parse() to be more strict. Use const pointer. Add length
checks for specific TCP options. The main purpose of the change is
avoiding of possible out of mbuf's data access.
Reported by: Maxime Villard
Reviewed by: melifaro, emaste
MFC after: 1 week
datagrams.
Previously destination address from original datagram was used. That
looked confusing, especially in the traceroute6 output.
Also honor IPSTEALTH kernel option and do TTL/HLIM decrementing only
when stealth mode is disabled.
Reported by: Marco van Tol <marco at tols org>
Reviewed by: melifaro
MFC after: 2 weeks
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D22631
PULLUP_LEN_LOCKED().
PULLUP_LEN_LOCKED() could update mbuf and thus we need to update related
pointers that can be used in next opcodes.
Reported by: Maxime Villard <max at m00nbsd net>
MFC after: 1 week
When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.
However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.
Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.
On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().
This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.
Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.
This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.
Reviewed by: gallatin, hselasky, cy, adrian, kristof
Differential Revision: https://reviews.freebsd.org/D19111
ACTION_PTR() returns pointer to the start of rule action section,
but rule can keep several rule modifiers like O_LOG, O_TAG and O_ALTQ,
and only then real action opcode is stored.
ipfw_get_action() function inspects the rule action section, skips
all modifiers and returns action opcode.
Use this function in ipfw_reset_eaction() and flush_nat_ptrs().
MFC after: 1 week
Sponsored by: Yandex LLC
After r343619 ipfw uses own locking for packets flow. PULLUP_LEN() macro
is used in ipfw_chk() to make m_pullup(). When m_pullup() fails, it just
returns via `goto pullup_failed`. There are two places where PULLUP_LEN()
is called with IPFW_PF_RLOCK() held.
Add PULLUP_LEN_LOCKED() macro to use in these places to be able release
the lock, when m_pullup() fails.
Sponsored by: Yandex LLC
After r343631 pfil hooks are invoked in net_epoch_preempt section,
this allows to avoid extra locking. Add NET_EPOCH_ASSER() assertion
to each ipfw_bpf_*tap*() call to require to be called from inside
epoch section.
Use NET_EPOCH_WAIT() in ipfw_clone_destroy() to wait until it becomes
safe to free() ifnet. And use on-stack ifnet pointer in each
ipfw_bpf_*tap*() call to avoid NULL pointer dereference in case when
V_*log_if global variable will become NULL during ipfw_bpf_*tap*() call.
Sponsored by: Yandex LLC
It is possible, that opcode at the ACTION_PTR() location is not real
action, but action modificator like "log", "tag" etc. In this case we
need to check for each opcode in the loop to find O_EXTERNAL_ACTION.
Obtained from: Yandex LLC
MFC after: 1 week
Sponsored by: Yandex LLC
With this opcode it is possible to match TCP packets with specified
MSS option, whose value corresponds to configured in opcode value.
It is allowed to specify single value, range of values, or array of
specific values or ranges. E.g.
# ipfw add deny log tcp from any to any tcpmss 0-500
Reviewed by: melifaro,bcr
Obtained from: Yandex LLC
MFC after: 1 week
Sponsored by: Yandex LLC
- Perform ifp mismatch checks (to determine if a send tag is allocated
for a different ifp than the one the packet is being output on), in
ip_output() and ip6_output(). This avoids sending packets with send
tags to ifnet drivers that don't support send tags.
Since we are now checking for ifp mismatches before invoking
if_output, we can now try to allocate a new tag before invoking
if_output sending the original packet on the new tag if allocation
succeeds.
To avoid code duplication for the fragment and unfragmented cases,
add ip_output_send() and ip6_output_send() as wrappers around
if_output and nd6_output_ifp, respectively. All of the logic for
setting send tags and dealing with send tag-related errors is done
in these wrapper functions.
For pseudo interfaces that wrap other network interfaces (vlan and
lagg), wrapper send tags are now allocated so that ip*_output see
the wrapper ifp as the ifp in the send tag. The if_transmit
routines rewrite the send tags after performing an ifp mismatch
check. If an ifp mismatch is detected, the transmit routines fail
with EAGAIN.
- To provide clearer life cycle management of send tags, especially
in the presence of vlan and lagg wrapper tags, add a reference count
to send tags managed via m_snd_tag_ref() and m_snd_tag_rele().
Provide a helper function (m_snd_tag_init()) for use by drivers
supporting send tags. m_snd_tag_init() takes care of the if_ref
on the ifp meaning that code alloating send tags via if_snd_tag_alloc
no longer has to manage that manually. Similarly, m_snd_tag_rele
drops the refcount on the ifp after invoking if_snd_tag_free when
the last reference to a send tag is dropped.
This also closes use after free races if there are pending packets in
driver tx rings after the socket is closed (e.g. from tcpdrop).
In order for m_free to work reliably, add a new CSUM_SND_TAG flag in
csum_flags to indicate 'snd_tag' is set (rather than 'rcvif').
Drivers now also check this flag instead of checking snd_tag against
NULL. This avoids false positive matches when a forwarded packet
has a non-NULL rcvif that was treated as a send tag.
- cxgbe was relying on snd_tag_free being called when the inp was
detached so that it could kick the firmware to flush any pending
work on the flow. This is because the driver doesn't require ACK
messages from the firmware for every request, but instead does a
kind of manual interrupt coalescing by only setting a flag to
request a completion on a subset of requests. If all of the
in-flight requests don't have the flag when the tag is detached from
the inp, the flow might never return the credits. The current
snd_tag_free command issues a flush command to force the credits to
return. However, the credit return is what also frees the mbufs,
and since those mbufs now hold references on the tag, this meant
that snd_tag_free would never be called.
To fix, explicitly drop the mbuf's reference on the snd tag when the
mbuf is queued in the firmware work queue. This means that once the
inp's reference on the tag goes away and all in-flight mbufs have
been queued to the firmware, tag's refcount will drop to zero and
snd_tag_free will kick in and send the flush request. Note that we
need to avoid doing this in the middle of ethofld_tx(), so the
driver grabs a temporary reference on the tag around that loop to
defer the free to the end of the function in case it sends the last
mbuf to the queue after the inp has dropped its reference on the
tag.
- mlx5 preallocates send tags and was using the ifp pointer even when
the send tag wasn't in use. Explicitly use the ifp from other data
structures instead.
- Sprinkle some assertions in various places to assert that received
packets don't have a send tag, and that other places that overwrite
rcvif (e.g. 802.11 transmit) don't clobber a send tag pointer.
Reviewed by: gallatin, hselasky, rgrimes, ae
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20117
There are a few places that use hand crafted versions of the macros
from sys/netinet/in.h making it difficult to actually alter the
values in use by these macros. Correct that by replacing handcrafted
code with proper macro usage.
Reviewed by: karels, kristof
Approved by: bde (mentor)
MFC after: 3 weeks
Sponsored by: John Gilmore
Differential Revision: https://reviews.freebsd.org/D19317