CLAT is customer-side translator that algorithmically translates 1:1
private IPv4 addresses to global IPv6 addresses, and vice versa.
It is implemented as part of ipfw_nat64 kernel module. When module
is loaded or compiled into the kernel, it registers "nat64clat" external
action. External action named instance can be created using `create`
command and then used in ipfw rules. The create command accepts two
IPv6 prefixes `plat_prefix` and `clat_prefix`. If plat_prefix is ommitted,
IPv6 NAT64 Well-Known prefix 64:ff9b::/96 will be used.
# ipfw nat64clat CLAT create clat_prefix SRC_PFX plat_prefix DST_PFX
# ipfw add nat64clat CLAT ip4 from IPv4_PFX to any out
# ipfw add nat64clat CLAT ip6 from DST_PFX to SRC_PFX in
Obtained from: Yandex LLC
Submitted by: Boris N. Lytochkin
MFC after: 1 month
Relnotes: yes
Sponsored by: Yandex LLC
r344504 added an extra ARP_LOG() call in case of an if_output() failure.
It turns out IPv4 can be noisy. In order to not spam the console by default:
(a) add a counter for these events so people can keep better track of how
often it happens, and
(b) add a sysctl to select the default ARP_LOG log level and set it to
INFO avoiding the one (the new) DEBUG level by default.
Claim a spare (1st one after 10 years since the stats were added) in order
to not break netstat from FreeBSD 12->13 updates in the future.
Reviewed by: karels
Differential Revision: https://reviews.freebsd.org/D19490
level socket option SCTP_GET_LOCAL_ADDRESSES in a getsockopt() call.
Thanks to Thomas Barabosch for reporting the issue which was found by
running syzkaller.
MFC after: 3 days
- Use strlcpy() with sizeof() instead of strncpy().
- Simplify initialization of TCP functions structures.
init_tcp_functions() was already called before the first call to
register a stack. Just inline the work in the SYSINIT and remove
the racy helper variable. Instead, KASSERT that the rw lock is
initialized when registering a stack.
- Protect the default stack via a direct pointer comparison.
The default stack uses the name "freebsd" instead of "default" so
this protection wasn't working for the default stack anyway.
Reviewed by: rrs
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19152
arprequest() is a void function and in case of error we simply
return without any feedback. In case of any local operation
or *if_output() failing no feedback is send up the stack for the
packet which triggered the arp request to be sent.
arpresolve_full() has three pre-canned possible errors returned
(if we have not yet sent enough arp requests or if we tried
often enough without success) otherwise "no error" is returned.
Make arprequest() an "internal" function arprequest_internal() which
does return a possible error to the caller. Preserve arprequest()
as a void wrapper function for external consumers.
In arpresolve_full() add an extra error checking. Use the
arprequest_internal() function and only return an error if non
of the three ones (mentioend above) are already set.
This will return possible errors all the way up the stack and
allows functions and programs to react on the send errors rather
than leaving them in the dark. Also they might get more detailed
feedback of why packets cannot be sent and they will receive it
quicker.
Reviewed by: karels, hselasky
Differential Revision: https://reviews.freebsd.org/D18904
for struct ip_mreq remains in place.
The struct ip_mreqn is Linux extension to classic BSD multicast API. It
has extra field allowing to specify the interface index explicitly. In
Linux it used as argument for IP_MULTICAST_IF and IP_ADD_MEMBERSHIP.
FreeBSD kernel also declares this structure and supports it as argument
to IP_MULTICAST_IF since r170613. So, we have structure declared but
not fully supported, this confused third party application configure
scripts.
Code handling IP_ADD_MEMBERSHIP was mixed together with code for
IP_ADD_SOURCE_MEMBERSHIP. Bringing legacy and new structure support
into the mess would made the "argument switcharoo" intolerable, so
code was separated into its own switch case clause.
MFC after: 3 months
Differential Revision: https://reviews.freebsd.org/D19276
is acceptable in the congestion avoidance phase, but not during slow start.
The MTU is is also not taken into account.
Use a method instead, which is based on exponential growth working also in
slow start and being independent from the MTU.
This is joint work with rrs@.
Reviewed by: rrs@, Richard Scheffenegger
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D18375
When TCP_REASS_LOGGING is defined, a NULL pointer dereference would happen,
if user data was received during the TCP handshake and BB logging is used.
A KASSERT is also added to detect tcp_reass() calls with illegal parameter
combinations.
Reported by: bz@
Reviewed by: rrs@
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D19254
1 second as allowed by RFC 6298.
Reviewed by: kbowling@, Richard Scheffenegger
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D18941
But ipsec_delete_pcbpolicy() uses some VNET-virtualized variables,
and thus it needs VNET context, that is missing during gtaskqueue
executing. Use inp_vnet context to set curvnet in in_pcbfree_deferred().
PR: 235684
MFC after: 1 week
Gratuitous ARP packets are sent from a timer, which means we don't have a vnet
context set. As a result we panic trying to send the packet.
Set the vnet context based on the interface associated with the interface
address.
To reproduce:
sysctl net.link.ether.inet.garp_rexmit_count=2
ifconfig vtnet1 10.0.0.1/24 up
PR: 235699
Reviewed by: vangyzen@
MFC after: 1 week
option.
This issue was found by running syzkaller on OpenBSD.
Greg Steuck made me aware that the problem might also exist on FreeBSD.
Reported by: Greg Steuck
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D18834
sysctl variable net.inet.tcp.cc.cdg.smoothing_factor to 0, the smoothing
is disabled. Without this patch, a division by zero orrurs.
PR: 193762
Reviewed by: lstewart@, rrs@
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D19071
Use recent best practices for Copyright form at the top of
the license:
1. Remove all the All Rights Reserved clauses on our stuff. Where we
piggybacked others, use a separate line to make things clear.
2. Use "Netflix, Inc." everywhere.
3. Use a single line for the copyright for grep friendliness.
4. Use date ranges in all places for our stuff.
Approved by: Netflix Legal (who gave me the form), adrian@ (pmc files)
consistently.
This inconsistency was observed when working on the bug reported in
PR 235256, although it does not fix the reported issue. The fix for
the PR will be a separate commit.
PR: 235256
Reviewed by: rrs@, Richard Scheffenegger
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D19033
The KPI have been reviewed and cleansed of features that were planned
back 20 years ago and never implemented. The pfil(9) internals have
been made opaque to protocols with only returned types and function
declarations exposed. The KPI is made more strict, but at the same time
more extensible, as kernel uses same command structures that userland
ioctl uses.
In nutshell [KA]PI is about declaring filtering points, declaring
filters and linking and unlinking them together.
New [KA]PI makes it possible to reconfigure pfil(9) configuration:
change order of hooks, rehook filter from one filtering point to a
different one, disconnect a hook on output leaving it on input only,
prepend/append a filter to existing list of filters.
Now it possible for a single packet filter to provide multiple rulesets
that may be linked to different points. Think of per-interface ACLs in
Cisco or Juniper. None of existing packet filters yet support that,
however limited usage is already possible, e.g. default ruleset can
be moved to single interface, as soon as interface would pride their
filtering points.
Another future feature is possiblity to create pfil heads, that provide
not an mbuf pointer but just a memory pointer with length. That would
allow filtering at very early stages of a packet lifecycle, e.g. when
packet has just been received by a NIC and no mbuf was yet allocated.
Differential Revision: https://reviews.freebsd.org/D18951
SIFTR does not allow any kind of filtering, but captures every packet
processed by the TCP stack.
Often, only a specific session or service is of interest, and doing the
filtering in post-processing of the log adds to the overhead of SIFTR.
This adds a new sysctl net.inet.siftr.port_filter. When set to zero, all
packets get captured as previously. If set to any other value, only
packets where either the source or the destination ports match, are
captured in the log file.
Submitted by: Richard Scheffenegger
Reviewed by: Cheng Cui
Differential Revision: https://reviews.freebsd.org/D18897
RFC 3168 defines an ECN-setup SYN-ACK packet as on with the ECE flags
set and the CWR flags not set. The code was only checking if ECE flag
is set. This patch adds the check to verify that the CWR flags is not
set.
Submitted by: Richard Scheffenegger
Reviewed by: tuexen@
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D18996
This allows the part of the rewrite of TCP reassembly in this
files to be MFCed to stable/11 with manual change.
MFC after: 3 days
Sponsored by: Netflix, Inc.
When implementing support for IW10, an update in the computation
of the restart window used after an idle phase was missed. To
minimize code duplication, implement the logic in tcp_compute_initwnd()
and call it. This fixes a bug in NewReno, which was not aware of
IW10.
Submitted by: Richard Scheffenegger
Reviewed by: tuexen@
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D18940
After the afdata read lock was converted to epoch(9), readers could
observe a linked LLE and block on the LLE while a thread was
unlinking the LLE. The writer would then release the lock and schedule
the LLE for deferred free, allowing readers to continue and potentially
schedule the LLE timer. By the point the timer fires, the structure is
freed, typically resulting in a crash in the callout subsystem.
Fix the problem by modifying the lookup path to check for the LLE_LINKED
flag upon acquiring the LLE lock. If it's not set, the lookup fails.
PR: 234296
Reviewed by: bz
Tested by: sbruno, Victor <chernov_victor@list.ru>,
Mike Andrews <mandrews@bit0.com>
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18906
Correct a logic error.
Only disable when already enabled or enable when disabled.
Submitted by: Richard Scheffenegger
Reviewed by: Cheng Cui
Obtained from: Cheng Cui
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D18885
When the TCP window scale option is not used, and the window
opens up enough in one soreceive, a window update will not be sent.
For example, if recwin == 65535, so->so_rcv.sb_hiwat >= 262144, and
so->so_rcv.sb_hiwat <= 524272, the window update will never be sent.
This is because recwin and adv are clamped to TCP_MAXWIN << tp->rcv_scale,
and so will never be >= so->so_rcv.sb_hiwat / 4
or <= so->so_rcv.sb_hiwat / 8.
This patch ensures a window update is sent if the window opens by
TCP_MAXWIN << tp->rcv_scale, which should only happen when the window
size goes from zero to the max expressible.
This issue looks like it was introduced in r306769 when recwin was clamped
to TCP_MAXWIN << tp->rcv_scale.
MFC after: 1 week
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D18821
r336616 copies inp->inp_options using the m_dup() function.
However, this function expects an mbuf packet header at the beginning,
which is not true in this case.
Therefore, use m_copym() instead of m_dup().
This issue was found by syzkaller.
Reviewed by: mmacy@
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D18753