free a fragment, provide two inline functions that do that for us:
ipq_drop() and ipq_timeout().
o Rename ip_free_f() to ipq_free() to match the name scheme of IP reassembly.
o Remove assertion from ipq_free(), since it requires extra argument to be
passed, but locking scheme is simple enough and function is static.
Sponsored by: Nginx, Inc.
This significantly improves performance on multi-core servers where there
is any kind of IPv4 reassembly going on.
glebius@ would like to see the locking moved to be attached to the reassembly
bucket, which would make it per-bucket + per-VNET, instead of being global.
I decided to keep it global for now as it's the minimal useful change;
if people agree / wish to migrate it to be per-bucket / per-VNET then please
do feel free to do so. I won't complain.
Thanks to Norse Corp for giving me access to much larger servers
to test this at across the 4 core boxes I have at home.
Differential Revision: https://reviews.freebsd.org/D2095
Reviewed by: glebius (initial comments incorporated into this patch)
MFC after: 2 weeks
Sponsored by: Norse Corp, Inc (hardware)
header and not only partial flags and fields. Firewalls can attach
classification tags to the outgoing mbufs which should be copied to
all the new fragments. Else only the first fragment will be let
through by the firewall. This can easily be tested by sending a large
ping packet through a firewall. It was also discovered that VLAN
related flags and fields should be copied for packets traversing
through VLANs. This is all handled by "m_dup_pkthdr()".
Regarding the MAC policy check in ip_fragment(), the tag provided by
the originating mbuf is copied instead of using the default one
provided by m_gethdr().
Tested by: Karim Fodil-Lemelin <fodillemlinkarim at gmail.com>
MFC after: 2 weeks
Sponsored by: Mellanox Technologies
PR: 7802
where we want to create a new IP datagram.
o Add support for RFC6864, which allows to set IP ID for atomic IP
datagrams to any value, to improve performance. The behaviour is
controlled by net.inet.ip.rfc6864 sysctl knob, which is enabled by
default.
o In case if we generate IP ID, use counter(9) to improve performance.
o Gather all code related to IP ID into ip_id.c.
Differential Revision: https://reviews.freebsd.org/D2177
Reviewed by: adrian, cy, rpaulo
Tested by: Emeric POUPON <emeric.poupon stormshield.eu>
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
Relnotes: yes
to obtain IPv4 next hop address in tablearg case.
Add `fwd tablearg' support for IPv6. ipfw(8) uses INADDR_ANY as next hop
address in O_FORWARD_IP opcode for specifying tablearg case. For IPv6 we
still use this opcode, but when packet identified as IPv6 packet, we
obtain next hop address from dedicated field nh6 in struct table_value.
Replace hopstore field in struct ip_fw_args with anonymous union and add
hopstore6 field. Use this field to copy tablearg value for IPv6.
Replace spare1 field in struct table_value with zoneid. Use it to keep
scope zone id for link-local IPv6 addresses. Since spare1 was used
internally, replace spare0 array with two variables spare0 and spare1.
Use getaddrinfo(3)/getnameinfo(3) functions for parsing and formatting
IPv6 addresses in table_value. Use zoneid field in struct table_value
to store sin6_scope_id value.
Since the kernel still uses embedded scope zone id to represent
link-local addresses, convert next_hop6 address into this form before
return from pfil processing. This also fixes in6_localip() check
for link-local addresses.
Differential Revision: https://reviews.freebsd.org/D2015
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
draft-ietf-6man-enhanced-dad-13.
This basically adds a random nonce option (RFC 3971) to NS messages
for DAD probe to detect a looped back packet. This looped back packet
prevented DAD on some pseudo-interfaces which aggregates multiple L2 links
such as lagg(4).
The length of the nonce is set to 6 bytes. This algorithm can be disabled by
setting net.inet6.ip6.dad_enhanced sysctl to 0 in a per-vnet basis.
Reported by: hiren
Reviewed by: ae
Differential Revision: https://reviews.freebsd.org/D1835
of packets. When the data payload length excluding any headers, of an
outgoing IPv4 packet exceeds PAGE_SIZE bytes, a special case in
ip_fragment() can kick in to optimise the outgoing payload(s). The
code which was added in r98849 as part of zero copy socket support
assumes that the beginning of any MTU sized payload is aligned to
where a MBUF's "m_data" pointer points. This is not always the case
and can sometimes cause large IPv4 packets, as part of ping replies,
to be split more than needed.
Instead of iterating the MBUFs to figure out how much data is in the
current chain, use the value already in the "m_pkthdr.len" field of
the first MBUF in the chain.
Reviewed by: ken @
Differential Revision: https://reviews.freebsd.org/D1893
MFC after: 2 weeks
Sponsored by: Mellanox Technologies
Previous __alignment(4) allowed compiler to assume that operations are
performed on aligned region. On ARM processor, this led to alignment fault
as shown below:
trapframe: 0xda9e5b10
FSR=00000001, FAR=a67b680e, spsr=60000113
r0 =00000000, r1 =00000068, r2 =0000007c, r3 =00000000
r4 =a67b6826, r5 =a67b680e, r6 =00000014, r7 =00000068
r8 =00000068, r9 =da9e5bd0, r10=00000011, r11=da9e5c10
r12=da9e5be0, ssp=da9e5b60, slr=a054f164, pc =a054f2cc
<...>
udp_input+0x264: ldmia r5, {r0-r3, r6}
udp_input+0x268: stmia r12, {r0-r3, r6}
This was due to instructions which do not support unaligned access,
whereas for __alignment(2) compiler replaced ldmia/stmia with some
logically equivalent memcpy operations.
In fact, the assumption that 'struct ip' is always 4-byte aligned
is definitely false, as we have no impact on data alignment of packet
stream received.
Another possible solution would be to explicitely perform memcpy()
on objects of 'struct ip' type, which, however, would suffer from
performance drop, and be merely a problem hiding.
Please, note that this has nothing to do with
ARM32_DISABLE_ALIGNMENT_FAULTS option, but is related strictly to
compiler behaviour.
Submitted by: Wojciech Macek <wma@semihalf.com>
Reviewed by: glebius, ian
Obtained from: Semihalf
represents a context.
- Preserve name 'struct igmp_ifinfo' for a new structure, that will be stable
API between userland and kernel.
- Make sysctl_igmp_ifinfo() return the new 'struct igmp_ifinfo', instead of
old one, which had a bunch of internal kernel structures in it.
- Move all above declarations from in_var.h to igmp_var.h, since they are
private to IGMP code.
Sponsored by: Netflix
Sponsored by: Nginx, Inc.