at which the lle_tbl pointer points to freed memory and the llt_free pointer is no longer
valid.
Move the free pointer in to the llentry itself and update the initalization sites.
MFC after: 2 weeks
hz >> 1000 and thus getting outside the timestamp clock frequenceny of
1ms < x < 1s per tick as mandated by RFC1323, leading to connection
resets on idle connections.
Always use a granularity of 1ms using getmicrouptime() making all but
relevant callouts independent of hz.
Use getmicrouptime(), not getmicrotime() as the latter may make a jump
possibly breaking TCP nfsroot mounts having our timestamps move forward
for more than 24.8 days in a second without having been idle for that
long.
PR: kern/61404
Reviewed by: jhb, mav, rrs
Discussed with: silby, lstewart
Sponsored by: Sandvine Incorporated (originally in 2011)
MFC after: 6 weeks
to a deadlock of an association when an IPv6 socket was used to
communcate with IPv4 and an ICMPv4 fragmentation needed message
was received.
While there, simplify the code a bit.
MFC after: 3 days.
TCP_KEEPCNT, that allow to control initial timeout, idle time, idle
re-send interval and idle send count on a per-socket basis.
Reviewed by: andre, bz, lstewart
in r179783 as (ab)using the concept of VRFs for this had not worked.
At this point SCTP in FreeBSD does not support multi-FIB, neither for
IPv4 nor for IPv6.
Discussed with: rrs
Sponsored by: Cisco Systems, Inc.
the original IPv4 implementation from r178888:
- Use RT_DEFAULT_FIB in the IPv4 implementation where noticed.
- Use rt*fib() KPI with explicit RT_DEFAULT_FIB where applicable in
the NFS code.
- Use the new in6_rt* KPI in TCP, gif(4), and the IPv6 network stack
where applicable.
- Split in6_rtqtimo() and in6_mtutimo() as done in IPv4 and equally
prevent multiple initializations of callouts in in6_inithead().
- Use wrapper functions where needed to preserve the current KPI to
ease MFCs. Use BURN_BRIDGES to indicate expected future cleanup.
- Fix (related) comments (both technical or style).
- Convert to rtinit() where applicable and only use custom loops where
currently not possible otherwise.
- Multicast group, most neighbor discovery address actions and faith(4)
are locked to the default FIB. Individual IPv6 addresses will only
appear in the default FIB, however redirect information and prefixes
of connected subnets are automatically propagated to all FIBs by
default (mimicking IPv4 behavior as closely as possible).
Sponsored by: Cisco Systems, Inc.
to cleanup routes from a single ifa.
o Implement carp_addroute()/carp_delroute() via above functions.
o Call carp_ifa_delroute() in the carp_detach() to avoid
junk routes left in routing table, in case if user
removes an address in a MASTER state. [1]
Reported by: az [1]
comments to longer, also refining strange ones.
Properly use #ifdef rather than #if defined() where possible. Four
#if defined(PCBGROUP) occurances (netinet and netinet6) were ignored to
avoid conflicts with eventually upcoming changes for RSS.
Reported by: bde (most)
Reviewed by: bde
MFC after: 3 days
o Make the pfsync.ko actually usable. Before this change loading it
didn't register protosw, so was a nop. However, a module /boot/kernel
did confused users.
o Rewrite the way we are joining multicast group:
- Move multicast initialization/destruction to separate functions.
- Don't allocate memory if we aren't going to join a multicast group.
- Use modern API for joining/leaving multicast group.
- Now the utterly wrong pfsync_ifdetach() isn't needed.
o Move module initialization from SYSINIT(9) to moduledata_t method.
o Refuse to unload module, unless asked forcibly.
o Improve a bit some FreeBSD porting code:
- Use separate malloc type.
- Simplify swi sheduling.
This change is probably wrong from VIMAGE viewpoint, however pfsync
wasn't VIMAGE-correct before this change, too.
Glanced at by: bz
in the ARP datagram generated by arprequest(). If caller doesn't
supply the address, then it is either picked from CARP or hardware
address of the interface is taken.
While here, make several minor fixes:
- Hold IF_ADDR_RLOCK(ifp) while traversing address list.
- Remove not true comment.
- Access internet address and mask via in_ifaddr fields,
rather than ifaddr.
If set to 1, no ABORT is sent back in response to an incoming
INIT. If set to 2, no ABORT is sent back in response to
an out of the blue packet. If set to 0 (the default), ABORTs
are sent.
Discussed with rrs@.
MFC after: 1 month.
than or equal to rcv_adv and fix tcp_twstart() to handle this case by
assuming the last window was zero rather than a negative value.
The code in tcp_input() already safely handled this case. It can happen
due to delayed ACKs along with a remote sender that sends data beyond
the window we previously advertised. If we have room in our socket buffer
for the extra data beyond the advertised window, we will accept it.
However, if the ACK for that segment is delayed, then we will not
effectively fixup rcv_adv to account for that extra data until the
next segment arrives and forces out an ACK. When that next segment
arrives, rcv_nxt will be beyond rcv_adv.
Tested by: pjd
MFC after: 1 week
missing interface address list locking and grab a reference on the
matching interface address after dropping the lock while it is used to
avoid a potential use after free.
Reviewed by: bz
MFC after: 1 week
reference on a group in the leaving state while iterating over the loop.
Instead, use the same approach used in igmp_ifdetach() and mld_ifdetach()
of placing the groups to free on pending release list and then releasing
the references after dropping the IF_ADDR_LOCK. This closes an ugly race
where the code was dropping the lock in the middle of iterating over the
list. It also fixes some additional potential use-after-free bugs since
the cancellation routine also applied other changes to the group after
dropping the reference. Now those changes are performed before the
reference is dropped and the group is potentially freed.
Prodded to fix by: glebius
Reviewed by: bz
MFC after: 1 week
asychronous task. This avoids tearing down multicast state including
sending IGMP leave messages and reprogramming MAC filters while holding
the per-protocol global pcbinfo lock that is used in the receive path of
packet processing.
Reviewed by: rwatson
MFC after: 1 month
7.x, 8.x and 9.x with pf(4) imports: pfsync(4) should suppress CARP
preemption, while it is running its bulk update.
However, reimplement the feature in more elegant manner, that is
partially inspired by newer OpenBSD:
- Rename term "suppression" to "demotion", to match with OpenBSD.
- Keep a global demotion factor, that can be raised by several
conditions, for now these are:
- interface goes down
- carp(4) has problems with ip_output() or ip6_output()
- pfsync performs bulk update
- Unlike in OpenBSD the demotion factor isn't a counter, but
is actual value added to advskew. The adjustment values for
particular error conditions are also configurable, and their
defaults are maximum advskew value, so a single failure bumps
demotion to maximum. This is for POLA compatibility, and should
satisfy most users.
- Demotion factor is a writable sysctl, so user can do
foot shooting, if he desires to.
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.
The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.
ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.
To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]
The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.
Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!
PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by: bz
Submitted by: bz [1]
Fix a bug where the parameter length of a supported address types
parameter is set to a wrong value if the kernel is built with
with either INET or INET6, but not both.
MFC after: 3 days.
backup stack queue entry when the zone is exhausted, otherwise we leak a zone
allocation each time we plug a hole in the reassembly queue.
Reported by: many on freebsd-stable@ (thread: "TCP Reassembly Issues")
Tested by: many on freebsd-stable@ (thread: "TCP Reassembly Issues")
Reviewed by: bz (very brief sanity check)
MFC after: 3 days
structs ifreq/in_aliasreq and there've been several panics due
to that problem. All these panics were fixed just a couple of
lines above the panicing code.
Take a more general approach: sanity check sockaddrs supplied
with SIOCAIFADDR and SIOCSIF*ADDR at the beggining of the
function and drop all checks below.
One check is now disabled due to strange code in ifconfig(8)
that I've removed recently. I'm going to enable it with next
__FreeBSD_version bump.
Historically in_ifinit() was able to recover from an error
and restore old address. Nowadays this feature isn't working
for all error cases, but for some of them. I suppose no software
relies on this behavior, so I'd like to remove it, since this
simplifies code a lot.
Also, move if_scrub() earlier in the in_ifinit(). It is more
correct to wipe routes before removing address from local
address list, and interface address list.
Silence from: bz, brooks, andre, rwatson, 3 weeks
machine to LOG_NOTICE. Exception left to "using my IP address".
- Fix multicast ARP warning: add newline and also log the bad MAC address.
Tested by: Alexander Wittig <wittigal msu.edu>
if_alloctype was used to store the origional interface type. Take
advantage of this change by removing all existing uses of if_free_type()
in favor of if_free().
MFC after: 1 Month
- fix other errors introduced when committing r226436
- add 'function' to a sentence where it makes sense
Submitted by: delphij
Submitted by: dougb
Submitted by: jhb
Approved by: dougb
Approved by: jhb
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
inp_socket->so_options dereference when we may not acquire the lock on
the inpcb.
This fixes the crash due to NULL pointer dereference in
in_pcbbind_setup() when inp_socket->so_options in a pcb returned by
in_pcblookup_local() was checked.
Reported by: dave jones <s.dave.jones@gmail.com>, Arnaud Lacombe <lacombar@gmail.com>
Suggested by: rwatson
Glanced by: rwatson
Tested by: dave jones <s.dave.jones@gmail.com>
compiled into the kernel.
Do not try to build the module in case of no INET support but
keep #error calls for now in case we would compile it into the
kernel.
This should fix an issue where the module would fail to enable
IPv6 support from the rc framework, but also other INET and INET6
parts being silently compiled out without giving a warning in the
module case.
While here garbage collect unneeded opt_*.h includes.
opt_ipdn.h is not used anywhere but we need to leave the DUMMYNET
entry in options for conditional inclusion in kernel so keep the
file with the same name.
Reported by: pluknet
Reviewed by: plunket, jhb
MFC After: 3 days
They seem to be changed unintentionally in r226437, and there were no
any mentions of renaming in commit log message.
Reported by: Anton Yuzhaninov <citrin citrin ru>
interfaces. A host route has a NULL mask so check for that condition.
I have also been told by developers who customize the packet output
path with direct manipulation of the route entry (or the outgoing
interface to be specific). This patch checks for the route mask
explicitly to make sure custom code will not panic.
PR: kern/161805
MFC after: 3 days
According to POSIX, these two header files should be able to be included
by themselves, not depending on other headers. The <net/if.h> header
uses struct sockaddr when __BSD_VISIBLE=1, while <netinet/tcp.h> uses
integer datatypes (u_int32_t, u_short, etc).
MFC after: 2 months
long been superseded by the RFC3390 initial CWND sizing.
Also remove the remnants of TCP_METRICS_CWND which used the
TCP hostcache to set the initial CWND in a non-RFC compliant
way.
MFC after: 1 week
This fixes a compiler warning at WARNS=6 when including the header files
as follows:
#include <sys/types.h>
#include <netinet/in.h>
#include <netinet/ip_var.h>
#include <netinet/udp.h>
#include <netinet/udp_var.h>
To run a /31 network, participating hosts MUST drop support
for directed broadcasts, and treat the first and last addresses
on subnet as unicast. The broadcast address for the prefix
should be the link local broadcast address, INADDR_BROADCAST.
- Remove ia_net, ia_netmask, ia_netbroadcast from struct in_ifaddr.
- Remove net.inet.ip.subnetsarelocal, I bet no one need it in 2011.
- fix bug when we were not forwarding to a host which matches classful
net address. For example router having 192.168.x.y/16 network attached,
would not forward traffic to 192.168.*.0, which are legal IPs in
CIDR world.
- For compatibility, leave autoguessing of mask based on class.
Reviewed by: andre, bz, rwatson
significant ones.
This has changed in the latest version of the socket API ID and
provides backwards compatibility and gets it in syn with the
usage of the IP_TOS socket option.
MFC after: 3 days.
route where the destination IP and the gateway IP is the same. This
special case handling is only meant for backward compatibility reason.
The last commit introduced a bug in the route check logic, where a
valid special case is treated as an error. This patch fixes that bug
along with some code cleanup.
Suggested by: gleb
Reviewed by: kmacy, discussed with gleb
MFC after: 1 day
struct route was changed in
http://svn.freebsd.org/changeset/base/225698
and since then SCTP support was broken.
This needs to be MFCed to stable/9 to unbreak SCTP support in 9.0
MFC after: 3 days.
address if that interface does not support ARP. Otherwise the
system will generate error messages unnecessarily due to the missing
entry.
PR: kern/159602
Submitted by: pluknet
MFC after: 3 days
address is being deleted. Only the last reference holder deletes the
loopback route. All other delete operations just clear the IFA_RTSELF
flag.
PR: kern/159601
Submitted by: pluknet
Reviewed by: discussed on net@
MFC after: 3 days
when reaching the zone limit of reassembly queue entries.
When the zone limit was reached not even the missing segment
that would complete the sequence space could be processed
preventing the TCP session forever from making any further
progress.
Solve this deadlock by using a temporary on-stack queue entry
for the missing segment followed by an immediate dequeue again
by delivering the contiguous sequence space to the socket.
Add logging under net.inet.tcp.log_debug for reassembly queue
issues.
Reviewed by: lsteward (previous version)
Tested by: Steven Hartland <killing-at-multiplay.co.uk>
MFC after: 3 days
raw IP sockets. It was deducted in ip_input() in preparation for
protocols interested only in the payload.
On raw sockets the IP header should be delivered as it at came in
from the network except for the byte order swaps in some fields.
This brings us in line with all other OS'es that provide raw
IP sockets.
Reported by: Matthew Cini Sarreo <mcins1-at-gmail.com>
MFC after: 3 days
inpcb object.
Skip the TCP_SIGNATURE check in that case as it is consistent with the
output path (no TCP_SIGNATURE for outcoming packets in TIMEWAIT state)
and also because for TIMEWAIT state the verify may be less effective.
Sponsored by: Sandvine Incorporated
Reported by: rwatson
No objections by: rwatson
MFC after: 3 days
same prefix. Since a single route entry is installed for the prefix
(without RADIX_MPATH), incoming packets on the interfaces that are not
associated with the prefix route may trigger an error message about
unable to allocation LLE entry, and fails L2. This patch makes sure a
valid route is present in the system, and allow the aforementioned
condition to exist and treats as valid.
Reviewed by: bz
MFC after: 5 days
build the ip_fw_pfil.c hooks and ipfw even in case of no-ip under the
assumption that the private L2 hook (which hopefully eventually will be a
pfil hook as well) can still be useful.
Allow building the module without inet as well.
Glanced at by: jhb
MFC after: 3 days
options defined in the kernel config. This more closely matches the
behavior of other modules which inherit configuration settings from the
kernel configuration during a kernel + modules build.
Reviewed by: luigi
Approved by: re (kib)
MFC after: 1 week
route with the same prefix is searched for as a replacement. The
current code did not bypass routes that have non-operational
interfaces. This patch fixes that bug and will find a replacement
route with an active interface.
PR: kern/159603
Submitted by: pluknet, ambrisko at ambrisko dot com
Reviewed by: discussed on net@
Approved by: re (bz)
MFC after: 3 days
and the maximum TCP send and receive buffer limits from 256kB
to 2MB.
For sb_max_adj we need to add the cast as already used in the sysctl
handler to not overflow the type doing the maths.
Note that this is just the defaults. They will allow more memory
to be consumed per socket/connection if needed but not change the
default "idle" memory consumption. All values are still tunable
by sysctls.
Suggested by: gnn
Discussed on: arch (Mar and Aug 2011)
MFC after: 3 weeks
Approved by: re (kib)
Distinguish IPv4 and IPv6 addresses and optional port numbers in
user space to set the option for the correct protocol family.
Add support in the kernel for carrying the new IPv6 destination
address and port.
Add support to TCP and UDP for IPv6 and fix UDP IPv4 to not change
the address in the IP header.
Add support for IPv6 forwarding to a non-local destination.
Add a regession test uitilizing VIMAGE to check all 20 possible
combinations I could think of.
Obtained from: David Dolson at Sandvine Incorporated
(original version for ipfw fwd IPv6 support)
Sponsored by: Sandvine Incorporated
PR: bin/117214
MFC after: 4 weeks
Approved by: re (kib)
more fragments flag off so that offset == 0 checks work properly.
PR: kern/145733
Submitted by: Matthew Luckie (mjl luckie.org.nz)
MFC after: 2 weeks
X-MFC with: r225032
Approved by: re (kib)
then terminate the loop as we will not find any further headers and
for short fragments this could otherwise lead to a pullup error
discarding the fragment.
PR: kern/145733
Submitted by: Matthew Luckie (mjl luckie.org.nz)
MFC after: 2 weeks
Approved by: re (kib)
packet is a/the first fragment or not. For IPv6 we have added the
"more fragments" flag as well to be able to determine on whether
there will be more as we do not have the fragment header avaialble
for logging, while for IPv4 this information can be derived directly
from the IPv4 header. This allowed fragmented packets to bypass
normal rules as proper masking was not done when checking offset.
Split variables to not need masking for IPv6 to avoid further errors.
PR: kern/145733
Submitted by: Matthew Luckie (mjl luckie.org.nz)
MFC after: 2 weeks
Approved by: re (kib)
translation technology involved (and that section is suggested to
be removed by Errata 2843), single packet fragments do not harm.
There is another errata under discussion to clarify and allow this.
Meanwhile add a sysctl to allow disabling this behaviour again.
We will treat single packet fragment (a fragment header added
when not needed) as if there was no fragment header.
PR: kern/145733
Submitted by: Matthew Luckie (mjl luckie.org.nz) (original version)
Tested by: Matthew Luckie (mjl luckie.org.nz)
MFC after: 2 weeks
Approved by: re (kib)
While there:
* Fix a locking issue in setsockopt() of SCTP_CMT_ON_OFF.
* Fix a bug in setsockopt() of SCTP_DEFAULT_PRINFO, where the pr_value
was ignored.
Approved by: re@
MFC after: 2 months.
* Decouple the path supervision using a separate HB timer per path.
* Add support for potentially failed state.
* Bring back RTO.min to 1 second.
* Accept packets on IP-addresses already announced via an ASCONF
* While there: do some cleanups.
Approved by: re@
MFC after: 2 months.
- TCP keep* timers
- TCP UTO (adjust from what was there already)
- netmap
- route caching
- user cookie (temporary to allow for the real fix)
Slightly re-shuffle struct ifnet moving fields out of the middle
of spares and to better align.
Discussed with: rwatson (slightly earlier version)
same as the host address. This already works fine for INET6 and ND6.
While here, remove two function pointers from struct lltable which are
only initialized but never used.
MFC after: 3 days
I think the benefit of making the code cleaner and easier to understand
outweighs the humour of leaving this intact (or possibly changing it to
#ifdef not_yet_and_probably_never).
MFC after: 2 weeks
when len is inserted back into the synthetic IP packet and cause a
multiple of 2^16 bytes of TCP "packet loss".
This improves Linux->FreeBSD netperf bandwidth by a factor of 300 in
testing on Amazon EC2.
Reviewed by: jfv
MFC after: 2 weeks
- While here, remove a paragraph about userspace operation that
has been outdated for some time. [2]
PR: 158623
Submitted by: Ben Kudak (kaduk % mit!edu) [1]
Reviewed by: glebius [2]
MFC after: 1 week
This makes pf find the wrong state and cause errors reported with state mismatches.
Clear the cached state link on the pf(4) tag to avoid the state mismatches.
Approved by: bz