pru_abort(), pru_detach(), and in_pcbdetach():
- Universally support and enforce the invariant that so_pcb is
never NULL, converting dozens of unnecessary NULL checks into
assertions, and eliminating dozens of unnecessary error handling
cases in protocol code.
- In some cases, eliminate unnecessary pcbinfo locking, as it is no
longer required to ensure so_pcb != NULL. For example, in protocol
shutdown methods, and in raw IP send.
- Abort and detach protocol switch methods no longer return failures,
nor attempt to free sockets, as the socket layer does this.
- Invoke in_pcbfree() after in_pcbdetach() in order to free the
detached in_pcb structure for a socket.
MFC after: 3 months
- in_pcbdetach(), which removes the link between an inpcb and its
socket.
- in_pcbfree(), which frees a detached pcb.
Unlike the previous in_pcbdetach(), neither of these functions will
attempt to conditionally free the socket, as they are responsible only
for managing in_pcb memory. Mirror these changes into in6_pcbdetach()
by breaking it into in6_pcbdetach() and in6_pcbfree().
While here, eliminate undesired checks for NULL inpcb pointers in
sockets, as we will now have as an invariant that sockets will always
have valid so_pcb pointers.
MFC after: 3 months
rather than an error. Detaches do not "fail", they other occur or
the protocol flags SS_PROTOREF to take ownership of the socket.
soclose() no longer looks at so_pcb to see if it's NULL, relying
entirely on the protocol to decide whether it's time to free the
socket or not using SS_PROTOREF. so_pcb is now entirely owned and
managed by the protocol code. Likewise, no longer test so_pcb in
other socket functions, such as soreceive(), which have no business
digging into protocol internals.
Protocol detach routines no longer try to free the socket on detach,
this is performed in the socket code if the protocol permits it.
In rts_detach(), no longer test for rp != NULL in detach, and
likewise in other protocols that don't permit a NULL so_pcb, reduce
the incidence of testing for it during detach.
netinet and netinet6 are not fully updated to this change, which
will be in an upcoming commit. In their current state they may leak
memory or panic.
MFC after: 3 months
than an int, as an error here is not meaningful. Modify soabort() to
unconditionally free the socket on the return of pru_abort(), and
modify most protocols to no longer conditionally free the socket,
since the caller will do this.
This commit likely leaves parts of netinet and netinet6 in a situation
where they may panic or leak memory, as they have not are not fully
updated by this commit. This will be corrected shortly in followup
commits to these components.
MFC after: 3 months
reason, seems to be where new flags are getting defined:
INP_DROPPED - The protocol has terminated this connection and the socket
is not reusable: when the socket code enters the protocol,
an error is immediately returned. This will substitute for
NULLing the so_pcb socket field, helping to implement the
invariant that all valid sockets have valid pcb's in TCP.
INP_SOCKREF - The protocol has become the owner of the socket reference,
and will need to free it when freeing the pcb, which will
be used when a TCP socket is closed but still has queued
data.
MFC after: 1 month
multicast addresses from carp interface. [1]
o Rewrite carpdetach(), so that it does the following things: [1]
- Stops callouts.
- Decrements carp_suppress_preempt, if needed.
- Downs interface and sets CARP state to INIT.
- Calls carp_multicast_cleanup().
- Detaches softc from carp_if and if we are the last frees
the carp_if.
o Use new carpdetach() in carp_clone_destroy().
o In carp_ifdetach() acquire the carp_if lock and cleanup all
interfaces hanging on carp_if. [1]
o Make carp_ifdetach() static and use EVENT(9) to call it
from if_detach(). [2]
o In carp_setrun() exit if the softc doesn't have a valid pointer
to parent. [1]
Obtained from: OpenBSD [1]
Submitted by: Dan Lukes <dan obluda.cz> [2]
PR: kern/82908 [2]
net.inet.ip.portrange.reservedlow apply to IPv6 aswell as IPv4.
We could have made new sysctls for IPv6, but that potentially makes
things complicated for mapped addresses. This seems like the least
confusing option and least likely to cause obscure problems in the
future.
This change makes the mac_portacl module useful with IPv6 apps.
Reviewed by: ume
MFC after: 1 month
consumers ignore the return value, soabort() is required to succeed,
and protocols produce errors here to report multiple freeing of the
pcb, which we hope to eliminate.
right from the beginning and partly clean up the differences in handling
between SYN_SENT and SYN_RCVD (syncache).
Further changes to this code to come. This is a first incremental step
to a general overhaul and streamlining of the TCP code.
PR: kern/15095
PR: kern/92690 (partly)
Reviewed by: qingli (and tested with ANVL)
Sponsored by: TCP/IP Optimization Fundraise 2005
simultaneous open. Both the bug and the patch were verified using the
ANVL test suite.
PR: kern/74935
Submitted by: qingli (before I became committer)
Reviewed by: andre
MFC after: 5 days
threshold. Inflight doesn't make sense on a LAN as it has
trouble figuring out the maximal bandwidth because of the coarse
tick granularity.
The sysctl net.inet.tcp.inflight.rttthresh specifies the threshold
in milliseconds below which inflight will disengage. It defaults
to 10ms.
Tested by: Joao Barros <joao.barros-at-gmail.com>,
Rich Murphey <rich-at-whiteoaklabs.com>
Sponsored by: TCP/IP Optimization Fundraise 2005
it so that ip_id etc. don't get overwritten. This fixes forwarding
of fragmented IP packets through a dummynet pipe -- fragments came
out with modified and different(!) ip_id's, making it impossible to
reassemble a datagram at the receiver side.
Submitted by: Alexander Karptsov (reworked by me)
MFC after: 3 days
in syncache_lookup() is not cleared and may lead to an arbitrary and
bogus rtentry pointer which later gets free'd.
Reviewed by: andre
MFC after: 3 days
we have another PCB which is bound to 0.0.0.0. If a PCB has the
INP_IPV6 flag, then we set its cost higher than IPv4 only PCBs.
Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net>
Obtained from: KAME
MFC after: 1 week
store some pipe pointers on stack. If user reconfigures dummynet
in the interlock gap, we can work with freed pipes after relock.
To fix this, we decided not to send packets in transmit_event(),
but fill a queue. At the end of dummynet() and dummynet_io(),
after the lock is dropped, if there is something in the queue
we run dummynet_send() to process the queue.
In collaboration with: ru
filtering mechanisms to use the new rwlock(9) locking API:
- Drop the variables stored in the phil_head structure which were specific to
conditions and the home rolled read/write locking mechanism.
- Drop some includes which were used for condition variables
- Drop the inline functions, and convert them to macros. Also, move these
macros into pfil.h
- Move pfil list locking macros intp phil.h as well
- Rename ph_busy_count to ph_nhooks. This variable will represent the number
of IN/OUT hooks registered with the pfil head structure
- Define PFIL_HOOKED macro which evaluates to true if there are any
hooks to be ran by pfil_run_hooks
- In the IP/IP6 stacks, change the ph_busy_count comparison to use the new
PFIL_HOOKED macro.
- Drop optimization in pfil_run_hooks which checks to see if there are any
hooks to be ran, and returns if not. This check is already performed by the
IP stacks when they call:
if (!PFIL_HOOKED(ph))
goto skip_hooks;
- Drop in assertion which makes sure that the number of hooks never drops
below 0 for good measure. This in theory should never happen, and if it
does than there are problems somewhere
- Drop special logic around PFIL_WAITOK because rw_wlock(9) does not sleep
- Drop variables which support home rolled read/write locking mechanism from
the IPFW firewall chain structure.
- Swap out the read/write firewall chain lock internal to use the rwlock(9)
API instead of our home rolled version
- Convert the inlined functions to macros
Reviewed by: mlaier, andre, glebius
Thanks to: jhb for the new locking API
and signifincantly improve the readability of ip_input() and
ip_output() again.
The resulting IPSEC hooks in ip_input() and ip_output() may be
used later on for making IPSEC loadable.
This move is mostly mechanical and should preserve current IPSEC
behaviour as-is. Nothing shall prevent improvements in the way
IPSEC interacts with the IPv4 stack.
Discussed with: bz, gnn, rwatson; (earlier version)
will be sent if there is an address on the bridge. Exclude the bridge from the
special arp handling.
This has been tested with all combinations of addresses on the bridge and members.
Pointed out by: Michal Mertl
however IPv4-in-IPv4 tunnels are now stable on SMP. Details:
- Add per-softc mutex.
- Hold the mutex on output.
The main problem was the rtentry, placed in softc. It could be
freed by ip_output(). Meanwhile, another thread being in
in_gif_output() can read and write this rtentry.
Reported by: many
Tested by: Alexander Shiryaev <aixp mail.ru>
ip_forward() would report back a zero MTU in ICMP needfrag messages
because on a IPSEC SP lookup failure no MTU got computed.
Fix this by changing the logic to compute a new MTU in any case if
IPSEC didn't do it.
Change MTU computation logic to use egress interface MTU if available
or the next smaller MTU compared to the current packet size instead
of falling back to a very small fixed MTU.
Fix associated comment.
PR: kern/91412
MFC after: 3 days
ia_hash only if it actually is an AF_INET address. All other places
test for sa_family == AF_INET but this one.
PR: kern/92091
Submitted by: Seth Kingsley <sethk-at-meowfishies.com>
MFC after: 3 days
If net.link.ether.inet.useloopback=1 and we send broadcast packet using our
own source ip address it may be rejected by uRPF rules.
Same bug was fixed for IPv6 in rev. 1.115 by suz.
PR: kern/76971
Approved by: glebius (mentor)
MFC after: 3 days
Vararg functions have a different calling convention than regular
functions on amd64. Casting a varag function to a regular one to
match the function pointer declaration will hide the varargs from
the caller and we will end up with an incorrectly setup stack.
Entirely remove the varargs from these functions and change the
functions to match the declaration of the function pointers.
Remove the now unnecessary casts.
Lots of explanations and help from: peter
Reviewed by: peter
PR: amd64/89261
MFC after: 6 days
errors from rn_inithead back to the ipfw initialization function.
- Check return value of rn_inithead for failure, if table allocation has
failed for any reason, free up any tables we have created and return ENOMEM
- In ipfw_init check the return value of init_tables and free up any mutexes or
UMA zones which may have been created.
- Assert that the supplied table is not NULL before attempting to dereference.
This fixes panics which were a result of invalid memory accesses due to failed
table allocation. This is an issue mainly because the R_Zalloc function is a
malloc(M_NOWAIT) wrapper, thus making it possible for allocations to fail.
Found by: Coverity Prevent (tm)
Coverity ID: CID79
MFC after: 1 week
This fixes a bug in the previous commit.
Found by: Coverity Prevent(tm)
Coverity ID: CID253
Sponsored by: TCP/IP Optimization Fundraise 2005
MFC after: 3 days