The lookup hurts a bit for connections but had been there anyway
if IPSEC was compiled in. So moving the lookup up a bit gives us
TSO support at not extra cost.
PR: kern/115586
Tested by: gallatin
Discussed with: kmacy
MFC after: 2 months
timestamps in the initial SYN packet actually use them in the rest of the
connection. Unfortunately, during the 7.0 testing cycle users have already
found network devices that violate this constraint.
RFC 1323 states 'and may send a TSopt in other segments' rather than
'and MUST send', so we must allow it.
Discovered by: Rob Zietlow
Tracked down by: Kip Macy
PR: bin/118005
If it is set to zero value (default) dummynet module will try to emulate
real link as close as possible (bandwidth & latency): packet will not leave
pipe faster than it should be on real link with given bandwidth.
(This is original behaviour of dummynet which was altered in previous commit)
If it is set to non-zero value only bandwidth is enforced: packet's latency
can be lower comparing to real link with given bandwidth.
- Document recently introduced dummynet(4) sysctl variables.
Requested by: luigi, julian
MFC after: 3 month
2) Alter packet flow inside dummynet: allow certain packets to bypass
dummynet scheduler. Benefits are:
- lower latency: if packet flow does not exceed pipe bandwidth, packets
will not be (up to tick) delayed (due to dummynet's scheduler granularity).
- lower overhead: if packet avoids dummynet scheduler it shouldn't reenter ip
stack later. Such packets can be fastforwarded.
- recursion (which can lead to kernel stack exhaution) eliminated. This fix
long existed panic, which can be triggered this way:
kldload dummynet
sysctl net.inet.ip.fw.one_pass=0
ipfw pipe 1 config bw 0
for i in `jot 30`; do ipfw add 1 pipe 1 icmp from any to any; done
ping -c 1 localhost
3) Three new sysctl nodes are added:
net.inet.ip.dummynet.io_pkt - packets passed to dummynet
net.inet.ip.dummynet.io_pkt_fast - packets avoided dummynet scheduler
net.inet.ip.dummynet.io_pkt_drop - packets dropped by dummynet
P.S. Above comments are true only for layer 3 packets. Layer 2 packet flow
is not changed yet.
MFC after: 3 month
- Select a tag gains ability to optionally save new tags
off in the timewait system.
- When looking up associations do not give back a stcb that
is in the about-to-be-freed state, and instead continue
looking for other candiates.
- New function to query to see if value is in time-wait.
- Timewait had a time comparison error that caused very
few vtags to actually stay in time-wait.
- When setting tags in time-wait, we now use the time
requested NOT a fixed constant value.
- sstat now gets the proper associd when we do the query.
- When we process an association, we expect the tag chosen
(if we have one from a cookie) to be in time-wait. Before
we would NOT allow the assoc up by checking if its good.
In theory this should have caused almost all assoc not
to come up except for the time-comparison bug above (this
bug was hidden by the time comparison bug :-D).
- Don't save tags for nonce values in the time-wait cache
since these are used only during cookie collisions and do
not matter if they are unique or not.
MFC after: 1 week
Framework by moving from mac_mbuf_create_netlayer() to more specific
entry points for specific network services:
- mac_netinet_firewall_reply() to be used when replying to in-bound TCP
segments in pf and ipfw (etc).
- Rename mac_netinet_icmp_reply() to mac_netinet_icmp_replyinplace() and
add mac_netinet_icmp_reply(), reflecting that in some cases we overwrite
a label in place, but in others we apply the label to a new mbuf.
Obtained from: TrustedBSD Project
in the TrustedBSD MAC Framework:
- Add mac_atalk.c and add explicit entry point mac_netatalk_aarp_send()
for AARP packet labeling, rather than using a generic link layer
entry point.
- Add mac_inet6.c and add explicit entry point mac_netinet6_nd6_send()
for ND6 packet labeling, rather than using a generic link layer entry
point.
- Add expliict entry point mac_netinet_arp_send() for ARP packet
labeling, and mac_netinet_igmp_send() for IGMP packet labeling,
rather than using a generic link layer entry point.
- Remove previous genering link layer entry point,
mac_mbuf_create_linklayer() as it is no longer used.
- Add implementations of new entry points to various policies, largely
by replicating the existing link layer entry point for them; remove
old link layer entry point implementation.
- Make MAC_IFNET_LOCK(), MAC_IFNET_UNLOCK(), and mac_ifnet_mtx global
to the MAC Framework rather than static to mac_net.c as it is now
needed outside of mac_net.c.
Obtained from: TrustedBSD Project
we move towards netinet as a pseudo-object for the MAC Framework.
Rename 'mac_create_mbuf_linklayer' to 'mac_mbuf_create_linklayer' to
reflect general object-first ordering preference.
Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:
mac_<object>_<method/action>
mac_<object>_check_<method/action>
The previous naming scheme was inconsistent and mostly
reversed from the new scheme. Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier. Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods. Also simplify, slightly,
some entry point names.
All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.
Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.
I'd LOVE to do this rename in 7.0 so that we can eventually MFC the
new kthread_xxx() calls.
us to scale up to sb_max, aka kern.ipc.maxsockbuf.
We do this because there are broken firewalls that will corrupt the window
scale option, leading to the other endpoint believing that our advertised
window is unscaled. At scale factors larger than 5 the unscaled window will
drop below 1500 bytes, leading to serious problems when traversing these
broken firewalls.
With the default maxsockbuf of 256K, a scale factor of 3 will be chosen by
this algorithm. Those who choose a larger maxsockbuf should watch out
for the compatiblity problems mentioned above.
Reviewed by: andre
- fix a bug during cookie collision that prevented an
association from coming up in a specific restart case.
- Fix it so the shutdown-pending flag gets removed (this is
more for correctness then needed) when we enter shutdown-sent
or shutdown-ack-sent states.
- Fix a bug that caused the receiver to sometimes NOT send
a SACK when a duplicate TSN arrived. Without this fix
it was possible for the association to fall down if the
- Deleted primary destination is also stored when SCTP_MOBILITY_BASE.
(Previously, it is stored when only SCTP_MOBILITY_FASTHANDOFF)
- Fix a locking issue where we might call send_initiate_ack() and
incorrectly state the lock held/not held. Also fix it so that
when we release the lock the inp cannot be deleted on us.
- Add the debug option that can cause the stack to panic instead
of aborting an assoc. This does not and should never show up
in options but is useful for debugging unexpected aborts.
- Add cumack_log sent to track sending cumack information for
the debug case where we are running a special log per assoc.
- Added extra () aroudn sctp_sbspace macro to avoid compile warnings.
MFC after: 1 week
TCP: [X.X.X.X]:X to [X.X.X.X]:X tcpflags 0x18<PUSH,ACK>; tcp_do_segment: FIN_WAIT_2: Received data after socket was closed, sending RST and removing tcpcb
So that it also includes how many bytes of data were received. It now looks
like this:
TCP: [X.X.X.X]:X to [X.X.X.X]:X tcpflags 0x18<PUSH,ACK>; tcp_do_segment: FIN_WAIT_2: Received X bytes of data after socket was closed, sending RST and removing tcpcb
Approved by: re (gnn)
problems with the syncache, it produces a lot of console noise and has led
to quite a few false positive bug reports. It can be selectively
re-enabled when debugging specific problems by frobbing the same sysctl.
Discussed with: silby
Approved by: re (gnn)
retransmittion by handover event (fast mobility code)
- Fixed problem of mobility code which is caused by remaining
parameters in the deleted primary destination.
- Add a missing lock. When a peer sends an INIT, and while we
are processing it to send an INIT-ACK the socket is closed,
we did not hold a lock to keep the socket from going away.
Add protection for this case.
- Fix so that arwnd is alway uses the minimal rwnd if the user
has set the socket buffer smaller. Found this when the test
org decided to see what happens when you set in a rwnd of 10
bytes (which is not allowed per RFC .. 4k is minimum).
- Fixes so a cookie-echo ootb will NOT cause an abort to
be sent. This was happening in a MPI collision case.
- Examined all panics and unless there was no recovery, moved
any that were not already to INVARANTS.
Approved by: re@freebsd.org (gnn)
- Reintegrate the ANSI C function declaration change
from tcp_timer.c rev 1.92
- Reorganize the tcpcb structure so that it has a single
pointer to the "tcp_timer" structure which contains all
of the tcp timer callouts. This change means that when
the single tcp timer change is reintegrated, tcpcb will
not change in size, and therefore the ABI between
netstat and the kernel will not change.
Neither of these changes should have any functional
impact.
Reviewed by: bmah, rrs
Approved by: re (bmah)
route and once they are done with it, call rtfree(). rtfree() should
only be used when we are certain we hold the last reference to the
route. This bug results in console messages like the following:
rtfree: 0xc40f7000 has 1 refs
This patch switches the rtfree() to use RTFREE_LOCKED() instead,
which should handle the reference counting on the route better.
Approved by: re@ (gnn)
Reviewed by: bms
Reported by: many via net@ and current@
Tested by: many
last interface should own the address, but the current code
fumbles the handoff. This fixes that.
- move address related debugs to PCB4 and add additional ones to
help in debugging address problems.
Approved by: re@freebsd.org (K Smith)
also involves macro changes to have a RLOCK and a WLOCK
and placing the correct version within the code.
- The INP-INFO lock is changed to a rwlock.
- When sctp_shutdown() is called on Mac OS X, the socket lock is held.
So call sctp_chunk_output with SCTP_SO_LOCKED and
not SCTP_SO_NOT_LOCKED.
- Add SCTP_IPI_ADDR_[RW]LOCK and SCTP_IPI_ADDR_[RW]UNLOCK for Mac OS X.
- u_int64_t -> uint64_t
- add missing addr unlock for error return path
Approved by: re@freebsd.org (K Smith)
- Fix panic from mutex unlock on freed lock when ASCONF-ACK
aborts an assoc
- Fix panic from addr lock recursion when ASCONFs are queued
in the front states
- ASCONFs "queued" in the front states should really be
bundled after the COOKIE-ACK, not in front of it
- Fix issue with addresses deleted in the front states from
being sent with ASCONF(DELETE)-- replaced
sctp_asconf_queue_add_sa() with delete specific function
- Comment change in sctp.h the drafts are now RFC's
Approved by: re@freebsd.org (B Mah)
incorrect and should be OFF letting IP fragment
large cookie-echos.
- Rename sysctl variable logging to log_level.
- Fix description of sysctl variable stats.
- Add sysctl variable log to make sctp_log readable via sysctl
mechanism (this is by compile switch and targets non KTR platforms or
when someone wants to do performance wise tracing).
- Removed debug code
Approved by: re@freebsd.org (B Mah)
stream (using EEOR mode). Changed to EINVAL (in sctp_output.c)
- Static analysis comments added
- fix in mobility code to return a value (static analysis found).
- sctp6_notify function made visible instead of
static (this is needed for Panda).
Approved by: re@freebsd.org (B Mah)
code comes from.
- Fix a LOR on Mac OS X: Do not hold an stcb lock when
calling soisconnected for a socket which has the
SS_INCOMP bit set on so_state.
- fix a comment to be non c++ style.
Approved by: re@freebsd.org (B Mah)
jumping to dropunlock to avoid a panic. While here move the calls to
ipsec4_in_reject() and ipsec6_in_reject() so they are after we obtain
the lock on inp.
Original patch to avoid panic: pjd
Review of locking adjustments: gnn, sam
Approved by: re (rwatson)
- Resort includes a bit.
- Correct typos and wording problems in comments.
- Rename udpcksum to udp_cksum to be consistent with other UDP-related
configuration variables.
- Remove indirection of udp_notify through local notify variable in
udp_ctlinput(), which is presumably due to copying and pasting from TCP,
where multiple notify routines exist.
Approved by: re (kensmith)
the recent send code, but uio may be NULL on sendfile
calls. Change to use sndlen variable.
- EMSGSIZE is not being returned in non-blocking mode
and needs a small tweak to look if the msg would
ever fit when returning EWOULDBLOCK.
- FWD-TSN has a bug in stream processing which could
cause a panic. This is a follow on to the codenomicon
fix.
- PDAPI level 1 and 2 do not work unless the reader
gets his returned buffer full. Fix so we can break
out when at level 1 or 2.
- Fix fast-handoff features to copy across properly on
accepted sockets
- Fix sctp_peeloff() system call when no true system call
exists to screen arguments for errors. In cases where a
real system call exists the system call itself does this.
- Fix raddr leak in recent add-ip code change for bundled
asconfs (even when non-bundled asconfs are received)
- Make sure ipi_addr lock is held when walking global addr
list. Need to change this lock type to a rwlock().
- Add don't wake flag on both input and output when the
socket is closing.
- When deleting an address verify the interface is correct
before allowing the delete to process. This protects panda
and unnumbered.
- Clean up old sysctl stuff and get rid of the old Open/Net
BSD structures.
- Add a function to watch the ranges in the sysctl sets.
- When appending in the reassembly queue, validate that
the assoc has not gone to about to be freed. If so
(in the middle) abort out. Note this especially effects
MAC I think due to the lock/unlock they do (or with
LOCK testing in place).
- Netstat patch to get rid of warnings.
- Make sure that no data gets queued to inactive/unconfirmed
destinations. This especially effect CMT but also makes a
impact on regular SCTP as well.
- During init collision when we detect seq number out
of sync we need to treat it like Case C and discard
the cookie (no invarient needed here).
- Atomic access to the random store.
- When we declare a vtag good, we need to shove it
into the time wait hash to prevent further use. When
the tag is put into the assoc hash, we need to remove it
from the twait hash (where it will surely be). This prevents
duplicate tag assignments.
- Move decr-ref count to better protect sysctl out of
data.
- ltrace error corrections in sctp6_usrreq.c
- Add hook for interface up/down to be sent to us.
- Make sysctl() exported structures independent of processor
architecture.
- Fix route and src addr cache clearing for delete address case.
- Make sure address marked SCTP_DEL_IP_ADDRESS is never selected
as src addr.
- in icmp handling fixed so we actually look at the icmp codes
to figure out what to do.
- Modified mobility code.
Reception of DELETE IP ADDRESS for a primary destination and
SET PRIMARY for a new primary destination is used for
retransmission trigger to the new primary destination.
Also, in this case, destination of chunks in send_queue are
changed to the new primary destination.
- Fix so that we disallow sending by mbuf to ever have EEOR
mode set upon it.
Approved by: re@freebsd.org (B Mah)
additional flags to many function calls. The flags only
get used in BSD when we compile with lock testing. These
flags allow apple to escape the "giant" lock it holds on
the socket and have more fine-grained locking in the NKE.
It also allows us to test (with witness) the locking used
by apple via a compile switch (manually applied).
Approved by: re@freebsd.org(B Mah)
TCP timers as a single timer, but retain the API changes necessary to
reintroduce this change. This will back out the source of at least two
reported problems: lock leaks in certain timer edge cases, and TCP timers
continuing to fire after a connection has closed (a bug previously fixed and
then reintroduced with the timer rewrite).
In a follow-up commit, some minor restylings and comment changes performed
after the TCP timer rewrite will be reapplied, and a further change to allow
the TCP timer rewrite to be added back without disturbing the ABI. The new
design is believed to be a good thing, but the outstanding issues are
leading to significant stability/correctness problems that are holding
up 7.0.
This patch was generated by silby, but is being committed by proxy due to
poor network connectivity for silby this week.
Approved by: re (kensmith)
Submitted by: silby
Tested by: rwatson, kris
Problems reported by: peter, kris, others
import. The PF mbuf-tagging support routines changed to link the
allocated tags into the provided mbuf themselves, so the left-over
m_tag_prepend() was trying to add a bogus (usually NULL) tag.
Reviewed by: mlaier
Approved by: re
the last message on the send stream was "null" but still
there, a state we allow, we could get hung and not clean
it up and wait for the shutdown guard timer to clear the
association without a graceful close. Fix this so that
that we properly clean up.
- Added support for Multiple ASCONF per new RFC. We only
(so far) accept input of these and cannot yet generate
a multi-asconf.
- Sysctl'd support for experimental Fast Handover feature. Always
disabled unless sysctl or socket option changes to enable.
- Error case in add-ip where the peer supports AUTH and ADD-IP
but does NOT require AUTH of ASCONF/ASCONF-ACK. We need to
ABORT in this case.
- According to the Kyoto summit of socket api developers
(Solaris, Linux, BSD). We need to have:
o non-eeor mode messages be atomic - Fixed
o Allow implicit setup of an assoc in 1-2-1 model if
using the sctp_**() send calls - Fixed
o Get rid of HAVE_XXX declarations - Done
o add a sctp_pr_policy in hole in sndrcvinfo structure - Done
o add a PR_SCTP_POLICY_VALID type flag - yet to-do in a future patch!
- Optimize sctp6 calls to reuse code in sctp_usrreq. Also optimize
when we close sending out the data and disabling Nagle.
- Change key concatenation order to match the auth RFC
- When sending OOTB shutdown_complete always do csum.
- Don't send PKT-DROP to a PKT-DROP
- For abort chunks just always checksums same for
shutdown-complete.
- inpcb_free front state had a bug where in queue
data could wedge an assoc. We need to just abandon
ones in front states (free_assoc).
- If a peer sends us a 64k abort, we would try to
assemble a response packet which may be larger than
64k. This then would be dropped by IP. Instead make
a "minimum" size for us 64k-2k (we want at least
2k for our initack). If we receive such an init
discard it early without all the processing.
- When we peel off we must increment the tcb ref count
to keep it from being freed from underneath us.
- handling fwd-tsn had bugs that caused memory overwrites
when given faulty data, fixed so can't happen and we
also stop at the first bad stream no.
- Fixed so comm-up generates the adaption indication.
- peeloff did not get the hmac params copied.
- fix it so we lock the addr list when doing src-addr selection
(in future we need to use a multi-reader/one writer lock here)
- During lowlevel output, we could end up with a _l_addr set
to null if the iterator is calling the output routine. This
means we would possibly crash when we gather the MTU info.
Fix so we only do the gather where we have a src address
cached.
- we need to be sure to set abort flag on conn state when
we receive an abort.
- peeloff could leak a socket. Moved code so the close will
find the socket if the peeloff fails (uipc_syscalls.c)
Approved by: re@freebsd.org(Ken Smith)
pack a set number correctly.
Submitted by: oleg
o Plug a memory leak.
Submitted by: oleg and Andrey V. Elsukov
Approved by: re (kensmith)
MFC after: 1 week
when peer acks the add in case the routing table changes.
- Fix sctp_lower_sosend to send shutdown chunk for mbuf send
case when sndlen = 0 and sinfoflag = SCTP_EOF
- Fix sctp_lower_sosend for SCTP_ABORT mbuf send case with null data,
So that it does not send the "null" data mbuf out and cause
it to get freed twice.
- Fix so auto-asconf sysctl actually effect the socket's asconf state.
- Do not allow SCTP_AUTO_ASCONF option to be used on subset bound sockets.
- Memset bug in sctp_output.c (arguments were reversed) submitted
found and reported by Dave Jones (davej@codemonkey.org.uk).
- PD-API point needs to be invoked >= not just > to conform to socket api
draft this fixes sctp_indata.c in the two places need to be >=.
- move M_NOTIFICATION to use M_PROTO5.
- PEER_ADDR_PARAMS did not fail properly if you specify an address
that is not in the association with a valid assoc_id. This meant
you got or set the stcb level values instead of the destination
you thought you were going to get/set. Now validate if the
stcb is non-null and the net is NULL that the sa_family is
set and the address is unspecified otherwise return an error.
- The thread based iterator could crash if associations were freed
at the exact time it was running. rework the worker thread to
use the increment/decrement to prevent this and no longer use
the markers that the timer based iterator uses.
- Fix the memleak in sctp_add_addr_to_vrf() for the case when it is
detected that ifa is already pointing to a ifn.
- Fix it so that if someone is so insane that they drop the
send window below the minimal add mark, they still can send.
- Changed all state for associations to use mask safe macro.
- During front states in association freeing in sctp_inpcbfree, we
had a locking problem where locks were not in place where they
should have been.
- Free association calls were not testing the return value in
sctp_inpcb_free() properly... others should be cast void returns
where we don't care about the return value.
- If a reference count is held on an assoc, even from the "force free"
we should not do the actual free.. but instead let the timer
free it.
- When we enter sctp_input(), if the SCTP_ASOC_ABOUT_TO_BE_FREED
flag is set, we must NOT process the packet but handle it like
ootb. This is because while freeing an assoc we release the
locks to get all the higher order locks so we can purge all
the hash tables. This leaves a hole if a packet comes in
just at that point. Now sctp_common_input_processing() will
call the ootb code in such a case.
- Change MBUF M_NOTIFICATION to use M_PROTO5 (per Sam L). This makes
it so we don't have a conflict (I think this is a covertity change).
We made this change AFTER some conversation and looking to make sure
that M_PROTO5 does not have a problem between SCTP and the 802.11
stuff (which is the only other place its used).
- Fixed lock order reversal and missing atomic protection around
locked_tcb during association lookup and the 1-2-1 model.
- Added debug to source address selection.
- V6 output must always do checksum even for loopback.
- Remove more locks around inp that are not needed for an atomically
added/subtracted ref count.
- slight optimization in the way we zero the array in sctp_sack_check()
- It was possible to respond to a ABORT() with bad checksum with
a PKT-DROP. This lead to a PKT-DROP/ABORT war. Add code to NOT
send a PKT-DROP to any ABORT().
- Add an option for local logging (useful for macintosh or when
you need better performing during debugging). Note no commands
are here to get the log info, you must just use kgdb.
- The timer code needs to be aware of if it needs to call
sctp_sack_check() to slide the maps and adjust the cum-ack.
This is because it may be out of sync cum-ack wise.
- Added threshold managment logging.
- If the user picked just the right size, that just filled the send
window minus one mtu, we would enter a forever loop not copying and
at the same time not blocking. Change from < to <= solves this.
- Sysctl added to control the fragment interleave level which defaults
to 1.
- My rwnd control was not being used to control the rwnd properly (we
did not add and subtract to it :-() this is now fixed so we handle
small messages (1 byte etc) better to bring our rwnd down more
slowly.
Approved by: re@freebsd.org (Bruce Mah)
- Remove unneeded WLOCK/UNLOCK of inp for getting TCB lock.
- Fix panic that may occur when freeing an assoc that has partial
delivery in progress (may dereference null socket pointer when
queuing partial delivery aborted notification)
- Some spacing and comment fixes.
- Fix address add handling to clear cached routes and source addresses
when peer acks the add in case the routing table changes.
Approved by: re@freebsd.org (Bruce Mah)
projected_offset against isn_offset to account for
wrap around.
Reviewed by: gnn, kmacy, silby
Submitted by: yusheng.huang@bluecoat.com
Approved by: re
MFC: 3 days
the use of divert sockets to dead locks. A number of LORs have been reported
between divert and a number of other network subsystems including: IPSEC, Pfil,
multicast, ipfw and others. Other dead locks could occur because of recursive
entry into the IP stack. This change should take care of most if not all of
these issues.
A summary of the changes follow:
- We disallow multicast operations on divert sockets. It really doesn't make
semantic sense to allow this, since typically you would set multicast
parameters on multicast end points.
NOTE: As a part of this change, we actually dis-allow multicast options on
any socket that IS a divert socket OR IS NOT a SOCK_RAW or SOCK_DGRAM family
- We check to see if there are any socket options that have been specified on
the socket, and if there was (which is very un-common and also probably
doesnt make sense to support) we duplicate the mbuf carrying the options.
- We then drop the INP/INFO locks over the call to ip_output(). It should be
noted that since we no longer support multicast operations on divert sockets
and we have duplicated any socket options, we no longer need the reference
to the pcb to be coherent.
- Finally, we replaced the call to ip_input() to use netisr queuing. This
should remove the recursive entry into the IP stack from divert.
By dropping the locks over the call to ip_output() we eliminate all the lock
ordering issues above. By switching over to netisr on the inbound path,
we can no longer recursively enter the ip_input() code via divert.
I have tested this change by using the following command:
ipfwpcap -r 8000 - | tcpdump -r - -nn -v
This should exercise the input and re-injection (outbound) path, which is
very similar to the work load performed by natd(8). Additionally, I have
run some ospf daemons which have a heavy reliance on raw sockets and
multicast.
Approved by: re@ (kensmith)
MFC after: 1 month
LOR: 163
LOR: 181
LOR: 202
LOR: 203
Discussed with: julian, andre et al (on freebsd-net)
In collaboration with: bms [1], rwatson [2]
[1] bms helped out with the multicast decisions
[2] rwatson submitted the original netisr patches and came up with some
of the original ideas on how to combat this issue.
for bakeoff.. using the next sequential ones)
- In cookie processing 1-2-1, we did not increment the stcb
refcnt before releasing the tcb lock. We need to do this
to keep the tcb from being freed by a abort or ?? unlikely
but worth doing. Also get rid of unneed INP_WLOCK.
- extra receive info included the rcvinfo which killed the
padding/alignment. We now redefine all the fields properly
so they both align properly both to 128 bytes.
- A peeled off socket would not close without an error due to
its misguided idea that sctp_disconnect() was not supported
on it. This fixes it so it goes through the proper path.
- When an assoc was being deleted after abort (via a timer) a
small race condition exists where we might take a packet for
the old assoc (since we are waiting for a cleanup timer). This
state especially happens in mac. We now add a state in the asoc
so these can properly handle the packet as OOTB.
Approved by: re@freebsd.org(Ken Smith)
previously conditionally acquired Giant based on debug.mpsafenet. As that
has now been removed, they are no longer required. Removing them
significantly simplifies error-handling in the socket layer, eliminated
quite a bit of unwinding of locking in error cases.
While here clean up the now unneeded opt_net.h, which previously was used
for the NET_WITH_GIANT kernel option. Clean up some related gotos for
consistency.
Reviewed by: bz, csjp
Tested by: kris
Approved by: re (kensmith)
Also rename the related functions in a similar way.
There are no functional changes.
For a packet coming in with IPsec tunnel mode, the default is
to only call into the firewall with the "outer" IP header and
payload.
With this option turned on, in addition to the "outer" parts,
the "inner" IP header and payload are passed to the
firewall too when going through ip_input() the second time.
The option was never only related to a gif(4) tunnel within
an IPsec tunnel and thus the name was very misleading.
Discussed at: BSDCan 2007
Best new name suggested by: rwatson
Reviewed by: rwatson
Approved by: re (bmah)