ICMP unreach, frag needed. Up to now we only looked at the
interface MTU. Make sure to only use the minimum of the two.
In case IPSEC is compiled in, loop the mtu through ip_ipsec_mtu()
to avoid any further conditional maths.
Without this, PMTU was broken in those cases when there was a
route with a lower MTU than the MTU of the outgoing interface.
PR: kern/122338
Tested by: Mark Cammidge mark peralex.com
Reviewed by: silence on net@
MFC after: 2 weeks
was changed in rev. 1.161 of tcp_var.h. All option now test for sufficient
space in TCP header before getting added.
Reported by: Mark Atkinson <atkin901-at-yahoo.com>
Tested by: Mark Atkinson <atkin901-at-yahoo.com>
MFC after: 1 week
Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true
since the advent of MBUMA.
Reviewed by: arch
There are ongoing disputes as to whether we want to switch to directly using
UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.
In that case return an continue processing the packet without IPsec.
PR: 121384
MFC after: 5 days
Reported by: Cyrus Rahman (crahman gmail.com)
Tested by: Cyrus Rahman (crahman gmail.com) [slightly older version]
the NOPs used are 0x01.
While we could simply pad with EOLs (which are 0x00), rather use an
explicit 0x00 constant there to not confuse poeple with 'EOL padding'.
Put in a comment saying just that.
Problem discussed on: src-committers with andre, silby, dwhite as
follow up to the rev. 1.161 commit of tcp_var.h.
MFC after: 11 days
restrict the utilization of direct pointers to the content of
ip packet. These modifications are functionally nop()s thus
can be merged with no side effects.
IPPORT_EPHEMERALFIRST and IPPORT_EPHEMERALLAST with values
10000 and 65535 respectively.
The rationale behind is that it makes the attacker's life more
difficult if he/she wants to guess the ephemeral port range and
also lowers the probability of a port colision (described in
draft-ietf-tsvwg-port-randomization-01.txt).
While there, remove code duplication in in_pcbbind_setup().
Submitted by: Fernando Gont <fernando at gont.com.ar>
Approved by: njl (mentor)
Reviewed by: silby, bms
Discussed on: freebsd-net
- Move the assigment of the socket down before we first need it.
No need to do it at the beginning and then drop out the function
by one of the returns before using it 100 lines further down.
- Use t_maxopd which was assigned the "tcp_mssdflt" for the corrrect
AF already instead of another #ifdef ? : #endif block doing the same.
- Remove an unneeded (duplicate) assignment of mss to t_maxseg just before
we possibly change mss and re-do the assignment without using t_maxseg
in between.
Reviewed by: silby
No objections: net@ (silence)
MFC after: 5 days
the limit in bytes) hard coded into both the kernel and userland.
Make both these limits a sysctl, so it is easy to change the limit.
If the userland part of ipfw finds that the sysctls don't exist,
it will just fall back to the traditional limits.
(100 packets is quite a small limit these days. If you want to test
TCP at 100Mbps, 100 packets can only accommodate a DBP of 12ms.)
Note these sysctls in the man page and warn against increasing them
without thinking first.
MFC after: 3 weeks
the same order that FreeBSD 6 and before did. Doug
White and the other bloodhounds at ISC discovered that
while FreeBSD 7's ordering of options was more efficient,
it caused some cable modem routers to ignore the
SYN-ACKs ordered in this fashion.
The placement of sackOK after the timestamp option seems
to be the critical difference:
FreeBSD 6:
<mss 1460,nop,wscale 1,nop,nop,timestamp 3512155768 0,sackOK,eol>
FreeBSD 7.0:
<mss 1460,nop,wscale 3,sackOK,timestamp 1370692577 0>
FreeBSD 7.0 + this change:
<mss 1460,nop,wscale 3,nop,nop,timestamp 7371813 0,sackOK,eol>
MFC after: 1 week
obtained from OpenBSD with an algorithm suggested
by Amit Klein. The OpenBSD algorithm has a few
flaws; see Amit's paper for more information.
For a description of how this algorithm works,
please see the comments within the code.
Note that this commit does not yet enable random IP ID
generation by default. There are still some concerns
that doing so will adversely affect performance.
Reviewed by: rwatson
MFC After: 2 weeks
ipsec*_set_policy and do the privilege check only if needed.
Try to assimilate both ip*_ctloutput code blocks calling ipsec*_set_policy.
Reviewed by: rwatson
read socket buffers in shutdown() and close():
- Call socantrcvmore() before sblock() to dislodge any threads that
might be sleeping (potentially indefinitely) while holding sblock(),
such as a thread blocked in recv().
- Flag the sblock() call as non-interruptible so that a signal
delivered to the thread calling sorflush() doesn't cause sblock() to
fail. The sblock() is required to ensure that all other socket
consumer threads have, in fact, left, and do not enter, the socket
buffer until we're done flushin it.
To implement the latter, change the 'flags' argument to sblock() to
accept two flags, SBL_WAIT and SBL_NOINTR, rather than one M_WAITOK
flag. When SBL_NOINTR is set, it forces a non-interruptible sx
acquisition, regardless of the setting of the disposition of SB_NOINTR
on the socket buffer; without this change it would be possible for
another thread to clear SB_NOINTR between when the socket buffer mutex
is released and sblock() is invoked.
Reviewed by: bz, kmacy
Reported by: Jos Backus <jos at catnook dot com>
exposing them to all consumers of ip_fw.h. These structures are
used in both ipfw(8) and ipfw(4), but not part of the user<->kernel
interface for other applications to use, rather, shared
implementation.
MFC after: 3 days
Reported by: Paul Vixie <paul at vix dot com>
Introduce a new privilege allowing to set certain IP header options
(hop-by-hop, routing headers).
Leave a few comments to be addressed later.
Reviewed by: rwatson (older version, before addressing his comments)
while in principle a good idea, opened us up to a race inherrent to
the syncache's direct insertion of incoming TCP connections into the
"completed connection" listen queue, as it transpires that the socket
is inserted before the inpcb is fully filled in by syncache_expand().
The bug manifested with the occasional returning of 0.0.0.0:0 in the
address returned by the accept() system call, which occurred if accept
managed to execute tcp_usr_accept() before syncache_expand() had copied
the endpoint addresses into inpcb connection state.
Re-add tcbinfo locking around the address copyout, which has the effect
of delaying the copy until syncache_expand() has finished running, as
it is run while the tcbinfo lock is held. This is undesirable in that
it increases contention on tcbinfo further, but a more significant
change will be required to how the syncache inserts new sockets in
order to fix this and keep more granular locking here. In particular,
either more state needs to be passed into sonewconn() so that
pru_attach() can fill in the fields *before* the socket is inserted, or
the socket needs to be inserted in the incomplete connection queue
until it is actually ready to be used.
Reported by: glebius (and kris)
Tested by: glebius
drop the lock and then re-acquire it, revalidating TCP connection state
assumptions when we do so. This avoids a potential lock order reversal
(and potential deadlock, although none have been reported) due to the
inpcb lock being held over a page fault.
MFC after: 1 week
PR: 102752
Reviewed by: bz
Reported by: Václav Haisman <v dot haisman at sh dot cvut dot cz>
of two compares against 0. The negative effect of cache flushing
is probably more than the gain by not doing the two compares (the
value is almost certainly in register or at worst, cache).
Note that the uses of m_freem() are in error cases and m_freem()
handles NULL anyhow. So fast-path really isn't changed much at all.
free the MAC label on the inpcb before freeing the inpcb.
MFC after: 3 days
Submitted by: tanyong <tanyong at ercist dot iscas dot ac dot cn>,
zhouzhouyi
When system ticks are positive, for entries in the cache
bucket, syncache_timer() ran on every tick (doing nothing
useful) instead of the supposed 3, 6, 12, and 24 seconds
later (when it's time to retransmit SYN,ACK).
When ticks are negative, syncache_timer() was scheduled
for the too far future (up to ~25 days on systems with
HZ=1000), no SYN,ACK retransmits were attempted at all,
and syncache entries added in that period that correspond
to non-established connections stay there forever.
Only HEAD and RELENG_7 are affected.
Reviewed by: silby, kmacy (earlier version)
Submitted by: Maxim Dounin, ru
- Rename output routines tcp_gen_* -> tcp_output_*.
- Rename notification routines that turn in to no-ops in the absence of TOE
from tcp_gen_* -> tcp_offload_*.
- Fix some minor comment nits.
- Add a /* FALLTHROUGH */
Reviewed by: Sam Leffler, Robert Watson, and Mike Silbersack