Commit Graph

2766 Commits

Author SHA1 Message Date
Andre Oppermann
9eab54debf When we run into the syncache entry limits syncache_add() tries
to free the oldest entry in the current bucket row.  The global
entry limit may be smaller than the bucket rows and their limit
combined however.  Thus only try to free a syncache entry if we
found one in this bucket row.

Reported by:	kris
2007-04-17 15:25:14 +00:00
Robert Watson
c9791cfb3e Shorten text string for ip_fw2 dynamic rules zone by removing the word
"zone", which is generally not present in zone names.  This reduces the
incidence of line-wrapping in "vmstat -z " using 80-column displays.

MFC after:	3 days
2007-04-17 09:28:36 +00:00
Robert Watson
215c8d75b8 Remove unused variable tcbinfo_mtx. 2007-04-15 21:03:23 +00:00
Randall Stewart
f1d6e6dc71 Fix stupid syntax error - Pointy hat to me :-( 2007-04-15 13:03:14 +00:00
Randall Stewart
478d3f0901 - Add more comments to sctps_stats struture in sctp_uio.h
- Fix bug that prevented EEOR mode from working
  and simplified the can_we_split code in the process.
- Reduce lock contention for the tcb_send_lock. I did
  this especially for EEOR mode, still need to look at
  why I need a lock when removing from the tailq and the
  ->next is NOT null. A lock fixes it but it implies a
  bug yet exists.
- Activated Andre's proposed changes to better use the mbuf
  infrastructure.
- Fixed places that were not using the aloc macro's to take
  advantage of the per assoc cache.
- Adds ifdef fix so any logging will enable stat_logging to
  get the right data structures in place (suggested by Max Laier).
2007-04-15 11:58:26 +00:00
Max Laier
d0cf96b407 Fix a typeo - unbreak the build. 2007-04-14 18:27:34 +00:00
Randall Stewart
c105859eee - fix source address selection when picking an acceptable address
- name change of prefered -> preferred
- CMT fast recover code added.
- Comment fixes in CMT.
- We were not giving a reason of cant_start_asoc per socket api
  if we failed to get init/or/cookie to bring up an assoc. Change
  so we don't just give a generic "comm lost" but look at actual
  states of dying assoc.
- change "crc32" arguments to "crc32c" to silence strict/noisy
  compiler warnings when crc32() is also declared
- A few minor tweaks to get the portable stuff truely portable
  for sctp6_usrreq.c :-D
- one-2-one style vrf match problem.
- window recovery would leave chks marked for retran
  during window probes on the sent queue. This would then
  cause an out-of-order problem and assure that the flight
  size "problem" would occur.
- Solves a flight size logging issue that caused rwnd
  overruns, flight size off as well as false retransmissions.g
- Macroize the up and down of flight size.
- Fix a ECNE bug in its counting.
- The strict_sacks options was causing aborts when window probing
  was active, fix to make strict sacks a bit smarter about what
  the next unsent TSN is.
- Fixes a one-2-one wakeup bug found by Martin Kulas.
- If-defed out form, Andre's copy routines pending his
  commit of at least m_last().. need to adjust for 6.2 as
  well.. since m_last won't exist.
Reviewed by:	gnn
2007-04-14 09:44:09 +00:00
Ruslan Ermilov
7480de4305 Make "struct tcp_timer" visible only to the kernel, and unbreak world. 2007-04-11 14:08:42 +00:00
Andre Oppermann
b8152ba793 Change the TCP timer system from using the callout system five times
directly to a merged model where only one callout, the next to fire,
is registered.

Instead of callout_reset(9) and callout_stop(9) the new function
tcp_timer_activate() is used which then internally manages the callout.

The single new callout is a mutex callout on inpcb simplifying the
locking a bit.

tcp_timer() is the called function which handles all race conditions
in one place and then dispatches the individual timer functions.

Reviewed by:	rwatson (earlier version)
2007-04-11 09:45:16 +00:00
Robert Watson
6493245ded Add a new privilege, PRIV_NETINET_REUSEPORT, which will replace superuser
checks to see whether bind() can reuse a port/address combination while
it's already in use (for some definition of use).
2007-04-10 15:58:38 +00:00
Paolo Pisati
c326cd0e62 Prevent the usage of an uninitialized variable: do not accept
StartMediaTx message before an OpnRcvChnAck message was received.

Reviewed by:	glebius
Approved by:	glebius (mentor)
MFC after:      3 days
Found with:	Coverity Prevent(tm)
CID:		498
2007-04-07 09:52:36 +00:00
Paolo Pisati
f4296f2246 Silence Coverity about an unused variable.
Reviewed by: 	glebius
Approved by: 	glebius (mentor)
MFC after: 	3 days
CID: 		538
2007-04-07 09:47:39 +00:00
Andre Oppermann
995a77176f Add INP_INFO_UNLOCK_ASSERT() and use it in tcp_input(). Also add some
further INP_INFO_WLOCK_ASSERT() while there.
2007-04-04 18:30:16 +00:00
Andre Oppermann
0c38fd0a7a Move last tcpcb initialization for the inbound connection case from
tcp_input() to syncache_socket() where it belongs and the majority
of it already happens.

The "tp->snd_up = tp->snd_una" is removed as it is done with the
tcp_sendseqinit() macro a few lines earlier.
2007-04-04 16:13:45 +00:00
Andre Oppermann
beaa515e95 Some local and style(9) cleanups. 2007-04-04 15:30:31 +00:00
Andre Oppermann
5dd9dfefd6 Retire unused TCP_SACK_DEBUG. 2007-04-04 14:44:15 +00:00
Andre Oppermann
b728e90260 In tcp_dooptions() skip over SACK options if it is a SYN segment. 2007-04-04 14:39:49 +00:00
Alexander Kabaev
edb2e5dca3 Include string.h for non-kernel builds to get proper memcpy prototype. 2007-04-04 03:16:59 +00:00
Alexander Kabaev
d8164209b3 Include string.h for non-kernel builds to get proper strcpy, strlen
prototypes.
2007-04-04 03:14:15 +00:00
Alexander Kabaev
9160afee7c Do not assign result of (char *) cast to u_char * variable. 2007-04-04 03:10:42 +00:00
Julian Elischer
1bd69ee131 Since we switched to using monatomically increasing timestamps,
they have been reported back to the userland as being in 1970.
Add boot time to the timestamp to give the time in the scale of the 'current'
real timescale.  Not perfect if you change the time a lot but good enough
to keep all the rules correct relative to each other correct in terms
of time relative to "now".
2007-04-03 22:45:50 +00:00
Randall Stewart
bff64a4db3 - fixed several places where we did not release INP locks.
- fixed a refcount bug in the new ifa structures.
- use vrf's from default stcb or inp whenever possible.
- Address limits raised to account for a full IP fragmented
  packet (1000 addresses).
- flight size correcting updated to include one message only
  and to handle case where the peer does not cumack the
  next segment aka lists 1/1 in sack blocks..
- Various bad init/init-ack handling could cause a panic
  since we tried to unlock the destroyed mutex. Fixes
  so we properly exit when we need to destroy an assoc.
  (Found by Cisco DevTest team :D)
- name rename in src-addr-selection from pass to sifa.
- route structure typedef'd to allow different platforms
  and updated into sctp_os_bsd file.
- Max retransmissions a chunk can be made added.
Reviewed by:	gnn
2007-04-03 11:15:32 +00:00
Randall Stewart
5e54f665f0 - Found bug in min split point bundling which caused
incorrect, non-bundlable fragmentation.
- Added min residual to better control split points for
  both how big a msg must be as well as how much needs
  to be left over.
- With our new algo in place, we need to implicitly
  set "end of msg" on the sp-> structure otherwise we
  end up with "hung" associations.
- Room reserved up front in IP header by pushing IP
  header to back of mbuf.
- Fix so FR's peg count of retransmissions needed.
- Fix so an unlucky chunk that never gets across
  will kill the assoc via the kill timer and send an
  abort too.
- Fix bug in sctp_input which can result in a crash.
- Do not strip off IP options anymore.
- Clean up sctp_calculate_rto().
- Get rid of unused sysctl.
- Fixed so we discard all M-Cast
- Fixed so port check done AFTER checksum
- Fixed bug in fragmentation code that prevented
  us from fragmenting a small complete message when
  we needed to.
- Window probes were not marked back to unsent and
  flight adjusted when a sack came in with no
  window change or accepting of the probe data.
  We now fix this with having a mark on the net and
  the chunk so we can clear it out when the sack arrives
  forcing it to retran just like it was "new" this
  improves the handling of window probes, which were
  dropped by the receiver.
- Tighten AUTH protocol error checks during INIT/INIT-ACK exchange
2007-03-31 11:47:30 +00:00
Bruce M Simpson
f7e083af90 Fix a bug in IPv4 address configuration exposed by refcounting.
* Join the IPv4 all-hosts multicast group 224.0.0.1 once only;
   that is, when an IPv4 address is first configured on an interface.
 * Do not join it for subsequent IPv4 addresses as this violates IGMP.
 * Be sure to leave the group when all IPv4 addresses have been removed
   from the interface.
 * Add two DIAGNOSTIC printfs related to the issue.

Further care and attention is needed in this area; it is suggested that
netinet's attachment to the ifnet structure be compartmentalized and
non-implicit.

Bug found by:	andre
MFC after:	1 month
2007-03-29 21:39:22 +00:00
Andre Oppermann
1929eae1cc When blackholing do a 'dropunlock' in the new world order to prevent the
INP_INFO_LOCK from leaking.

Reported by:	ache
Found by:	rwatson
2007-03-28 12:58:13 +00:00
Robert Watson
77c78838f0 Remove stale comment about not enabling inpcb and inpcbinfo lock assertions
when IPv6 is enabled.

MFC after:	3 days
2007-03-28 00:50:20 +00:00
Andre Oppermann
07b64b901a In tcp_sack_doack() remove too tight KASSERT() added in last revision. This
function may be called without any TCP SACK option blocks present.  Protect
iteration over SACK option blocks by checking for SACK options present flag
first.

Bug reported by:	wkoszek, keramida, Nicolas Blais
2007-03-25 23:27:26 +00:00
Robert Watson
30916a2d1d Replace a comment about RSVP/mrouting with a different but similar comment
explaining that some more locking is needed.  The routing pieces are done,
but there is an interlocking issue between optionally compiled code and
mandatory code.

Spotted by:	kris
2007-03-25 21:49:50 +00:00
Maxim Konovalov
14739780bd o Use a define for a buffer size.
Prodded by:	db

o Add missed vars for TCPDEBUG in tcp_do_segment().

Prodded by:	tinderbox
2007-03-24 22:15:02 +00:00
Andre Oppermann
302ce8d690 Split tcp_input() into its two functional parts:
o tcp_input() now handles TCP segment sanity checks and preparations
   including the INPCB lookup and syncache.
 o tcp_do_segment() handles all data and ACK processing and is IPv4/v6
   agnostic.

Change all KASSERT() messages to ("%s: ", __func__).

The changes in this commit are primarily of mechanical nature and no
functional changes besides the function split are made.

Discussed with:	rwatson
2007-03-23 20:16:50 +00:00
Andre Oppermann
4dfdffe9e2 Tidy up some code to conform better to surroundings and style(9), 0 = NULL
and space/tab.
2007-03-23 19:11:22 +00:00
Andre Oppermann
fc30a25199 Bring SACK option handling in tcp_dooptions() in line with all other
options and ajust users accordingly.
2007-03-23 18:33:21 +00:00
Bruce M Simpson
73ec8173eb Purge two redundant case labels. 2007-03-23 09:43:36 +00:00
Gleb Smirnoff
1daaa65d3f Remove global list of all llinfo_arp entries and use a callout per
instance expiry of the ARP entries. Since we no longer abuse the IPv4
radix head lock, we can now enter arp_rtrequest() with a lock held on
an arbitrary rt_entry.

Reviewed by:	bms
2007-03-22 10:37:53 +00:00
Andre Oppermann
ad3f9ab320 ANSIfy function declarations and remove register keywords for variables.
Consistently apply style to all function declarations.
2007-03-21 19:37:55 +00:00
Andre Oppermann
f7608d9e7f Match up SYSCTL declarations in style. 2007-03-21 19:34:12 +00:00
Andre Oppermann
eec9d82d8e Subtract optlen in the maximum length check for TSO and finally avoid
slightly oversized TSO mbuf chains.

Submitted by:	kmacy
2007-03-21 19:04:07 +00:00
Andre Oppermann
b10fbdeafa Tidy up IPFIREWALL_FORWARD sections and comments. 2007-03-21 18:56:03 +00:00
Andre Oppermann
794235b737 Update and clarify comments in first section of tcp_input(). 2007-03-21 18:52:58 +00:00
Andre Oppermann
db33b3e6a7 Tidy up the ACCEPTCONN section of tcp_input(), ajust comments and remove
old dead T/TCP code.
2007-03-21 18:49:43 +00:00
Andre Oppermann
574b696407 Tidy up tcp_log_in_vain and blackhole. 2007-03-21 18:36:49 +00:00
Andre Oppermann
85c497918c Make TCP_DROP_SYNFIN a standard part of TCP. Disabled by default it
doesn't impede normal operation negatively and is only a few lines of
code.  It's close relatives blackhole and log_in_vain aren't options
either.
2007-03-21 18:25:28 +00:00
Andre Oppermann
e406f5a1c9 Remove tcp_minmssoverload DoS detection logic. The problem it tried to
protect us from wasn't really there and it only bloats the code.  Should
the problem surface in the future we can simply resurrect it from cvs
history.
2007-03-21 18:05:54 +00:00
Bruce M Simpson
c7547d1aaf Increase default size of raw IP send and receive buffers to the same as
udp_sendspace, to avoid a situation where jumbograms (datagrams > 9KB)
are unnecessarily fragmented.

A common use case for this is OSPF link-state database synchronization
during adjacency bringup on a high speed network with a large MTU.

It is not possible to auto-tune this setting until a socket is bound to
a given interface, and because the laddr part of the inpcb tuple may be
overridden, it makes no sense to do so. Applications may request a larger
socket buffer size by using the SO_SENDBUF and SO_RECVBUF socket options.

Certain applications such as Quagga ospfd do not probe for interface MTU
and therefore do not increase SO_SENDBUF in this use case.
XORP is not affected by this problem as it preemptively uses SO_SENDBUF
and SO_RECVBUF to account for any possible additional latency in XRL IPC.

PR:		kern/108375
Requested by:	Vladimir Ivanov
MFC after:	1 week
2007-03-20 13:15:20 +00:00
Randall Stewart
62c1ff9c48 - window update sacks sent incorrectly after
shutdown which caused extra abort from peer.
- RTT time calculation was not being done in
  express sack handling since it refered to an unused
  variable (rto_pending). Removed variable.
- socket buffer high water access macro-ized.
2007-03-20 10:23:11 +00:00
Bruce M Simpson
ec002fee99 Implement reference counting for ifmultiaddr, in_multi, and in6_multi
structures. Detect when ifnet instances are detached from the network
stack and perform appropriate cleanup to prevent memory leaks.

This has been implemented in such a way as to be backwards ABI compatible.
Kernel consumers are changed to use if_delmulti_ifma(); in_delmulti()
is unable to detect interface removal by design, as it performs searches
on structures which are removed with the interface.

With this architectural change, the panics FreeBSD users have experienced
with carp and pfsync should be resolved.

Obtained from:	p4 branch bms_netdev
Reviewed by:	andre
Sponsored by:	Garance A Drosehn
Idea from:	NetBSD
MFC after:	1 month
2007-03-20 00:36:10 +00:00
Andre Oppermann
6489fe6553 Match up SYSCTL declaration style. 2007-03-19 19:00:51 +00:00
Andre Oppermann
8b8ed7a78e Match up SYSCTL_INT declarations in style. 2007-03-19 18:42:27 +00:00
Andre Oppermann
4e02375908 Maintain a pointer and offset pair into the socket buffer mbuf chain to
avoid traversal of the entire socket buffer for larger offsets on stream
sockets.

Adjust tcp_output() make use of it.

Tested by:	gallatin
2007-03-19 18:35:13 +00:00
Randall Stewart
6a27c37636 Adds a hash table to speed local address lookup
on a per VRF basis (BSD has only one VRF currently).
Hash table is sized to 16 but may need to be adjusted
for machines with large numbers of addresses.
Reviewed by:	gnn
2007-03-19 11:11:16 +00:00