Commit Graph

3231 Commits

Author SHA1 Message Date
Bjoern A. Zeeb
8e5c87f4b6 Fix typo and while here another one.
Reviewed by:	keramida
Reported by:	keramida
MFC after:	2 months (with r184720)
2008-11-06 16:30:20 +00:00
Bjoern A. Zeeb
91d6cfa6b1 Fix a bug introduced with r182851 splitting tcp_mss() into
tcp_mss() and tcp_mss_update() so that tcp_mtudisc() could
re-use the same code.

Move the TSO logic back to tcp_mss() and out of tcp_mss_update().
We tried to avoid that initially but if were are called from
tcp_output() with EMSGSIZE, we cleared the TSO flag on the tcpcb
there, called into tcp_mtudisc() and tcp_mss_update() which
then would reenable TSO on the tcpcb based on TSO capabilities
of the interface as learnt in tcp_maxmtu/6().
So if TSO was enabled on the (possibly new) outgoing interface
it was turned back on, which lead to an endless loop between
tcp_output() and tcp_mtudisc() until we overflew the stack.

Reported by:	kmacy
MFC after:	2 months (along with r182851)
2008-11-06 13:25:59 +00:00
Bjoern A. Zeeb
4b3f4d3818 Adopt the comment for tcp_maxmtu(); we are returning a number
not a pointer. While here update the rest of the comment to
better match what we have these days.

MFC after:	2 months
2008-11-06 12:59:00 +00:00
Bjoern A. Zeeb
6f01cac68a Fix a bug introduced with r182851 splitting tcp_mss() into
tcp_mss() and tcp_mss_update() so that tcp_mtudisc() could
re-use the same code.

In case we return early and got a metricptr to pass the hostcache
info back to the caller we need to initialize the data to a defined
state (zero it) as tcp_hc_get() would do if there was no hit.
Without that the caller would check on random stack garbage which
could lead to undefined results.

This only affected tcp_mss() if there was no routing entry for the peer,
tcp_mtudisc() was not affected.

MFC after:	2 months (along with r182851)
2008-11-06 12:33:33 +00:00
Oleg Bulyzhin
02d09f7901 Type of q_time (start of queue idle time) has changed: uint32_t -> uint64_t.
This should fix q_time overflow, which happens after 2^32/(86400*hz) days of
uptime (~50days for hz = 1000).
q_time overflow cause following:
- traffic shaping may not work in 'fast' mode (not enabled by default).
- incorrect average queue length calculation in RED/GRED algorithm.

NB: due to ABI change this change is not applicable to stable.

PR:		kern/128401
2008-10-28 14:14:57 +00:00
Randall Stewart
73adc48f49 More issues with pre-blocking:
a) Need for EEOR mode to take the min of the socket buffer size and the
    add more threshold, otherwise if you are so silly as to set a send
    buf size less than the add-more you could block forever in eeor mode.

 b) We were incorrectly using the sysctl vs the calculated value. This
    causes us to block forever if the addmore theshold is larger than
    then the socket buffer size.
2008-10-27 14:49:12 +00:00
Randall Stewart
35e4161b1f Two inter-related bugs.
- If we send EXACTLY the size left in the send buffer
    and then send again, we end up with exactly 0 bytes and
    don't hit the pre-block code to wait for more space.
  - If we fall into the loop with our max_len == 0 (the bug
    above) we then call in to copy out the data, setup the length
    of the waiting to transmit data to 0 and call the mbuf copy routine
    which 0 indicates copy all the data to the mbuf chain.. which it
    does. This then leaves a "stuck" message on the stream queue with
    its size exactly 0 bytes but all the data there and thus nothing
    left in the uio structure. We then reach a stuck forever state
    never being able to send data.
2008-10-27 14:01:23 +00:00
Randall Stewart
a4c651183e Get rid of ifdef for vimage on version 8 comparison. Now the
scrubbing program properly takes care of this.
2008-10-27 13:54:54 +00:00
Randall Stewart
83416c885d Invariants changes that make more sense. 2008-10-27 13:53:31 +00:00
Robert Watson
dd8ac7f990 In both dropwithreset paths in tcp_input.c, drop the tcbinfo lock
sooner to decomplicate locking and eliminate the need for a rather
chatty comment about why we have to handle the global lock in a
special way for the benefit of ipfw and pf cred rules.

MFC after:	3 days
2008-10-26 22:03:52 +00:00
Robert Watson
4c95fd23d6 Remove endearing but syntactically unnecessary "return;" statements
directly before the final closeing brackets of some TCP functions.

MFC after:	3 days
2008-10-26 19:33:22 +00:00
Bjoern A. Zeeb
460473a071 Style changes only:
- Consistently add parentheses to return statements.
 - Use NULL instead of 0 when comparing pointers, also avoiding
   unnecessary casts.
 - Do not use pointers as booleans.

Reviewed by:	rwatson (earlier version)
MFC after:	2 months
2008-10-26 19:17:25 +00:00
Dag-Erling Smørgrav
e11e3f187d Fix a number of style issues in the MALLOC / FREE commit. I've tried to
be careful not to fix anything that was already broken; the NFSv4 code is
particularly bad in this respect.
2008-10-23 20:26:15 +00:00
Dag-Erling Smørgrav
1ede983cc9 Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after:	3 months
2008-10-23 15:53:51 +00:00
Bjoern A. Zeeb
7e1bc2729c Update a comment which to my reading had been misplaced in rev. 1.12
already (but probably had been way above as the code was there twice)
and describe what was last changed in rev. 1.199 there (which now is
in sync with in6_src.c r184096).

Pointed at by:	mlaier
MFC after:	2 mmonths
2008-10-20 18:56:00 +00:00
Bjoern A. Zeeb
dc3c09c89f Bring over the change switching from using sequential to random
ephemeral port allocation as implemented in netinet/in_pcb.c rev. 1.143
(initially from OpenBSD) and follow-up commits during the last four and
a half years including rev. 1.157, 1.162 and 1.199.
This now is relying on the same infrastructure as has been implemented
in in_pcb.c since rev. 1.199.

Reviewed by:	silby, rpaulo, mlaier
MFC after:	2 months
2008-10-20 18:43:59 +00:00
Randall Stewart
1b9f62a044 The flags value was not always being copied out in the recv routine like it
should be.
Obtained from:	Michael Tuexen
2008-10-18 15:56:52 +00:00
Randall Stewart
ac29704161 New sockets (accepted) were not inheriting the proper snd/rcv buffer value.
Obtained from:	 Michael Tuexen
2008-10-18 15:56:12 +00:00
Randall Stewart
1862b24533 - Peers rwnd is now available for the MIB.
Obtained from:	Michael Tuexen
2008-10-18 15:55:15 +00:00
Randall Stewart
fc69c30240 - Adapt layer indication was always being given (it should only
be given when the user has enabled it). (Michael Tuexen)
- Sack Immediately was not being set properly on the actual chunk, it
  was only put in the rcvd_flags which is incorrect. (Michael Tuexen)
- added an ifndef userspace to one of the already present macro's for
  inet (Brad Penoff)
Obtained from:	Michael Tuexen and Brad Penoff
MFC after:	4 weeks
2008-10-18 15:54:25 +00:00
Randall Stewart
fcea7c2ed3 Reported by Yehuda Weinraub (yehudasa@gamil.com) - CRC32C algorithm
uses incorrect init_bytes value. It SHOULD have the number
of bytes to get to a 4 byte boundary.

PR:	128134
MFC after:	4 weeks
2008-10-18 15:53:31 +00:00
Bjoern A. Zeeb
f08ef6c595 Add cr_canseeinpcb() doing checks using the cached socket
credentials from inp_cred which is also available after the
socket is gone.
Switch cr_canseesocket consumers to cr_canseeinpcb.
This removes an extra acquisition of the socket lock.

Reviewed by:	rwatson
MFC after:	3 months (set timer; decide then)
2008-10-17 16:26:16 +00:00
Marko Zec
3ff0b2135b Remove a useless global static variable.
Approved by:	bz (ad-hoc mentor)
2008-10-16 12:31:03 +00:00
Maxim Konovalov
0279bb29a0 o Remove unnecessary parentheses and restore identation.
Prodded by:	mlaier
2008-10-14 17:47:29 +00:00
Maxim Konovalov
8e6c0f8cfd o Reformat ipfw nat get|setsockopt code to look it more
style(9) compliant.  No functional changes.
2008-10-14 12:26:55 +00:00
Robert Watson
1f6ef666b5 Fix content and spelling of comment on _ipfw_insn.len -- a count of
32-bit words, not 32-byte words.

MFC after:	3 days
2008-10-10 14:33:47 +00:00
Robert Watson
6c8286e42d Don't pass curthread to sbreserve_locked() in tcp_do_segment(), as the
netisr or ithread's socket buffer size limit is not the right limit to
use.  Instead, pass NULL as the other two calls to sbreserve_locked()
in the TCP input path (tcp_mss()) do.

In practice, this is a no-op, as ithreads and the netisr run without a
process limit on socket buffer use, and a NULL thread pointer leads to
not using the process's limit, if any.  However, if tcp_input() is
called in other contexts that do have limits, this may prevent the
incorrect limit from being used.

MFC after:	3 days
2008-10-07 09:41:07 +00:00
Bjoern A. Zeeb
c6ddb94cf2 Remove an INP_RUNLOCK() missed in SVN r183606, cvs rev. 1.195 raw_ip.c
when transitioning from so_cred to inp_cred.

MFC after:	6 weeks
2008-10-04 16:48:09 +00:00
Bjoern A. Zeeb
86d02c5c63 Cache so_cred as inp_cred in the inpcb.
This means that inp_cred is always there, even after the socket
has gone away. It also means that it is constant for the lifetime
of the inp.
Both facts lead to simpler code and possibly less locking.

Suggested by:	rwatson
Reviewed by:	rwatson
MFC after:	6 weeks
X-MFC Note:	use a inp_pspare for inp_cred
2008-10-04 15:06:34 +00:00
Bjoern A. Zeeb
0895aec30c Implement IPv4 source address selection for unbound sockets.
For the jail case we are already looping over the interface addresses
before falling back to the only IP address of a jail in case of no
match. This is in preparation for the upcoming multi-IPv4/v6/no-IP
jail patch this change was developed with initially.

This also changes the semantics of selecting the IP for processes within
a jail as it now uses the same logic as outside the jail (with additional
checks) but no longer is on a mutually exclusive code path.

Benchmarks had shown no difference at 95.0% confidence for neither the
plain nor the jail case (even with the additional overhead).  See:
http://lists.freebsd.org/pipermail/freebsd-net/2008-September/019531.html

Inpsired by a patch from:	Yahoo! (partially)
Tested by:			latest multi-IP jail patch users (implictly)
Discussed with:			rwatson (general things around this)
Reviewed by:			mostly silence (feedback from bms)
Help with benchmarking from:	kris
MFC after:			2 months
2008-10-03 12:21:21 +00:00
Marko Zec
8b615593fc Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by:	julian, bz, brooks, zec
Reviewed by:	julian, bz, brooks, kris, rwatson, ...
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
Robert Watson
c0a211c51f Expand comments relating various detach/free/drop inpcb routines.
MFC after:	3 days
2008-09-29 13:50:17 +00:00
Robert Watson
fc18af966f Fix typo in comment.
MFC after:	3 days
2008-09-29 13:48:48 +00:00
Robert Watson
47505890d6 When an inpcb doesn't have a socket but the inpcb is passed to ipfw
in the transmit path, such as TCPS_TIMEWAIT, fail the credential
extraction immediately rather than acquiring locks and looking up
the inpcb on the global lists in order to reach the conclusion that
the credential extraction has failed.

This is more efficient, but more importantly, it avoids lock
recursion on the inpcbinfo, which is no longer allowed with rwlocks.
This appears to have been responsible for at least two reported
panics.

MFC after:	3 days
Reported by:	ganbold
2008-09-27 19:28:28 +00:00
Robert Watson
d83412e791 Rather than shadowing global variable 'lookup' in check_uidgid(), rename
it to ugid_lookupp.  This should make debugging issues with ipfw uid
rules easier.

MFC after:	3 days
2008-09-27 10:14:02 +00:00
Ed Maste
d2035ffb7a Move CTASSERT from header file to source file, per implementation note now
in the CTASSERT man page.

Submitted by:	Ryan Stone
2008-09-26 18:30:11 +00:00
Robert Watson
014ea782b1 As a follow-on to r183323, correct another case where ip_output() was
called without an inpcb pointer despite holding the tcbinfo global
lock, which lead to a deadlock or panic when ipfw tried to further
acquire it recursively.

Reported by:    Stefan Ehmann <shoesoft at gmx dot net>
MFC after:      3 days
2008-09-25 17:26:54 +00:00
Robert Watson
a0ca087183 When dropping a packet and issuing a reset during TCP segment handling,
unconditionally drop the tcbinfo lock (after all, we assert it lines
before), but call tcp_dropwithreset() under both inpcb and inpcbinfo
locks only if we pass in an tcpcb.  Otherwise, if the pointer is NULL,
firewall code may later recurse the global tcbinfo lock trying to look
up an inpcb.

This is an instance where a layering violation leads not only
potentially to code reentrace and recursion, but also to lock
recursion, and was revealed by the conversion to rwlocks because
acquiring a read lock on an rwlock already held with a write lock is
forbidden.  When these locks were mutexes, they simply recursed.

Reported by:	Stefan Ehmann <shoesoft at gmx dot net>
MFC after:	3 days
2008-09-24 11:07:03 +00:00
Roman Kurakin
f7b5554eb7 Export IPFW_TABLES_MAX value for compiled in defaults. 2008-09-21 20:42:42 +00:00
Roman Kurakin
6b057f1b5e Export IPFW_TABLES_MAX via sysctl. Part of PR: 127058.
PR:		127058
2008-09-14 09:24:12 +00:00
Julian Elischer
de34ad3f4b oops commit the version that compiles 2008-09-14 08:24:45 +00:00
Julian Elischer
93fcb5a28d Revert a part of the MRT commit that proved un-needed.
rt_check() in its original form proved to be sufficient and
rt_check_fib() can go away (as can its evil twin in_rt_check()).

I believe this does NOT address the crashes people have been seeing
in rt_check.

MFC after:	1 week
2008-09-14 08:19:48 +00:00
Roman Kurakin
eb29d14ccb Make the commet for the default rule number more clear.
Submitted by:	yar@
2008-09-14 06:14:06 +00:00
Bjoern A. Zeeb
3418daf2f1 Implement IPv6 support for TCP MD5 Signature Option (RFC 2385)
the same way it has been implemented for IPv4.

Reviewed by:	bms (skimmed)
Tested by:	Nick Hilliard (nick netability.ie) (with more changes)
MFC after:	2 months
2008-09-13 17:26:46 +00:00
Bjoern A. Zeeb
c10eb6d10a Work around an integer division resulting in 0 and thus the
congestion window not being incremented, if cwnd > maxseg^2.
As suggested in RFC2581 increment the cwnd by 1 in this case.

See http://caia.swin.edu.au/reports/080829A/CAIA-TR-080829A.pdf
for more details.

Submitted by:	Alana Huebner, Lawrence Stewart,
		Grenville Armitage (caia.swin.edu.au)
Reviewed by:	dwmalone, gnn, rpaulo
MFC After:	3 days
2008-09-09 07:35:21 +00:00
Bjoern A. Zeeb
00db174bc2 To my reading there are no real consumers of ip6_plen (IPv6
Payload Length) as set in tcpip_fillheaders().
ip6_output() will calculate it based of the length from the
mbuf packet header itself.
So initialize the value in tcpip_fillheaders() in correct
(network) byte order.

With the above change, to my reading, all places calling tcp_trace()
pass in the ip6 header via ipgen as serialized in the mbuf and with
ip6_plen in network byte order.
Thus convert the IPv6 payload length to host byte order before printing.

MFC after:	2 months
2008-09-07 20:44:45 +00:00
Bjoern A. Zeeb
3cee92e074 Split tcp_mss() in tcp_mss() and tcp_mss_update() where the former
calls the latter.

Merge tcp_mss_update() with code from tcp_mtudisc() basically
doing the same thing.

This gives us one central place where we calcuate and check mss values
to update t_maxopd (maximum mss + options length) instead of two slightly
different but almost equal implementations to maintain.

PR:		kern/118455
Reviewed by:	silby (back in March)
MFC after:	2 months
2008-09-07 18:50:25 +00:00
Bjoern A. Zeeb
ebe5426934 V_irtualize SVN r182846 tcp_mssdflt/tcp_v6mssdflt procedure based
sysctl implementations for VIMAGE the same way we did elsewhere:
update the implementation but leave the globals and the SYSCTL
statement untouched.
2008-09-07 15:20:21 +00:00
Bjoern A. Zeeb
4cdf3bedf3 Convert SYSCTL_INTs for tcp_mssdflt and tcp_v6mssdflt to
SYSCTL_PROCs and check that the default mss for neither v4 nor
v6 goes below the minimum MSS constant (216).

This prevents people from shooting themselves in the foot.

PR:		kern/118455 (remotely related)
Reviewed by:	silby (as part of a larger patch in March)
MFC after:	2 months
2008-09-07 14:44:55 +00:00
Bjoern A. Zeeb
c4982fae59 Add a second KASSERT checking for len >= 0 in the tcp output path.
This is different to the first one (as len gets updated between those
two) and would have caught various edge cases (read bugs) at a well
defined place I had been debugging the last months instead of
triggering (random) panics further down the call graph.

MFC after:	2 months
2008-09-07 11:38:30 +00:00