Commit Graph

2245 Commits

Author SHA1 Message Date
Gleb Smirnoff
3a84d72a78 Revert change to struct ifnet. Use ifnet pointer in softc. Embedding
ifnet into smth will soon be removed.

Requested by:	brooks
2005-03-01 10:59:14 +00:00
Gleb Smirnoff
4358dfc32f Remove debugging printf.
Reviewed by:	mlaier
2005-03-01 09:31:36 +00:00
Yaroslav Tykhiy
630481bb92 Support running carp(4) over a vlan(4) parent interface.
Encouraged by:	glebius
2005-02-28 16:19:11 +00:00
Gleb Smirnoff
1d0a237660 Remove unused field from carp softc.
OK'ed by:	mcbride@OpenBSD
2005-02-28 11:57:03 +00:00
Gleb Smirnoff
3e07def4cd Fix tcpdump(8) on carp(4) interface:
- Use our loop DLT type, not OpenBSD. [1]
- The fields that are converted to network byte order are not 32-bit
  fields but 16-bit fields, so htons should be used in htonl. [1]
- Secondly, ip_input changes ip->ip_len into its value without
  the ip-header length. So, restore the length to make bpf happy. [1]
- Use bpf_mtap2(), use temporary af1, since bpf_mtap2 doesn't
  understand uint8_t af identifier.

Submitted by:	Frank Volf [1]
2005-02-28 11:54:36 +00:00
Paul Saab
8291294024 If the receiver sends an ack that is out of [snd_una, snd_max],
ignore the sack options in that segment. Else we'd end up
corrupting the scoreboard.

Found by:	Raja Mukerji (raja at moselle dot com)
Submitted by:	Mohan Srinivasan
2005-02-27 20:39:04 +00:00
Max Laier
a4e5390551 Unbreak the build. carp_iamatch6 and carp_macmatch6 are not supposed to be
static as they are used elsewhere.
2005-02-27 11:32:26 +00:00
Gleb Smirnoff
e8c34a71eb Remove carp_softc.sc_ifp member in favor of union pointers in struct ifnet.
Obtained from:	OpenBSD
2005-02-26 13:55:07 +00:00
Gleb Smirnoff
5c1f0f6de5 Staticize local functions. 2005-02-26 10:33:14 +00:00
Gleb Smirnoff
88bf82a62e New lines when logging. 2005-02-25 11:26:39 +00:00
Gleb Smirnoff
947b7cf3c6 Embrace macros with do {} while (0)
Submitted by:	maxim
2005-02-25 10:49:47 +00:00
Gleb Smirnoff
39aeaa0eb5 Call carp_carpdev_state() from carp_set_addr6(). See log for rev 1.4.
Sponsored by:	Rambler
2005-02-25 10:12:11 +00:00
Gleb Smirnoff
1e9e65729b Improve logging:
- Simplify CARP_LOG() and making it working (we don't have addlog in FreeBSD).
 - Introduce CARP_DEBUG() which logs with LOG_DEBUG severity when
   net.inet.carp.log > 1
 - Use CARP_DEBUG to log state changes of carp interfaces.

After CARP_LOG() cleanup it appeared that carp_input_c() does not need sc
argument. Remove it.

Sponsored by:	Rambler
2005-02-25 10:09:44 +00:00
Gleb Smirnoff
6fba4c0bae Fix problem when master comes up with one interface down, and preempts
mastering on all other interfaces:

- call carp_carpdev_state() on initialize instead of just setting to INIT
- in carp_carpdev_state() check that interface is UP, instead of checking
  that it is not DOWN, because a rebooted machine may have interface in
  UNKNOWN state.

Sponsored by:	Rambler
Obtained from:	OpenBSD (partially)
2005-02-24 09:05:28 +00:00
Sam Leffler
db77984c5b fix potential invalid index into ip_protox array
Noticed by:	Coverity Prevent analysis tool
2005-02-23 00:38:12 +00:00
Maxime Henrion
2368737719 Unbreak CARP build on 64-bit architectures.
Tested on:	sparc64
2005-02-23 00:20:33 +00:00
Andre Oppermann
099dd0430b Bring back the full packet destination manipulation for 'ipfw fwd'
with the kernel compile time option:

 options IPFIREWALL_FORWARD_EXTENDED

This option has to be specified in addition to IPFIRWALL_FORWARD.

With this option even packets targeted for an IP address local
to the host can be redirected.  All restrictions to ensure proper
behaviour for locally generated packets are turned off.  Firewall
rules have to be carefully crafted to make sure that things like
PMTU discovery do not break.

Document the two kernel options.

PR:		kern/71910
PR:		kern/73129
MFC after:	1 week
2005-02-22 17:40:40 +00:00
Gleb Smirnoff
67df421496 Remove promisc counter from parent interface in carp_clone_destroy(),
so that parent interface is not left in promiscous mode after carp
interface is destroyed.

This is not perfect, since promisc counter is added when carp
interface is assigned an IP address. However, when address is removed
parent interface is still in promiscuous mode. Only removal of
carp interface removes promisc from parent. Same way in OpenBSD.

Sponsored by:	Rambler
2005-02-22 16:24:55 +00:00
Gleb Smirnoff
a97719482d Add CARP (Common Address Redundancy Protocol), which allows multiple
hosts to share an IP address, providing high availability and load
balancing.

Original work on CARP done by Michael Shalayeff, with many
additions by Marco Pfatschbacher and Ryan McBride.

FreeBSD port done solely by Max Laier.

Patch by:	mlaier
Obtained from:	OpenBSD (mickey, mcbride)
2005-02-22 13:04:05 +00:00
Gleb Smirnoff
797127a9bf We can make code simplier after last change.
Noticed by:	Andrew Thompson
2005-02-22 08:35:24 +00:00
Gleb Smirnoff
3a1757b9c0 In in_pcbconnect_setup() jailed sockets are treated specially: if local
address is not supplied, then jail IP is choosed and in_pcbbind() is called.
Since udp_output() does not save local addr after call to in_pcbconnect_setup(),
in_pcbbind() is called for each packet, and this is incorrect.

So, we shall treat jailed sockets specially in udp_output(), we will save
their local address.

This fixes a long standing bug with broken sendto() system call in jails.

PR:		kern/26506
Reviewed by:	rwatson
MFC after:	2 weeks
2005-02-22 07:50:02 +00:00
Gleb Smirnoff
914d092f5d In in_pcbconnect_setup() remove a check that route points at
loopback interface. Nobody have explained me sense of this check.
It breaks connect() system call to a destination address which is
loopback routed (e.g. blackholed).

Reviewed by:	silence on net@
MFC after:	2 weeks
2005-02-22 07:39:15 +00:00
Robert Watson
0daccb9c94 In the current world order, solisten() implements the state transition of
a socket from a regular socket to a listening socket able to accept new
connections.  As part of this state transition, solisten() calls into the
protocol to update protocol-layer state.  There were several bugs in this
implementation that could result in a race wherein a TCP SYN received
in the interval between the protocol state transition and the shortly
following socket layer transition would result in a panic in the TCP code,
as the socket would be in the TCPS_LISTEN state, but the socket would not
have the SO_ACCEPTCONN flag set.

This change does the following:

- Pushes the socket state transition from the socket layer solisten() to
  to socket "library" routines called from the protocol.  This permits
  the socket routines to be called while holding the protocol mutexes,
  preventing a race exposing the incomplete socket state transition to TCP
  after the TCP state transition has completed.  The check for a socket
  layer state transition is performed by solisten_proto_check(), and the
  actual transition is performed by solisten_proto().

- Holds the socket lock for the duration of the socket state test and set,
  and over the protocol layer state transition, which is now possible as
  the socket lock is acquired by the protocol layer, rather than vice
  versa.  This prevents additional state related races in the socket
  layer.

This permits the dual transition of socket layer and protocol layer state
to occur while holding locks for both layers, making the two changes
atomic with respect to one another.  Similar changes are likely require
elsewhere in the socket/protocol code.

Reported by:		Peter Holm <peter@holm.cc>
Review and fixes from:	emax, Antoine Brodin <antoine.brodin@laposte.net>
Philosophical head nod:	gnn
2005-02-21 21:58:17 +00:00
Paul Saab
7643c37cf2 Remove 2 (SACK) fields from the tcpcb. These are only used by a
function that is called from tcp_input(), so they oughta be passed on
the stack instead of stuck in the tcpcb.

Submitted by:	Mohan Srinivasan
2005-02-17 23:04:56 +00:00
Paul Saab
7776346f83 Fix for a SACK (receiver) bug where incorrect SACK blocks are
reported to the sender - in the case where the sender sends data
outside the window (as WinXP does :().

Reported by:	Sam Jensen <sam at wand dot net dot nz>
Submitted by:	Mohan Srinivasan
2005-02-16 01:46:17 +00:00
Paul Saab
8db456bf17 - Retransmit just one segment on initiation of SACK recovery.
Remove the SACK "initburst" sysctl.
- Fix bugs in SACK dupack and partialack handling that can cause
  large bursts while in SACK recovery.

Submitted by:	Mohan Srinivasan
2005-02-14 21:01:08 +00:00
Maxim Konovalov
9945c0e21f o Add handling of an IPv4-mapped IPv6 address.
o Use SYSCTL_IN() macro instead of direct call of copyin(9).

Submitted by:	ume

o Move sysctl_drop() implementation to sys/netinet/tcp_subr.c where
most of tcp sysctls live.
o There are net.inet[6].tcp[6].getcred sysctls already, no needs in
a separate struct tcp_ident_mapping.

Suggested by:	ume
2005-02-14 07:37:51 +00:00
Gleb Smirnoff
1af305441d Jump to common action checks after doing specific once. This fixes adding
of divert rules, which I break in previous commit.

Pointy hat to:	glebius
2005-02-06 11:13:59 +00:00
Maxim Konovalov
212a79b010 o Implement net.inet.tcp.drop sysctl and userland part, tcpdrop(8)
utility:

    The tcpdrop command drops the TCP connection specified by the
    local address laddr, port lport and the foreign address faddr,
    port fport.

Obtained from:	OpenBSD
Reviewed by:	rwatson (locking), ru (man page), -current
MFC after:	1 month
2005-02-06 10:47:12 +00:00
Gleb Smirnoff
670742a102 Add a ng_ipfw node, implementing a quick and simple interface between
ipfw(4) and netgraph(4) facilities.

Reviewed by:	andre, brooks, julian
2005-02-05 12:06:33 +00:00
Hajimu UMEMOTO
6d0a982bdf teach scope of IPv6 address to net.inet6.tcp6.getcred.
MFC after:	1 week
2005-02-04 14:43:05 +00:00
Robert Watson
06456da2c6 Update an additional reference to the rate of ISN tick callouts that was
missed in tcp_subr.c:1.216: projected_offset must also reflect how often
the tcp_isn_tick() callout will fire.

MFC after:	2 weeks
Submitted by:	silby
2005-01-31 01:35:01 +00:00
Christian S.J. Peron
0ba04c87b3 Change the state allocator from using regular malloc to using
a UMA zone instead. This should eliminate a bit of the locking
overhead associated with with malloc and reduce the memory
consumption associated with each new state.

Reviewed by:	rwatson, andre
Silence on:	ipfw@
MFC after:	1 week
2005-01-31 00:48:39 +00:00
Robert Watson
54082796aa Have tcp_isn_tick() fire 100 times a second, rather than HZ times a
second; since the default hz has changed to 1000 times a second,
this resulted in unecessary work being performed.

MFC after:		2 weeks
Discussed with:		phk, cperciva
General head nod:	silby
2005-01-30 23:30:28 +00:00
Robert Watson
024105493d Prefer (NULL) spelling of (0) for pointers.
MFC after:	3 days
2005-01-30 19:29:47 +00:00
Robert Watson
77c16eed7c Remove clause three from tcp_syncache.c license per permission of
McAfee.  Update copyright to McAfee from NETA.
2005-01-30 19:28:27 +00:00
Alan Cox
7258e9687b Correctly move the packet header in ip_insertoptions().
Reported by: Anupam Chanda
Reviewed by: sam@
MFC after: 2 weeks
2005-01-23 19:43:46 +00:00
Ruslan Ermilov
24a0682c64 Sort sections. 2005-01-20 09:17:07 +00:00
Gleb Smirnoff
28935658c4 - Reduce number of arguments passed to dummynet_io(), we already have cookie
in struct ip_fw_args itself.
- Remove redundant &= 0xffff from dummynet_io().
2005-01-16 11:13:18 +00:00
Gleb Smirnoff
6c69a7c30b o Clean up interface between ip_fw_chk() and its callers:
- ip_fw_chk() returns action as function return value. Field retval is
  removed from args structure. Action is not flag any more. It is one
  of integer constants.
- Any action-specific cookies are returned either in new "cookie" field
  in args structure (dummynet, future netgraph glue), or in mbuf tag
  attached to packet (divert, tee, some future action).

o Convert parsing of return value from ip_fw_chk() in ipfw_check_{in,out}()
  to a switch structure, so that the functions are more readable, and a future
  actions can be added with less modifications.

Approved by:	andre
MFC after:	2 months
2005-01-14 09:00:46 +00:00
Paul Saab
8d03f2b53b Fix a TCP SACK related crash resulting from incorrect computation
of len in tcp_output(), in the case where the FIN has already been
transmitted. The mis-computation of len is because of a gcc
optimization issue, which this change works around.

Submitted by:	Mohan Srinivasan
2005-01-12 21:40:51 +00:00
Brian Somers
2a4cd52421 include "alias.h", not <alias.h>
MFC after:	3 days
2005-01-10 10:54:06 +00:00
Warner Losh
c398230b64 /* -> /*- for license, minor formatting changes 2005-01-07 01:45:51 +00:00
Mike Silbersack
a69968ee4e Add a sysctl (net.inet.tcp.insecure_rst) which allows one to specify
that the RFC 793 specification for accepting RST packets should be
following.  When followed, this makes one vulnerable to the attacks
described in "slipping in the window", but it may be necessary in
some odd circumstances.
2005-01-03 07:08:37 +00:00
Mike Silbersack
5f311da2cc Port randomization leads to extremely fast port reuse at high
connection rates, which is causing problems for some users.

To retain the security advantage of random ports and ensure
correct operation for high connection rate users, disable
port randomization during periods of high connection rates.

Whenever the connection rate exceeds randomcps (10 by default),
randomization will be disabled for randomtime (45 by default)
seconds.  These thresholds may be tuned via sysctl.

Many thanks to Igor Sysoev, who proved the necessity of this
change and tested many preliminary versions of the patch.

MFC After:	20 seconds
2005-01-02 01:50:57 +00:00
Robert Watson
74d4630b71 Remove an errant blank line apparently introduced in
ip_output.c:1.194.
2004-12-25 22:59:42 +00:00
Robert Watson
42cf3289c3 In the dropafterack case of tcp_input(), it's OK to release the TCP
pcbinfo lock before calling tcp_output(), as holding just the inpcb
lock is sufficient to prevent garbage collection.
2004-12-25 22:26:13 +00:00
Robert Watson
e0bef1cb35 Revert parts of tcp_input.c:1.255 associated with the header predicted
cases for tcp_input():

While it is true that the pcbinfo lock provides a pseudo-reference to
inpcbs, both the inpcb and pcbinfo locks are required to free an
un-referenced inpcb.  As such, we can release the pcbinfo lock as
long as the inpcb remains locked with the confidence that it will not
be garbage-collected.  This leads to a less conservative locking
strategy that should reduce contention on the TCP pcbinfo lock.

Discussed with: sam
2004-12-25 22:23:13 +00:00
Robert Watson
452d9f5b1c Attempt to consistently use () around return values in calls to
return() in newer code (sysctl, ISN, timewait).
2004-12-23 01:34:26 +00:00
Robert Watson
06da46b241 Remove an XXXRW comment relating to whether or not the TCP timers are
MPSAFE: they are now believed to be.

Correct a typo in a second comment.

MFC after:	2 weeks
2004-12-23 01:27:13 +00:00
Robert Watson
db0aae38b6 Remove the now unused tcp_canceltimers() function. tcpcb timers are
now stopped as part of tcp_discardcb().

MFC after:	2 weeks
2004-12-23 01:25:59 +00:00
Robert Watson
950ab1e470 Remove an annotation of a minor race relating to the update of
multiple MIB entries using sysctl in short order, which might
result in unexpected values for tcp_maxidle being generated by
tcp_slowtimo.  In practice, this will not happen, or at least,
doesn't require an explicit comment.

MFC after:	2 weeks
2004-12-23 01:21:54 +00:00
Gleb Smirnoff
5e5da86597 In certain cases ip_output() can free our route, so check
for its presence before RTFREE().

Noticed by:	ru
2004-12-10 07:51:14 +00:00
Gleb Smirnoff
d2a09f901a Revert last change.
Andre:
  First lets get major new features into the kernel in a clean and nice way,
  and then start optimizing. In this case we don't have any obfusication that
  makes later profiling and/or optimizing difficult in any way.

Requested by:	csjp, sam
2004-12-10 07:47:17 +00:00
Christian S.J. Peron
fbf2edb6e4 This commit adds a shared locking mechanism very similar to the
mechanism used by pfil.  This shared locking mechanism will remove
a nasty lock order reversal which occurs when ucred based rules
are used which results in hard locks while mpsafenet=1.

So this removes the debug.mpsafenet=0 requirement when using
ucred based rules with IPFW.

It should be noted that this locking mechanism does not guarantee
fairness between read and write locks, and that it will favor
firewall chain readers over writers. This seemed acceptable since
write operations to firewall chains protected by this lock tend to
be less frequent than reads.

Reviewed by:	andre, rwatson
Tested by:	myself, seanc
Silence on:	ipfw@
MFC after:	1 month
2004-12-10 02:17:18 +00:00
Gleb Smirnoff
f5a19d3909 Check that DUMMYNET_LOADED before seeking dummynet m_tag.
Reviewed by:	andre
MFC after:	1 week
2004-12-09 16:41:47 +00:00
Max Laier
067a8bab8a More fixing of multiple addresses in the same prefix. This time do not try
to arp resolve "secondary" local addresses.

Found and submitted by:	ru
With additions from:	OpenBSD (rev. 1.47)
Reviewed by:		ru
2004-12-09 00:12:41 +00:00
Ruslan Ermilov
5cae05ad33 Time out routes created by redirect. 2004-12-06 22:27:22 +00:00
Gleb Smirnoff
98335aa976 - Make route cacheing optional, configurable via IFF_LINK0 flag.
- Turn it off by default.

Requested by:	many
Reviewed by:	andre
Approved by:	julian (mentor)
MFC after:	3 days
2004-12-06 19:02:43 +00:00
Robert Watson
79a9e59c89 Assert the tcptw inpcb lock in tcp_timer_2msl_reset(), as fields in
the tcptw undergo non-atomic read-modify-writes.

MFC after:	2 weeks
2004-12-05 22:47:29 +00:00
Robert Watson
b9155d92b2 Assert inpcb lock in:
tcpip_fillheaders()
  tcp_discardcb()
  tcp_close()
  tcp_notify()
  tcp_new_isn()
  tcp_xmit_bandwidth_limit()

Fix a locking comment in tcp_twstart(): the pcbinfo will be locked (and
is asserted).

MFC after:	2 weeks
2004-12-05 22:27:53 +00:00
Robert Watson
6fbed4af22 Minor grammer fix in comment. 2004-12-05 22:20:59 +00:00
Robert Watson
89924e5865 Pass the inpcb reference into ip_getmoptions() rather than just the
inp->inp_moptions pointer, so that ip_getmoptions() can perform
necessary locking when doing non-atomic reads.

Lock the inpcb by default to copy any data to local variables, then
unlock before performing sooptcopyout().

MFC after:	2 weeks
2004-12-05 22:08:37 +00:00
Robert Watson
92c71ab30b Define INP_UNLOCK_ASSERT() to assert that an inpcb is unlocked.
MFC after:	2 weeks
2004-12-05 22:07:14 +00:00
Robert Watson
5c918b56d8 Push the inpcb argument into ip_setmoptions() when setting IP multicast
socket options, so that it is available for locking.
2004-12-05 21:38:33 +00:00
Robert Watson
993d9505d4 Start working through inpcb locking for ip_ctloutput() by cleaning up
modifications to the inpcb IP options mbuf:

- Lock the inpcb before passing it into ip_pcbopts() in order to prevent
  simulatenous reads and read-modify-writes that could result in races.
- Pass the inpcb reference into ip_pcbopts() instead of the option chain
  pointer in the inpcb.
- Assert the inpcb lock in ip_pcbots.
- Convert one or two uses of a pointer as a boolean or an integer
  comparison to a comparison with NULL for readability.
2004-12-05 19:11:09 +00:00
Paul Saab
7d5ed1ceea Fixes a bug in SACK causing us to send data beyond the receive window.
Found by: Pawel Worach and Daniel Hartmeier
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
2004-11-29 18:47:27 +00:00
Robert Watson
2be3bf2244 Assert the inpcb lock in tcp_xmit_timer() as it performs read-modify-
write of various time/rtt-related fields in the tcpcb.
2004-11-28 11:06:22 +00:00
Robert Watson
18ad5842c5 Expand coverage of the receive socket buffer lock when handling urgent
pointer updates: test available space while holding the socket buffer
mutex, and continue to hold until until the pointer update has been
performed.

MFC after:	2 weeks
2004-11-28 11:01:31 +00:00
Robert Watson
c8443a1dc0 Do export the advertised receive window via the tcpi_rcv_space field of
struct tcp_info.
2004-11-27 20:20:11 +00:00
Robert Watson
b8af5dfa81 Implement parts of the TCP_INFO socket option as found in Linux 2.6.
This socket option allows processes query a TCP socket for some low
level transmission details, such as the current send, bandwidth, and
congestion windows.  Linux provides a 'struct tcpinfo' structure
containing various variables, rather than separate socket options;
this makes the API somewhat fragile as it makes it dificult to add
new entries of interest as requirements and implementation evolve.
As such, I've included a large pad at the end of the structure.
Right now, relatively few of the Linux API fields are filled in, and
some contain no logical equivilent on FreeBSD.  I've include __'d
entries in the structure to make it easier to figure ou what is and
isn't omitted.  This API/ABI should be considered unstable for the
time being.
2004-11-26 18:58:46 +00:00
Mike Silbersack
6a220ed80a Fix a problem where our TCP stack would ignore RST packets if the receive
window was 0 bytes in size.  This may have been the cause of unsolved
"connection not closing" reports over the years.

Thanks to Michiel Boland for providing the fix and providing a concise
test program for the problem.

Submitted by:	Michiel Boland
MFC after:	2 weeks
2004-11-25 19:04:20 +00:00
Robert Watson
de30ea131f In tcp_reass(), assert the inpcb lock on the passed tcpcb, since the
contents of the tcpcb are read and modified in volume.

In tcp_input(), replace th comparison with 0 with a comparison with
NULL.

At the 'findpcb', 'dropafterack', and 'dropwithreset' labels in
tcp_input(), assert 'headlocked'.  Try to improve consistency between
various assertions regarding headlocked to be more informative.

MFC after:	2 weeks
2004-11-23 23:41:20 +00:00
Robert Watson
cce83ffb5a tcp_timewait() performs multiple non-atomic reads on the tcptw
structure, so assert the inpcb lock associated with the tcptw.
Also assert the tcbinfo lock, as tcp_timewait() may call
tcp_twclose() or tcp_2msl_rest(), which require it.  Since
tcp_timewait() is already called with that lock from tcp_input(),
this doesn't change current locking, merely documents reasons for
it.

In tcp_twstart(), assert the tcbinfo lock, as tcp_timer_2msl_rest()
is called, which requires that lock.

In tcp_twclose(), assert the tcbinfo lock, as tcp_timer_2msl_stop()
is called, which requires that lock.

Document the locking strategy for the time wait queues in tcp_timer.c,
which consists of protecting the time wait queues in the same manner
as the tcbinfo structure (using the tcbinfo lock).

In tcp_timer_2msl_reset(), assert the tcbinfo lock, as the time wait
queues are modified.

In tcp_timer_2msl_stop(), assert the tcbinfo lock, as the time wait
queues may be modified.

In tcp_timer_2msl_tw(), assert the tcbinfo lock, as the time wait
queues may be modified.

MFC after:	2 weeks
2004-11-23 17:21:30 +00:00
Robert Watson
b42ff86e73 De-spl tcp_slowtimo; tcp_maxidle assignment is subject to possible
but unlikely races that could be corrected by having tcp_keepcnt
and tcp_keepintvl modifications go through handler functions via
sysctl, but probably is not worth doing.  Updates to multiple
sysctls within evaluation of a single addition are unlikely.

Annotate that tcp_canceltimers() is currently unused.

De-spl tcp_timer_delack().

De-spl tcp_timer_2msl().

MFC after:	2 weeks
2004-11-23 16:45:07 +00:00
Robert Watson
7258e91f0f Assert the inpcb lock in tcp_twstart(), which does both read-modify-write
on the tcpcb, but also calls into tcp_close() and tcp_twrespond().

Annotate that tcp_twrecycleable() requires the inpcb lock because it does
a series of non-atomic reads of the tcpcb, but is currently called
without the inpcb lock by the caller.  This is a bug.

Assert the inpcb lock in tcp_twclose() as it performs a read-modify-write
of the timewait structure/inpcb, and calls in_pcbdetach() which requires
the lock.

Assert the inpcb lock in tcp_twrespond(), as it performs multiple
non-atomic reads of the tcptw and inpcb structures, as well as calling
mac_create_mbuf_from_inpcb(), tcpip_fillheaders(), which require the
inpcb lock.

MFC after:	2 weeks
2004-11-23 16:23:13 +00:00
Robert Watson
8263bab34d Assert inpcb lock in tcp_quench(), tcp_drop_syn_sent(), tcp_mtudisc(),
and tcp_drop(), due to read-modify-write of TCP state variables.

MFC after:	2 weeks
2004-11-23 16:06:15 +00:00
Robert Watson
8438db0f59 Assert the tcbinfo write lock in tcp_new_isn(), as the tcbinfo lock
protects access to the ISN state variables.

Acquire the tcbinfo write lock in tcp_isn_tick() to synchronize
timer-driven isn bumping.

Staticize internal ISN variables since they're not used outside of
tcp_subr.c.

MFC after:	2 weeks
2004-11-23 15:59:43 +00:00
Robert Watson
ca127a3e80 Remove "Unlocked read" annotations associated with previously unlocked
use of socket buffer fields in the TCP input code.  These references
are now protected by use of the receive socket buffer lock.

MFC after:	1 week
2004-11-22 13:16:27 +00:00
Robert Watson
98734750b4 s/send/sent/ in comment describing TCPS_SYN_RECEIVED. 2004-11-21 14:38:04 +00:00
Gleb Smirnoff
c1384b5ae2 - Since divert protocol is not connection oriented, remove SS_ISCONNECTED flag
from divert sockets.
- Remove div_disconnect() method, since it shouldn't be called now.
- Remove div_abort() method. It was never called directly, since protocol
  doesn't have listen queue. It was called only from div_disconnect(),
  which is removed now.

Reviewed by:	rwatson, maxim
Approved by:	julian (mentor)
MT5 after:	1 week
MT4 after:	1 month
2004-11-18 13:49:18 +00:00
Max Laier
9a6a6eeba2 Fix host route addition for more than one address to a loopback interface
after allowing more than one address with the same prefix.

Reported by:	Vladimir Grebenschikov <vova NO fbsd SPAM ru>
Submitted by:	ru (also NetBSD rev. 1.83)
Pointyhat to:	mlaier
2004-11-17 23:14:03 +00:00
Max Laier
81d96ce8a4 Merge copyright notices.
Requested by:	njl
2004-11-13 17:05:40 +00:00
Gleb Smirnoff
ea0bd57615 Fix ng_ksocket(4) operation as a divert socket, which is pretty useful
and has been broken twice:

- in the beginning of div_output() replace KASSERT with assignment, as
  it was in rev. 1.83. [1] [to be MFCed]
- refactor changes introduced in rev. 1.100: do not prepend a new tag
  unconditionally. Before doing this check whether we have one. [2]

A small note for all hacking in this area:
when divert socket is not a real userland, but ng_ksocket(4), we receive
_the same_ mbufs, that we transmitted to socket. These mbufs have rcvif,
the tags we've put on them. And we should treat them correctly.

Discussed with:	mlaier [1]
Silence from:	green [2]
Reviewed by:	maxim
Approved by:	julian (mentor)
MFC after:	1 week
2004-11-12 22:17:42 +00:00
Max Laier
48321abefe Change the way we automatically add prefix routes when adding a new address.
This makes it possible to have more than one address with the same prefix.
The first address added is used for the route. On deletion of an address
with IFA_ROUTE set, we try to find a "fallback" address and hand over the
route if possible.
I plan to MFC this in 4 weeks, hence I keep the - now obsolete - argument to
in_ifscrub as it must be considered KAPI as it is not static in in.c. I will
clean this after the MFC.

Discussed on:	arch, net
Tested by:	many testers of the CARP patches
Nits from:	ru, Andrea Campi <andrea+freebsd_arch webcom it>
Obtained from:	WIDE via OpenBSD
MFC after:	1 month
2004-11-12 20:53:51 +00:00
Poul-Henning Kamp
e21e4c19c9 Add missing '='
Spotted by:	obrien
2004-11-11 19:02:01 +00:00
Andre Oppermann
5e7b233055 Fix a double-free in the 'hlen > m->m_len' sanity check.
Bug report by:	<james@towardex.com>
MFC after:	2 weeks
2004-11-09 09:40:32 +00:00
SUZUKI Shinsuke
3d54848fc2 support TCP-MD5(IPv4) in KAME-IPSEC, too.
MFC after: 3 week
2004-11-08 18:49:51 +00:00
Poul-Henning Kamp
756d52a195 Initialize struct pr_userreqs in new/sparse style and fill in common
default elements in net_init_domain().

This makes it possible to grep these structures and see any bogosities.
2004-11-08 14:44:54 +00:00
Robert Watson
d6915262af Do some re-sorting of TCP pcbinfo locking and assertions: make sure to
retain the pcbinfo lock until we're done using a pcb in the in-bound
path, as the pcbinfo lock acts as a pseuo-reference to prevent the pcb
from potentially being recycled.  Clean up assertions and make sure to
assert that the pcbinfo is locked at the head of code subsections where
it is needed.  Free the mbuf at the end of tcp_input after releasing
any held locks to reduce the time the locks are held.

MFC after:	3 weeks
2004-11-07 19:19:35 +00:00
Andre Oppermann
e9a4cd2426 Fix a double-free in the 'm->m_len < sizeof (struct ip)' sanity check.
Bug report by:	<james@towardex.com>
MFC after:	2 weeks
2004-11-06 10:47:36 +00:00
Poul-Henning Kamp
c83c1318f5 Hide udp_in6 behind #ifdef INET6 2004-11-04 07:14:03 +00:00
Bruce M Simpson
38f061057b When performing IP fast forwarding, immediately drop traffic which is
destined for a blackhole route.

This also means that blackhole routes do not need to be bound to lo(4)
or disc(4) interfaces for the net.inet.ip.fastforwarding=1 case.

Submitted by:	james at towardex dot com
Sponsored by:	eXtensible Open Router Project <URL:http://www.xorp.org/>
MFC after:	3 weeks
2004-11-04 02:14:38 +00:00
Robert Watson
d4b509bd7f Until this change, the UDP input code used global variables udp_in,
udp_in6, and udp_ip6 to pass socket address state between udp_input(),
udp_append(), and soappendaddr_locked().  While file in the default
configuration, when running with multiple netisrs or direct ithread
dispatch, this can result in races wherein user processes using
recvmsg() get back the wrong source IP/port.  To correct this and
related races:

- Eliminate udp_ip6, which is believed to be generated but then never
  used.  Eliminate ip_2_ip6_hdr() as it is now unneeded.

- Eliminate setting, testing, and existence of 'init' status fields
  for the IPv6 structures.  While with multiple UDP delivery this
  could lead to amortization of IPv4 -> IPv6 conversion when
  delivering an IPv4 UDP packet to an IPv6 socket, it added
  substantial complexity and side effects.

- Move global structures into the stack, declaring udp_in in
  udp_input(), and udp_in6 in udp_append() to be used if a conversion
  is required.  Pass &udp_in into udp_append().

- Re-annotate comments to reflect updates.

With this change, UDP appears to operate correctly in the presence of
substantial inbound processing parallelism.  This solution avoids
introducing additional synchronization, but does increase the
potential stack depth.

Discovered by:	kris (Bug Magnet)
MFC after:	3 weeks
2004-11-04 01:25:23 +00:00
Andre Oppermann
c94c54e4df Remove RFC1644 T/TCP support from the TCP side of the network stack.
A complete rationale and discussion is given in this message
and the resulting discussion:

 http://docs.freebsd.org/cgi/mid.cgi?4177C8AD.6060706

Note that this commit removes only the functional part of T/TCP
from the tcp_* related functions in the kernel.  Other features
introduced with RFC1644 are left intact (socket layer changes,
sendmsg(2) on connection oriented protocols)  and are meant to
be reused by a simpler and less intrusive reimplemention of the
previous T/TCP functionality.

Discussed on:	-arch
2004-11-02 22:22:22 +00:00
Robert Watson
ab5c14d828 Correct a bug in TCP SACK that could result in wedging of the TCP stack
under high load: only set function state to loop and continuing sending
if there is no data left to send.

RELENG_5_3 candidate.

Feet provided:	Peter Losher <Peter underscore Losher at isc dot org>
Diagnosed by:	Aniel Hartmeier <daniel at benzedrine dot cx>
Submitted by:	mohan <mohans at yahoo-inc dot com>
2004-10-30 12:02:50 +00:00
Robert Watson
c427483381 Add a matching tunable for net.inet.tcp.sack.enable sysctl. 2004-10-26 08:59:09 +00:00
Bruce M Simpson
d6fa5d2806 Check that rt_mask(rt) is non-NULL before dereferencing it, in the
RTM_ADD case, thus avoiding a panic.

Submitted by:	Iasen Kostov
2004-10-26 03:31:58 +00:00
Andre Oppermann
84bb6a2e75 IPDIVERT is a module now and tell the other parts of the kernel about it.
IPDIVERT depends on IPFIREWALL being loaded or compiled into the kernel.
2004-10-25 20:02:34 +00:00
Ruslan Ermilov
a35d88931c For variables that are only checked with defined(), don't provide
any fake value.
2004-10-24 15:33:08 +00:00