Commit Graph

2322 Commits

Author SHA1 Message Date
Paul Saab
64b5fbaa04 Rewrite of tcp_sack_option(). Kentaro Kurahone (NetBSD) pointed out
that if we sort the incoming SACK blocks, we can update the scoreboard
in one pass of the scoreboard. The added overhead of sorting upto 4
sack blocks is much lower than traversing (potentially) large
scoreboards multiple times. The code was updating the scoreboard with
multiple passes over it (once for each sack option). The rewrite fixes
that, reducing the complexity of the main loop from O(n^2) to O(n).

Submitted by:   Mohan Srinivasan, Noritoshi Demizu.
Reviewed by:    Raja Mukerji.
2005-05-23 19:22:48 +00:00
Paul Saab
2cdbfa66ee Replace t_force with a t_flag (TF_FORCEDATA).
Submitted by:   Raja Mukerji.
Reviewed by:    Mohan, Silby, Andre Opperman.
2005-05-21 00:38:29 +00:00
Paul Saab
4fc5324557 Introduce routines to alloc/free sack holes. This cleans up the code
considerably.

Submitted by:   Noritoshi Demizu.
Reviewed by:    Raja Mukerji, Mohan Srinivasan.
2005-05-16 19:26:46 +00:00
Gleb Smirnoff
32247f8629 - When carp interface is destroyed, and it affects global preemption
suppresion counter, decrease the latter. [1]
- Add sysctl to monitor preemption suppression.

PR:		kern/80972 [1]
Submitted by:	Frank Volf [1]
MFC after:	1 week
2005-05-15 01:44:26 +00:00
Paul Saab
fdace17f81 Fix for a bug where the "nexthole" sack hint is out of sync with the
real next hole to retransmit from the scoreboard, caused by a bug
which did not update the "nexthole" hint in one case in
tcp_sack_option().

Reported by:    Daniel Eriksson
Submitted by:   Mohan Srinivasan
2005-05-13 18:02:02 +00:00
Gleb Smirnoff
b3cf6808ce In div_output() explicitly set m->m_nextpkt to NULL. If divert socket
is not userland, but ng_ksocket, then m->m_nextpkt may be non-NULL. In
this case we would panic in sbappend.
2005-05-13 11:44:37 +00:00
Paul Saab
0077b0163f When looking for the next hole to retransmit from the scoreboard,
or to compute the total retransmitted bytes in this sack recovery
episode, the scoreboard is traversed. While in sack recovery, this
traversal occurs on every call to tcp_output(), every dupack and
every partial ack. The scoreboard could potentially get quite large,
making this traversal expensive.

This change optimizes this by storing hints (for the next hole to
retransmit and the total retransmitted bytes in this sack recovery
episode) reducing the complexity to find these values from O(n) to
constant time.

The debug code that sanity checks the hints against the computed
value will be removed eventually.

Submitted by:   Mohan Srinivasan, Noritoshi Demizu, Raja Mukerji.
2005-05-11 21:37:42 +00:00
Colin Percival
fe2eee8231 Fix two issues which were missed in FreeBSD-SA-05:08.kmem.
Reported by:	Uwe Doering
2005-05-07 00:41:36 +00:00
Gleb Smirnoff
cbfbc555e0 Add a workaround for 64-bit archs: store unsigned long return value in
temporary variable, check it and then cast to in_addr_t.
2005-05-06 13:01:31 +00:00
Gleb Smirnoff
6293e003c9 s/DEBUG/LIBALIAS_DEBUG/, since DEBUG is defined in LINT and
not supported for kernel build.
2005-05-06 11:07:49 +00:00
Colin Percival
fd94099ec2 If we are going to
1. Copy a NULL-terminated string into a fixed-length buffer, and
2. copyout that buffer to userland,
we really ought to
0. Zero the entire buffer
first.

Security: FreeBSD-SA-05:08.kmem
2005-05-06 02:50:00 +00:00
Gleb Smirnoff
e9d5db2888 More bits for kernel version:
- copy inet_aton() from libc
- disable getservbyname() lookup and accept only numeric port
2005-05-05 22:00:32 +00:00
Gleb Smirnoff
75bc262006 Always include alias.h before alias_local.h 2005-05-05 21:55:17 +00:00
Gleb Smirnoff
f87fe393ce When used in kernel define NO_FW_PUNCH, NO_LOGGING, NO_USE_SOCKETS. 2005-05-05 21:53:17 +00:00
Gleb Smirnoff
c8d3ca728f Fix argument order for bcopy() in last commit.
Noticed by:	njl
Pointy hat to:	glebius
2005-05-05 21:40:49 +00:00
Gleb Smirnoff
efdc8fbf79 Use bcopy() instead of memmove(). 2005-05-05 21:10:51 +00:00
Gleb Smirnoff
ae0440572f Hide fflush(3) under ifdef DEBUG. 2005-05-05 21:07:34 +00:00
Gleb Smirnoff
c8564bffd2 Things required to build libalias as kernel module:
- kernel module declarations and handler.
- macros to map malloc(3) calls to malloc(9) ones.
- malloc(9) declarations.
- call finishoff() from module handler MOD_UNLOAD case
  instead of atexit(3).
- use panic(9) instead of abort(3)
- take time from time_second instead of gettimeofday(2)
- define INADDR_NONE
2005-05-05 21:05:38 +00:00
Gleb Smirnoff
00fc9a5bb9 Add NO_USE_SOCKETS knob, which cuts off functionality socket binding. 2005-05-05 20:25:12 +00:00
Gleb Smirnoff
40106c140f Add NO_LOGGING knob, which cuts off functionality of debug logging to a file. 2005-05-05 20:22:09 +00:00
Gleb Smirnoff
c649a2e033 Play with includes so that libalias can be compiled both as userland
library and kernel module.
2005-05-05 19:27:32 +00:00
Andre Oppermann
9e4ca6315d If we don't get a suggested MTU during path MTU discovery
look up the packet size of the packet that generated the
response, step down the MTU by one step through ip_next_mtu()
and try again.

Suggested by:	dwmalone
2005-05-04 13:48:44 +00:00
Gleb Smirnoff
1f8f08e1c9 Cleanup IPFW2 ifdefs. 2005-05-04 13:24:37 +00:00
Gleb Smirnoff
c3c2f9a9ba Makefile is not needed here. 2005-05-04 13:24:12 +00:00
Andre Oppermann
4c037f8d6e Add another step of 1280 (gif(4) tunnels) to ip_next_mtu(). 2005-05-04 13:23:54 +00:00
Gleb Smirnoff
a1429ad928 IPFW version 2 is the only option in HEAD and RELENG_5.
Thus, cleanup unnecessary now ifdefs.
2005-05-04 13:12:52 +00:00
Andre Oppermann
c773494edd Pass icmp_error() the MTU argument directly instead of
an interface pointer.  This simplifies a couple of uses
and removes some XXX workarounds.
2005-05-04 13:09:19 +00:00
Robert Watson
b60d26c9b9 Remove now unused inirw variable from previous use of COMMON_END().
Reported by:	csjp
2005-05-01 14:01:38 +00:00
Peter Grehan
73fddedac8 Fix typo in last commit.
Approved by:	rwatson
2005-05-01 13:06:05 +00:00
Robert Watson
d1401c9000 Slide unlocking of the tcbinfo lock earlier in tcp_usr_send(), as it's
needed only for implicit connect cases.  Under load, especially on SMP,
this can greatly reduce contention on the tcbinfo lock.

NB: Ambiguities about the state of so_pcb need to be resolved so that
all use of the tcbinfo lock in non-implicit connection cases can be
eliminated.

Submited by:	Kazuaki Oda <kaakun at highway dot ne dot jp>
2005-05-01 11:11:38 +00:00
Brooks Davis
31519b13c8 Introduce a struct icmphdr which contains the type, code, and cksum
fields of an ICMP packet.

Use this to allow ipfw to pullup only these values since it does not use
the rest of the packet and it was failed on ICMP packets because they
were not long enough.

struct icmp should probably be modified to use these at some point, but
that will break a fair bit of code so it can wait for another day.

On the off chance that adding this struct breaks something in ports,
bump __FreeBSD_version.

Reported by:	Randy Bush <randy at psg dot com>
Tested by:	Randy Bush <randy at psg dot com>
2005-04-26 18:10:21 +00:00
Paul Saab
91232d6ccc Remove some code that snuck in by accident.
Submitted by:	Mohan Srinivasan
2005-04-21 20:29:40 +00:00
Paul Saab
be3f3b5ead Fix for interaction problems between TCP SACK and TCP Signature.
If TCP Signatures are enabled, the maximum allowed sack blocks aren't
going to fit. The fix is to compute how many sack blocks fit and tack
these on last. Also on SYNs, defer padding until after the SACK
PERMITTED option has been added.

Found by:	Mohan Srinivasan.
Submitted by:	Mohan Srinivasan, Noritoshi Demizu.
Reviewed by:	Raja Mukerji.
2005-04-21 20:26:07 +00:00
Paul Saab
97b76190eb Undo rev 1.71 as it is the wrong change. 2005-04-21 20:24:43 +00:00
Paul Saab
a6235da61e - Make the sack scoreboard logic use the TAILQ macros. This improves
code readability and facilitates some anticipated optimizations in
  tcp_sack_option().
- Remove tcp_print_holes() and TCP_SACK_DEBUG.

Submitted by:	Raja Mukerji.
Reviewed by:	Mohan Srinivasan, Noritoshi Demizu.
2005-04-21 20:11:01 +00:00
Paul Saab
a3047bc036 Fix for 2 bugs related to TCP Signatures :
- If the peer sends the Signature option in the SYN, use of Timestamps
  and Window Scaling were disabled (even if the peer supports them).
- The sender must not disable signatures if the option is absent in
  the received SYN. (See comment in syncache_add()).

Found, Submitted by:	Noritoshi Demizu <demizu at dd dot ij4u dot or dot jp>.
Reviewed by:		Mohan Srinivasan <mohans at yahoo-inc dot com>.
2005-04-21 20:09:09 +00:00
Andre Oppermann
1aedbd9c80 Move Path MTU discovery ICMP processing from icmp_input() to
tcp_ctlinput() and subject it to active tcpcb and sequence
number checking.  Previously any ICMP unreachable/needfrag
message would cause an update to the TCP hostcache.  Now only
ICMP PMTU messages belonging to an active TCP session with
the correct src/dst/port and sequence number will update the
hostcache and complete the path MTU discovery process.

Note that we don't entirely implement the recommended counter
measures of Section 7.2 of the paper.  However we close down
the possible degradation vector from trivially easy to really
complex and resource intensive.  In addition we have limited
the smallest acceptable MTU with net.inet.tcp.minmss sysctl
for some time already, further reducing the effect of any
degradation due to an attack.

Security:	draft-gont-tcpm-icmp-attacks-03.txt Section 7.2
MFC after:	3 days
2005-04-21 14:29:34 +00:00
Andre Oppermann
1600372b6b Ignore ICMP Source Quench messages for TCP sessions. Source Quench is
ineffective, depreciated and can be abused to degrade the performance
of active TCP sessions if spoofed.

Replace a bogus call to tcp_quench() in tcp_output() with the direct
equivalent tcpcb variable assignment.

Security:	draft-gont-tcpm-icmp-attacks-03.txt Section 7.1
MFC after:	3 days
2005-04-21 12:37:12 +00:00
Gleb Smirnoff
9dc1f8e41e Remove anti-LOR bandaid, it is not needed now.
Sponsored by:	Rambler
2005-04-20 09:32:05 +00:00
Poul-Henning Kamp
6196e2db3e Make DUMMYNET compile without INET6 2005-04-19 10:12:21 +00:00
Poul-Henning Kamp
d137deac11 typo 2005-04-19 10:04:38 +00:00
Poul-Henning Kamp
7292e12676 Make IPFIREWALL compile without INET6 2005-04-19 09:56:14 +00:00
Brooks Davis
8195404bed Add IPv6 support to IPFW and Dummynet.
Submitted by:	Mariano Tortoriello and Raffaele De Lorenzo (via luigi)
2005-04-18 18:35:05 +00:00
Paul Saab
b7c755717c Rewrite of tcp_update_sack_list() to make it simpler and more readable
than our original OpenBSD derived version.

Submitted by:	Noritoshi Demizu
Reviewed by:	Mohan Srinivasan, Raja Mukerji
2005-04-18 18:10:56 +00:00
Brooks Davis
27a2f39bcf Centralized finding the protocol header in IP packets in preperation for
IPv6 support.  The header in IPv6 is more complex then in IPv4 so we
want to handle skipping over it in one location.

Submitted by:	Mariano Tortoriello and Raffaele De Lorenzo (via luigi)
2005-04-15 00:47:44 +00:00
Paul Saab
25e6f9ed4b Fix for a TCP SACK bug where more than (win/2) bytes could have been
in flight in SACK recovery.

Found by:	Noritoshi Demizu
Submitted by:	Mohan Srinivasan <mohans at yahoo-inc dot com>
		Noritoshi Demizu <demizu at dd dot ij4u dot or dot jp>
		Raja Mukerji <raja at moselle dot com>
2005-04-14 20:09:52 +00:00
Paul Saab
cf09195ba5 - Tighten up the Timestamp checks to prevent a spoofed segment from
setting ts_recent to an arbitrary value, stopping further
  communication between the two hosts.
- If the Echoed Timestamp is greater than the current time,
  fall back to the non RFC 1323 RTT calculation.

Submitted by:	Raja Mukerji (raja at moselle dot com)
Reviewed by:	Noritoshi Demizu, Mohan Srinivasan
2005-04-10 05:24:59 +00:00
Paul Saab
e346eeff65 - If the reassembly queue limit was reached or if we couldn't allocate
a reassembly queue state structure, don't update (receiver) sack
  report.
- Similarly, if tcp_drain() is called, freeing up all items on the
  reassembly queue, clean the sack report.

Found, Submitted by:	Noritoshi Demizu <demizu at dd dot iij4u dot or dot jp>
Reviewed by:	Mohan Srinivasan (mohans at yahoo-inc dot com),
		Raja Mukerji (raja at moselle dot com).
2005-04-10 05:21:29 +00:00
Paul Saab
b962fa74b5 When the rightmost SACK block expands, rcv_lastsack should be updated.
(Fix for kern/78226).

Submitted by : Noritoshi Demizu <demizu at dd dot iij4u dot or dot jp>
Reviewed by  : Mohan Srinivasan (mohans at yahoo-inc dot com),
               Raja Mukerji (raja at moselle dot com).
2005-04-10 05:20:10 +00:00
Paul Saab
da39f5b963 Remove some unused sack fields.
Submitted by : Noritoshi Demizu, Mohan Srinivasan.
2005-04-10 05:19:22 +00:00
Maxim Konovalov
800af1fb81 o Nano optimize ip_reass() code path for the first fragment: do not
try to reasseble the packet from the fragments queue with the only
fragment, finish with the first fragment as soon as we create a queue.

Spotted by:	Vijay Singh

o Drop the fragment if maxfragsperpacket == 0, no chances we
will be able to reassemble the packet in future.

Reviewed by:	silby
2005-04-08 10:25:13 +00:00
Maxim Konovalov
29f2a6ec18 o Tweak the comment a bit. 2005-04-08 08:43:21 +00:00
Maxim Konovalov
e99971bf2f o Disable random port allocation when ip.portrange.first ==
ip.portrange.last and there is the only port for that because:
a) it is not wise; b) it leads to a panic in the random ip port
allocation code.  In general we need to disable ip port allocation
randomization if the last - first delta is ridiculous small.

PR:		kern/79342
Spotted by:	Anjali Kulkarni
Glanced at by:	silby
MFC after:	2 weeks
2005-04-08 08:42:10 +00:00
Gleb Smirnoff
8351d04f34 When a packet has been reinjected into ipfw(4) after dummynet(4) processing
we have a non-NULL args.rule. If the same packet later is subject to "tee"
rule, its original is sent again into ipfw_chk() and it reenters at the same
rule. This leads to infinite loop and frozen router.

Assign args.rule to NULL, any time we are going to send packet back to
ipfw_chk() after a tee rule. This is a temporary workaround, which we
will leave for RELENG_5. In HEAD we are going to make divert(4) save
next rule the same way as dummynet(4) does.

PR:		kern/79546
Submitted by:	Oleg Bulyzhin
Reviewed by:	maxim, andre
MFC after:	3 days
2005-04-06 14:00:33 +00:00
Brooks Davis
a0d17f7e98 Use ACTION_PTR(r) instead of (r->cmd + r->act_ofs).
Reviewed by:	md5
2005-04-06 00:26:08 +00:00
Brooks Davis
f4ff11976d Make dummynet_flush() match its prototype. 2005-04-05 23:38:16 +00:00
Poul-Henning Kamp
a8bc22b47a natd core dumps when -reverse switch is used because of a bug in
libalias.

In /usr/src/lib/libalias/alias.c, the functions LibAliasIn and
LibAliasOutTry call the legacy PacketAliasIn/PacketAliasOut instead
of LibAliasIn/LibAliasOut when the PKT_ALIAS_REVERSE option is set.
In this case, the context variable "la" gets lost because the legacy
compatibility routines expect "la" to be global.  This was obviously
an oversight when rewriting the PacketAlias* functions to the
LibAlias* functions.

The fix (as shown in the patch below) is to remove the legacy
subroutine calls and replace with the new ones using the "la" struct
as the first arg.

Submitted by:	Gil Kloepfer <fgil@kloepfer.org>
Confirmed by:	<nicolai@catpipe.net>
PR:		76839
MFC after:	3 days
2005-04-05 13:04:35 +00:00
Gleb Smirnoff
4cb39345c0 When several carp interfaces are attached to Ethernet interface,
carp_carpdev_state_locked() is called every time carp interface is attached.
The first call backs up flags of the first interface, and the second
call backs up them again, erasing correct values.
  To solve this, a carp_sc_state_locked() function is introduced. It is
called when interface is attached to parent, instead of calling
carp_carpdev_state_locked. carp_carpdev_state_locked() calls
carp_sc_state_locked() for each sc in chain.

Reported by:	Yuriy N. Shkandybin, sem
2005-03-30 11:44:43 +00:00
Gleb Smirnoff
d1a4742962 - Don't free mbuf, passed to interface output method if the latter
returns error. In this case mbuf has already been freed. [1]
- Remove redundant declaration.

PR:		kern/78893 [1]
Submitted by:	Liang Yi [1]
Reviewed by:	sam
MFC after:	1 day
2005-03-29 13:43:09 +00:00
Sam Leffler
812d865346 eliminate extraneous null ptr checks
Noticed by:	Coverity Prevent analysis tool
2005-03-29 01:10:46 +00:00
Sam Leffler
5309f84168 deal with malloc failures
Noticed by:	Coverity Prevent analysis tool
Together with:	mdodd
2005-03-26 22:20:22 +00:00
Maxim Konovalov
6ee79c59d2 o Document net.inet.ip.portrange.random* sysctls.
o Correct a comment about random port allocation threshold
implementation.

Reviewed by:	silby, ru
MFC after:	3 days
2005-03-23 09:26:38 +00:00
Gleb Smirnoff
d4d2297060 ifma_protospec is a pointer. Use NULL when assigning or compating it. 2005-03-20 14:31:45 +00:00
Gleb Smirnoff
50bb170471 Remove a workaround from previos revision. It proved to be incorrect.
Add two another workarounds for carp(4) interfaces:
- do not add connected route when address is assigned to carp(4) interface
- do not add connected route when other interface goes down

Embrace workarounds with #ifdef DEV_CARP
2005-03-20 10:27:17 +00:00
Gleb Smirnoff
ee6f227017 If vhid exists return more informative EEXIST instead of EINVAL. While here
remove redundant brackets.
2005-03-18 13:41:38 +00:00
Gleb Smirnoff
9860bab349 Fix a potential crash that could occur when CARP_LOG is being used.
Obtained from:	OpenBSD (pat)
2005-03-18 13:18:34 +00:00
Sam Leffler
6a9909b5e6 plug resource leak
Noticed by:	Coverity Prevent analysis tool
2005-03-16 05:27:19 +00:00
Robert Watson
d2bc35ab29 In tcp_usr_send(), broaden coverage of the socket buffer lock in the
non-OOB case so that the sbspace() check is performed under the same
lock instance as the append to the send socket buffer.

MFC after:	1 week
2005-03-14 22:15:14 +00:00
Gleb Smirnoff
422a115a4a Embrace with #ifdef DEV_CARP carp-related code. 2005-03-13 11:23:22 +00:00
Gleb Smirnoff
0504a89fdd Add antifootshooting workaround, which will make all routes "connected"
to carp(4) interfaces host routes. This prevents a problem, when connected
network is routed to carp(4) interface.
2005-03-10 15:26:45 +00:00
Paul Saab
e891d82b56 Add limits on the number of elements in the sack scoreboard both
per-connection and globally. This eliminates potential DoS attacks
where SACK scoreboard elements tie up too much memory.

Submitted by:	Raja Mukerji (raja at moselle dot com).
Reviewed by:	Mohan Srinivasan (mohans at yahoo-inc dot com).
2005-03-09 23:14:10 +00:00
Gleb Smirnoff
2ef4a436e0 Make ARP do not complain about wrong interface if correct interface
is a carp one and address matched it.

Reviewed by:	brooks
2005-03-09 10:00:01 +00:00
Joe Marcus Clarke
70037e98c4 Fix a problem in the Skinny ALG where a specially crafted packet could cause
a libalias application (e.g.  natd, ppp, etc.) to crash.  Note: Skinny support
is not enabled in natd or ppp by default.

Approved by:	secteam (nectar)
MFC after:	1 day
Secuiryt:	This fixes a remote DoS exploit
2005-03-03 03:06:37 +00:00
Gleb Smirnoff
b82936c5d4 Fix typo. Unbreak build. Take pointy hat. 2005-03-02 09:11:18 +00:00
Gleb Smirnoff
d220759b41 Add more locking when reading/writing to carp softc. When carp softc is
attached to a parent interface we use its mutex to lock the softc. This
means that in several places like carp_ioctl() we lock softc conditionaly.
This should be redesigned.

To avoid LORs when MII announces us a link state change, we schedule
a quick callout and call carp_carpdev_state_locked() from it.

Initialize callouts using NET_CALLOUT_MPSAFE.

Sponsored by:	Rambler
Reviewed by:	mlaier
2005-03-01 13:14:33 +00:00
Gleb Smirnoff
d92d54d54d - Add carp_mtx. Use it to protect list of all carp interfaces.
- In carp_send_ad_all() walk through list of all carp interfaces
  instead of walking through list of all interfaces.

Sponsored by:	Rambler
Reviewed by:	mlaier
2005-03-01 12:36:07 +00:00
Gleb Smirnoff
31199c8463 Use NET_CALLOUT_MPSAFE macro. 2005-03-01 12:01:17 +00:00
Gleb Smirnoff
3a84d72a78 Revert change to struct ifnet. Use ifnet pointer in softc. Embedding
ifnet into smth will soon be removed.

Requested by:	brooks
2005-03-01 10:59:14 +00:00
Gleb Smirnoff
4358dfc32f Remove debugging printf.
Reviewed by:	mlaier
2005-03-01 09:31:36 +00:00
Yaroslav Tykhiy
630481bb92 Support running carp(4) over a vlan(4) parent interface.
Encouraged by:	glebius
2005-02-28 16:19:11 +00:00
Gleb Smirnoff
1d0a237660 Remove unused field from carp softc.
OK'ed by:	mcbride@OpenBSD
2005-02-28 11:57:03 +00:00
Gleb Smirnoff
3e07def4cd Fix tcpdump(8) on carp(4) interface:
- Use our loop DLT type, not OpenBSD. [1]
- The fields that are converted to network byte order are not 32-bit
  fields but 16-bit fields, so htons should be used in htonl. [1]
- Secondly, ip_input changes ip->ip_len into its value without
  the ip-header length. So, restore the length to make bpf happy. [1]
- Use bpf_mtap2(), use temporary af1, since bpf_mtap2 doesn't
  understand uint8_t af identifier.

Submitted by:	Frank Volf [1]
2005-02-28 11:54:36 +00:00
Paul Saab
8291294024 If the receiver sends an ack that is out of [snd_una, snd_max],
ignore the sack options in that segment. Else we'd end up
corrupting the scoreboard.

Found by:	Raja Mukerji (raja at moselle dot com)
Submitted by:	Mohan Srinivasan
2005-02-27 20:39:04 +00:00
Max Laier
a4e5390551 Unbreak the build. carp_iamatch6 and carp_macmatch6 are not supposed to be
static as they are used elsewhere.
2005-02-27 11:32:26 +00:00
Gleb Smirnoff
e8c34a71eb Remove carp_softc.sc_ifp member in favor of union pointers in struct ifnet.
Obtained from:	OpenBSD
2005-02-26 13:55:07 +00:00
Gleb Smirnoff
5c1f0f6de5 Staticize local functions. 2005-02-26 10:33:14 +00:00
Gleb Smirnoff
88bf82a62e New lines when logging. 2005-02-25 11:26:39 +00:00
Gleb Smirnoff
947b7cf3c6 Embrace macros with do {} while (0)
Submitted by:	maxim
2005-02-25 10:49:47 +00:00
Gleb Smirnoff
39aeaa0eb5 Call carp_carpdev_state() from carp_set_addr6(). See log for rev 1.4.
Sponsored by:	Rambler
2005-02-25 10:12:11 +00:00
Gleb Smirnoff
1e9e65729b Improve logging:
- Simplify CARP_LOG() and making it working (we don't have addlog in FreeBSD).
 - Introduce CARP_DEBUG() which logs with LOG_DEBUG severity when
   net.inet.carp.log > 1
 - Use CARP_DEBUG to log state changes of carp interfaces.

After CARP_LOG() cleanup it appeared that carp_input_c() does not need sc
argument. Remove it.

Sponsored by:	Rambler
2005-02-25 10:09:44 +00:00
Gleb Smirnoff
6fba4c0bae Fix problem when master comes up with one interface down, and preempts
mastering on all other interfaces:

- call carp_carpdev_state() on initialize instead of just setting to INIT
- in carp_carpdev_state() check that interface is UP, instead of checking
  that it is not DOWN, because a rebooted machine may have interface in
  UNKNOWN state.

Sponsored by:	Rambler
Obtained from:	OpenBSD (partially)
2005-02-24 09:05:28 +00:00
Sam Leffler
db77984c5b fix potential invalid index into ip_protox array
Noticed by:	Coverity Prevent analysis tool
2005-02-23 00:38:12 +00:00
Maxime Henrion
2368737719 Unbreak CARP build on 64-bit architectures.
Tested on:	sparc64
2005-02-23 00:20:33 +00:00
Andre Oppermann
099dd0430b Bring back the full packet destination manipulation for 'ipfw fwd'
with the kernel compile time option:

 options IPFIREWALL_FORWARD_EXTENDED

This option has to be specified in addition to IPFIRWALL_FORWARD.

With this option even packets targeted for an IP address local
to the host can be redirected.  All restrictions to ensure proper
behaviour for locally generated packets are turned off.  Firewall
rules have to be carefully crafted to make sure that things like
PMTU discovery do not break.

Document the two kernel options.

PR:		kern/71910
PR:		kern/73129
MFC after:	1 week
2005-02-22 17:40:40 +00:00
Gleb Smirnoff
67df421496 Remove promisc counter from parent interface in carp_clone_destroy(),
so that parent interface is not left in promiscous mode after carp
interface is destroyed.

This is not perfect, since promisc counter is added when carp
interface is assigned an IP address. However, when address is removed
parent interface is still in promiscuous mode. Only removal of
carp interface removes promisc from parent. Same way in OpenBSD.

Sponsored by:	Rambler
2005-02-22 16:24:55 +00:00
Gleb Smirnoff
a97719482d Add CARP (Common Address Redundancy Protocol), which allows multiple
hosts to share an IP address, providing high availability and load
balancing.

Original work on CARP done by Michael Shalayeff, with many
additions by Marco Pfatschbacher and Ryan McBride.

FreeBSD port done solely by Max Laier.

Patch by:	mlaier
Obtained from:	OpenBSD (mickey, mcbride)
2005-02-22 13:04:05 +00:00
Gleb Smirnoff
797127a9bf We can make code simplier after last change.
Noticed by:	Andrew Thompson
2005-02-22 08:35:24 +00:00
Gleb Smirnoff
3a1757b9c0 In in_pcbconnect_setup() jailed sockets are treated specially: if local
address is not supplied, then jail IP is choosed and in_pcbbind() is called.
Since udp_output() does not save local addr after call to in_pcbconnect_setup(),
in_pcbbind() is called for each packet, and this is incorrect.

So, we shall treat jailed sockets specially in udp_output(), we will save
their local address.

This fixes a long standing bug with broken sendto() system call in jails.

PR:		kern/26506
Reviewed by:	rwatson
MFC after:	2 weeks
2005-02-22 07:50:02 +00:00
Gleb Smirnoff
914d092f5d In in_pcbconnect_setup() remove a check that route points at
loopback interface. Nobody have explained me sense of this check.
It breaks connect() system call to a destination address which is
loopback routed (e.g. blackholed).

Reviewed by:	silence on net@
MFC after:	2 weeks
2005-02-22 07:39:15 +00:00
Robert Watson
0daccb9c94 In the current world order, solisten() implements the state transition of
a socket from a regular socket to a listening socket able to accept new
connections.  As part of this state transition, solisten() calls into the
protocol to update protocol-layer state.  There were several bugs in this
implementation that could result in a race wherein a TCP SYN received
in the interval between the protocol state transition and the shortly
following socket layer transition would result in a panic in the TCP code,
as the socket would be in the TCPS_LISTEN state, but the socket would not
have the SO_ACCEPTCONN flag set.

This change does the following:

- Pushes the socket state transition from the socket layer solisten() to
  to socket "library" routines called from the protocol.  This permits
  the socket routines to be called while holding the protocol mutexes,
  preventing a race exposing the incomplete socket state transition to TCP
  after the TCP state transition has completed.  The check for a socket
  layer state transition is performed by solisten_proto_check(), and the
  actual transition is performed by solisten_proto().

- Holds the socket lock for the duration of the socket state test and set,
  and over the protocol layer state transition, which is now possible as
  the socket lock is acquired by the protocol layer, rather than vice
  versa.  This prevents additional state related races in the socket
  layer.

This permits the dual transition of socket layer and protocol layer state
to occur while holding locks for both layers, making the two changes
atomic with respect to one another.  Similar changes are likely require
elsewhere in the socket/protocol code.

Reported by:		Peter Holm <peter@holm.cc>
Review and fixes from:	emax, Antoine Brodin <antoine.brodin@laposte.net>
Philosophical head nod:	gnn
2005-02-21 21:58:17 +00:00