Commit Graph

2560 Commits

Author SHA1 Message Date
sam
2350e92037 Revise network interface cloning to take an optional opaque
parameter that can specify configuration parameters:
o rev cloner api's to add optional parameter block
o add SIOCCREATE2 that accepts parameter data
o rev vlan support to use new api (maintain old code)

Reviewed by:	arch@
2006-07-09 06:04:01 +00:00
mlaier
f7e47bf374 Make in-kernel multicast protocols for pfsync and carp work after enabling
dynamic resizing of multicast membership array.

Reported and testing by:	Maxim Konovalov, Scott Ullrich
Reminded by:			thompsa
MFC after:			2 weeks
2006-07-08 00:01:01 +00:00
rwatson
ef5c0fe5ce Remove unneeded mac.h include.
MFC after:	3 days
2006-07-06 13:25:01 +00:00
oleg
e6e6ae67a2 Complete timebase (time_second -> time_uptime) conversion.
PR:		kern/94249
Reviewed by:	andre (few months ago)
Approved by:	glebius (mentor)
2006-07-05 23:37:21 +00:00
maxim
3463a7b118 o Kill BUGS section as it is not valid since rev. 1.4 alias_pptp.c.
Spotted by:	ru.unix.bsd activists
MFC after:	1 week
2006-07-04 20:39:38 +00:00
yar
ba19b1ecd4 There is a consensus that ifaddr.ifa_addr should never be NULL,
except in places dealing with ifaddr creation or destruction; and
in such special places incomplete ifaddrs should never be linked
to system-wide data structures.  Therefore we can eliminate all the
superfluous checks for "ifa->ifa_addr != NULL" and get ready
to the system crashing honestly instead of masking possible bugs.

Suggested by:	glebius, jhb, ru
2006-06-29 19:22:05 +00:00
yar
cd07978e96 Use TAILQ_FOREACH consistently. 2006-06-29 17:09:47 +00:00
glebius
b014c19488 Fix URL to Bellovin's paper.
Submitted by:	Anton Yuzhaninov <citrin rambler-co.ru>
2006-06-29 13:38:36 +00:00
bz
d1b46f3dc8 Eliminate the offset argument from send_reject. It's not been
used since FreeBSD-SA-06:04.ipfw.
Adopt send_reject6 to what had been done for legacy IP: no longer
send or permit sending rejects for any but the first fragment.

Discussed with: oleg, csjp (some weeks ago)
2006-06-29 11:17:16 +00:00
bz
ed6ddd5a31 Use INPLOOKUP_WILDCARD instead of just 1 more consistently.
OKed by: rwatson (some weeks ago)
2006-06-29 10:49:49 +00:00
pjd
7f09680f0c - Use suser_cred(9) instead of directly checking cr_uid.
- Change the order of conditions to first verify that we actually need
  to check for privileges and then eventually check them.

Reviewed by:	rwatson
2006-06-27 11:35:53 +00:00
andre
823f6f19d1 In syncache_respond() do not reply with a MSS that is larger than what
the peer announced to us but make it at least tcp_minmss in size.

Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-06-26 17:54:53 +00:00
andre
ac13cc218d Some cleanups and janitorial work to tcp_syncache:
o don't assign remote/local host/port information manually between provided
   struct in_conninfo and struct syncache, bcopy() it instead
 o rename sc_tsrecent to sc_tsreflect in struct syncache to better capture
   the purpose of this field
 o rename sc_request_r_scale to sc_requested_r_scale for ditto reasons
 o fix IPSEC error case printf's to report correct function name
 o in syncache_socket() only transpose enhanced tcp options parameters to
   struct tcpcb when the inpcb doesn't has TF_NOOPT set
 o in syncache_respond() reorder stack variables
 o in syncache_respond() remove bogus KASSERT()

No functional changes.

Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-06-26 16:14:19 +00:00
andre
fa9880c218 Some cleanups and janitorial work to tcp_dooptions():
o redefine the parameter 'is_syn' to 'flags', add TO_SYN flag and adjust its
   usage accordingly
 o update the comments to the tcp_dooptions() invocation in
   tcp_input():after_listen to reflect reality
 o move the logic checking the echoed timestamp out of tcp_dooptions() to the
   only place that uses it next to the invocation described in the previous
   item
 o adjust parsing of TCPOPT_SACK_PERMITTED to use the same style as the others
 o add comments in to struct tcpopt.to_flags #defines

No functional changes.

Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-06-26 15:35:25 +00:00
andre
03888640de Reverse the source/destination parameters to in[6]_pcblookup_hash() in
syncache_respond() for the #ifdef MAC case.

Submitted by:	Tai-hwa Liang <avatar-at-mmlab.cse.yzu.edu.tw>
2006-06-26 09:43:55 +00:00
rwatson
a46d883eae In tcp6_usr_attach(), return immediately if SS_ISDISCONNECTED, to
avoid dereferencing an uninitialized inp variable.

Submitted by:	Michiel Boland <michiel at boland dot org>
MFC after:	1 month
2006-06-26 09:38:08 +00:00
andre
2c00a4721f Decrement the global syncache counter in syncache_expand() when the entry
is removed from the bucket.  This fixes the syncache statistics.
2006-06-25 11:11:33 +00:00
andre
b23e67a0d5 Move the syncookie MD5 context from globals to the stack to make it MP safe. 2006-06-22 15:07:45 +00:00
ume
bddec6f4bb - Pullup even when the extention header is unknown, to prevent
infinite loop with net.inet6.ip6.fw.deny_unknown_exthdrs=0.
- Teach ipv6 and ipencap as they appear in an IPv4/IPv6 over IPv6
  tunnel.
- Test the next extention header even when the routing header type
  is unknown with net.inet6.ip6.fw.deny_unknown_exthdrs=0.

Found by:	xcast-fan-club
MFC after:	1 week
2006-06-22 13:22:54 +00:00
andre
58082047f9 Allocate a zero'ed syncache hashtable. mtx_init() tests the supplied
memory location for already existing/initialized mutexes.  With random
data in the memory location this fails (ie. after a soft reboot).

Reported by:	brueffer, YAMAMOTO Shigeru
Submitted by:	YAMAMOTO Shigeru <shigeru-at-iij.ad.jp>
2006-06-20 08:11:30 +00:00
dwmalone
10047ad619 When we receive an out-of-window SYN for an "ESTABLISHED" connection,
ACK the SYN as required by RFC793, rather than ignoring it. NetBSD
have had a similar change since 1999.

PR:		93236
Submitted by:	Grant Edwards <grante@visi.com>
MFC after:	1 month
2006-06-19 12:33:52 +00:00
andre
2ca2d64333 Remove T/TCP RFC1644 Connection Count comparison macros. They are no longer
used and needed.

Sponsored by:   TCP/IP Optimization Fundraise 2005
2006-06-18 14:24:12 +00:00
andre
2e570176ba Do not access syncache entry before it was allocated for the TF_NOOPT case
in syncache_add().

Found by:	Coverity Prevent
CID:		1473
2006-06-18 13:03:42 +00:00
andre
f41df61805 Move all syncache related structures to tcp_syncache.c. They are only used
there.

This unbreaks userland programs that include tcp_var.h.

Discussed with:	rwatson
2006-06-18 12:26:11 +00:00
andre
9d4650295f Remove double lock acquisition in syncookie_lookup() which came from last
minute conversions to macros.

Pointy hat to:	andre
2006-06-18 11:48:03 +00:00
andre
fd73883e66 Fix the !INET6 compile.
Reported by:	alc
2006-06-17 18:42:07 +00:00
andre
5de64f9bb7 Rearrange fields in struct syncache and syncache_head to make them more
cache line friendly.

Sponsored by:   TCP/IP Optimization Fundraise 2005
2006-06-17 17:57:36 +00:00
andre
aa591c6cf8 ANSIfy and tidy up comments.
Sponsored by:   TCP/IP Optimization Fundraise 2005
2006-06-17 17:49:11 +00:00
andre
ac46e67460 Add locking to TCP syncache and drop the global tcpinfo lock as early
as possible for the syncache_add() case.  The syncache timer no longer
aquires the tcpinfo lock and timeout/retransmit runs can happen in
parallel with bucket granularity.

On a P4 the additional locks cause a slight degression of 0.7% in tcp
connections per second.  When IP and TCP input are deserialized and
can run in parallel this little overhead can be neglected. The syncookie
handling still leaves room for improvement and its random salts may be
moved to the syncache bucket head structures to remove the second lock
operation currently required for it.  However this would be a more
involved change from the way syncookies work at the moment.

Reviewed by:	rwatson
Tested by:	rwatson, ps (earlier version)
Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-06-17 17:32:38 +00:00
oleg
7a65db868d Add support of 'tablearg' feature for:
- 'tag' & 'untag' action parameters.
- 'tagged' & 'limit' rule options.
Rule examples:
	pipe 1 tag tablearg ip from table(1) to any
	allow ip from any to table(2) tagged tablearg
	allow tcp from table(3) to any 25 setup limit src-addr tablearg

sbin/ipfw/ipfw2.c:
1) new macros
   GET_UINT_ARG - support of 'tablearg' keyword, argument range checking.
   PRINT_UINT_ARG - support of 'tablearg' keyword.
2) strtoport(): do not silently truncate/accept invalid port list expressions
   like: '1,2-abc' or '1,2-3-4' or '1,2-3x4'. style(9) cleanup.

Approved by:	glebius (mentor)
MFC after:	1 month
2006-06-15 09:39:22 +00:00
oleg
45f57ec2f1 install_state(): style(9) cleanup
Approved by:	glebius (mentor)
MFC after:	1 month
2006-06-15 08:54:29 +00:00
thompsa
e3103df967 Enable proxy ARP answers on any of the bridged interfaces if proxy record
belongs to another interface within the bridge group.

PR:		kern/94408
Submitted by:	Eygene A. Ryabinkin
MFC after:	1 month
2006-06-09 00:33:30 +00:00
oleg
4dd0249169 install_state() should properly initialize 'addr_type' field of newly created
flows for O_LIMIT rules.  Otherwise 'ipfw -d show' is unable to display
PARENT rules properly.
(This bug was exposed by ipfw2.c rev.1.90)

Approved by:	glebius (mentor)
MFC after:	2 weeks
2006-06-08 11:27:45 +00:00
oleg
a2786b9de3 Fix following rules: pipe X (tag|altq) Y ...
Approved by:	glebius (mentor)
MFC after:	2 weeks
2006-06-08 11:13:23 +00:00
rwatson
f0edd09c23 Push acquisition of pcbinfo lock out of tcp_usr_attach() into
tcp_attach() after the call to soreserve(), as it doesn't require
the global lock.  Rearrange inpcb locking here also.

MFC after:	1 month
2006-06-04 09:31:34 +00:00
rwatson
8d3568ae0b When entering a timer on a tcpcb, don't continue processing if it has been
dropped.  This prevents a bug introduced during the socket/pcb refcounting
work from occuring, in which occasionally the retransmit timer may fire
after a connection has been reset, resulting in the resulting R|A TCP
packet having a source port of 0, as the port reservation has been
released.

While here, fixing up some RUNLOCK->WUNLOCK bugs.

MFC after:	1 month
2006-06-03 19:37:08 +00:00
rwatson
133fd236d1 Acquire udbinfo lock after call to soreserve() rather than before, as it
is not required.  This simplifies error-handling, and reduces the time
that this lock is held.

MFC after:	1 month
2006-06-03 19:29:26 +00:00
csjp
2c4f67981e Fix the following bpf(4) race condition which can result in a panic:
(1) bpf peer attaches to interface netif0
	(2) Packet is received by netif0
	(3) ifp->if_bpf pointer is checked and handed off to bpf
	(4) bpf peer detaches from netif0 resulting in ifp->if_bpf being
	    initialized to NULL.
	(5) ifp->if_bpf is dereferenced by bpf machinery
	(6) Kaboom

This race condition likely explains the various different kernel panics
reported around sending SIGINT to tcpdump or dhclient processes. But really
this race can result in kernel panics anywhere you have frequent bpf attach
and detach operations with high packet per second load.

Summary of changes:

- Remove the bpf interface's "driverp" member
- When we attach bpf interfaces, we now set the ifp->if_bpf member to the
  bpf interface structure. Once this is done, ifp->if_bpf should never be
  NULL. [1]
- Introduce bpf_peers_present function, an inline operation which will do
  a lockless read bpf peer list associated with the interface. It should
  be noted that the bpf code will pickup the bpf_interface lock before adding
  or removing bpf peers. This should serialize the access to the bpf descriptor
  list, removing the race.
- Expose the bpf_if structure in bpf.h so that the bpf_peers_present function
  can use it. This also removes the struct bpf_if; hack that was there.
- Adjust all consumers of the raw if_bpf structure to use bpf_peers_present

Now what happens is:

	(1) Packet is received by netif0
	(2) Check to see if bpf descriptor list is empty
	(3) Pickup the bpf interface lock
	(4) Hand packet off to process

From the attach/detach side:

	(1) Pickup the bpf interface lock
	(2) Add/remove from bpf descriptor list

Now that we are storing the bpf interface structure with the ifnet, there is
is no need to walk the bpf interface list to locate the correct bpf interface.
We now simply look up the interface, and initialize the pointer. This has a
nice side effect of changing a bpf interface attach operation from O(N) (where
N is the number of bpf interfaces), to O(1).

[1] From now on, we can no longer check ifp->if_bpf to tell us whether or
    not we have any bpf peers that might be interested in receiving packets.

In collaboration with:	sam@
MFC after:	1 month
2006-06-02 19:59:33 +00:00
rwatson
88f1a971b9 Minor restyling and cleanup around ipport_tick().
MFC after:	1 month
2006-06-02 08:18:27 +00:00
oleg
499297c74c Implement internal (i.e. inside kernel) packet tagging using mbuf_tags(9).
Since tags are kept while packet resides in kernelspace, it's possible to
use other kernel facilities (like netgraph nodes) for altering those tags.

Submitted by:	Andrey Elsukov <bu7cher at yandex dot ru>
Submitted by:	Vadim Goncharov <vadimnuclight at tpu dot ru>
Approved by:	glebius (mentor)
Idea from:	OpenBSD PF
MFC after:	1 month
2006-05-24 13:09:55 +00:00
maxim
fa6b08051c o In udp|rip_disconnect() acquire a socket lock before the socket
state modification.  To prevent races do that while holding inpcb
lock.

Reviewed by:	rwatson
2006-05-21 19:28:46 +00:00
maxim
2be20d920e o Add missed error check: in ip_ctloutput() sooptcopyin() returns a
result but we never examine it.

Reviewed by:	rwatson
MFC after:	2 weeks
2006-05-21 17:52:08 +00:00
bms
c18ccfca54 Initialize the new members of struct ip_moptions as
a defensive programming measure.

Note that whilst these members are not used by the ip_output()
path, we are passing an instance of struct ip_moptions here
which is declared on the stack (which could be considered a
bad thing).

ip_output() does not consume struct ip_moptions, but in case it
does in future, declare an in_multi vector on the stack too to
behave more like ip_findmoptions() does.
2006-05-18 19:51:08 +00:00
glebius
7a37254134 Since m_pullup() can return a new mbuf, change gre_input2() to
return mbuf back to gre_input(). If the former returns mbuf back
to the latter, then pass it to raw_input().

Coverity ID:	829
2006-05-16 11:15:22 +00:00
glebius
7840c3c3e0 - Backout one line from 1.78. The tp can be freed by tcp_drop().
- Style next line.

Coverity ID:	912
2006-05-16 10:51:26 +00:00
maxim
ecfee6862b o In rip_disconnect() do not call rip_abort(), just mark a socket
as not connected.  In soclose() case rip_detach() will kill inpcb for
us later.

It makes rawconnect regression test do not panic a system.

Reviewed by:	rwatson
X-MFC after:	with all 1th April inpcb changes
2006-05-15 09:28:57 +00:00
mlaier
d94d020eed Use only lower 64bit of src/dest (and src/dest port) for hashing of IPv6
connections and get rid of the flow_id as it is not guaranteed to be stable
some (most?) current implementations seem to just zero it out.

PR:		kern/88664
Reported by:	jylefort
Submitted by:	Joost Bekkers (w/ changes)
Tested by	"regisr" <regisrApoboxDcom>
2006-05-14 23:42:24 +00:00
bms
18895e4f42 Fix a long-standing limitation in IPv4 multicast group membership.
By making the imo_membership array a dynamically allocated vector,
this minimizes disruption to existing IPv4 multicast code. This
change breaks the ABI for the kernel module ip_mroute.ko, and may
cause a small amount of churn for folks working on the IGMPv3 merge.

Previously, sockets were subject to a compile-time limitation on
the number of IPv4 group memberships, which was hard-coded to 20.
The imo_membership relationship, however, is 1:1 with regards to
a tuple of multicast group address and interface address. Users who
ran routing protocols such as OSPF ran into this limitation on machines
with a large system interface tree.
2006-05-14 14:22:49 +00:00
mlaier
123e91766b Remove ip6fw. Since ipfw has full functional IPv6 support now and - in
contrast to ip6fw - is properly lockes, it is time to retire ip6fw.
2006-05-12 20:39:23 +00:00
mlaier
95826ec6b4 Reintroduce net.inet6.ip6.fw.enable sysctl to dis/enable the ipv6 processing
seperately.  Also use pfil hook/unhook instead of keeping the check
functions in pfil just to return there based on the sysctl.  While here fix
some whitespace on a nearby SYSCTL_ macro.
2006-05-12 04:41:27 +00:00