Commit Graph

3071 Commits

Author SHA1 Message Date
Robert Watson
baa45840d7 In ip_output(), allow a read lock as well as a write lock when asserting
a lock on the passed inpcb.

MFC after:	3 months
2008-04-19 14:35:17 +00:00
Robert Watson
a69042a5be When querying the local or foreign address from an IP socket, acquire
only a read lock on the inpcb.

When an external module requests a read lock, acquire only a read lock.

MFC after:	3 months
2008-04-19 14:34:38 +00:00
Kip Macy
73a0d5896e move tcbinfo lock acquisition in to syncache 2008-04-19 03:39:17 +00:00
Kip Macy
46b0a854cc move cxgb_lt2.[ch] from NIC to TOE
move most offload functionality from NIC to TOE
factor out all socket and inpcb direct access
factor out access to locking in incpb, pcbinfo, and sockbuf
2008-04-19 03:22:43 +00:00
George V. Neville-Neil
0327aeb9e3 Add in check for loopback as well, which was missing from the original patch.
PR: 120958
Submitted by: James Snow <snow at teardrop.org>
MFC after: 2 weeks
2008-04-17 23:24:58 +00:00
Robert Watson
8501a69cc9 Convert pcbinfo and inpcb mutexes to rwlocks, and modify macros to
explicitly select write locking for all use of the inpcb mutex.
Update some pcbinfo lock assertions to assert locked rather than
write-locked, although in practice almost all uses of the pcbinfo
rwlock main exclusive, and all instances of inpcb lock acquisition
are exclusive.

This change should introduce (ideally) little functional change.
However, it lays the groundwork for significantly increased
parallelism in the TCP/IP code.

MFC after:	3 months
Tested by:	kris (superset of committered patch)
2008-04-17 21:38:18 +00:00
George V. Neville-Neil
6b9ff6b7a7 Clean up the code that checks the types of address so that it is
done by understandable macros.

Fix the bug that prevented the system from responding on interfaces with
link local addresses assigned.

PR: 120958
Submitted by: James Snow <snow at teardrop.org>
MFC after: 2 weeks
2008-04-17 12:50:42 +00:00
Randall Stewart
5e2c2d872b Allow SCTP to compile without INET6.
PR:		116816
Obtained from	tuexen@fh-muenster.de:
MFC after:	2 weeks
2008-04-16 17:24:18 +00:00
Randall Stewart
eadccaccf0 Use the pru_flush infrastructure to avoid a panic
PR:		122710
MFC after:	1 week
2008-04-14 18:13:33 +00:00
Randall Stewart
c40e9cf2c1 Protection against errant sender sending a stream
seq number out of order with no missing TSN's (a
cisco box has this problem which will make a ssn
be held forever).
MFC after:	1 week
2008-04-14 14:34:29 +00:00
Randall Stewart
2a3eb019db New logging values. 2008-04-14 14:33:07 +00:00
Randall Stewart
45ccc1a635 1) adds some additional logging
2) changes to use a inqueue_bytes calculated value in max_len calc's.
MFC after:	1 week
2008-04-14 14:32:32 +00:00
Qing Li
e440aed958 This patch provides the back end support for equal-cost multi-path
(ECMP) for both IPv4 and IPv6. Previously, multipath route insertion
is disallowed. For example,

	route add -net 192.103.54.0/24 10.9.44.1
	route add -net 192.103.54.0/24 10.9.44.2

The second route insertion will trigger an error message of
"add net 192.103.54.0/24: gateway 10.2.5.2: route already in table"

Multiple default routes can also be inserted. Here is the netstat
output:

default		10.2.5.1	UGS	0	3074	bge0 =>
default		10.2.5.2	UGS	0	0	bge0

When multipath routes exist, the "route delete" command requires
a specific gateway to be specified or else an error message would
be displayed. For example,

	route delete default

would fail and trigger the following error message:

"route: writing to routing socket: No such process"
"delete net default: not in table"

On the other hand,

	route delete default 10.2.5.2

would be successful: "delete net default: gateway 10.2.5.2"

One does not have to specify a gateway if there is only a single
route for a particular destination.

I need to perform more testings on address aliases and multiple
interfaces that have the same IP prefixes. This patch as it
stands today is not yet ready for prime time. Therefore, the ECMP
code fragments are fully guarded by the RADIX_MPATH macro.
Include the "options  RADIX_MPATH" in the kernel configuration
to enable this feature.

Reviewed by:	robert, sam, gnn, julian, kmacy
2008-04-13 05:45:14 +00:00
Bjoern A. Zeeb
b835b6fe2b Take the route mtu into account, if available, when sending an
ICMP unreach, frag needed.  Up to now we only looked at the
interface MTU. Make sure to only use the minimum of the two.

In case IPSEC is compiled in, loop the mtu through ip_ipsec_mtu()
to avoid any further conditional maths.

Without this, PMTU was broken in those cases when there was a
route with a lower MTU than the MTU of the outgoing interface.

PR:		kern/122338
Tested by:	Mark Cammidge  mark peralex.com
Reviewed by:	silence on net@
MFC after:	2 weeks
2008-04-09 05:17:18 +00:00
Andre Oppermann
3a4018c4e8 Remove TCP options ordering assumptions in tcp_addoptions(). Ordering
was changed in rev. 1.161 of tcp_var.h.  All option now test for sufficient
space in TCP header before getting added.

Reported by:	Mark Atkinson <atkin901-at-yahoo.com>
Tested by:	Mark Atkinson <atkin901-at-yahoo.com>
MFC after:	1 week
2008-04-07 19:09:23 +00:00
Andre Oppermann
5b2e33eab5 Remove now unnecessary comment. 2008-04-07 18:50:05 +00:00
Andre Oppermann
c343c524e1 Use #defines for TCP options padding after EOL to be consistent.
Reviewed by:	bz
2008-04-07 18:43:59 +00:00
Robert Watson
7a3244ccb7 Add further TCP inpcb locking assertions to some TCP input code paths.
MFC after:	1 month
2008-04-07 12:41:45 +00:00
Robert Watson
f457d58098 In in_pcbnotifyall() and in6_pcbnotify(), use LIST_FOREACH_SAFE() and
eliminate unnecessary local variable caching of the list head pointer,
making the code a bit easier to read.

MFC after:	3 weeks
2008-04-06 21:20:56 +00:00
Ruslan Ermilov
ea26d58729 Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT.
Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true
since the advent of MBUMA.

Reviewed by:	arch

There are ongoing disputes as to whether we want to switch to directly using
UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.
2008-03-25 09:39:02 +00:00
Kip Macy
e79dd20dd5 change inp_wlock_assert to inp_lock_assert 2008-03-24 20:24:04 +00:00
Kip Macy
8815ab518a Label inp as unused in the non-INVARIANTS case 2008-03-24 00:29:01 +00:00
Kip Macy
3d5853271e Insulate inpcb consumers outside the stack from the lock type and offset within the pcb by adding accessor functions.
Reviewed by: rwatson
MFC after: 3 weeks
2008-03-23 22:34:16 +00:00
Paolo Pisati
63bea44682 Explicitate the newpacket size.
Bug pointed out by: many
Pointy hat to: me :(
2008-03-19 11:28:13 +00:00
Paolo Pisati
8368edc123 Don't cache ptr to nat rule in case of tablearg argument.
Bug spotted by: Dyadchenko Mihail
2008-03-17 23:02:56 +00:00
Paolo Pisati
f6efbc8842 Don't abuse stack space while in kernel land, use heap instead. 2008-03-17 22:08:31 +00:00
Robert Watson
c2877015a1 Fix indentation for a closing brace in in_pcballoc().
MFC after:	3 days
2008-03-17 13:04:56 +00:00
Bjoern A. Zeeb
9e3bdede0f Correct IPsec behaviour with a 'use' level in SP but no SA available.
In that case return an continue processing the packet without IPsec.

PR:		121384
MFC after:	5 days
Reported by:	Cyrus Rahman (crahman gmail.com)
Tested by:	Cyrus Rahman (crahman gmail.com) [slightly older version]
2008-03-14 16:38:11 +00:00
Paolo Pisati
ab0fcfd00a -Don't pass down the entire pkt to ProtoAliasIn, ProtoAliasOut, FragmentIn
and FragmentOut.
-Axe the old PacketAlias API: it has been deprecated since 5.x.
2008-03-12 11:58:29 +00:00
Bjoern A. Zeeb
413deb1262 Padding after EOL option must be zeros according to RFC793 but
the NOPs used are 0x01.
While we could simply pad with EOLs (which are 0x00), rather use an
explicit 0x00 constant there to not confuse poeple with 'EOL padding'.
Put in a comment saying just that.

Problem discussed on:	src-committers with andre, silby, dwhite as
			follow up to the rev. 1.161 commit of tcp_var.h.
MFC after:		11 days
2008-03-09 13:26:50 +00:00
Paolo Pisati
4741f3a109 MFP4:
restrict the utilization of direct pointers to the content of
	ip packet. These modifications are functionally nop()s thus
	can be merged with no side effects.
2008-03-06 21:50:41 +00:00
Rui Paulo
1cf6e4f5ff Change the default port range for outgoing connections by introducing
IPPORT_EPHEMERALFIRST and IPPORT_EPHEMERALLAST with values
10000 and 65535 respectively.
The rationale behind is that it makes the attacker's life more
difficult if he/she wants to guess the ephemeral port range and
also lowers the probability of a port colision (described in
draft-ietf-tsvwg-port-randomization-01.txt).

While there, remove code duplication in in_pcbbind_setup().

Submitted by:	Fernando Gont <fernando at gont.com.ar>
Approved by:	njl (mentor)
Reviewed by:	silby, bms
Discussed on:	freebsd-net
2008-03-04 19:16:21 +00:00
Paolo Pisati
31937d2fb0 When unloading kld, don't forget to flush the nat pointers. 2008-03-03 22:32:01 +00:00
Paolo Pisati
2b40ce00a5 Raise a bit ipfw kld priority.
Discussed on: net-, ipfw-.
2008-03-03 10:12:46 +00:00
Bjoern A. Zeeb
c3b02504bc Some "cleanup" of tcp_mss():
- Move the assigment of the socket down before we first need it.
  No need to do it at the beginning and then drop out the function
  by one of the returns before using it 100 lines further down.
- Use t_maxopd which was assigned the "tcp_mssdflt" for the corrrect
  AF already instead of another #ifdef ? : #endif block doing the same.
- Remove an unneeded (duplicate) assignment of mss to t_maxseg just before
  we possibly change mss and re-do the assignment without using t_maxseg
  in between.

Reviewed by:	silby
No objections:	net@ (silence)
MFC after:	5 days
2008-03-02 08:40:47 +00:00
Bjoern A. Zeeb
af92e6cf95 Fix indentation (whitespace changes only).
MFC after:	6 days
2008-03-01 22:27:15 +00:00
Paolo Pisati
531c890b8a Move ipfw's nat code into its own kld: ipfw_nat. 2008-02-29 22:27:19 +00:00
David Malone
2b2c3b23d1 Dummynet has a limit of 100 slots queue size (or 1MB, if you give
the limit in bytes) hard coded into both the kernel and userland.
Make both these limits a sysctl, so it is easy to change the limit.
If the userland part of ipfw finds that the sysctls don't exist,
it will just fall back to the traditional limits.

(100 packets is quite a small limit these days. If you want to test
TCP at 100Mbps, 100 packets can only accommodate a DBP of 12ms.)

Note these sysctls in the man page and warn against increasing them
without thinking first.

MFC after:      3 weeks
2008-02-27 13:52:33 +00:00
Paolo Pisati
f94a7fc0b5 Add table/tablearg support to ipfw's nat.
MFC After: 1 week
2008-02-24 15:37:45 +00:00
Mike Silbersack
ea346b19cc Change FreeBSD 7 so that it returns TCP options in
the same order that FreeBSD 6 and before did.  Doug
White and the other bloodhounds at ISC discovered that
while FreeBSD 7's ordering of options was more efficient,
it caused some cable modem routers to ignore the
SYN-ACKs ordered in this fashion.

The placement of sackOK after the timestamp option seems
to be the critical difference:

FreeBSD 6:
<mss 1460,nop,wscale 1,nop,nop,timestamp 3512155768 0,sackOK,eol>

FreeBSD 7.0:
<mss 1460,nop,wscale 3,sackOK,timestamp 1370692577 0>

FreeBSD 7.0 + this change:
<mss 1460,nop,wscale 3,nop,nop,timestamp 7371813 0,sackOK,eol>

MFC after: 1 week
2008-02-24 05:13:20 +00:00
Randall Stewart
7a846e9ad8 Fixes a memory leak when VRF's are in play.
Submitted by:	Prasad Narasimha (snprasad@cisco.com)
Reviewed by:	rrs
2008-02-22 15:08:10 +00:00
Randall Stewart
69d5ee4f23 - Takes out stray ifdef code that should not have been present. 2008-02-22 15:06:25 +00:00
Gleb Smirnoff
e60a0104f8 If the vhid already present, return EEXIST instead of
non-informative EINVAL.
2008-02-07 13:18:59 +00:00
Gleb Smirnoff
3a2f50140c Remove unused structure member from struct in_ifadown_arg. 2008-02-07 11:26:52 +00:00
Mike Silbersack
361021cc6e Replace the random IP ID generation code we
obtained from OpenBSD with an algorithm suggested
by Amit Klein.  The OpenBSD algorithm has a few
flaws; see Amit's paper for more information.

For a description of how this algorithm works,
please see the comments within the code.

Note that this commit does not yet enable random IP ID
generation by default.  There are still some concerns
that doing so will adversely affect performance.

Reviewed by:  rwatson
MFC After: 2 weeks
2008-02-06 15:40:30 +00:00
Bjoern A. Zeeb
c26fe973a3 Rather than passing around a cached 'priv', pass in an ucred to
ipsec*_set_policy and do the privilege check only if needed.

Try to assimilate both ip*_ctloutput code blocks calling ipsec*_set_policy.

Reviewed by:	rwatson
2008-02-02 14:11:31 +00:00
Robert Watson
265de5bb62 Correct two problems relating to sorflush(), which is called to flush
read socket buffers in shutdown() and close():

- Call socantrcvmore() before sblock() to dislodge any threads that
  might be sleeping (potentially indefinitely) while holding sblock(),
  such as a thread blocked in recv().

- Flag the sblock() call as non-interruptible so that a signal
  delivered to the thread calling sorflush() doesn't cause sblock() to
  fail.  The sblock() is required to ensure that all other socket
  consumer threads have, in fact, left, and do not enter, the socket
  buffer until we're done flushin it.

To implement the latter, change the 'flags' argument to sblock() to
accept two flags, SBL_WAIT and SBL_NOINTR, rather than one M_WAITOK
flag.  When SBL_NOINTR is set, it forces a non-interruptible sx
acquisition, regardless of the setting of the disposition of SB_NOINTR
on the socket buffer; without this change it would be possible for
another thread to clear SB_NOINTR between when the socket buffer mutex
is released and sblock() is invoked.

Reviewed by:	bz, kmacy
Reported by:	Jos Backus <jos at catnook dot com>
2008-01-31 08:22:24 +00:00
Randall Stewart
3ca1bceea5 - Fix a comment about prison.
- Fix it so the VRF is captured while locks are held.
MFC after:	1 week
2008-01-28 10:34:38 +00:00
Randall Stewart
bf949ea2d4 - Change back to using prioity 0. Which means don't change the
prioity when running the thread. (this is for the sctp_interator thread).

MFC after:	1 week
2008-01-28 10:33:41 +00:00
Randall Stewart
257438fb6c - Fix a bug where the socket may have been closed which
could cause a crash in the auth code.
Obtained from:	Michael Tuexen
MFC after:	1 week
2008-01-28 10:31:12 +00:00