Commit Graph

215 Commits

Author SHA1 Message Date
Robert Watson
fe6bfc3730 Commit correct version of previous commit (in_pcb.c:1.164). Use the
local variables as currently named.

MFC after:	7 days
2005-06-01 11:43:39 +00:00
Robert Watson
6b348152be Assert pcbinfo lock in in_pcbdisconnect() and in_pcbdetach(), as the
global pcb lists are modified.

MFC after:	7 days
2005-06-01 11:39:42 +00:00
Maxim Konovalov
29f2a6ec18 o Tweak the comment a bit. 2005-04-08 08:43:21 +00:00
Maxim Konovalov
e99971bf2f o Disable random port allocation when ip.portrange.first ==
ip.portrange.last and there is the only port for that because:
a) it is not wise; b) it leads to a panic in the random ip port
allocation code.  In general we need to disable ip port allocation
randomization if the last - first delta is ridiculous small.

PR:		kern/79342
Spotted by:	Anjali Kulkarni
Glanced at by:	silby
MFC after:	2 weeks
2005-04-08 08:42:10 +00:00
Maxim Konovalov
6ee79c59d2 o Document net.inet.ip.portrange.random* sysctls.
o Correct a comment about random port allocation threshold
implementation.

Reviewed by:	silby, ru
MFC after:	3 days
2005-03-23 09:26:38 +00:00
Gleb Smirnoff
797127a9bf We can make code simplier after last change.
Noticed by:	Andrew Thompson
2005-02-22 08:35:24 +00:00
Gleb Smirnoff
914d092f5d In in_pcbconnect_setup() remove a check that route points at
loopback interface. Nobody have explained me sense of this check.
It breaks connect() system call to a destination address which is
loopback routed (e.g. blackholed).

Reviewed by:	silence on net@
MFC after:	2 weeks
2005-02-22 07:39:15 +00:00
Warner Losh
c398230b64 /* -> /*- for license, minor formatting changes 2005-01-07 01:45:51 +00:00
Mike Silbersack
5f311da2cc Port randomization leads to extremely fast port reuse at high
connection rates, which is causing problems for some users.

To retain the security advantage of random ports and ensure
correct operation for high connection rate users, disable
port randomization during periods of high connection rates.

Whenever the connection rate exceeds randomcps (10 by default),
randomization will be disabled for randomtime (45 by default)
seconds.  These thresholds may be tuned via sysctl.

Many thanks to Igor Sysoev, who proved the necessity of this
change and tested many preliminary versions of the patch.

MFC After:	20 seconds
2005-01-02 01:50:57 +00:00
Robert Watson
81158452be Push acquisition of the accept mutex out of sofree() into the caller
(sorele()/sotryfree()):

- This permits the caller to acquire the accept mutex before the socket
  mutex, avoiding sofree() having to drop the socket mutex and re-order,
  which could lead to races permitting more than one thread to enter
  sofree() after a socket is ready to be free'd.

- This also covers clearing of the so_pcb weak socket reference from
  the protocol to the socket, preventing races in clearing and
  evaluation of the reference such that sofree() might be called more
  than once on the same socket.

This appears to close a race I was able to easily trigger by repeatedly
opening and resetting TCP connections to a host, in which the
tcp_close() code called as a result of the RST raced with the close()
of the accepted socket in the user process resulting in simultaneous
attempts to de-allocate the same socket.  The new locking increases
the overhead for operations that may potentially free the socket, so we
will want to revise the synchronization strategy here as we normalize
the reference counting model for sockets.  The use of the accept mutex
in freeing of sockets that are not listen sockets is primarily
motivated by the potential need to remove the socket from the
incomplete connection queue on its parent (listen) socket, so cleaning
up the reference model here may allow us to substantially weaken the
synchronization requirements.

RELENG_5_3 candidate.

MFC after:	3 days
Reviewed by:	dwhite
Discussed with:	gnn, dwhite, green
Reported by:	Marc UBM Bocklet <ubm at u-boot-man dot de>
Reported by:	Vlad <marchenko at gmail dot com>
2004-10-18 22:19:43 +00:00
Robert Watson
48ac555d83 Assign so_pcb to NULL rather than 0 as it's a pointer.
Spotted by:	dwhite
2004-09-29 04:01:13 +00:00
Robert Watson
4c2bb15a89 In in_pcbrehash(), do assert the inpcb lock as well as the pcbinfo lock. 2004-08-19 01:11:17 +00:00
Robert Watson
27f74fd0ed Assert the locks of inpcbinfo's and inpcb's passed into in_pcbconnect()
and in_pcbconnect_setup(), since these functions frob the port and
address state of inpcbs.
2004-08-11 04:35:20 +00:00
Yaroslav Tykhiy
a4eb4405e3 Disallow a particular kind of port theft described by the following scenario:
Alice is too lazy to write a server application in PF-independent
	manner.  Therefore she knocks up the server using PF_INET6 only
	and allows the IPv6 socket to accept mapped IPv4 as well.  An evil
	hacker known on IRC as cheshire_cat has an account in the same
	system.  He starts a process listening on the same port as used
	by Alice's server, but in PF_INET.  As a consequence, cheshire_cat
	will distract all IPv4 traffic supposed to go to Alice's server.

Such sort of port theft was initially enabled by copying the code that
implemented the RFC 2553 semantics on IPv4/6 sockets (see inet6(4)) for
the implied case of the same owner for both connections.  After this
change, the above scenario will be impossible.  In the same setting,
the user who attempts to start his server last will get EADDRINUSE.

Of course, using IPv4 mapped to IPv6 leads to security complications
in the first place, but there is no reason to make it even more unsafe.

This change doesn't apply to KAME since it affects a FreeBSD-specific
part of the code.  It doesn't modify the out-of-box behaviour of the
TCP/IP stack either as long as mapping IPv4 to IPv6 is off by default.

MFC after:	1 month
2004-07-28 13:03:07 +00:00
Colin Percival
56f21b9d74 Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with:	rwatson, scottl
Requested by:	jhb
2004-07-26 07:24:04 +00:00
Maxim Konovalov
ef14c36965 o connect(2): if there is no a route to the destination
do not pick up the first local ip address for the source
ip address, return ENETUNREACH instead.

Submitted by:	Gleb Smirnoff
Reviewed by:	-current (silence)
2004-06-16 10:02:36 +00:00
Robert Watson
310e7ceb94 Socket MAC labels so_label and so_peerlabel are now protected by
SOCK_LOCK(so):

- Hold socket lock over calls to MAC entry points reading or
  manipulating socket labels.

- Assert socket lock in MAC entry point implementations.

- When externalizing the socket label, first make a thread-local
  copy while holding the socket lock, then release the socket lock
  to externalize to userspace.
2004-06-13 02:50:07 +00:00
Robert Watson
395a08c904 Extend coverage of SOCK_LOCK(so) to include so_count, the socket
reference count:

- Assert SOCK_LOCK(so) macros that directly manipulate so_count:
  soref(), sorele().

- Assert SOCK_LOCK(so) in macros/functions that rely on the state of
  so_count: sofree(), sotryfree().

- Acquire SOCK_LOCK(so) before calling these functions or macros in
  various contexts in the stack, both at the socket and protocol
  layers.

- In some cases, perform soisdisconnected() before sotryfree(), as
  this could result in frobbing of a non-present socket if
  sotryfree() actually frees the socket.

- Note that sofree()/sotryfree() will release the socket lock even if
  they don't free the socket.

Submitted by:	sam
Sponsored by:	FreeBSD Foundation
Obtained from:	BSD/OS
2004-06-12 20:47:32 +00:00
Yaroslav Tykhiy
4658dc8325 When checking for possible port theft, skip over a TCP inpcb
unless it's in the closed or listening state (remote address
== INADDR_ANY).

If a TCP inpcb is in any other state, it's impossible to steal
its local port or use it for port theft.  And if there are
both closed/listening and connected TCP inpcbs on the same
localIP:port couple, the call to in_pcblookup_local() will
find the former due to the design of that function.

No objections raised in:	-net, -arch
MFC after:			1 month
2004-05-20 06:35:02 +00:00
Mike Silbersack
6b2fc10b64 Wrap two long lines in the previous commit. 2004-04-23 23:29:49 +00:00
Mike Silbersack
174624e01d Take out an unneeded variable I forgot to remove in the last commit,
and make two small whitespace fixes so that diffs vs rev 1.142 are minimal.
2004-04-22 08:34:55 +00:00
Mike Silbersack
6ac48b7409 Simplify random port allocation, and add net.inet.ip.portrange.randomized,
which can be used to turn off randomized port allocation if so desired.

Requested by:	alfred
2004-04-22 08:32:14 +00:00
Mike Silbersack
6dd946b3f7 Switch from using sequential to random ephemeral port allocation,
implementation taken directly from OpenBSD.

I've resisted committing this for quite some time because of concern over
TIME_WAIT recycling breakage (sequential allocation ensures that there is a
long time before ports are recycled), but recent testing has shown me that
my fears were unwarranted.
2004-04-20 06:45:10 +00:00
Warner Losh
f36cfd49ad Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson
2004-04-07 20:46:16 +00:00
Bruce Evans
30a4ab088a Fixed misspelling of IPPORT_MAX as USHRT_MAX. Don't include <sys/limits.h>
to implement this mistake.

Fixed some nearby style bugs (initialization in declaration, misformatting
of this initialization, missing blank line after the declaration, and
comparision of the non-boolean result of the initialization with 0 using
"!".  In KNF, "!" is not even used to compare booleans with 0).
2004-04-06 10:59:11 +00:00
Pawel Jakub Dawidek
b0330ed929 Reduce 'td' argument to 'cred' (struct ucred) argument in those functions:
- in_pcbbind(),
	- in_pcbbind_setup(),
	- in_pcbconnect(),
	- in_pcbconnect_setup(),
	- in6_pcbbind(),
	- in6_pcbconnect(),
	- in6_pcbsetport().
"It should simplify/clarify things a great deal." --rwatson

Requested by:	rwatson
Reviewed by:	rwatson, ume
2004-03-27 21:05:46 +00:00
Pawel Jakub Dawidek
6823b82399 Remove unused argument.
Reviewed by:	ume
2004-03-27 20:41:32 +00:00
Pawel Jakub Dawidek
8da601dfb7 Remove unused function.
It was used in FreeBSD 4.x, but now we're using cr_canseesocket().
2004-03-25 15:12:12 +00:00
Robert Watson
846840ba95 Scrub unused variable zeroin_addr. 2004-03-10 01:01:04 +00:00
Hajimu UMEMOTO
548c676b32 do not deref freed pointer
Submitted by:	"Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net>
Reviewed by:	itojun
2004-01-13 09:51:47 +00:00
Andre Oppermann
0cfbbe3bde Make sure all uses of stack allocated struct route's are properly
zeroed.  Doing a bzero on the entire struct route is not more
expensive than assigning NULL to ro.ro_rt and bzero of ro.ro_dst.

Reviewed by:	sam (mentor)
Approved by:	re  (scottl)
2003-11-26 20:31:13 +00:00
Sam Leffler
5bd311a566 Split the "inp" mutex class into separate classes for each of divert,
raw, tcp, udp, raw6, and udp6 sockets to avoid spurious witness
complaints.

Reviewed by:	rwatson
Approved by:	re (rwatson)
2003-11-26 01:40:44 +00:00
Thomas Moestl
1f831750b5 bzero() the the sockaddr used for the destination address for
rtalloc_ign() in in_pcbconnect_setup() before it is filled out.
Otherwise, stack junk would be left in sin_zero, which could
cause host routes to be ignored because they failed the comparison
in rn_match().
This should fix the wrong source address selection for connect() to
127.0.0.1, among other things.

Reviewed by:	sam
Approved by:	re (rwatson)
2003-11-23 03:02:00 +00:00
Andre Oppermann
97d8d152c2 Introduce tcp_hostcache and remove the tcp specific metrics from
the routing table.  Move all usage and references in the tcp stack
from the routing table metrics to the tcp hostcache.

It caches measured parameters of past tcp sessions to provide better
initial start values for following connections from or to the same
source or destination.  Depending on the network parameters to/from
the remote host this can lead to significant speedups for new tcp
connections after the first one because they inherit and shortcut
the learning curve.

tcp_hostcache is designed for multiple concurrent access in SMP
environments with high contention and is hash indexed by remote
ip address.

It removes significant locking requirements from the tcp stack with
regard to the routing table.

Reviewed by:	sam (mentor), bms
Reviewed by:	-net, -current, core@kame.net (IPv6 parts)
Approved by:	re (scottl)
2003-11-20 20:07:39 +00:00
Robert Watson
a557af222b Introduce a MAC label reference in 'struct inpcb', which caches
the   MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols.  This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.

This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.

For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks.  Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.

Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.

Reviewed by:	sam, bms
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-11-18 00:39:07 +00:00
Sam Leffler
f7bbe2c0f1 add missing inpcb lock before call to tcp_twclose (which reclaims the inpcb)
Supported by:	FreeBSD Foundation
2003-11-13 05:18:23 +00:00
Sam Leffler
1b73ca0bf1 o reorder some locking asserts to reflect the order of the locks
o correct a read-lock assert in in_pcblookup_local that should be
  a write-lock assert (since time wait close cleanups may alter state)

Supported by:	FreeBSD Foundation
2003-11-13 05:16:56 +00:00
Ian Dowse
3ab2096b80 In in_pcbconnect_setup(), don't use the cached inp->inp_route unless
it is marked as RTF_UP. This appears to fix a crash that was sometimes
triggered when dhclient(8) tried to send a packet after an interface
had been detatched.

Reviewed by:	sam
2003-11-10 22:45:37 +00:00
Sam Leffler
59daba27d9 add locking assertions
Supported by:	FreeBSD Foundation
2003-11-08 23:02:36 +00:00
Hajimu UMEMOTO
0f9ade718d - cleanup SP refcnt issue.
- share policy-on-socket for listening socket.
- don't copy policy-on-socket at all.  secpolicy no longer contain
  spidx, which saves a lot of memory.
- deep-copy pcb policy if it is an ipsec policy.  assign ID field to
  all SPD entries.  make it possible for racoon to grab SPD entry on
  pcb.
- fixed the order of searching SA table for packets.
- fixed to get a security association header.  a mode is always needed
  to compare them.
- fixed that the incorrect time was set to
  sadb_comb_{hard|soft}_usetime.
- disallow port spec for tunnel mode policy (as we don't reassemble).
- an user can define a policy-id.
- clear enc/auth key before freeing.
- fixed that the kernel crashed when key_spdacquire() was called
  because key_spdacquire() had been implemented imcopletely.
- preparation for 64bit sequence number.
- maintain ordered list of SA, based on SA id.
- cleanup secasvar management; refcnt is key.c responsibility;
  alloc/free is keydb.c responsibility.
- cleanup, avoid double-loop.
- use hash for spi-based lookup.
- mark persistent SP "persistent".
  XXX in theory refcnt should do the right thing, however, we have
  "spdflush" which would touch all SPs.  another solution would be to
  de-register persistent SPs from sptree.
- u_short -> u_int16_t
- reduce kernel stack usage by auto variable secasindex.
- clarify function name confusion.  ipsec_*_policy ->
  ipsec_*_pcbpolicy.
- avoid variable name confusion.
  (struct inpcbpolicy *)pcb_sp, spp (struct secpolicy **), sp (struct
  secpolicy *)
- count number of ipsec encapsulations on ipsec4_output, so that we
  can tell ip_output() how to handle the packet further.
- When the value of the ul_proto is ICMP or ICMPV6, the port field in
  "src" of the spidx specifies ICMP type, and the port field in "dst"
  of the spidx specifies ICMP code.
- avoid from applying IPsec transport mode to the packets when the
  kernel forwards the packets.

Tested by:	nork
Obtained from:	KAME
2003-11-04 16:02:05 +00:00
Mike Silbersack
96af9ea52b - Add a new function tcp_twrecycleable, which tells us if the ISN which
we will generate for a given ip/port tuple has advanced far enough
for the time_wait socket in question to be safely recycled.

- Have in_pcblookup_local use tcp_twrecycleable to determine if
time_Wait sockets which are hogging local ports can be safely
freed.

This change preserves proper TIME_WAIT behavior under normal
circumstances while allowing for safe and fast recycling whenever
ephemeral port space is scarce.
2003-11-01 07:30:08 +00:00
Sam Leffler
9c63e9dbd7 Overhaul routing table entry cleanup by introducing a new rtexpunge
routine that takes a locked routing table reference and removes all
references to the entry in the various data structures. This
eliminates instances of recursive locking and also closes races
where the lock on the entry had to be dropped prior to calling
rtrequest(RTM_DELETE).  This also cleans up confusion where the
caller held a reference to an entry that might have been reclaimed
(and in some cases used that reference).

Supported by:	FreeBSD Foundation
2003-10-30 23:02:51 +00:00
Sam Leffler
d1dd20be6e Locking for updates to routing table entries. Each rtentry gets a mutex
that covers updates to the contents.  Note this is separate from holding
a reference and/or locking the routing table itself.

Other/related changes:

o rtredirect loses the final parameter by which an rtentry reference
  may be returned; this was never used and added unwarranted complexity
  for locking.
o minor style cleanups to routing code (e.g. ansi-fy function decls)
o remove the logic to bump the refcnt on the parent of cloned routes,
  we assume the parent will remain as long as the clone; doing this avoids
  a circularity in locking during delete
o convert some timeouts to MPSAFE callouts

Notes:

1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level
   applications cannot/do-no know about mutex's.  Doing this requires
   that the mutex be the last element in the structure.  A better solution
   is to introduce an externalized version of struct rtentry but this is
   a major task because of the intertwining of rtentry and other data
   structures that are visible to user applications.
2. There are known LOR's that are expected to go away with forthcoming
   work to eliminate many held references.  If not these will be resolved
   prior to release.
3. ATM changes are untested.

Sponsored by:	FreeBSD Foundation
Obtained from:	BSD/OS (partly)
2003-10-04 03:44:50 +00:00
John Baldwin
8b149b5131 Consistently use the BSD u_int and u_short instead of the SYSV uint and
ushort.  In most of these files, there was a mixture of both styles and
this change just makes them self-consistent.

Requested by:	bde (kern_ktrace.c)
2003-08-07 15:04:27 +00:00
Alexander Kabaev
104a9b7e3e Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on:	standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
2003-04-29 13:36:06 +00:00
Crist J. Clark
b0d226932e The ancient and outdated concept of "privileged ports" in UNIX-type
OSes has probably caused more problems than it ever solved. Allow the
user to retire the old behavior by specifying their own privileged
range with,

  net.inet.ip.portrange.reservedhigh  default = IPPORT_RESERVED - 1
  net.inet.ip.portrange.reservedlo    default = 0

Now you can run that webserver without ever needing root at all. Or
just imagine, an ftpd that can really drop privileges, rather than
just set the euid, and still do PORT data transfers from 20/tcp.

Two edge cases to note,

  # sysctl net.inet.ip.portrange.reservedhigh=0

Opens all ports to everyone, and,

  # sysctl net.inet.ip.portrange.reservedhigh=65535

Locks all network activity to root only (which could actually have
been achieved before with ipfw(8), but is somewhat more
complicated).

For those who stick to the old religion that 0-1023 belong to root and
root alone, don't touch the knobs (or even lock them by raising
securelevel(8)), and nothing changes.
2003-02-21 05:28:27 +00:00
Jonathan Lemon
340c35de6a Add a TCP TIMEWAIT state which uses less space than a fullblown TCP
control block.  Allow the socket and tcpcb structures to be freed
earlier than inpcb.  Update code to understand an inp w/o a socket.

Reviewed by: hsu, silby, jayanth
Sponsored by: DARPA, NAI Labs
2003-02-19 22:32:43 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Jeffrey Hsu
3dc7ebf9ff in_pcbnotifyall() requires an exclusive protocol lock for notify functions
which modify the connection list, namely, tcp_notify().
2003-02-12 23:55:07 +00:00
Sam Leffler
28a34902c4 remove the restriction on build a kernel with FAST_IPSEC and INET6;
you still don't want to use the two together, but it's ok to have
them in the same kernel (the problem that initiated this bandaid
has long since been fixed)
2003-01-30 05:43:08 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Sam Leffler
9c0a8ace11 temporarily disallow FAST_IPSEC and INET6 to avoid potential panics;
will correct this before 5.0 release
2002-11-08 23:50:32 +00:00
Ian Dowse
5200e00e72 Replace in_pcbladdr() with a more generic inner subroutine for
in_pcbconnect() called in_pcbconnect_setup(). This version performs
all of the functions of in_pcbconnect() except for the final
committing of changes to the PCB. In the case of an EADDRINUSE error
it can also provide to the caller the PCB of the duplicate connection,
avoiding an extra in_pcblookup_hash() lookup in tcp_connect().

This change will allow the "temporary connect" hack in udp_output()
to be removed and is part of the preparation for adding the
IP_SENDSRCADDR control message.

Discussed on:	-net
Approved by:	re
2002-10-21 13:55:50 +00:00
Ian Dowse
4b932371f4 Split out most of the logic from in_pcbbind() into a new function
called in_pcbbind_setup() that does everything except commit the
changes to the PCB. There should be no functional change here, but
in_pcbbind_setup() will be used by the soon-to-appear IP_SENDSRCADDR
control message implementation to check or allocate the source
address and port.

Discussed on:	-net
Approved by:	re
2002-10-20 21:44:31 +00:00
Sam Leffler
b9234fafa0 Tie new "Fast IPsec" code into the build. This involves the usual
configuration stuff as well as conditional code in the IPv4 and IPv6
areas.  Everything is conditional on FAST_IPSEC which is mutually
exclusive with IPSEC (KAME IPsec implmentation).

As noted previously, don't use FAST_IPSEC with INET6 at the moment.

Reviewed by:	KAME, rwatson
Approved by:	silence
Supported by:	Vernier Networks
2002-10-16 02:25:05 +00:00
Don Lewis
26ef6ac4df Create new functions in_sockaddr(), in6_sockaddr(), and
in6_v4mapsin6_sockaddr() which allocate the appropriate sockaddr_in*
structure and initialize it with the address and port information passed
as arguments.  Use calls to these new functions to replace code that is
replicated multiple times in in_setsockaddr(), in_setpeeraddr(),
in6_setsockaddr(), in6_setpeeraddr(), in6_mapped_sockaddr(), and
in6_mapped_peeraddr().  Inline COMMON_END in tcp_usr_accept() so that
we can call in_sockaddr() with temporary copies of the address and port
after the PCB is unlocked.

Fix the lock violation in tcp6_usr_accept() (caused by calling MALLOC()
inside in6_mapped_peeraddr() while the PCB is locked) by changing
the implementation of tcp6_usr_accept() to match tcp_usr_accept().

Reviewed by:	suz
2002-08-21 11:57:12 +00:00
Hajimu UMEMOTO
eccb7001ee cleanup usage of ip6_mapped_addr_on and ip6_v6only. now,
ip6_mapped_addr_on is unified into ip6_v6only.

MFC after:	1 week
2002-07-25 17:40:45 +00:00
Jeffrey Hsu
3ce144ea88 Notify functions can destroy the pcb, so they have to return an
indication of whether this happenned so the calling function
knows whether or not to unlock the pcb.

Submitted by:	Jennifer Yang (yangjihui@yahoo.com)
Bug reported by:  Sid Carter (sidcarter@symonds.net)
2002-06-14 08:35:21 +00:00
Jeffrey Hsu
3cfcc388ea Fix typo where INP_INFO_RLOCK should be INP_INFO_RUNLOCK.
Submitted by: tegge, jlemon

Prefer LIST_FOREACH macro.
  Submitted by: jlemon
2002-06-12 03:08:08 +00:00
Jeffrey Hsu
f76fcf6d4c Lock up inpcb.
Submitted by:	Jennifer Yang <yangjihui@yahoo.com>
2002-06-10 20:05:46 +00:00
Seigo Tanimura
4cc20ab1f0 Back out my lats commit of locking down a socket, it conflicts with hsu's work.
Requested by:	hsu
2002-05-31 11:52:35 +00:00
Seigo Tanimura
243917fe3b Lock down a socket, milestone 1.
o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
  socket buffer. The mutex in the receive buffer also protects the data
  in struct socket.

o Determine the lock strategy for each members in struct socket.

o Lock down the following members:

  - so_count
  - so_options
  - so_linger
  - so_state

o Remove *_locked() socket APIs.  Make the following socket APIs
  touching the members above now require a locked socket:

 - sodisconnect()
 - soisconnected()
 - soisconnecting()
 - soisdisconnected()
 - soisdisconnecting()
 - sofree()
 - soref()
 - sorele()
 - sorwakeup()
 - sotryfree()
 - sowakeup()
 - sowwakeup()

Reviewed by:	alfred
2002-05-20 05:41:09 +00:00
John Baldwin
ad278afdf0 Change the first argument of prison_xinpcb() to be a thread pointer instead
of a proc pointer so that prison_xinpcb() can use td_ucred.
2002-04-09 20:04:10 +00:00
John Baldwin
44731cab3b Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API.  The entire API now consists of two functions
similar to the pre-KSE API.  The suser() function takes a thread pointer
as its only argument.  The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0.  The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on:	smp@
2002-04-01 21:31:13 +00:00
Mike Silbersack
9e5a5ed4c5 Change the ephemeral port range from 1024-5000 to 49152-65535.
This increases the number of concurrent outgoing connections from ~4000
to ~16000.  Other OSes (Solaris, OS X, NetBSD) and many other NAT
products have already made this change without ill effects, so we
should not run into any problems.

MFC after:	1 week
2002-03-22 03:28:11 +00:00
Jeff Roberson
69c2d429c1 Switch vm_zone.h with uma.h. Change over to uma interfaces. 2002-03-20 05:48:55 +00:00
Alfred Perlstein
4d77a549fe Remove __P. 2002-03-19 21:25:46 +00:00
John Baldwin
a854ed9893 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
Hajimu UMEMOTO
a4a6e77341 - Check the address family of the destination cached in a PCB.
- Clear the cached destination before getting another cached route.
  Otherwise, garbage in the padding space (which might be filled in if it was
  used for IPv4) could annoy rtalloc.

Obtained from:	KAME
2002-01-21 20:04:22 +00:00
Jonathan Lemon
eaa6d8efe5 Minor style fixes. 2001-12-13 04:01:23 +00:00
Robert Watson
011376308f o Introduce pr_mtx into struct prison, providing protection for the
mutable contents of struct prison (hostname, securelevel, refcount,
  pr_linux, ...)
o Generally introduce mtx_lock()/mtx_unlock() calls throughout kern/
  so as to enforce these protections, in particular, in kern_mib.c
  protection sysctl access to the hostname and securelevel, as well as
  kern_prot.c access to the securelevel for access control purposes.
o Rewrite linux emulator abstractions for accessing per-jail linux
  mib entries (osname, osrelease, osversion) so that they don't return
  a pointer to the text in the struct linux_prison, rather, a copy
  to an array passed into the calls.  Likewise, update linprocfs to
  use these primitives.
o Update in_pcb.c to always use prison_getip() rather than directly
  accessing struct prison.

Reviewed by:	jhb
2001-12-03 16:12:27 +00:00
Jonathan Lemon
be2ac88c59 Introduce a syncache, which enables FreeBSD to withstand a SYN flood
DoS in an improved fashion over the existing code.

Reviewed by: silby  (in a previous iteration)
Sponsored by: DARPA, NAI Labs
2001-11-22 04:50:44 +00:00
Matthew Dillon
b1e4abd246 Give struct socket structures a ref counting interface similar to
vnodes.  This will hopefully serve as a base from which we can
expand the MP code.  We currently do not attempt to obtain any
mutex or SX locks, but the door is open to add them when we nail
down exactly how that part of it is going to work.
2001-11-17 03:07:11 +00:00
Andrew R. Reiter
83103a7397 - Fixes non-zero'd out sin_zero field problem so that the padding
is used as it is supposed to be.

Inspired by: PR #31704
Approved by: jdp
Reviewed by: jhb, -net@
2001-11-06 00:48:01 +00:00
Ruslan Ermilov
8071913df2 Pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2.
Have sys/net/route.c:rtrequest1(), which takes ``rt_addrinfo *''
as the argument.  Pass rt_addrinfo all the way down to rtrequest1
and ifa->ifa_rtrequest.  3rd argument of ifa->ifa_rtrequest is now
``rt_addrinfo *'' instead of ``sockaddr *'' (almost noone is
using it anyways).

Benefit: the following command now works.  Previously we needed
two route(8) invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

Remove unsafe typecast in rtrequest(), from ``rtentry *'' to
``sockaddr *''.  It was introduced by 4.3BSD-Reno and never
corrected.

Obtained from:	BSD/OS, NetBSD
MFC after:	1 month
PR:		kern/28360
2001-10-17 18:07:05 +00:00
Jonathan Lemon
9a10980e2a Centralize satosin(), sintosa() and ifatoia() macros in <netinet/in.h>
Remove local definitions.
2001-09-29 03:23:44 +00:00
Brooks Davis
9494d5968f Make faith loadable, unloadable, and clonable. 2001-09-25 18:40:52 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
Hajimu UMEMOTO
e43cc4ae36 When running aplication joined multicast address,
removing network card, and kill aplication.
imo_membership[].inm_ifp refer interface pointer
after removing interface.
When kill aplication, release socket,and imo_membership.
imo_membership use already not exist interface pointer.
Then, kernel panic.

PR:		29345
Submitted by:	Inoue Yuichi <inoue@nd.net.fujitsu.co.jp>
Obtained from:	KAME
MFC after:	3 days
2001-08-04 17:10:14 +00:00
Hajimu UMEMOTO
13cf67f317 move ipsec security policy allocation into in_pcballoc, before
making pcbs available to the outside world.  otherwise, we will see
inpcb without ipsec security policy attached (-> panic() in ipsec.c).

Obtained from:	KAME
MFC after:	3 days
2001-07-26 19:19:49 +00:00
Ruslan Ermilov
8bf82a92d5 Backout CSRG revision 7.22 to this file (if in_losing notices an
RTF_DYNAMIC route, it got freed twice).  I am not sure what was
the actual problem in 1992, but the current behavior is memory
leak if PCB holds a reference to a dynamically created/modified
routing table entry.  (rt_refcnt>0 and we don't call rtfree().)

My test bed was:

1.  Set net.inet.tcp.msl to a low value (for test purposes), e.g.,
    5 seconds, to speed up the transition of TCP connection to a
    "closed" state.
2.  Add a network route which causes ICMP redirect from the gateway.
3.  ping(8) host H that matches this route; this creates RTF_DYNAMIC
    RTF_HOST route to H.  (I was forced to use ICMP to cause gateway
    to generate ICMP host redirect, because gateway in question is a
    4.2-STABLE system vulnerable to a problem that was fixed later in
    ip_icmp.c,v 1.39.2.6, and TCP packets with DF bit set were
    triggering this bug.)
4.  telnet(1) to H
5.  Block access to H with ipfw(8)
6.  Send something in telnet(1) session; this causes EPERM, followed
    by an in_losing() call in a few seconds.
7.  Delete ipfw(8) rule blocking access to H, and wait for TCP
    connection moving to a CLOSED state; PCB is freed.
8.  Delete host route to H.
9.  Watch with netstat(1) that `rttrash' increased.
10. Repeat steps 3-9, and watch `rttrash' increases.

PR:		kern/25421
MFC after:	2 weeks
2001-06-29 12:07:29 +00:00
Hajimu UMEMOTO
3384154590 Sync with recent KAME.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.

TODO:
  - The definitions of SADB_* in sys/net/pfkeyv2.h are still different
    from RFC2407/IANA assignment because of binary compatibility
    issue.  It should be fixed under 5-CURRENT.
  - ip6po_m member of struct ip6_pktopts is no longer used.  But, it
    is still there because of binary compatibility issue.  It should
    be removed under 5-CURRENT.

Reviewed by:	itojun
Obtained from:	KAME
MFC after:	3 weeks
2001-06-11 12:39:29 +00:00
Poul-Henning Kamp
ccd6f42dc9 Fix a style(9) nit. 2001-03-16 19:36:23 +00:00
Poul-Henning Kamp
503d3c0277 Correctly cleanup in case of failure to bind a pcb.
PR:		25751
Submitted by:	<unicorn@Forest.Od.UA>
2001-03-12 21:53:23 +00:00
Bosko Milekic
234ff7c46f During a flood, we don't call rtfree(), but we remove the entry ourselves.
However, if the RTF_DELCLONE and RTF_WASCLONED condition passes, but the ref
count is > 1, we won't decrement the count at all. This could lead to
route entries never being deleted.

Here, we call rtfree() not only if the initial two conditions fail, but
also if the ref count is > 1 (and we therefore don't immediately delete
the route, but let rtfree() handle it).

This is an urgent MFC candidate. Thanks go to Mike Silbersack for the
fix, once again. :-)

Submitted by: Mike Silbersack <silby@silby.com>
2001-03-04 21:28:40 +00:00
Poul-Henning Kamp
970680fad8 Fix jails. 2001-02-28 09:38:48 +00:00
Jonathan Lemon
c693a045de Remove in_pcbnotify and use in_pcblookup_hash to find the cb directly.
For TCP, verify that the sequence number in the ICMP packet falls within
the tcp receive window before performing any actions indicated by the
icmp packet.

Clean up some layering violations (access to tcp internals from in_pcb)
2001-02-26 21:19:47 +00:00
Jesper Skriver
d1c54148b7 Redo the security update done in rev 1.54 of src/sys/netinet/tcp_subr.c
and 1.84 of src/sys/netinet/udp_usrreq.c

The changes broken down:

- remove 0 as a wildcard for addresses and port numbers in
  src/sys/netinet/in_pcb.c:in_pcbnotify()
- add src/sys/netinet/in_pcb.c:in_pcbnotifyall() used to notify
  all sessions with the specific remote address.
- change
  - src/sys/netinet/udp_usrreq.c:udp_ctlinput()
  - src/sys/netinet/tcp_subr.c:tcp_ctlinput()
  to use in_pcbnotifyall() to notify multiple sessions, instead of
  using in_pcbnotify() with 0 as src address and as port numbers.
- remove check for src port == 0 in
  - src/sys/netinet/tcp_subr.c:tcp_ctlinput()
  - src/sys/netinet/udp_usrreq.c:udp_ctlinput()
  as they are no longer needed.
- move handling of redirects and host dead from in_pcbnotify() to
  udp_ctlinput() and tcp_ctlinput(), so they will call
  in_pcbnotifyall() to notify all sessions with the specific
  remote address.

Approved by:	jlemon
Inspired by:    NetBSD
2001-02-22 21:23:45 +00:00
Robert Watson
91421ba234 o Move per-process jail pointer (p->pr_prison) to inside of the subject
credential structure, ucred (cr->cr_prison).
o Allow jail inheritence to be a function of credential inheritence.
o Abstract prison structure reference counting behind pr_hold() and
  pr_free(), invoked by the similarly named credential reference
  management functions, removing this code from per-ABI fork/exit code.
o Modify various jail() functions to use struct ucred arguments instead
  of struct proc arguments.
o Introduce jailed() function to determine if a credential is jailed,
  rather than directly checking pointers all over the place.
o Convert PRISON_CHECK() macro to prison_check() function.
o Move jail() function prototypes to jail.h.
o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the
  flag in the process flags field itself.
o Eliminate that "const" qualifier from suser/p_can/etc to reflect
  mutex use.

Notes:

o Some further cleanup of the linux/jail code is still required.
o It's now possible to consider resolving some of the process vs
  credential based permission checking confusion in the socket code.
o Mutex protection of struct prison is still not present, and is
  required to protect the reference count plus some fields in the
  structure.

Reviewed by:	freebsd-arch
Obtained from:	TrustedBSD Project
2001-02-21 06:39:57 +00:00
Jesper Skriver
c2221099a9 Remove unneeded loop increment in src/sys/netinet/in_pcb.c:in_pcbnotify
Forgotten by phk, when committing fix in kern/23986

PR:		kern/23986
Reviewed by:	phk
Approved by:	phk
2001-02-20 21:11:29 +00:00
Poul-Henning Kamp
37d4006626 Another round of the <sys/queue.h> FOREACH transmogriffer.
Created with:   sed(1)
Reviewed by:    md5(1)
2001-02-04 16:08:18 +00:00
Poul-Henning Kamp
fc2ffbe604 Mechanical change to use <sys/queue.h> macro API instead of
fondling implementation details.

Created with: sed(1)
Reviewed by: md5(1)
2001-02-04 13:13:25 +00:00
Wes Peters
550b151850 When attempting to bind to an ephemeral port, if no such port is
available, the error return should be EADDRNOTAVAIL rather than
EAGAIN.

PR:		14181
Submitted by:	Dima Dorfman <dima@unixfreak.org>
Reviewed by:	Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
2001-01-23 07:27:56 +00:00
Dag-Erling Smørgrav
a3ea6d41b9 First step towards an MP-safe zone allocator:
- have zalloc() and zfree() always lock the vm_zone.
 - remove zalloci() and zfreei(), which are now redundant.

Reviewed by:	bmilekic, jasone
2001-01-21 22:23:11 +00:00
Assar Westerlund
598ce68dbd include tcp header files to get the prototype for tcp_seq_vs_sess 2000-12-27 03:02:29 +00:00
Poul-Henning Kamp
442fad6798 Update the "icmp_admin_prohib_like_rst" code to check the tcp-window and
to be configurable with respect to acting only in SYN or in all TCP states.

PR:		23665
Submitted by:	Jesper Skriver <jesper@skriver.dk>
2000-12-24 10:57:21 +00:00
David Malone
7cc0979fd6 Convert more malloc+bzero to malloc+M_ZERO.
Submitted by:	josh@zipperup.org
Submitted by:	Robert Drehmel <robd@gmx.net>
2000-12-08 21:51:06 +00:00
Poul-Henning Kamp
e4bdf25dc8 Properly jail UDP sockets. This is quite a bit more tricky than TCP.
This fixes a !root userland panic, and some cases where the wrong
interface was chosen for a jailed UDP socket.

PR:		20167, 19839, 20946
2000-09-17 13:35:42 +00:00
Jayanth Vijayaraghavan
e7f3269307 When a connection is being dropped due to a listen queue overflow,
delete the cloned route that is associated with the connection.
This does not exhaust the routing table memory when the system
is under a SYN flood attack. The route entry is not deleted if there
is any prior information cached in it.

Reviewed by: Peter Wemm,asmodai
2000-07-21 23:26:37 +00:00
Jun-ichiro itojun Hagino
686cdd19b1 sync with kame tree as of july00. tons of bug fixes/improvements.
API changes:
- additional IPv6 ioctls
- IPsec PF_KEY API was changed, it is mandatory to upgrade setkey(8).
  (also syntax change)
2000-07-04 16:35:15 +00:00