Commit Graph

55 Commits

Author SHA1 Message Date
Bjoern A. Zeeb
603724d3ab Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from:	//depot/projects/vimage-commit2/...
Reviewed by:	brooks, des, ed, mav, julian,
		jamie, kris, rwatson, zec, ...
		(various people I forgot, different versions)
		md5 (with a bit of help)
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
X-MFC after:	never
V_Commit_Message_Reviewed_By:	more people than the patch
2008-08-17 23:27:27 +00:00
Bjoern A. Zeeb
078b704233 Pass the ucred along into in{,6}_pcblookup_local for upcoming
prison checks.

Reviewed by:	rwatson
2008-07-10 13:31:11 +00:00
Bjoern A. Zeeb
a55b8b2068 Document required locking in in6_sleectsrc() in case an inp is
passed in by adding an assert.

Requested by:	rwatson
Reviewed by:	rwatson
2008-07-09 16:33:21 +00:00
Bjoern A. Zeeb
f2f877d38c Change the parameters to in6_selectsrc():
- pass in the inp instead of both in6p_moptions and laddr.
 - pass in cred for upcoming prison checks.

Reviewed by:	rwatson
2008-07-08 18:41:36 +00:00
Robert Watson
8501a69cc9 Convert pcbinfo and inpcb mutexes to rwlocks, and modify macros to
explicitly select write locking for all use of the inpcb mutex.
Update some pcbinfo lock assertions to assert locked rather than
write-locked, although in practice almost all uses of the pcbinfo
rwlock main exclusive, and all instances of inpcb lock acquisition
are exclusive.

This change should introduce (ideally) little functional change.
However, it lays the groundwork for significantly increased
parallelism in the TCP/IP code.

MFC after:	3 months
Tested by:	kris (superset of committered patch)
2008-04-17 21:38:18 +00:00
Qing Li
e440aed958 This patch provides the back end support for equal-cost multi-path
(ECMP) for both IPv4 and IPv6. Previously, multipath route insertion
is disallowed. For example,

	route add -net 192.103.54.0/24 10.9.44.1
	route add -net 192.103.54.0/24 10.9.44.2

The second route insertion will trigger an error message of
"add net 192.103.54.0/24: gateway 10.2.5.2: route already in table"

Multiple default routes can also be inserted. Here is the netstat
output:

default		10.2.5.1	UGS	0	3074	bge0 =>
default		10.2.5.2	UGS	0	0	bge0

When multipath routes exist, the "route delete" command requires
a specific gateway to be specified or else an error message would
be displayed. For example,

	route delete default

would fail and trigger the following error message:

"route: writing to routing socket: No such process"
"delete net default: not in table"

On the other hand,

	route delete default 10.2.5.2

would be successful: "delete net default: gateway 10.2.5.2"

One does not have to specify a gateway if there is only a single
route for a particular destination.

I need to perform more testings on address aliases and multiple
interfaces that have the same IP prefixes. This patch as it
stands today is not yet ready for prime time. Therefore, the ECMP
code fragments are fully guarded by the RADIX_MPATH macro.
Include the "options  RADIX_MPATH" in the kernel configuration
to enable this feature.

Reviewed by:	robert, sam, gnn, julian, kmacy
2008-04-13 05:45:14 +00:00
Bjoern A. Zeeb
ab569b9c05 Correct the commented out debugging printf()s in REPLACE and NEXT macros.
ip6_sprintf() needs a buffer as first argument these days.

MFC after:	2 weeks
2008-01-20 10:08:15 +00:00
David E. O'Brien
9233d8f3ad un-__P() 2008-01-08 19:08:58 +00:00
David E. O'Brien
b48287a32a Clean up VCS Ids. 2007-12-10 16:03:40 +00:00
Xin LI
2a463222be Space cleanup
Approved by:	re (rwatson)
2007-07-05 16:29:40 +00:00
Xin LI
1272577e22 ANSIfy[1] plus some style cleanup nearby.
Discussed with:	gnn, rwatson
Submitted by:	Karl Sj?dahl - dunceor <dunceor gmail com> [1]
Approved by:	re (rwatson)
2007-07-05 16:23:49 +00:00
Robert Watson
32f9753cfb Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths.  Do, however, move those prototypes to priv.h.

Reviewed by:	csjp
Obtained from:	TrustedBSD Project
2007-06-12 00:12:01 +00:00
Robert Watson
712fc218a0 Rename some fields of struct inpcbinfo to have the ipi_ prefix,
consistent with the naming of other structure field members, and
reducing improper grep matches.  Clean up and comment structure
fields in structure definition.
2007-04-30 23:12:05 +00:00
John Baldwin
4e7f640dfb Optimize sx locks to use simple atomic operations for the common cases of
obtaining and releasing shared and exclusive locks.  The algorithms for
manipulating the lock cookie are very similar to that rwlocks.  This patch
also adds support for exclusive locks using the same algorithm as mutexes.

A new sx_init_flags() function has been added so that optional flags can be
specified to alter a given locks behavior.  The flags include SX_DUPOK,
SX_NOWITNESS, SX_NOPROFILE, and SX_QUITE which are all identical in nature
to the similar flags for mutexes.

Adaptive spinning on select locks may be enabled by enabling the
ADAPTIVE_SX kernel option.  Only locks initialized with the SX_ADAPTIVESPIN
flag via sx_init_flags() will adaptively spin.

The common cases for sx_slock(), sx_sunlock(), sx_xlock(), and sx_xunlock()
are now performed inline in non-debug kernels.  As a result, <sys/sx.h> now
requires <sys/lock.h> to be included prior to <sys/sx.h>.

The new kernel option SX_NOINLINE can be used to disable the aforementioned
inlining in non-debug kernels.

The size of struct sx has changed, so the kernel ABI is probably greatly
disturbed.

MFC after:	1 month
Submitted by:	attilio
Tested by:	kris, pjd
2007-03-31 23:23:42 +00:00
Bjoern A. Zeeb
1d54aa3ba9 MFp4: 92972, 98913 + one more change
In ip6_sprintf no longer use and return one of eight static buffers
for printing/logging ipv6 addresses.
The caller now has to hand in a sufficiently large buffer as first
argument.
2006-12-12 12:17:58 +00:00
Robert Watson
acd3428b7d Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
Brooks Davis
43bc7a9c62 With exception of the if_name() macro, all definitions in net_osdep.h
were unused or already in if_var.h so add if_name() to if_var.h and
remove net_osdep.h along with all references to it.

Longer term we may want to kill off if_name() entierly since all modern
BSDs have if_xname variables rendering it unnecessicary.
2006-08-04 21:27:40 +00:00
Seigo Tanimura
f8366b0334 Avoid spurious release of an rtentry. 2006-05-23 00:32:22 +00:00
Robert Watson
8deea4a8f3 Move lock assertions to top of in6_pcbladdr(): we still want them to run
even if we're going to return an argument-based error.

Assert pcbinfo lock in in6_pcblookup_local(), in6_pcblookup_hash(), since
they walk pcbinfo inpcb lists.

Assert inpcb and pcbinfo locks in in6_pcbsetport(), since
port reservations are changing.

MFC after:	3 months
2006-04-25 12:09:58 +00:00
SUZUKI Shinsuke
c1a049ac20 sync with KAME (removed a unnecesary non-standard macro)
Obtained from: KAME
Reviewd by: ume, gnn
2005-10-19 16:53:24 +00:00
Hajimu UMEMOTO
5d52565396 - fix race condition using sx lock.
- use TAILQ_FOREACH() for readability.

Suggested by:	jhb
2005-08-17 16:46:55 +00:00
Hajimu UMEMOTO
1c44678637 avoid exclusive sleep mutex. 2005-08-16 19:49:10 +00:00
Hajimu UMEMOTO
cd0fdcf7a7 - fix typo in comment.
- nuke unused code.

Submitted by:	suz
Obtained from:	KAME
2005-08-12 15:27:25 +00:00
Hajimu UMEMOTO
d6bb0cb7eb nuke duplicate inclusion of scope6_var.h. 2005-07-26 11:46:15 +00:00
Hajimu UMEMOTO
a1f7e5f8ee scope cleanup. with this change
- most of the kernel code will not care about the actual encoding of
  scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
  scoped addresses as a special case.
- scope boundary check will be stricter.  For example, the current
  *BSD code allows a packet with src=::1 and dst=(some global IPv6
  address) to be sent outside of the node, if the application do:
    s = socket(AF_INET6);
    bind(s, "::1");
    sendto(s, some_global_IPv6_addr);
  This is clearly wrong, since ::1 is only meaningful within a single
  node, but the current implementation of the *BSD kernel cannot
  reject this attempt.

Submitted by:	JINMEI Tatuya <jinmei__at__isl.rdc.toshiba.co.jp>
Obtained from:	KAME
2005-07-25 12:31:43 +00:00
George V. Neville-Neil
403cbcf59f Fixes for various nits found by the Coverity tool.
In particular 2 missed return values and an inappropriate bcopy from
a possibly NULL pointer.

Reviewed by:	jake
Approved by:	rwatson
MFC after:	1 week
2005-05-15 02:28:30 +00:00
Warner Losh
caf43b0208 /* -> /*- for license, minor formatting changes, separate for KAME 2005-01-07 02:30:35 +00:00
Warner Losh
f36cfd49ad Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson
2004-04-07 20:46:16 +00:00
Pawel Jakub Dawidek
b0330ed929 Reduce 'td' argument to 'cred' (struct ucred) argument in those functions:
- in_pcbbind(),
	- in_pcbbind_setup(),
	- in_pcbconnect(),
	- in_pcbconnect_setup(),
	- in6_pcbbind(),
	- in6_pcbconnect(),
	- in6_pcbsetport().
"It should simplify/clarify things a great deal." --rwatson

Requested by:	rwatson
Reviewed by:	rwatson, ume
2004-03-27 21:05:46 +00:00
Hajimu UMEMOTO
68efda090a KNF
Obtained from:	KAME
2004-02-04 12:55:45 +00:00
Peter Wemm
a89ec05e3e Catch a few places where NULL (pointer) was used where 0 (integer) was
expected.
2003-12-23 02:36:43 +00:00
Andre Oppermann
97d8d152c2 Introduce tcp_hostcache and remove the tcp specific metrics from
the routing table.  Move all usage and references in the tcp stack
from the routing table metrics to the tcp hostcache.

It caches measured parameters of past tcp sessions to provide better
initial start values for following connections from or to the same
source or destination.  Depending on the network parameters to/from
the remote host this can lead to significant speedups for new tcp
connections after the first one because they inherit and shortcut
the learning curve.

tcp_hostcache is designed for multiple concurrent access in SMP
environments with high contention and is hash indexed by remote
ip address.

It removes significant locking requirements from the tcp stack with
regard to the routing table.

Reviewed by:	sam (mentor), bms
Reviewed by:	-net, -current, core@kame.net (IPv6 parts)
Approved by:	re (scottl)
2003-11-20 20:07:39 +00:00
Hajimu UMEMOTO
f4dec803c9 reflect ip6_pktopts and ip6_moptions into embeded scope of
destination address.  it makes `ping6 -I <if> <link-local>'
work again.  since we don't merge scope cleanup yet, we need
this for workaround.
2003-11-12 21:39:12 +00:00
Hajimu UMEMOTO
e6a2735045 make sure to treat destrination address as KAME internal form
of embedscope.
2003-11-05 16:09:21 +00:00
Hajimu UMEMOTO
d6385b1c0b source address selection part of RFC3484.
TODO: since there is scope issue to be solved, multicast and
link-local address are treated as special for workaround for
now.

Obtained from:	KAME
2003-11-04 20:22:33 +00:00
Hajimu UMEMOTO
4a201dfc18 - update comments to refrect recent BSDs.
- nuke unused macro PSUEDO_SET().
- I believe our if_xname stuff is nothing strange against other BSDs.

Obtained from:	KAME
2003-11-04 14:08:31 +00:00
Hajimu UMEMOTO
349b668aab - unlock on error.
- don't call malloc with M_WAITOK within lock context.
2003-10-30 18:42:25 +00:00
Hajimu UMEMOTO
7fc91b3f1d add management part of address selection policy described in
RFC3484.

Obtained from:	KAME
2003-10-30 15:29:17 +00:00
Sam Leffler
37bdc2803f check return result from rtalloc1 before invoking RTUNLOCK 2003-10-23 21:41:00 +00:00
Hajimu UMEMOTO
66bb118edd drop the code of HAVE_NRL_INPCB part. our system doesn't
use NRL style INPCB.
2003-10-22 18:52:57 +00:00
Hajimu UMEMOTO
9a4f9608ad - change scope to zone.
- change node-local to interface-local.
- better error handling of address-to-scope mapping.
- use in6_clearscope().

Obtained from:	KAME
2003-10-21 20:05:32 +00:00
Hajimu UMEMOTO
31b1bfe1b0 - add dom_if{attach,detach} framework.
- transition to use ifp->if_afdata.

Obtained from:	KAME
2003-10-17 15:46:31 +00:00
Hajimu UMEMOTO
7efe5d92ab - fix typo in comments.
- style.
- NULL is not 0.
- some variables were renamed.
- nuke unused logic.
(there is no functional change.)

Obtained from:	KAME
2003-10-08 18:26:08 +00:00
Hajimu UMEMOTO
40e39bbb67 return(code) -> return (code)
(reduce diffs against KAME)
2003-10-06 14:02:09 +00:00
Sam Leffler
d1dd20be6e Locking for updates to routing table entries. Each rtentry gets a mutex
that covers updates to the contents.  Note this is separate from holding
a reference and/or locking the routing table itself.

Other/related changes:

o rtredirect loses the final parameter by which an rtentry reference
  may be returned; this was never used and added unwarranted complexity
  for locking.
o minor style cleanups to routing code (e.g. ansi-fy function decls)
o remove the logic to bump the refcnt on the parent of cloned routes,
  we assume the parent will remain as long as the clone; doing this avoids
  a circularity in locking during delete
o convert some timeouts to MPSAFE callouts

Notes:

1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level
   applications cannot/do-no know about mutex's.  Doing this requires
   that the mutex be the last element in the structure.  A better solution
   is to introduce an externalized version of struct rtentry but this is
   a major task because of the intertwining of rtentry and other data
   structures that are visible to user applications.
2. There are known LOR's that are expected to go away with forthcoming
   work to eliminate many held references.  If not these will be resolved
   prior to release.
3. ATM changes are untested.

Sponsored by:	FreeBSD Foundation
Obtained from:	BSD/OS (partly)
2003-10-04 03:44:50 +00:00
Seigo Tanimura
4cc20ab1f0 Back out my lats commit of locking down a socket, it conflicts with hsu's work.
Requested by:	hsu
2002-05-31 11:52:35 +00:00
Seigo Tanimura
243917fe3b Lock down a socket, milestone 1.
o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
  socket buffer. The mutex in the receive buffer also protects the data
  in struct socket.

o Determine the lock strategy for each members in struct socket.

o Lock down the following members:

  - so_count
  - so_options
  - so_linger
  - so_state

o Remove *_locked() socket APIs.  Make the following socket APIs
  touching the members above now require a locked socket:

 - sodisconnect()
 - soisconnected()
 - soisconnecting()
 - soisdisconnected()
 - soisdisconnecting()
 - sofree()
 - soref()
 - sorele()
 - sorwakeup()
 - sotryfree()
 - sowakeup()
 - sowwakeup()

Reviewed by:	alfred
2002-05-20 05:41:09 +00:00
John Baldwin
44731cab3b Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API.  The entire API now consists of two functions
similar to the pre-KSE API.  The suser() function takes a thread pointer
as its only argument.  The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0.  The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on:	smp@
2002-04-01 21:31:13 +00:00
Hajimu UMEMOTO
54c1b8821b - Check the address family of a cached destination, in case of
sharing the cache with IPv4.
- Check if the cached route is up in in6_selectsrc().

Obtained from:	KAME
2002-01-21 20:02:36 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00